[188] Assessing the Capabilities of Large Language Models for Oil and Gas Industry Applications
F. Castanedo, W. Bhattacharya, S. Ghosh, Martin Takáč, Salem Lahlou, Zangir Iklassov, M. Schaffrath, N. Reddicharla, R. Mohan, M. Yaslam, M. Lee
2025
Journal Paper
Abu Dhabi International Petroleum Exhibition and Conference
LLM
Energy Systems
Benchmarking
Natural Language Processing
Domain Adaptation
Keywords:
oil and gas domain evaluation,
domain-specific llm benchmarks,
multiple-choice question answering,
open-ended question answering,
upstream midstream downstream operations,
llama 3.1 405b,
gpt-4o comparison,
model throughput and latency,
performance-to-cost tradeoff,
technical knowledge integration
Abstract
Large Language Models (LLMs) are commonly evaluated on general-domain datasets such as MMLU or GSM8K. However, assessing their performance in specific domains requires creating a test set with domain-specific context. This work provides a comprehensive analysis of state-of-the-art LLM's capabilities with respect to oil and gas knowledge that has been conducted in the context of the EnergyAI project within ADNOC.
Our assessment is performed using two custom domain evaluation datasets. First, we measure the ability of the model to handle Multiple-Choice Questions (MCQ), which consists of questions that require a solid understanding and reasoning across upstream, midstream and downstream oil and gas operations. Second, we assess their ability to answer open-ended questions in the same domain, analyzing and quantifying the quality of the responses in relevance, accuracy and completeness. In addition to these evaluations, we examine other factors such as model size, throughput, response latency, and overall capabilities.
We have observed that the Llama3.1 405B model delivers a domain-specific performance comparable to proprietary models like GPT4o. Furthermore, models at a smaller scale - such as Llama 3.1 70B and Llama 3.2 90B - deliver excellent results relative to their size and offer an attractive performance-to-cost tradeoff. Nevertheless, all current LLM, including the most advanced open-source and proprietary models, exhibit notable limitations in their knowledge and reasoning capabilities within the oil and gas domain. These shortcomings highlight the need for targeted domain adaptation, deeper integration of technical knowledge, and rigorous evaluation tailored to the complexities of the industry.