Subsurface Data Access and Analytics

SIRIUS is building on the Optique platform for ontology-based data access to demonstrate how repositories like DISKOS can be developed into digital platforms for exploration, research and innovation. Once this data is opened up, it needs to be analysed. For this reason we are also working with image analysis, data science and natural language applications in sub-surface data management.

A Platform for Subsurface Innovation

SIRIUS is working on a vision of providing a platform for innovation in the sub-surface. A recent book by Andrew McAfee and Erik Brynjolfsson, Machine Platform Crowd traced the role of platforms as enablers for innovation by crowds of workers. We believe that there is a need to open up subsurface data to researchers and innovators so that they can try out their ideas on real data. We also believe that national data repositories, like DISKOS, have the potential to provide such a platform. For this to be done, however, we need to improve the access to the data and allow it to be linked with data in other databases. We also need to improve the access to unstructured text information in these databases.

This vision is shared by the SIRIUS partners Evry and Kadme. By using insights and technologies from several of the SIRIUS research programs we believe it is possible to extract valuable information, correlations and hints on where to search for new patches of oil and gas. The value of finding a new Johan Sverdrup based on better data analysis of ‘old’ data in DISKOS is the driving force for pursuing this holy grail. But the possibility of exploiting minor finds, better utilization of infrastructure, smarter production strategies are all ‘buried’ within the databank, waiting to be excavated if you have the right tools. The purpose and benefits speak for themselves.

SIRIUS has built a prototype of a platform for teaching and research at the University of Oslo. This used the Optique platform to link data in several internal data sources and data in DISKOS. This proved the feasibility of the approach and is a launching point for further work. In 2019 this prototype will be extended into a system that will allow joint work with our colleagues at the Federal University of Rio Grande do Sul. We will also be working with the open data set provided by Equinor from the Volve field. This data is a valuable test of the feasibility of our ideas and methods.

SIRIUS is also supporting and participating in other initiatives for sharing and interoperability of subsurface data. Our research provides a framework that can be used to implement an effective, robust and scalable framework for data access. The Subsea Valley National Centre of Expertise is organising a working party on data sharing, in which both Equinor and SIRIUS are active. We are interacting, together with our partners, with other initiatives, such as DataLink and the Open Subsurface Data Universe.

Subsurface Data Access: Data Wrangling

Digital transformation of sub-surface work processes is about overcoming the bottleneck of data access and increasing the quality of interpretations by means of better use of data. The data access bottleneck means that up to 70% of exploration experts’ time is spent on finding, accessing, integrating and cleaning data before analysis can even start.

Geoscientists find that it is hard to get an overview of all available data related to an area of interest. This data is spread over different applications and many internal and external data sources. No unified view is, as a rule, available up front, though a Project Data Managers (PDM) helps. It is difficult to extract data from databases. Should complex queries have to be written, a Central Data Manager (CDM) is needed. It is challenging to extract data and information based on geological and petrophysical attributes, as it is not possible to run these queries simultaneously on multiple data sources. It is challenging to integrate datasets before analysis can start. This is often tedious manual work that the geoscientists must do themselves. It is difficult to extract data and knowledge from the text documents as there are very few tools that can deal with the contents of unstructured documents and reports.

Geoscientists are well aware of the limitations of the workflow. As a result, valuable analysis does not include all essential data and conclusions may be inaccurate or at worse, erroneous, due to incomplete data foundation.

SIRIUS is developing methods, tools and competence in sub-surface data wrangling. A sub-surface data wrangler has competence in both geoscience and digital technologies. These skills are crucial for integrating the workflows of geoscientists and data managers. A data wrangler can also efficiently exploit opportunities brought by new IT technologies, such as identifying relevant data sources, developing complex ad-hoc queries over federated databases, and retrieving information from reports stored as text documents. In these ways, a data wrangler can bring data much closer to the project teams and give geoscientists a radically better possibility of extracting data and information with the exact specification (in terms of complex geological and petrophysical attributes) they need for their sub-surface evaluation.

For the data wrangler to be less dependent on the data managers than geoscientists are today, we need to capture the specialized knowledge of data managers and build this into data wrangling tools. A successful attempt in this direction was Optique, a 14M Euro EU project that finished in 2016. Optique showed that geoscience knowledge could be captured in a knowledge graph and reusable mappings, built by data managers, could connect this knowledge graph to data in databases. Optique then demonstrated that complex queries over several federated data sources (including EPDS, NPD FactPages, Open Works installations, GeoChemDB, CoreDB and DDR) could be easily written and efficiently executed. Since this process was fully automated, tasks that normally would take several days could now be performed in minutes.

Optique showed the potential to transform the way data is gathered and analysed by streamlining the workflow and making it more user-friendly. However, Optique had limitations. It could only work with relational databases, had no built-in support for quantitative analytics, could not access unstructured data, and had limited tool support for constructing and maintaining the necessary knowledge graph and mappings.

SIRIUS is working to address these shortcomings and aims to significantly broaden the applicability of the approach for use in exploration projects.  Furthermore, as pointed out in the recent KonKraft report, DISKOS is particularly an important data source for exploration.

The SIRIUS Geoscientists (from left to right): Michael Heeremans, Irina Pene and Adnan Latif

This beacon responds to the KonKraft recommendations on improving the user-friendliness of DISKOS by using DISKOS to motivate data wrangling challenges and evaluate results.

Subsurface Data Analytics

Faster access to relevant data is of interest only if the data is can be used to create insight and guide decisions. Emerging sciences like Data science can enable end-users to generate valuable insights for the subsurface evaluation and can provide significant aid to exploration decisions.

The subsurface analytics part of this beacon is focussing on developing methodologies and tools for data analytics, comprising topics like data analysis, machine learning, natural language processing and visualisation. Applications currently in focus are:

  • Reservoir parameters prediction using analogues, where the focus is to investigate techniques to identify and quantify uncertainties, thus predicting more accurate parameters to be used in exploration modelling. The primary objective is to extend Machine learning models that can incorporate Oil & Gas domain information and handles prediction uncertainties to a reliable extent.
  • Domain-adapted information extraction for the oil and gas domain, where the focus is to construct a system which will extract and make searchable factual information from large quantities of unstructured, natural language text. In future, we aim to extend NLP/ML capabilities to extract structured (numerical) information from a large amount of unstructured data (documents) to generate valuable insights, which is beyond human capacity.
  • The engineering aspects of building data access and analytics pipeline.

More information and news from this beacon

External Collaboration