Subsurface Data Access – OBDA Pilot

SIRIUS OBDA Subsurface Pilot

The SIRIUS OBDA subsurface pilot project is addressing the shortcomings of Ontology-based data access and aims to significantly broaden the applicability of the approach, with a focus on subsurface applications.

Challenges

Subsurface digital transformation is about overcoming the bottleneck of data access and increasing the quality of interpretations by means of the better use of data. The data access bottleneck is substantial as up to 70% of subsurface experts’ time is spent finding, accessing, integrating, and cleaning data before analysis can even start. (Putting the FOCUS on Data, W3C Workshop on Semantic Web in Oil & Gas Industry, Jim Crompton).

Viewed from the Geoscientist, it is hard to get an overview of all available data related to an area of interest, as this data is spread over different applications and many internal and external data sources. No unified view is, as a rule, available up front, though Project Data Managers (PDMs) assist. It is difficult to extract data from databases; should complex queries have to be written, Central Data Managers (CDMs) typically assist. It is challenging to extract data and information based on geological and petrophysical attributes (see the user scenario example below) as it is not possible to execute these types of queries simultaneously on multiple data sources. It is challenging to integrate datasets before analysis can start: this is often tedious manual work that the geoscientists must do themselves. It is incredibly difficult to extract data and knowledge from the text documents as there are very few tools that can deal with the contents of unstructured documents and reports. Geoscientists are well aware of the limitations of the workflow. As a result, valuable analyses on data are too often not performed, and possibilities in data are too often not detected.

It is urgently needed to build competence and tools for the exploration data wrangling. An exploration data wrangler has competence in both geoscience and digital technologies. This competence is crucial to integrate the workflows of geoscientists, PDMs and CDMs and plays an important role in enabling digital transformation in exploration work practices. Along with supporting exploration teams with the routine data access tasks, the data wranglers will efficiently exploit opportunities brought by new IT technologies. This includes efficient handling of critical tasks such as identifying relevant data sources, developing complex ad-hoc queries over federated databases, and retrieving information from reports stored as text documents. In these ways, the data wrangler can bring data much closer to the project teams and give geoscientists a radically better possibility of extracting data and information with the exact specification (in terms of complex geological and petrophysical attributes) they need for their subsurface evaluation.

Our Approach

For the data wrangler to be less dependent on the CDMs and PDMs than the geoscientists are today we need to capture the specialknowledge of the CDMs and PDMs and buildt his into data wrangling tools. Asuccessful attempt in this direction was Optique, a 14M Euro EU project that finished in 2016. Optique showed that geoscience knowledge could be reliably captured in a knowledge graph (or an ontology) and reusable mappings from CDMs could efficiently connect this knowledge graph to data in databases. Optique then demonstrated that complex queries over several federated data sources (including EPDS, NPD FactPages, Open Works installations, GeoChemDB, CoreDB and DDR) could be easily written and efficiently executed. Since the process was fully automated, tasks that normally would take several days could, with the Optique platform, be performed in minutes.

Optique showed the potential to transform the way data is gathered and analyzed by streamlining the workflow and making it more user-friendly. However, Optique has also revealed shortcomings that impede the realization of its full potential: (i) limitation to relational databases, (ii) lack of built-in support for quantitative analytics, (iii) lack of access to unstructured data, and (iv) limited tool support for constructing and maintaining the necessary ontology and mappings.

Optique:http//optique-project.eu

The SIRIUS OBDA subsurface pilot project is addressing these shortcomings and aim to significantly broaden the applicability of the approach for use in subsurface projects.

Work completed

RDB from Volve dataset: [subsurface part of] the Optique was focused on developing OBDA on a relational database at Equinor (Ontology-Based Data Access to Slegge, ISWC 2017), which is no more an active database. Further, that database contains proprietary data, and access to that is now restricted. This resulted in a significant setback for researchers to continue further extermination on extending OBDA capabilities. In 2021 we established a large in-house relational database from the publicly available subsurface datasets (mainly by processing the Volve dataset https://data.equinor.com & NPD FactPages https://factpages.npd.no). This database is now being utilized in various other internal and external research projects where subsurface data is being used for experimentation. See the Project [SIRIUS Subsurface Lab] for more information.

Mappings and Ontology: Adapted mapping and Ontology from Optique to the new schema

Integration & testing

New way of creating and maintain Ontology and Mappings: We experimented OTTR templates to create and maintaining both, mappings as well as ontology. This new workflow reduces time from weeks to minutes.

Technology adaption: Several Working sessions with end-users with live demos and feedback

Work in progress

Extend Ontology & Mappings
Developing method for OBDA on non-relational databases
Extend OBDA to work with DISKOS & OSDU
Integration with knowledge bases

Results

Public Demo

Demo http://158.39.75.9:8443 (only accessible by Norwegian Universities IPs)

Demo Videos

Using OBDA to formulate and execute complex query on Volve dataset. Click here

Documentation

Database Schema (MySQL) Click here

OBDA Subsurface Pilot – Information Slides Click here

Database

API for MySQL [In progress]

Relevant Publications

Latif, Adnan; R Kontchakov; Skjæveland, Martin G;.Scalable end-user access to the subsurface data with ontologies. SPDM Online November Conference 2022; UK, 2022-11-28 – 2022-12-02
Latif, Adnan; Kontchakov, Roman;.Ontology based data access – subsurface databases on steroids. ECIM annual International E&P DM Conference; Haugesund, 2022-09-12 – 2022-19-14
METHODOLOGY : EU Project Optique (FP7-ICT-318338 | http://optique-project.eu
ONTOP : Guohui Xiao, Davide Lanti, Roman Kontchakov, Sarah Komla-Ebri, Elem Güzel-Kalayci, Linfang Ding, Julien Corman, Benjamin Cogrel, Diego Calvanese, and Elena Botoeva. The Virtual Knowledge Graph System Ontop (opens new window). In: International Semantic Web Conference (Resource Track), 2020 | https://ontop-vkg.org
VQS : Ahmet Soylu, Evgeny Kharlamov, Dimitry Zheleznyakov, Ernesto Jimenez Ruiz, Martin Giese, Martin G. Skjaeveland, Dag Hovland, Rudolf Schlatte, Sebastian Brandt, Hallstein Lie, Ian Horrocks. OptiqueVQS: a Visual Query System over Ontologies for Industry. Semantic Web Journal, 2018 | https://sws.ifi.uio.no/project/optique-vqs/
ONTOLOGY : EU Project Optique | http://slegger.gitlab.io
G&G Data : Equinor and the Volve license partners | https://data.equinor.com
Semantic Reasoner : RDFox | https://www.oxfordsemantic.tech/

Team

SIRIUS:

Adnan Latif [Contact Person], Martin Georg Skjævland, Dag Hovdland

Birkbeck University of London:

Roman Kontchakov

Partners

Acknowledgements

This work was supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).

G&G dataset from Volve Field, Publicly released by Equinor under CC BY 4.0. Equinor and the Volve license partners | https://data.equinor.com