Subsurface Data Access – Subsurface Lab

SIRIUS Subsurface Lab

A testbed for experiments on subsurface (Geological, Geophysical and Petrophysical) data

Equinor has made a complete set of data from a North Sea oil field (Volve field) available for research, study and development. This dataset consists of a variety of structured and unstructured subsurface data, comprising approximately 40,000 files from the Volve field which was in production from 2008 to 2016. The data has been released to give students and scientists a realistic case to study and support learning, innovation and new solutions for the energy future. This project is focused on pre-processing the Volve dataset and creating a sandbox environment for experimentation.


This dataset could be instrumental if used as input for experimentation in projects that require structured and unstructured subsurface (Geological, Geophysical, Petrophysical, Drilling etc.) data or testing the developed tools with real-world data.


The volume of the information available in this dataset is huge (approximately 5TB, 40,000 files), in proprietary and nonproprietary formats and with limited/missing metadata. This makes it challenging to use this dataset for experimentation. A substantial amount of time is required to make it usable (finding the required information from 40,000 files, reading and transforming formats, and compensating the missing meta data etc.)

Our Approach

We have processed most of the structured data (most of the Well data and Seismic horizons) and transformed it into a relational database (MySQL). For this purpose, we designed a conceptual and logical model focusing on how the data is represented in the raw data files and the relationships between different data types. To enrich this dataset, we have transformed some of the related information on the NPD FactPages into structured data and loaded it into this database.

Database Schema:One of the main objectives of developing such a database is the application of OBDA on the subsurface data [see Project SIRIUS OBDA subsurface Pilot]. We intend to extend the OBDA application to data lake architecture and NoSQL sources. so instead of using PPDM or Energatics’s data model, we designed a new schema with a focus on

  • Database structure stays as close as possible to the data structure present in the raw files
  • Database stays in non-normalized form
  • Architecture as simple as possible
  • Nomenclature similar to the Raw files
  • Other data sources (e.g., NPD Fact Pages) can be integrated easily 


API: [In-Progress]:We are in the process of building an API for this (MySQL)database that will enable researchers to connect to this database and use underlying data directly. The access mechanism will be explained here as soon as the development is fished.

RDF Database: [In-Progress]: To make use of the content as well as the context of this dataset, we are working on transforming this dataset (MySQL database) into an RDF Trippelstore. It can enable the handling of powerful semantic queries and uncover new information from existing relations through semantic reasoning. We are currently working on building Ontology and R2RML mappings to translate this ontology to the DB schema. In the end, we will deploy a SPARQL endpoint on this Tripplestore to make the underlying RDF data queryable.


Demo Videos 

Using OBDA to formulate and execute complex query on Volve dataset. Click here


Database Schema (MySQL) Click here

Database [Coming soon]

  • MySQL and RDF dump
  • SPARQL Endpoint Clcik here (only accessible by Norwegian Universities IPs)




Adnan Latif [Contact Person], Martin Georg Skjævland, Dag Hovdland

Birkbeck University of London:

Roman Kontchakov



This work was supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).

G&G dataset from Volve Field, Publicly released by Equinor under CC BY 4.0. Equinor and the Volve license partners |