Page 43 - Sirius_Annual_Report_2021
P. 43
GeoDataPrep
However, such data-driven decision-making is moot if the time spent preparing subsurface data for analysis and visu- alization far exceeds the time saved by decision-makers.
GeoDataPrep project targets data preparation workflows necessary for dashboarding and business analytics in the subsurface domain.
The point of departure for the GeoDataPrep project was the observation that data scientists in Equinor spend inordinately much of their time and energy preparing subsurface data for Big Data analytics. This observation supplements Sirius’ ongoing engagement with OSDU as a repository for storing and retrieving large data sets for Big Data analytics by targe- ting the use side of collecting and making large subsurface datasets available for analytics.
In this project, we are mainly targeting the issue of naming variability in geology and well log metadata.
Naming variability is something geoscientists are well acquain- ted with and have developed robust and reliable methods to handle. However, these methods and practices do not scale to the high volumes of data used for data analytics. The initial study of this research project showed that dashboarding projects could spend several man-years just harmonizing ex- isting naming variability within large sets of data to the extent that the algorithms used can run and give useful results.
Some key observations from this study
• The data preparation is a large part of the work for making dashboards, often not easily visible to other people in the organization. The time needed is often hard to estimate ahead of time.
• Data preparation includes harmonizing the data and inferring lacking data to an extent that the algorithms can run and give useful results. Several man-years in preparation for a single dashboard is not unusual.
• Variation in names must be aligned, a problem that is easy for a domain specialist when the scale is small enough, for example, all wellbore logs from a handful of wells, but unmanageable when the scale is large enough, for instance the Norwegian continental shelf.
The specific use cases in this project
• Metadata associated with the Well log data. This data is produced during and after drilling and is frequently used to study the petrophysical properties of the subsurface for further hydrocarbon exploration and production.
• The data in the geological knowledgebases; a key information in subsurface studies but is attributed to frequent and complex naming variability. One of the main reasons for this variability is the fact that a single object in the subsurface can be defined, described and data stored by different names representing particular aspects of the geological domain.
Highlights of 2021
In 2021, Researchers in SIRIUS worked on
• A detailed study on understanding the problem. SIRIUS researchers conducted a series of interviews at Equinor to understand better issues surrounding data preparation.
• Creating a demonstrator for naming variably in the geo- logical knowledgebases. We used semantic reasoners to infer and enrich information for a particular object from different G&G aspects.
• Transformation of the publicly available G&G dataset (Vovle Field) to a relational database to be used as the main source of data for this project.
• Using Machine learning techniques to predict the missing/uncertain part in the curve information block header.
SIRIUS ANNUAL REPORT 2021
| 43