Subsurface Data Analytics – Unstructured Data (Images)

SIRIUS Geo-Annotator

Ontology-driven Knowledge Graph Population for Geological Image Annotation

Finding a geological image based on its technical content from a large image database is difficult. Geoscientists use the keyword search on the textual content of source documents to find relevant Images. 

To address this problem, an innovation project was initiated in 2019, and a prototype is developed. This tool supports executing complex queries to find geological images based on the geological content embedded in the images and significantly reduces the time and effort required to find the most relevant images and corresponding documents

Challenges

Workflows used in subsurface evaluation generate a tremendous amount of image data. These images become then a part of several different types of documents, e.g. PDF reports, PowerPoint slides, scientific articles and standalone images.These images and the embedded information is extremely useful in subsurface evaluation, e.g. identifying analogues or retrieving geologic information for a specific area. 

Retrieving the correct image and its corresponding information from a large image database is a difficult task Information embedded in these images is 

  1. Complex and condensed 
  2. Based on different conventions and standards 
  3. Interpreting the information high-level expert knowledge 
  4. Only in a ‘Geoscientist readable form’ 

These challenges also create significant problems in training machine- learning based algorithms for an automatic information extraction from these images 

Our Approach

We have designed and implemented SiriusGeoAnnotator—a system that generates annotations in the form of a knowledge graph and drives the annotation process according to an ontology and the previously generated annotations. We have also implemented SiriusGeoOnto—an ontology tailored to the semantic annotation of geological images. The creation of a knowledge graph to represent the annotations enables the use of the available Semantic Web infrastructure to perform automated reasoning and explore the annotations via semantic queries. 

SiriusGeoAnnotator applies the notion of navigation graph to present the user with relevant annotation suggestions. The suggestions are exhaustive but presented on-demand to avoid overloading the user interface. SiriusGeoAnnotator also includes autocompletion features to facilitate the search of the relevant annotations. The navigation graph relies on the support of the OWL 2 reasoner  HermiT reasoner to obtain the classification of the ontology and the triplestore reasoner RDFox to saturate the knowledge graph (i.e., entail new triples) according to the axioms in the ontology. 

 

Results

Demo

Public Demo 

Source Code

The source code of SiriusGeoAnnotator is available on Gitlab under the GNU General Public License version 2 as published by the Free Software Foundation.

Front-end [gitlab project] 

Back-end [gitlab project] 

The back-end relies on RDFox for which an academic license can be requested. 

SiriusGeoAnnotator uses Java 8 features and Maven for project organisation and build. SiriusGeoAnnotator has been tested on Windows, Linux and macOS. The source code also contains a Dockerfile that can be used to provide a Tomcat server ready to run SiriusGeoAnnotator (tested on Tomcat 9).

SiriusGeoOnto Ontology 

We have designed and implemented the SiriusGeoOnto ontology to cover the information embedded in the geological images. SiriusGeoOnto has been modelled in the OWL 2 ontology language using the ontology editor Protégé. SiriusGeoOnto includes the hierarchy of geological periods of time based on the International Chronostratigraphic Chart (v2018/08).

SiriusGeoOnto is publicly available as a Zenodo dataset:

rina Pene and Ernesto Jiménez-Ruiz. SiriusGeoOnto: an ontology tailored to the semantic annotation of geological images. (Version July 2020 – v.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3600955

Knowledge Graph Exploration

he generated knowledge graph can be accessed, in a standard way, as a RDF graph via SPARQL queries. To do so we have set up a SPARQL endpoint that can be accessed programmatically (e.g., via Python or Java) or via user-friendly systems like SemFacet and OptiqueVQS that deal with the query generation.

Publications

Ernesto Jiménez-Ruiz, Irina Pene, Oliver Stahl, Adnan Latif, Rogerio Abreu de Paula, Arild Waaler and Jens Grimsgaard. Ontology-driven Knowledge Graph Population for Geological Image Annotation. Submitted to a Journal. 2020. 

Marcelo Arenas, Bernardo Cuenca Grau, Evgeny Kharlamov, Sarunas Marciuska, Dmitriy Zheleznyakov: Faceted search over RDF-based knowledge graphs. J. Web Semant. 37-38: 55-74, 2016. [.pdf] 

Ahmet Soylu, Evgeny Kharlamov, Dimitry Zheleznyakov, Ernesto Jimenez Ruiz, Martin Giese, Martin G. Skjaeveland, Dag Hovland, Rudolf Schlatte, Sebastian Brandt, Hallstein Lie, Ian Horrocks. OptiqueVQS: a Visual Query System over Ontologies for Industry. Semantic Web Journal, 2018. [.pdf] 

Acknowledgements

The Geo-Assistant team would like to thank project partners Equinor and IBM for the discussions and feedback on this work.

This work was supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).