Semantic Integration

Data within the oil and gas domain typically resides in several different sources and can have vastly different forms and access methods. In order to ensure optimal decision making all of this data must be taken into account; an end-user needs to be able to view and understand all data.

Accessing the data in their legacy format requires in-depth, low-level knowledge of how the data is stored, which is a considerable challenge for end-users. By integrating all data under a common ontology, users can view and explore the data in a language they understand. This research program aims at addressing issues that come up during this integration process, in particular by designing and developing scalable infrastructure for the integration of multiple large datasets and large-scale ontologies.

The Semantic Integration research program designs and develops scalable infrastructure that supports semantic integration using large ontologies (with many thousands of classes) and massive data sets (many billions of tuples). It will demonstrate the efficacy of these tools through deployment in the beacon projects. Specifically, we work with ontology reasoners capable of supporting the development of large-scale ontologies and semantic data stores which answer realistic ontology-based queries over massive data sets.

Challenges

Semantic Integration requires both solid theoretical foundation and proper tooling support. For the theoretical part, we need to understand the formal semantics of the semantic integration when considering different components: data sources (e.g., relational databases, JSON files), the mapping from these sources to the ontologies, the ontologies in different profiles (e.g., OWL 2 QL, OWL 2 RL, OWL 2), and the queries. We need to develop efficient algorithms and to make sure they are sound and complete. These algorithms need to be implemented in proper software tools. The tools should be validated in real-world use cases to show their effectiveness.

Approach

Semantic Integration can be realized in two flavors:

Virtual Knowledge Graphs (VKGs). In the virtual approach, the data assertions are not materialized in a separate data store, but their presence in the KG is only virtual. Systems operating on VKGs are able to retrieve the data directly from the data sources only when it is required for a particular user query. In fact, query processing is delegated to the data sources. This is achieved by unfolding the mappings, thus translating user queries into queries over the data sources, whilst taking into account also the ontology background knowledge through a so-called query rewriting step. The advantage of VKGs is that information is always fresh and up-to-date with the data sources. For example, Ontop is a state-of-the-art Virtual Knowledge Graph system. Ontop implements the VKG technology, thus lowering the cost of typical data integration projects. In this way, companies and organizations can readily exploit the value of their data assets and make such data available for Business Intelligence and applications based on Machine Learning.

Materialized Knowledge Graphs (MKGs). Despite the advantages of the virtual approach, it is sometimes convenient to actually materialize the data assertions. In such a case, we talk about Materialized Knowledge Graphs (MKGs). The main advantage of MKGs over VKGs is that usually a better performance in query answering can be achieved, especially in those situations where mappings are very complex and thus the unfolding of the virtual approach would give rise to complex queries over the data sources. This comes at the cost of maintaining a potentially very large MKG. For example, RDFox is a powerful system for Materialized Knowledge Graph. RDFox is a high-performance in-memory knowledge graph and semantic reasoner, optimised for speed and efficiency. Designed from the ground up with reasoning in mind, it outperforms other graph databases while also providing benefits and insights that cannot be achieved by alternatives.

Results

Ontop (https://ontop-vkg.org/) is the state-of-the-art open-source VKG engine. The Ontop project is hosted by the Free University of Bolzano, and is also commercially supported by the company Ontopic, which became Sirius partner in 2020.

RDFox (https://www.oxfordsemantic.tech/product) is a high-performance in-memory knowledge graph and semantic reasoner, optimised for speed and efficiency, initially developed by University of Oxford, and currently by Oxford Semantic Technologies.

Ontopic Studio

(https://ontopic.ai/en/ontopic-studio/)

provides an integrated low-code environment for building RDF graphs from relational data sources. The comfortable KG authoring environment features a lightweight ontology editor and an advanced mapping editor for connecting relational datasets to ontologies, resulting in either complete Virtual Knowledge Graph setups or R2RML files for generating RDF data from relational data sources.

Ontopic Studio is powered by

Ontop

Projects in the Semantic Integration research program

(click on the Project Name to read more about project)

Rule-Based Stream Reasoning [PhD Project]

Materialisation and Data Partitioning Algorithms for Distributed RDF Systems [PhD Project]

Conjunctive Query Answering overUnrestricted OWL 2 Ontologies [PhD Project]

RDFox

Limit Datalog

Team

UiO: Arild Waaler, Egor V. Kostylev, Martin Giese, Jens Otten, Vidar Klungre, Ratan Bahadur Thapa, Eduard Kamburjan, Adnan Latif

University of Oxford: David Tena Cucala, Federico Igne, Ian Horrocks, Jiaoyan Chen, Shuwen Liu, Stefano Germano, Temitope Ajileye, Xiaxia Wang, Xinyue Zhang, Yuan He

City University of London: Ernesto Jimenez-Ruiz, Roman Kontchakov

Bosch: Evgeny Kharlamov

Oxford Semantic Technology: Peter Crocker

Ontopic: Guohui Xiao, Benjamin Cogrel, Diego Calvanese, Peter

Prediktor: Magnus Bakken

Partners

External Partners

City University of London

Acknowledgements

This work was partially supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).