Page 28 - Sirius_Annual_Report_2021
P. 28
Semantic Integration
Vision: The Semantic Integration research program will continue improving the software systems
Introduction
The Semantic Integration research program designs and develops scalable infrastructure that supports semantic integration using large ontologies (with many thousands of classes) and massive data sets (many billions of tuples) into Knowledge Graphs. It will demonstrate the efficacy of these tools through deployment in the demonstration projects. Specifically, we work with ontology reasoners capable of supporting the development of large-scale ontologies and semantic data stores which answer realistic ontology- based queries over massive data sets.
Figure 1 shows the conceptual framework of Semantic Inte- gration, also known as ontology-based data access (OBDA) [1][2]. At the bottom of this figure, in this project, we are working on integrating different kinds of data sources, which are typically legacy systems and might come in different forms, such as relational databases (DBs), or as files in various formats (such as CSV, XML, JSON, or proprietary formats). The objective is to semantically integrate these data sources into a Knowledge Graph consisting of a set of data assertions that use the vocabulary of classes and properties provided in the ontology. The data assertions in the KG are often obtained by mapping the data stored in various data sources to the terms of the ontology vocabulary. Intuitively, a mapping can be thought of as a collection of queries that are used
to construct RDF triples using the classes and properties of the ontology by retrieving the necessary data from the sources.
Semantic Integration can be realized in two flavors:
• Virtual Knowledge Graphs (VKGs). In the virtual approach, the triples are not materialized in a separate triple store, but their presence in the KG is only virtual. Systems operating on VKGs are able to retrieve the data directly from the data sources only when it is required for a particular user query. In fact, query processing is delegated to the data sources. This is achieved by un- folding the mappings, thus translating user queries into queries over the data sources, whilst taking into account also the ontology background knowledge through a so-called query rewriting step. The advantage of VKGs is that information is always fresh and up-to-date with the data sources. For example, Ontop is a state-of-the-art Virtual Knowledge Graph system. Ontop implements the VKG technology, thus lowering the cost of typical data integration projects. In this way, companies and organi- zations can readily exploit the value of their data assets and make such data available for Business Intelligence and applications based on Machine Learning.
• Materialized Knowledge Graphs (MKGs). Despite the advantages of the virtual approach, it is sometimes con- venient to actually materialize the triples. In such a case, we talk about Materialized Knowledge Graphs (MKGs). The main advantage of MKGs over VKGs is that usually a more predictable performance in query answering
28 |
SIRIUS ANNUAL REPORT 2021
Figure 1 The Framework of Semantic Integration (OBDA)
can be achieved, especially in those situations where