RDFox

RDFox is a high performance knowledge graph and semantic reasoning engine. Originally the result of research at the University of Oxford, RDFox is now developed and marketed by Oxford Semantic Technologies.

RDFox exploits a patented in-memory architecture and parallelised computation to provide high performance for data loading, reasoning and query answering. Key features of RDFox include:

  • RDF triples, rules, and OWL 2 and SWRL axioms can be imported either programmatically or from files in a range of formats including turtle, datalog and OWL. RDF data can also be validated using the SHACL constraint language.
  • Information can be accessed directly from external data sources, such as CSV files, relational databases, and Apache Solr.
  • Triples, rules and axioms can be exported into a number of different formats, and the contents of the system can also be (incrementally) saved into a binary file, which can later be used to restore the system’s state.
  • Multi-user support with ACID transactional updates.
  • Access control allows for individual information elements in the system to be assigned different access permissions for different users.
  • Full support for SPARQL 1.1, and functionality for monitoring query answering and accessing query plans.
  • Materialization-based reasoning, where all triples that logically follow from the triples and rules in the system are materialized as new triples.
  • Incremental update of materialized graphs: reasoning does not need to be performed from scratch when the information in the system is updated.
  • Explanation of reasoning results: RDFox is able to return a proof for any new fact added to the store through materialization.

Challenges

Challenges of materialization-based reasoning include the time taken to compute an initial materialization, the space required to store such a materialization and what to do when the underlying dataset changes via the addition or deletion of facts.

Approach

RDFox addresses the above challenges by using novel algorithms and data structures developed at Oxford. These support (mostly) lock-free updates, allowing for highly parallelized computation of the materialization, while at the same time are compact, allowing for the storage of large knowledge graphs. The algorithms and data structures also support incremental reasoning for fast updates when the data changes.

Results

Implementation

RDFox has been tested on a wide range of hardware and data sets and has proved to be both robust and highly scalable. We have conducted tests on Numa-0-0, which has 67 nodes (each node has 6 processing units), which relative distances between that go from 10 (intra-node distance) to 200. The tests used the Claros dataset, which is particularly interesting as the materialisation increases the size of the graph more than 20 times. The performance depends on node distances, but when bound to near nodes is comparable to the SPARC T5-8 which we used in earlier tests (see AAAI paper).

Selected Publications

  • Motik et al. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. AAAI 2014.
  • Motik et al. Combining Rewriting and Incremental Materialisation Maintenance for Datalog Programs with Equality. IJCAI 2015
  • Hu et al. Optimised Maintenance of Datalog Materialisations. AAAI 2018.
  • Hu et al. Modular Materialisation of Datalog Programs. AAAI 2019.
  • Potter et al. Dynamic Data Exchange in Distributed RDF Stores. IEEE Trans. on Knowledge and Data Engineering, 30(12), 2018.

Team

Boris Motik, Ian Horrocks,Pan Hu, Anthony Potter

Partners

Acknowledgements

This work was partially supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).