Materialisation and Data Partitioning Algorithms for Distributed RDF Systems [PhD Project]

Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. We have developed several techniques that facilitate scalable materialisation in distributed RDF systems.

Challenges

Whereas numerous distributed query answering techniques are known, distributed materialisation is less well understood.

Approach

First, we developed a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we developed two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory.

 

We have evaluated our materialisation algorithm against two state-of-the- art distributed Datalog systems and shown that our technique offers competitive performance, particularly when the rules are complex. Moreover, we have analysed in depth the effects of data partitioning on reasoning performance and have shown that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably fewer resources.

Results

Selected Publications

  • Temitope Ajileye, Boris Motik, and Ian Horrocks. Datalog Materialisation in Distributed RDF Stores with Dynamic Data Exchange. In Proc. of the 18th International Semantic Web Conference (ISWC 2019), volume 11778 of Lecture Notes in Computer Science, pages 21-37. Springer, 2019. 
  • Temitope Ajileye, Boris Motik, and Ian Horrocks. Streaming Partitioning of RDF Graphs for Datalog Reasoning. In ESWC, volume 12731 of Lecture Notes in Computer Science, pages 3-22. Springer, 2021.

Team

Temitope Ajileye [PhD Candidate], Boris Motik [Supervisor], Ian Horrocks [Co-supervisor]

Partners

Acknowledgements

This work was supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).