Limit Datalog

Motivated by applications in declarative data analysis, we have been studying extensions of Datalog with arithmetic functions over integers (DatalogZ). This language is known to be undecidable in general, but we have identified two interesting decidable fragments: limit DatalogZ and stable DatalogZ, the latter of which is even tractable (PTime) in data complexity. Stable DatalogZ is interesting not only due to its tractability, but also because we have shown that it can express many useful data analysis tasks.

Challenges

DatalogZ is known to be undecidable in general. The challenge is to find fragments that are at least decidable and preferably tractable while at the same time being powerful enough to express common data analysis tasks.

Approach

We have explored a range of different restrictions to DatalogZ, including linearity conditions, stratification, semi-positive programs and stability conditions.

Results

We have shown that reasoning in limit DatalogZ is decidable if a linearity condition restricting the use of multiplication is satisfied. In particular, limit-linear DatalogZ is complete for ΔEXP2 and captures ΔP2 over ordered datasets in the sense of descriptive complexity. We have also provided a comprehensive study of several fragments of limit-linear DatalogZ. We have shown that semi-positive limit-linear programs (i.e., programs where negation is allowed only in front of extensional atoms) capture coNP over ordered datasets; furthermore, reasoning becomes coNEXP-complete in combined and coNP-complete in data complexity, where the lower bounds hold already for negation-free programs. In order to satisfy the requirements of data-intensive applications, we also proposed an additional stability requirement, which causes the complexity of reasoning to drop to EXP in combined and to P in data complexity, thus obtaining the same bounds as for usual Datalog. Finally, we have compared our formalisms with the languages underpinning existing Datalog-based approaches for data analysis and have shown that core fragments of these languages can be encoded as limit programs; this allows us to transfer decidability and complexity upper bounds from limit programs to other formalisms. Therefore, we have provided a unified logical framework for declarative data analysis which can be used as a basis for understanding the impact on expressive power and computational complexity of the key constructs available in existing languages.

Selected Publications

Mark Kaminski, Egor V. Kostylev, Bernardo Cuenca Grau, Boris Motik, and Ian Horrocks. The Complexity and Expressive Power of Limit Datalog. J. ACM, 69(1):6:1-6:83, 2022.
Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, and Ian Horrocks. Complexity and Expressive Power of Disjunction and Negation in Limit Datalog. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 2862-2869. AAAI Press, 2020.
Bernardo Cuenca Grau, Ian Horrocks, Mark Kaminski, Egor V. Kostylev, and Boris Motik. Limit Datalog: A Declarative Query Language for Data Analysis. SIGMOD Rec., 48(4):6-17, 2019.
Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, and Ian Horrocks. Stratified Negation in Limit Datalog Programs. In Jérôme Lang, editor, Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI 2018), pages 1875-1881. ijcai.org, 2018.

Team

Mark Kaminski, Egor Kostylev, Boris Motik, Ian Horrocks

Partners

Acknowledgements

This work was partially supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).