Sirius_Annual_Report

Page 22 - Sirius_Annual_Report_2021

P. 22

Domain-Adapted Data Science
Vision: We develop hybrid approaches that exploit both knowledge in data and knowledge in ontologies
Within the artificial intelligence (AI) research community and beyond that community there is interest in developing strong AI, which means intelligent machines that are indistinguishable from the human mind or that go beyond human-level intelligence (superintelligence). However, despite the impressive progress in the field over the last decades, we still do not know how to achieve strong AI.
Basil Ell
As an example of human capability, imagine a child who has seen the usual animals that live on Norwegian farms, such as sheep and horses, but has never seen a giraffe, neither in person nor in pictures. If the child has sufficient language skills, then you can tell them, before going to a zoo, that
a giraffe is an animal that looks like a horse but with a very long neck. Then, at
and sub-symbolic learning, developing what we refer to
as hybrid approaches. On the other hand, as it turns out, hybrid approaches yield improved machine learning results, and especially on «not so big-data». Combined with the fact that symbolically represented knowledge can often be very small and concise, this is a powerful tool that makes machine learning available on datasets that are otherwise too small, or otherwise unfit, for classical machine learning tasks.
The overarching goal is a general methodology for how data science tasks can be enhanced through the combined use of symbolic and sub-symbolic knowledge.
Another intriguing feature of hybrid approaches is that
the presence of symbolic knowledge in the machine learning process may lead to more explainable predictions. Within our research program, we also develop novel hybrid approaches that identify and exploit these capabilities.
One term that we use to refer to a particular class of hybrid approaches is domain-adapted approaches. Very often in machine learning and data science tasks, data in the form of textual documents, images, or tables is processed, where making use of domain knowledge, for example in the form of an ontology, can improve the results.
In the context of our project domain adapted data science pipeline, we develop a catalog of situations related to domain adaptation and describe how these situations are related to each other. For example, consider the situation that an organization performs machine learning given a tabular dataset, is interested in improving the performance of the approach, has selected the most appropriate approach, and has tried out the usual performance improvement strategies such as hyperparameter optimization, but has so far not made use of domain knowledge such as taxonomies or ontologies. (Often it is a case that ”a little semantics goes a long way”, which means that the ontology needs not to be very extensive. Instead, a couple of statements can already make a significant difference.) One way to go forward can be to check whether openly available domain knowledge
the zoo, most likely the child will correctly identify a giraffe as a giraffe with ease.
This example shows that humans can combine what they have learned from experience (in our example, what they have seen before) with declarative statements (the description of the similarities and differences between giraffes and horses). Machines are not yet good at that.
In the AI community, there has been a long-standing frontier, labeled as the discussion about «symbolic vs. non-symbolic AI». In symbolic AI, information is structured in ontologies and deductions are made via reasoning. In sub-symbolic AI, information is obtained from data, and deductions are made via machine learning (ML). One hypothesis is that machines will become better at learning if we can combine these two types of information in the learning process. That is, combine information in terms of declarative statements and ontologies with information encoded in data made available through statistics and machine learning.
The focus of the SIRIUS domain adapted data science program is exactly to develop approaches that combine the use of structured knowledge with learning from data in the machine learning process. On one hand, this means that we try to bridge the traditional divide between symbolic
22 | SIRIUS ANNUAL REPORT 2021

20 21 22 23 24