SIRIUS is doing work in the intersection between semantic technologies, statistical methods and language processing. We see that these distinct fields need to cross-fertilize to produce scalable data access solutions in and beyond the oil & gas industry. For this reason, we have recruited Basil Ell, a German researcher whose interests lie right at this intersection. He is working half-time in Oslo and half in Bielefeld, Germany. David Cameron from SIRIUS interviewed Basil in Oslo in January 2019.
Can you tell us a little about Bielefeld?
The University of Bielefeld has around 25000 students and has a strong focus on social and political sciences. It will soon also have a medical faculty. It was founded in 1969 and is built around a single building housing many faculties, with the idea of building interdisciplinarity. The university has outgrown this building, so the CITEC group is in a new building, a kilometre from the main campus. I work in a group of fifteen researchers, headed by Professor Philipp Cimiano.
How did you come to work in SIRIUS?
I came to work in Oslo due to a meeting between Martin Giese in SIRIUS and Professor Cimiano, where SIRIUS’ need for a researcher with both semantic and language background was discussed. This led to an invitation to give a guest lecture and then the offer of a job, starting in September 2018.
Can you tell us a little about your research interests?
I am interested in the gap between humans and machines. Language is difficult for machines to understand: it is ambiguous and vague. There are many ways to say things. Machines need a well-defined way of representing knowledge, an organized way of describing the things expressed in texts and how they fit together. The first thing I am interested in is how to get information out of texts and into a knowledge representation. Once we have this, we can then also do Natural Language Generation: taking information in a knowledge representation and producing text.
We, thus far, have looked at how to take text, for example in Wikipedia, and use it to fill up a knowledge base. We want to do this automatically, so we need a system that can be trained, rather than being built manually. This is helped if we can use existing, publicly available structured data (DBpedia or Wikidata) to guide the extraction so that we can align this structured data with text.
What sort of projects can you see being done in SIRIUS?
An immediate area of application is SIRIUS’ work on requirements. Manually converting existing requirements from text to a digital form is difficult and time-consuming. However, requirements are usually highly structured, so it is likely that we can parse requirements as text statements automatically to generate digital requirements. This work requires substantial computing power, so there is scope for designing clever algorithms and use high-performance computers.
Similar problems can be found with electronic health records, drilling reports, operational reports and inspection logs.
Within SIRIUS, I will be trying to strengthen the Natural Language Processing (NLP) work and bring it closer to the Knowledge Representation activities. This is actually the way we are addressing Data Science: NLP is a sub-problem of data science. Deep learning has been a popular and successful tool for NLP, but it seems to hit a performance barrier because it does not exploit known structure in the data. Our vision is to create machine learning algorithms that exploit the structure in data and its semantics.