The SIRIUS consortium assembles a team of research and development groups with world leading expertise in the technologies needed to solve the data access problem: high performance and cloud computing that offer almost unlimited computational resources; novel in-memory and distributed database technologies that offer efficient querying of extremely large volumes of data; Semantic technologies that support integration of heterogeneous data sources and transformation of data into information; and language technologies that support access to unstructured data, including text documents.
Existing work practices exhibit considerable inertia. They are maintained through cognitive (habits), social (routines) and technological (tools) means, and typically resist imposed change efforts. The end-users within the two SIRIUS innovation domains are highly competent knowledge workers that exercise substantial autonomy and discretion. Change efforts through new tools gain legitimacy by grounding them in the needs and demands of the users. Identifying users’ needs draws on the expertise and experiences of the piloting partners and invariably takes the form of uncovering (by participant observation, interviews and logging of use) rather than merely ‘eliciting’ requirements. User requirements are more often than not diverging or conflicting, severely compounding the challenge of uncovering them. The two innovation domains of SIRIUS comprise literally thousands of end-users and numerous sub-groups, many with competing requirements.
An organizational implementation process is exactly that, a process. It involves continuously responding to new user demands, facilitating uptake (information, training, support) and resolving barriers arising across silos.
Data is always related to things or ideas. Getting access to data is made much simpler if we are able to link that data to its underlying thing or idea. The things and ideas related to a certain discipline or system, together with the linkages between them, together constitute the domain knowledge for that discipline. Given a model of the relationships and connections between these things and ideas, we can reason about the related data, make predictions and check for errors and inconsistencies. This discipline, which lies in the borderlands between philosophy and computer science is called knowledge representation.
To use a common and simple example: a pump is widely used thing in the oil and gas industry. It is linked to other things, pipes, fittings, a motor and various sensors. Each sensor produces measurements: data on things like speed, torque, liquid flowrate, inlet pressure, outlet pressure and temperature. The pump object is also linked to ideas like specifications, requirements and model numbers and things like maintenance orders and invoices. The data about a pump is stored in many databases: data historians, ERP systems and design database. Knowledge representation consists of creating a useful model of things like this pump and using this model to improve access to data about this pump.
The Knowledge Representation strand specialises on machine-processable representations of domain knowledge. Often, these representations take the shape of an ontology, a tractable representation of a domain vocabulary (things and ideas) and certain kinds of facts (linkages between things and idea) concerning the domain.
We are interested in using ontologies in a variety of contexts:
We develop ontology-based tools and methods for these and other applications. A particular focus is on tools and methods needed to set up and maintain necessary artefacts, like ontologies and mappings, that control the remaining tool chain.
The Natural Language strand aims at providing linguistically sophisticated analysis of textual documents for use in semantically enriched document retrieval. Typical tasks for this strand include: document processing, including automated syntactic and semantic analysis of documents; document clustering, based in part on information provided by processing; document enrichment, including merging documents with structured sources, and extraction of meta-data compatible with a specified ontology; and document retrieval, including experiments with federated search against textual and structural data, and query expansion using semantic categories.
The Scalable Computing strand contains activities related to HPC and Cloud Computing. The cloud work aims to provides both new and improved cloud services to the problem owners, as well as to improve the efficiency and flexibility of the cloud architecture itself, implemented by SIRIUS partners. The corresponding research and innovation will typically include tasks like: parallelisation and scaling strategies for data access; improving data access on-site through distributed cloud architectures, making for efficient data access out in the field; off-site big data analytics accessed through cloud services; cloud federation to seamlessly utilize resources from clouds with different characteristics, as well as for flexible scaling between private and public clouds; utilization of High-Performance Computing technologies as a foundation for improved cloud architectures more efficiently supporting big data and big data analytics.
This HPC work addresses technologies to improve computational performance for the applications that require significant QoS guarantees and low latency communication. SIRIUS partners will continue to work on already-solid research foundations to provide solutions enabling direct storage data transfers and GPU data flows to improve application efficiency. Typical tasks for this strand include: innovative ultra low-latency unified infrastructure; scalable in-memory compute platforms; and cost-effective solutions for Big Data domains.
The Database strand in SIRIUS builds on observation that Scalable Data Access is dependent on good and scalable technologies for data storage and retrieval. The exploration and operations functions in the oil and gas industry rely heavily on Relational Database Management Systems (RDMS) and will continue to do so for the foreseeable future. Semantic technologies such as Optique can facilitate data access, but performance (of back-end repositories) may be a limiting factor. New technologies such as RDFox offer many exciting opportunities, but we need to demonstrate practicality in real use cases.
Performance and scalability may require innovation in both software and hardware. This means that the database strand will interact closely with the other strands, knowledge representation and scalable computing in particular. Interaction with software-vendor partners such as IBM and OSIsoft and hardware vendors like Dolphin and Numascale is also promising.
The database strand brings theoretical knowledge and practical implementations to bear in the following areas:
The oil and gas industry is full of complex systems. Some of these are computer systems, such as a cloud deployment of an ERP or trading system. Others are cyber-physical systems, such as the safety and automation system on an oil rig or an automated drilling system. Finally, still more are complex logistic and commercial systems, such as the execution of a maintenance turnaround for an offshore facility.
The decisions we make about these systems are critical for safety, efficiency and profitability. We want to be able to predict how these systems will behave so that we can design them properly and run them optimally. We need to know that the system will be safe and reliable. We want to know how we can change operations to deal with surprises and unwanted events. The Execution Modelling & Analysis Strand provides tools that can answer these needs.
Our aim is to predict the behaviour of complex systems using the analysis of models. Decisions can then be made based on these analyses. Our expertise lies in modelling complex parallel and distributed systems, including object-oriented and service-oriented systems, cloud computing and the Internet of Things. Analysis techniques for these models range from simulation, which analyses a single run of a system, to deductive verification, which analyses all possible runs of a system. We analyse both functional and non-functional properties such as safety properties, timing properties, resource management and scaling strategies. Our work on resource-restricted parallel systems with timing constraints is currently being applied to planning and logistics in the context of SIRIUS.
We bring background from two European projects: Envisage and HyVar. Each of these projects provides important tools and experience that can be used in the SIRIUS experiments and pilots.
The Data Science strand in SIRIUS is linked to the DataScience@UiO innovation cluster (http://www.mn.uio.no/english/research/about/centre-focus/innovation/data-science/). This cluster is a collaboration between SIRIUS, the BigInsight Centre for Research Innovation, SIRIUS and the Department of Mathematics and the University of Oslo. In addition, SIRIUS has appointed an Associate Professor for Machine Learning from September 2017.
The Data Science strand provides SIRIUS with the skills and resources that are needed to use data to support decisions. The initial SIRIUS project are in two areas.
The first project looks at the problem of missing and inconsistent data in corporate and technical database systems. Decisions about product and system development and troubleshooting are dependent on data sets that are incomplete. This project will examine ways of intelligently detecting and correcting these inconsistencies.
The second project focuses on use of semantic modelling and machine learning together to address the challenges the oil industry faces in understanding, navigating and analysing the massive amounts of data generated in exploration and operations.