Six interdisciplinary strands

The SIRIUS consortium assembles a team of research and development groups with world leading expertise in the technologies needed to solve the data access problem: high performance and cloud computing that offer almost unlimited computational resources; novel in-memory and distributed database technologies that offer efficient querying of extremely large volumes of data; Semantic technologies that support integration of heterogeneous data sources and transformation of data into information; and language technologies that support access to unstructured data, including text documents.

Working practices

Existing work practices exhibit considerable inertia. They are maintained through cognitive (habits), social (routines) and technological (tools) means, and typically resist imposed change efforts. The end-users within the two SIRIUS innovation domains are highly competent knowledge workers that exercise substantial autonomy and discretion. Change efforts through new tools gain legitimacy by grounding them in the needs and demands of the users. Identifying users’ needs draws on the expertise and experiences of the piloting partners and invariably takes the form of uncovering (by participant observation, interviews and logging of use) rather than merely ‘eliciting’ requirements. User requirements are more often than not diverging or conflicting, severely compounding the challenge of uncovering them. The two innovation domains of SIRIUS comprise literally thousands of end-users and numerous sub-groups, many with competing requirements.

An organizational implementation process is exactly that, a process. It involves continuously responding to new user demands, facilitating uptake (information, training, support) and resolving barriers arising across silos.

Semantic technologies

This strand addresses technologies that are based on a combination of loosely structured data (e.g., RDF) and conceptual models (ontologies) with a precise and machine processable semantics. In this research strand, we leverage that principle to facilitate both the seamless and robust integration of heterogeneous data sources, and communication between domain experts and information systems.

The Optique project is positioned in this area. Typical tasks for this strand include: the extension of Optique technologies to temporal and geospatial data; the combination of relational with semi-structured and textual data; the definition of refined domain models; and semantics-based user interfaces for data access.

Natural language technologies

The Language Technology strand aims at pro- viding linguistically sophisticated analysis of textual documents for use in semantically enriched document retrieval. Typical tasks for this strand include: document processing, including automated syntactic and semantic analysis of documents; document clustering, based in part on information provided by processing; document enrichment, including merging documents with structured sources, and extraction of meta-data compatible with a specified ontology; and document retrieval, including experiments with federated search against textual and structural data, and query expansion using semantic categories.

Cloud computing

The Cloud Computing strand comprises activities, competences and technologies with the aim of providing both new and improved cloud services to the problem owners, as well as to improve the efficiency and flexibility of the cloud architecture itself, implemented by SIRIUS partners. The corresponding research and innovation will typically include tasks like: parallelisation and scaling strategies for data access; improving data access on-site through distributed cloud architectures, making for efficient data access out in the field; off-site big data analytics accessed through cloud services; cloud federation to seamlessly utilize resources from clouds with different characteristics, as well as for flexible scaling between private and public clouds; utilization of High-Performance Computing technologies as a foundation for improved cloud architectures more efficiently supporting big data and big data analytics.

Database technologies

Traditional RDBMS architectures often prove insufficient to cope with the O&G industry’s processing and storage requirements, while also being poorly suited to modern cloud processing facilities, and increasingly costly to manage. These shortcomings are addressed in this strand via tasks such as: Big Data extensions (partitioning/parallelisation) to RDBMS; ‘always-on’ technologies (management, maintenance, and hardware scaling without downtimes); and integration of relational data with RDF, XML and JSON-like data.

High-performance computing

This strand addresses technologies to improve computational performance for the applications that require significant QoS guarantees and low latency communication. SIRIUS partners will continue to work on already-solid research foundations to provide solutions enabling direct storage data transfers and GPU data flows to improve application efficiency. Typical tasks for this strand include: innovative ultra low-latency unified infrastructure; scalable in-memory compute platforms; and cost-effective solutions for Big Data domains.