Scalable Computing

In order to ensure that industry gains a long-term benefit from the tools and methods developed in SIRIUS one must ensure the scalability of these solutions. This research program focusses on research in high-performance computing coupled with scalable cloud computing to support scalable big-data application processing.

The Scalable Computing research program builds knowledge in high performance computing (HPC) and couples this with scalable cloud computing to support scalable big-data application processing. Specifically, we look at solutions for scalable and reconfigurable hardware, software design for parallel numerical simulations and autonomic cross-cloud application deployment and reconfiguration.

Main Achievements

To ensure that industry gains a long-term benefit from the tools and methods developed in SIRIUS one must ensure the scalability of these solutions. This research program focusses on research in high-performance computing coupled with scalable Cloud computing to support scalable big-data application processing. New hardware from Dolphin has been developed and will be integrated with the flexible PCI express computing infrastructure established at SIMULA Research Laboratory. The NUMA shared memory computer at the University of Oslo has been moved from the University’s central computing facilities to the Department of Informatics, and work has been progressing well on the reservoir simulation software. A call for a PhD position in Cross-Cloud computing is currently open.

High-performance test beds

Based on input and requirements from the SIRIU S centre partners, Dolphin has started research and development activities that will lead to further improvements in functionality, flexibility, scalability and performance. A new PCIe Gen3 switch prototype has been developed to enhance scalability. Internal tests indicate support for up to 60 nodes, deployment for high performance real application testing and tuning to the eX3 infrastructure installed at SIMULA Research Laboratory is planned to begin in 2019. This will enable additional SIRIUS partner laboratory activities. In the meantime, Dolphin has contributed its current PCIe adapter card and PCIe switch offering basic infrastructure for flexible sharing of compute and storage resources for Big Data High Performance applications. This gives us an 8 node PCIe Gen3 x8 cluster that is currently available for the SIRIUS partner laboratory activities at SIMULA Research Laboratory.

The NUMA machine available for SIRIUS experim entation has been moved to the Department of Informatics in 2018. This results from the machine being reserved for SIRIUS, and the University of Oslo’s central IT system operators desired the space and allocated personnel’s capacity to support other infrastructure more generally needed at the University. Hosting the computer at the Department of Informatics gives unique access to the machine for the SIRIUS Laboratory managed by the local system operators, and although we have suffered downtime from the move this will be a better and more accessible solution for the SIRIUS Laboratory in the long run. It is anticipated that the NUMA machine is fully operational again from the end for the first quarter 2019.

The central system operators at the University of Oslo maintain a Cloud infrastructure for research and innovation together with other major universities in Norway. Researchers from the SIRIUS Laboratory have started using this Cloud infrastructure for small and medium sized computing jobs. In addition, the Cloud infrastructure has been heavily used for testing the Cross-Cloud platform developed in the Horizon 2020 project MELODIC, which is linked to the Scalable Computing effort of the SIRIUS Laboratory.

Generally, experience from using the Cloud has been very positive, and as soon as the NUMA machine is operational, it is planned to establish a Cloud interface also for this and thereby offering the capability to use this machine both as a High-Performance Computer, and as a private Cloud for experimentation in the SIRIUS Laboratory.  We will test a real Cross-Cloud, combining this private Cloud with the universities’ research Cloud and other public and commercial Cloud offerings, such as the Cloud platform offered by the SIRIUS partner IBM.

Applications

The work on improving the reservoir simulations has been progressing well over 2018 with close interaction with the Equinor staff to discuss about further development of the ongoing collaboration.

An interesting future direction to investigate for both application areas may be to look at combinations of High-Performance Computing infrastructures with commercial Cloud scalability. This architecture may allow temporary scaling on rented Cloud resources, at the penalty of some additional execution time, but at reduced cost compared with dedicated and private hardware infrastructures. This will be initiated with the new PhD student starting to work on Scalable Computing in 2019.

An emerging collaboration within SIRIUS is an application within Semantic Natural Language Processing to derive labels from variables in queries formulated in SPARQL, a semantic query language for databases, and Resource Description Framework graphs and process these to understand the intended meaning. This will require significant computing resources, and the first results are expected as soon as the NUMA machine again becomes available.

Projects in the Scalable Computing research program

(click on the Project Name to read more about project)

Smart Scalable PCI Express [more information coming soon]
SmartIO: Device sharing and memory disaggregation in PCIe clusters using non-transparent bridging [PhD Project] 
MELODIC
MORPHEMIC

SIRIUS Researchers

SIRIUS Partners