DiUS enables 40x faster analysis of multi-billion-row highly-sensitive patient databases
The Centre for Big Data Research in Health (CBDRH) in the Faculty of Medicine at the University of NSW needed a solution to better enable researchers to analyse multi-billion row datasets using machine learning. A platform that shortened the time it took to set up and manage research projects aimed at delivering better patient care.
Developed in partnership with DiUS, a new cloud-based research platform has delivered up to 40x faster analysis of large and highly sensitive datasets. The platform, known as ERICA (E-Research Institutional Cloud Architecture), empowers researchers to improve the way patients are treated in Australia, and worldwide.
ERICA is a cost-effective, scalable and secure platform that supports faster, more efficient and innovative scientific research. The CBDRH is extending the cloud research platform’s impact by inviting research institutions that also work with sensitive patient data to form a consortium.
Using HPC on demand to boost scientific computing capability
The CBDRH is a research centre of excellence using cutting-edge statistical and machine learning methods on ‘big’ health data. With access to data collected by governments, hospitals and other healthcare providers, the Centre answers important clinical questions—such as what is the best treatment for this disease?—to help optimise the efficiency and quality of Australia’s healthcare system. The Centre delivers research outcomes in areas such as aged health, cancer, childhood development, intensive care and improved care for transplant recipients.
The Centre had access to high performance compute (HPC) resources through the University’s on-premise infrastructure, however the time to set up a research project through the centralised IT department was a significant barrier to project launch times. Additionally, the physical infrastructure was not keeping pace with sophisticated workflows on increasingly larger cohorts of data. The research infrastructure needed more scale and speed.
Given the strict privacy and ethical requirements governing the use of health data, the CBDRH had a specific security requirement. Data movement and use had to be strictly controlled and audited, while still supporting fast, scalable and robust workspaces.
Custom platform needed to further accelerate research outcomes
When a market review failed to uncover an appropriate off-the-shelf cloud-based research solution, the Centre reached out to Australian technology company and AWS Advanced Consulting Partner, DiUS, to develop a custom platform using AWS.
DiUS worked closely with the CBDRH to co-create the research platform, ERICA (E-Research Institutional Cloud Architecture) to meet the Centre’s needs. Using a cross-functional team of Human Centred Designers, specialist software engineers and an Iteration Manager, DiUS leveraged agile and lean approaches, which were new to the CDBRH. The DiUS approach soon proved value by placing working prototypes in the hands of researchers to validate the platform’s functionality and future development plans.
DiUS was involved in all stages of product development, from the original proof of concept to validate the ERICA cloud concept and developing the initial MVP release. DiUS has continued to work on the ERICA platform, providing a richer feature set for further releases of the platform.
Creating virtual research environments to deliver on-demand fast, scalable and secure insights
ERICA was designed as an encapsulated, self-contained virtual research environment, complete with ready access to a HPC environment. Workspaces can be configured with the right level of access to storage, compute power, cutting-edge data science and machine learning tools and the specific data sets required for each research project and researcher.
“DiUS was instrumental in bringing our vision for a scalable, cost-effective, reliable and highly secure cloud-based research platform to life.
“The ERICA platform supports faster, more efficient and innovative scientific analysis by enabling researchers at the Centre for Big Data Research in Health UNSW, and our collaborators in other research institutions, to use cutting-edge machine learning and deep learning techniques in the secure analysis of large and highly sensitive health datasets.” Professor Louisa Jorm
DiUS leveraged multiple AWS services including VPC, S3, ECS, Auto Scaling, and CloudFormation to set up managed and secure cloud desktop environment that can only be accessed through Amazon WorkSpaces.
Controls were built into ERICA to provide a high level of security, controlled audit of data movement, cost visibility and control, as well as data recovery at any point in time. Data movement in and out of the environment is strictly controlled through custom-built gateways which provide a fail-safe way to lock access to highly sensitive health data.
To make the setup as simple as possible, a project setup in ERICA is orchestrated through a point-and-click web-based interface. The continuous delivery pipeline, built into the system as infrastructure-as-code, means each research project’s resources can be pulled down or suspended when not needed.
With security being critical, granular access control and two-factor authentication was implemented along with encryption everywhere, both for data-at-rest and data-in-transit.
Analysing more data, more quickly to drive better patient outcomes
Having ERICA place the control for provisioning research environments directly into the hands of the CBDRH researchers and their collaborators is a real win. Project launch time is now much faster. Previously, setting up a secure research environment could take days to weeks—now it takes minutes.
Researchers can now access multi-billion row databases through ERICA, which previously wasn’t possible. There’s also been a 40x reduction in time taken to train machine learning models, meaning that research projects see faster results.
For example, ERICA is being used by researchers to train a machine learning model using the largest collection of intensive care unit (ICU) patient data in the world. The model is 30-40% more accurate than current models in predicting patient outcomes following admission and can be trained in under an hour using approximately $40,000 worth of cloud-provisioned GPU servers on an on-demand basis for just a few tens of dollars.
Overall, the CBDRH is thrilled with ERICA’s ability to deliver cost savings, improved time to delivering research outcomes and it’s ability to scale with ever larger and complex electronic datasets.
- Faster project launch time: new project environments can be set up in minutes.
- Cost of compute visibility: to track the cost of the cloud instances to train machine learning models through the project.
- Faster time to scientific results: A 40-fold improvement in the time taken to train machine learning models.
- Reduced IT infrastructure provisioning errors: system administrator errors are minimised by using an automated, infrastructure-as-code process to set up research environments and access controls.
- Better support for data-intensive research: analysis of extremely large, for example multi-billion row databases.
- Research collaboration: secure, remote and virtual access for researchers supports inter-faculty, inter-university, inter-colleague collaboration - without having to move or duplicate large and sensitive data sets to multiple locations.
- Scalability and flexibility: ERICA enables infrastructure, informatics and analytics capabilities to be scaled up or down, as required, driven largely by the researcher end-users.
Moving forward, the on-demand and scalable nature of the cloud research platform means that research can keep pace with the advances in HPC power, without having to make capital investment in infrastructure.
Delivering ongoing research benefits through consortium
As ERICA has met a market gap, CBDRH is extending the reach and capabilities of the research platform by inviting other research-focused organisations to join a consortium. ERICA’s scalability and robustness underpins the CBDRH’s ability to productise the platform and amortise its development costs.
DiUS was the ideal innovation partner for developing ERICA because of a proven experience using emerging technologies to deliver unique solutions with enduring community benefit. We’re thrilled to be at the leading edge of using big data to enhance the health and well-being of Australians and the global community.” Professor Louisa Jorm
The consortium will empower more researchers to undertake world-class health and medical research using large and highly sensitive patient data through ERICA, scaling the impact beyond CBDRH. DiUS is working with the consortium partners to customise ERICA to meet individual research needs.