The benefits of a federated data platform

What is data federation?

Data federation enables researchers to analyse sensitive data without granting them direct access to the dataset. Instead, they can use approved federated data analysis tools to perform their research without transferring the data.

The problem with direct access

When a research project is granted access to sensitive data, it is common for a copy of the data to be transferred to a trusted research environment. In this instance, researchers are able to view the full dataset and perform their analysis on it directly. Where direct sharing of data is not possible due to security restrictions and/or data protection laws users need an alternative way to analyse the data. 

User
Workspace
Secure Workspace
Data transferred
to workspace
Database

How Federated Data Analysis works

Federated Data Analysis means that rather than granting researchers direct access to view record level data and perform analysis, they are granted federated access permissions to send approved analytics to the data source and run remote tasks, only receiving aggregate results. This means that sensitive data doesn’t have to be shared to be analysed, making use of it in a compliant and secure manner.

User
Workspace
Secure Workspace
No data is transferred to the workspace, only the results of the remote task
Requests remote
task run
Aggregate results
returned
 Remote Task
Users send approved analytical tasks to run remotely at the data source

Piloting the use of Federated Analysis to access COVID-19 research data

A paper highlighting the potential of using federated analysis networks to provide access to previously inaccessible datasets for research.

Levels of Data Federation

Data protection laws like GDPR make data sharing across national borders and institutional boundaries increasingly difficult, restricting the ability of researchers to share data with colleagues outside of their organisation. Federated Data Sharing allows researchers to analyse data that they cannot access directly.

At Aridhia we use a three-level model to define data federation:

Centralised

The user is transferred a copy of the data for direct analysis.

Distributed

The user can query federated data and return matching records for analysis.

Federated

The user has no direct access to the data, they can send queries to the federated data and receive results.

The Federated Node

Pioneering data federation solutions for life sciences and healthcare industry leaders has led us to develop an open source tool for handling federated analysis tasks. That tool is The Federated Node.

An open source tool for running federated analysis tasks

The Federated Node (FN) is an open source software component built for running federated tasks, and was the first product available under the Aridhia Open Source GitHub organisation.

The Federated Node is built on three existing projects:

The Common API was developed in 2021 as an open standard for data platforms to participate in data sharing networks. It specifies a set of endpoints that provide a framework for organisations that wish to collaborate on federated data sharing and analysis. It provides the structure of the Federated Node API.

Keycloak is used for token and user management, and nginx is used as a reverse proxy, to route incoming requests.

Deploying The Federated Node

Federated Node deployments are lightweight and use common technologies. All Federated Nodes are deployed to a Kubernetes cluster and require a Postgres database for storing user credentials.

A deployed Federated Node must be associated with a Docker container registry which is used to store the remote tasks that are run against the data. This architecture gives the data owner full control over what code is run against their data, as only scripts stored in the associated container registry can be used, and only authenticated users have the ability to initiate federated tasks.

Available under the GNU GPL v3 license

This licence ensures that the Federated Node is free to use, that the source code can be modified and distributed as needed, and that any subsequent projects based on the FN must also be open-source.

Using federated analysis in the real world

The use of a federated data platform for conducting federated analysis presents data networks and consortia with an attractive solution for sharing data while remaining GDPR compliant. Read the PHEMS case study below to learn how Aridhia is providing federated data analysis tools to facilitate compliant data sharing between hospitals.

PHEMS logo

How Aridhia built a Data Federation Platform to enable the PHEMS project to share health data securely

PHEMS (short for “Paediatric Hospitals as European drivers for multi-party computation and synthetic data generation capabilities across clinical specialities and data types”) is a Europe-wide consortium of paediatric hospitals with the aim of revolutionizing the way in which critical health data is managed and utilized across Europe.

Born from the need for a GDPR-compliant solution to sharing data that also takes into account the complex data protection legislation requirements of working across differing jurisdictions, PHEMS is working with Aridhia to provide a federated data network that will allow the partners to collaborate across national borders.

PHEMS use cases for Data Federation

Objective

Hospitals in different European countries want to share benchmarking data for clinical outcomes.

Problem

National and international data protection laws mean this data cannot be shared directly with other members of the network.

Federated solution

Partner hospitals agree key benchmarking statistics. Federated analytics which can generate these stats are hosted in a central analytic library. Data owners can give researchers’ permission to run the agreed analytics against their data.

The aggregate results of this analysis are returned to a dashboard in the researchers’ secure workspace.

Objective

European research hospitals want to pool their data to train machine learning (ML) models.

Problem

National and international data protection laws mean that data cannot be consolidated in one place for the purposes of training the ML model.

Federated solution

Hospitals agree to the creation of a common ML model, using the federated node this can be trained on each dataset independently. Data Owner retains full control over their data, and release of results from ML model.

The Aridhia DRE: A federated data platform for sharing health data securely

Delivering real world federated analysis solutions, the Aridhia DRE is giving data owners and researchers new tools to securely share patient data in order to conduct vital medical research.

Paper

Data Federation in a Trusted Research Environment

This whitepaper outlines the concept of data federation within a trusted research environment using the example of Aridhia’s work with the PHEMS consortium.

Article

How the Aridhia DRE is Enabling Federated Analysis in the PHEMS Consortium

An outline of Aridhia’s federated data sharing solution for the PHEMS project. The first in a series of articles about Aridhia’s involvement as a technical partner for providing federated data analysis.

Article

The Federated Node – Development of An Open-Source Component for Federated Analysis

An introduction to The Federated Node, a key component of the Aridhia Federated Data Platform. This article details the Federated Node, including its release, licensing, and upcoming development priorities.

Article

Aridhia Releases Open Source Federated Node

Our recent blogs have covered Aridhia’s involvement in the PHEMS project, and the development of the Federated Node for the PHEMS federated network. This blog provides more information on the PHEMS network and the preview release of the Federated Node.

The Aridhia DRE:
A TRE with data federation support

The Federated Node is just part of a data federation workflow focussed on running federated taks and not an entire end-to-end product. As a trusted research environment, the Aridhia DRE integrates with the Federated Node to provide features like a metadata catalogue, data access request management (DAR), and a UI for submitting federated analysis tasks and viewing the results after they have been run.

Data Federation from within a TRE
Submit data access request
Approve data access request
Issue credentials for FN
Access secure workspace
Submit federated query
Return results
Hover over a number on the diagram to learn more
  1. Submit data access request
  2. Approve data access request
  3. Issue credentials for FN
  4. Access secure workspace
  5. Submit federated query
  6. Return results

FAIR Data Services within the Aridhia DRE, allows data owners to self-serve connections to data deployed with a Federated Node and associate it with dataset metadata. This feature also ensures secure generation and transfer of user authentication tokens into Workspaces.

The benefits of querying a Federated Node from a secure Aridhia workspace include providing research teams with a powerful, easy-to-use user interface and the deployment of custom applications for carrying out federated analysis requests and visualising the returned results.

An enterprise-scale trusted research environment

A TRE with federated analysis capabilities built in, the Aridhia DRE is the platform giving data owners and patients peace of mind that their data is being used for research securely.