Blogs & News

Home Blogs & News

The Federated Node – Development of An Open-Source Component for Federated Analysis

Our previous blog on Aridhia’s work as part of the PHEMS consortium introduced the Federated Node and gave a high-level view of its integration with the Aridhia DRE. This blog provides further information on the Federated Node, including its release, licensing, and upcoming development priorities.

Defining Federation

Data protection laws like GDPR make data sharing across national borders and institutional boundaries increasingly difficult, restricting the ability of researchers to share data with colleagues outside of their organisation.

Federated Data Sharing allows researchers to analyse data that they cannot access directly. The video below details a three-tier model for understanding data federation:

Federated Node Overview

The Federated Node (FN) is a component used for running federated tasks, and is based on three existing open-source projects:

The Common API
Keycloak
Nginx

The Common API provides the structure of the API calls, Keycloak is used for token and user management, and Nginx is used as a reverse proxy. The FN needs to be deployed to a Kubernetes cluster, and requires a Postgres database for storing user credentials.

A deployed Federated Node needs to be associated with a Docker container registry. All analytical tasks that can run against the federated data are hosted as images in this registry, and approved users can initiate these via the FN.

Diagram showing the Federated Node’s relationship with the ACR
1a – Before creating the task pod, the FN checks if the docker image needed can be found in any of the docker container registries associated with the FN.
1b – The task pod is created and the results are saved in the storage account.
2 – On /results calls, if the task pod is on completed status, a job is created.
3 – The job’s pod will have the 2 storage environments mounted. It fetches the tasks result folder and zips it.
4 – The webserver reads the zip contents from the live job pod and saves it in its own storage account environment.
5 – The resulting archive is returned to the end user.

Full details of the FN tech stack and comprehensive deployment instructions will be available from the project Github repo when the FN is made open-source.

Federated Node Open-Source Release

The FN will be available as open-source project from October 2024 under the GNU GPL v3 license.
Releasing under this license means that the Federated Node will be free to use, and that other projects can modify and distribute the source code as they need, while ensuring that any subsequent projects based on the FN must also be open-source.

Federated Node Development Priorities

Initial development of the FN is being carried out for the PHEMS project. As noted in the previous blog, PHEMS has two high-level use cases:

Use Case 1 Use Case 2
Hospitals in the PHEMS network to share benchmarking data for clinical outcomes Hospitals in the PHEMS network pool their data to train machine learning (ML) models

The Federated Node already supports use case one, and developing use case two is one of our immediate priorities:

Priority 1 – responding to initial user feedback to ensure the FN is easy to deploy and all supporting documentation is as clear as possible.
Priority 2 – evaluation of open-source ML frameworks, and trial integrations with the Federated Node to meet PHEMS use case two. We aim to have a PoC for this use case deployed in Q1 2025.

In addition to the above we already have a number of other future development possibilities under consideration:

• Developing a beacon endpoint that will give users limited insight into the federated data before requesting access
• Introducing L1 federation capability
• Improved management functionality, including an edit dataset endpoint
• Integration with other federated analysis frameworks, e.g. DataSHIELD

If you would like to know more about the Federated Node, Ross will be discussing it as part of a public PHEMS meetup on October 22: attendees can sign up here.