Blogs & News

Home Blogs & News

AI, Data Hubs, and Federation: Aridhia’s Strategic Insights for 2025

In 2024 our blogs covered a number of topics, Aridhia’s involvement in multi-national research consortia and the creation of national and transnational data hubs, the development of our metadata syndication capability, the increasing adoption of common data models, the emergence of data federation as a technology for enabling collaboration, and the increasing adoption of AI as a research tool. This blog is an attempt to draw out how these trends interact, or are in some respects in tension with each other, and what they mean for our development priorities in 2025.

National and transnational data hubs

The emergence of large-scale national and transnational data hubs was not a new trend for 2024, with platforms like the HDR Innovation Gateway already well established, but it was brought into sharp relief for Aridhia due to our work with EPND and PHEMS, in addition to our long-term partnership with the AD Data Initiative.

These are all initiatives that seek to provide data owners with a secure platform to advertise their data and metadata, and researchers with an easy means of finding and requesting access to it. The direction of travel appears to be greater interoperability between these hubs, something that we have actively supported in our work with EPND and AD Data Initiative, enabling single sign on (SSO) between the two platforms.

As the demand for wide, cross-border collaboration increases, the network effect of enabling true transnational collaborative research will be transformational, which we are enabling by building Aridhia Networks and connecting previously siloed environments together.

Enabling SSO between two data hubs is at the far end of the interoperability spectrum and requires a very high degree of trust and cooperation between the organisations involved, and the DRE provides other means of collaboration.

Metadata Syndication

Metadata syndication i.e. providing a programmatic means for the metadata from one hub to be published on another, is easier to achieve while still providing meaningful benefits to both data owners and researchers.

In the Aridhia DRE the FAIR API makes it easy for approved users to retrieve dataset metadata and push this to other platforms, and our dedicated HDR UK syndication endpoint allows data owners to easily populate their dataset metadata to the Innovation Gateway. One of our ambitions for 2025 is to extend our syndication capabilities, both between DRE hubs for the Aridhia Network, and with external partners.

Federated Analytics


The emergence of data hubs, and the drive to make them interoperable means that it has never been easier for a researcher to discover datasets of interest, and shows there is a widespread appetite for data sharing and collaboration within the research community. However this is tempered by the reality of operating under data protection legislation such as GDPR, which makes direct data sharing more difficult.

Data federation represents a potential technical solution to this difficulty, allowing users to run analysis on data they do not have direct access to. In 2024 we published a series of blogs detailing the development of the Federated Node, Aridhia’s work with PHEMS project that explain how we are integrating this capability in to the DRE, by introducing token transfer to FAIR and the secrets API to Workspaces.

In 2025 we will continue to develop our federated analysis capabilities, including the introduction of federated learning for the Federated Node and the introduction of user ACR management in Workspaces, with our priorities grounded in our work supporting the development of the Pediatric Health Data Space.

Common Standards

As with data hubs, the emergence of common data standards is not new, but their adoption seems to be reaching a critical mass with, for example, the NHS SDE adoption of the OMOP standard as its common data model. This is at least partially driven by the emergence of data federation, which to work effectively requires the reuse of common analytics, and this can only be achieved where data owners have adopted a common data standard such as OMOP.

Within the DRE we already support standardised datasets, with features like FAIR dataset duplication which allows data owners to easily reuse common metadata elements such as dictionaries. In 2025 we intend to continue making it easier for data owners to adopt and manage standardised datasets in the DRE; including extending our Workspaces database to support schemas which allows our users to work with multiple OMOP datasets at a time.

Another area in which standards are becoming increasingly important is the provision of Trusted Research Environments (TREs). As the Aridhia DRE is already certified against a variety of national and international standards, we welcome this development and earlier in the year published a whitepaper in scoring the DRE against the SATRE standard.

A key criteria of the SATRE specification is that a TRE should provide users with appropriate tooling, something that the DRE does across a wide spectrum of data types.

Multi-modal Data Access

Our customers are increasingly using Aridhia DRE workspaces for the analysis of multi-mode data, as demonstrated at LCC. Requested via FAIR, our platform enables the access and delivery of imaging, genomic, audio data and more, including biosamples. These datasets are often very large and very complex, requiring dedicated tools and environments. Our platform provides the flexibility to enable this access and we continue to develop this area to enable faster and smooth access to data.

AI as a research and curation assistant

Our final trend is likely the most important and disruptive: the mainstreaming of AI as a data analysis and transformation tool. The nature of healthcare data is that researchers and data owners are rightly wary of using AI cloud services to assist in research of sensitive data. Providing access to specialised, offline DRE-hosted LLMs in a secure and audited manner allows researchers, data scientists and curation experts to experiment with these technologies, or accelerate development time using data-aware code generation, while secure in the knowledge that their data is safe with no possibility of exfiltration or further training on secure data.

We deepened our commitment to this through 2024, with team members attending a Microsoft hackathon on utilising Azure AI services, and work commencing on both using Azure Vector search in FAIR, and enabling the use of offline LLMs in a workspace for data context aware code generation for both analysis and data curation, assisting in the development of analysis and data transformation code and aiming to deliver production-ready capabilities in early 2025.

2025 Priorities

So what do the above trends mean for our development priorities in 2025? In summary:

– Securely integrating AI capabilities in the DRE
– Expanding our data federation capabilities and growing The Aridhia Network
– Increased support for common data standards like OMOP and SDTM
– Enabling smooth metadata syndication to drive collaboration across platforms

Authors

Laura joined Aridhia in March 2019 as the Product Owner for Workspaces, having previously worked in the Fintech sector. Laura’s role involves creating and maintaining the product backlog, defining the features of Workspaces and working with the development team to improve the product. Outside of work, she spends all of her spare time with her horse, Brutus.


Ross joined the Aridhia Product Team in January 2022. He is the Product Owner for FAIR Data Services, and Aridhia’s open source federation project. He works with our customers to understand their needs, and with our Development Team to introduce new features and improve our products. Outside of work, he likes to go hill walking and is slowly working his way through Scotland’s Munros.