Home Blogs & News

A Data Commons – deliver more with collaboration and data sharing

The ‘Commons’ has a long social history as an approach to self-governance of a resource deemed to be valuable and necessary for the well-being of a community. Originating from 17th Century agricultural communities as a model to govern common grazing land, the evolution of the commons approach has generated a fair amount of economic and social science study. Today the Data Commons is following in the footsteps of this long tradition by providing a business model to establish pre-competitive consortia aligned around a particular clinical condition.

The digital community has embraced aspects of the commons model since the late 1970’s with the birth of the free software movement and its evolution to the open source software models we all rely upon for the vast majority of our digital transactions.

Within healthcare, longitudinal studies of patient and citizen cohorts have historically provided the bedrock for public health policy by generating highly correlated data linking lifestyle to disease onset within populations. Those studies have informed everything from medical protocols to consumer lifestyle choices demonstrating their value to population health.

As scientific progress in understanding disease progression and patient group stratification continues, the collation of longitudinal datasets from cohorts takes on a new set of characteristics. To maximise the development of new stratified treatments and ultimately improve patient outcomes, both the size and the depth of phenotypic data needs to increase dramatically. In essence we need to establish an easy model for the creation of a ‘Data Commons’ serving a community who have particular interest in a particular disease or condition (pharma, healthcare providers and academics).

In line with the original thinking of a commons, no individual owns the commons, it’s a collective resource to allow the participants within a community to advance what they want to achieve, whether that be raising healthy cows or running an adaptive clinical trial with the data commons as one of the inputs. Pooling of data within healthcare and bioscience is fraught with practical challenges in honouring patient privacy as well as governing access and intellectual property. However, at the same time we can’t expect rapid progress to be made in oncology, neuro-degenerative and other life affecting conditions unless we can bring pooled data together in a way that creates an evergreen data resource.

Fortunately, advances in public cloud solutions are bringing services to market that allow the implementation of multi-institution, geographically diverse data commons for particular disease areas. Our attention now needs to turn on how we make the barriers to creating a disease-specific data commons as low as possible.

At Aridhia, we’re providing services configured to help data controllers easily understand how their data is being consumed either by users of the commons or by other machine services. For domain users (clinical researchers, clinicians, data scientists) we’ve implemented the FAIR approach to data (Findability, Accessibility, Interoperability, and Reusability), making the assembly and population of research relevant datasets as easy as we can. Our Workspace services provide a secure and audited environment for collaborative analysis, insight and publication. Later in the year we’ll release a set of FHIR services to make it easier to connect data commons to electronic patient record systems.

As the barriers to adoption of data commons based consortia come down, the potential for creating and curating extremely large and deeply phenotyped data becomes a reality. As with anything new, we rely on pioneers to demonstrate what can be achieved. The European Prevention of Alzheimer’s Dementia Consortia is probably one of the best examples of a Data Commons in practice. Funded jointly by the European Union IMI program and Janssen and led by Prof Craig Ritchie, an internationally renowned dementia researcher. To quote –

Craig Ritchie, Project Co-coordinator, EPAD

“EPAD is the largest ever public-private partnership in Alzheimer’s disease research, EPAD transcends disciplinary boundaries to combine knowledge and expertise from 39 European partner organisations across multiple sectors. Together, we are developing the EPAD platform on which new therapies can be tested in a streamlined and efficient way, delivering more effective, targeted interventions that can slow or stop dementia”.

The EPAD community recognises that the pooling of well characterised and well curated longitudinal data establishes an evergreen resource that can be used as run-in data set for adaptive clinical trials, an asset that any individual participant would struggle to create by themselves. The informatics framework establishes the environment to allow the data commons to function, providing confidence and evidence to data controllers that their data is being used and accessed within the information governance protocols established by the commons consortia.

If we’re to make progress in finding cost effective new treatments for stratified disease variants and their progression, the widespread adoption of data commons models becomes a pre-requisite to achieving this in a timely and cost-effective manner. The informatics infrastructure now exists to turn those services up through public cloud and the experience of pioneers such as EPAD gives us the governance framework to allow institutions and their data controllers the confidence to participate.

A Data Commons – deliver more with collaboration and data sharing

David Sibbald

Recent Posts