Blogs & News
July marks a milestone in Aridhia’s history as we celebrate the first release of our FAIR Data Services focusing on data discovery.
In the development of our FAIR Data Services there is much to consider, but first and foremost is the “F” of the FAIR Data Principles, or how to make data Findable. To be Findable:
Despite the FAIR Data Principles being published in 2016, their acceptance and implementation into research ecosystems was initially slow. However, there is now a substantial demand for making data available and shareable to improve research quality and hence patient outcomes at a faster pace.
Outside of research circles it might be surprising to learn just how much time is dedicated to simply finding and organising data for analysis. Data scientists reported that this accounts for up to 80% of their working time. Similarly, by not knowing what data already exists, researchers can spend their time duplicating existing work. Lifting that burden allows for faster discovery, more efficient analysis and greater innovation.
The challenge is how to enable effective data discovery and help users understand if a particular dataset is relevant to answering their research question. As the FAIR Data Principles outline, there are two parts to addressing this.
A dataset should be described with clear, concise, and exhaustive metadata. The quantity and quality of added metadata has a direct impact of the overall findability of a dataset and what can be indexed by search resources. A well-described dataset should include:
Leading by example of curating well described datasets can itself encourage other users to add more metadata and define the standard of what is expected by all users in the research ecosystem. Where possible, dataset descriptions should align with metadata standards to facilitate data sharing and the interoperability between data discovery ecosystems, such as the Data Catalog Vocabulary (DCAT) for catalogue entries. Metadata should also be available in a machine-readable format (e.g. JSON).
Alongside user added metadata, auto-generated metadata should also allow the user to obtain a greater understanding of the dataset:
A dataset may have the exhaustive and quality metadata, however without a search resource, the dataset will never be found. Effective metadata and search capabilities both play and equally important part of enabling data discovery.
Searching for data must suit a variety of users based on their “search experience”. The ability to use simple search criteria typically is balanced against the accuracy required of the search engine (i.e. precision vs recall). While complex search criteria may return more relevant results, nowadays there are various user-friendly smart search capabilities that can in-part replace the need for complex queries to return a user’s expected results. For example:
Search should not only allow enable datasets to become ‘Findable’, but the search itself should be ‘Reusable’ where searches can be saved and re-run at a later date. Furthermore, URLs of searches should be copiable to enable searches to be shared within the community. Options for this are intuitively built-in to the user interface of FAIR Data Services.
Despite the many user-friendly approaches to search, ultimately a search engine is only as good as the data it indexes. While there is some variability in approaches to indexing, there are certain use cases and should be the foundation of searching across datasets and metadata. Users should be able to find data via:
Searching for data should understand semantics, which can be difficult in an ecosystem like biomedicine with so many different ontological systems. What is required is a harmonisation of these semantic repositories.
The first release of Aridhia FAIR Data Services addresses much of the above by giving researchers and innovators the ability to discover and understand data through dataset search, classification and efficient metadata browsing capabilities described via dataset catalogues, dictionaries and associated attached assets.
Specifically, this release comprises the following features:
Feature | Description |
Data Discovery |
|
Metadata Browsing |
|
Metadata Management |
|
Role-based Access Control |
|
Built on Standards |
|
Integration with Aridhia Workspaces |
|
Privacy by Design |
|
Cloud-native Service |
|
Throughout the rest of 2020, more features will be rolled out as they are completed. This includes:
For more information about the service, view our Aridhia FAIR Data Services web page. Alternatively feel free to contact us.
Andrew joined Aridhia in January 2018 to support the Enablement Team. He studied Ecology and Animal Behaviour at the University of St Andrews before working in various sales and marketing positions for technology companies. Outside of sales/marketing, Andrew also provides client support for the likes of Great Ormond Street Children’s Hospital and the European Prevention of Alzheimer’s Dementia Consortium (EPAD).
July 13, 2020
Gary joined Aridhia in 2019 as the Product Owner of FAIR Data Services. With three years prior Product Management experience in the Cloud Computing domain, Gary also has a PhD in Computer Science and has worked in a number of Postdoctorate researcher roles from developing online distance learning courses to improving usability of HPC infrastructures. Similarly, he has worked at Apple and CERN in similar capacities of improving Cloud Computing infrastructures. As a FAIR data advocate, Gary is now responsible for the development of Aridhia FAIR Data Services and the implementation of the wider product strategy.