Data quality and inclusion
Guidelines and best practices for sequencing and data generation with a focus on ensuring data quality.
Quality metrics for sequencing
Required
In this deliverable, the authors aim to understand how 22 laboratories across 13 European countries carry out clinical protocols in their labs and how they address quality control of their samples in each step of the pipeline. They describe how laboratories undergo the 5 NGS pipeline steps for cancer and germline samples as well as WGS and WES workflows.
Best practices for Next Generation Sequencing (NGS)
Recommended
Determining variants in the genome involves several bioinformatic procedures, such as eliminating low-quality sequences, aligning sequencing reads to the human reference genome, and establishing confidence in the presence of a variant based on a threshold. Once a variant is identified, it is annotated to predict its effect. The goal of this deliverable is to establish a best practice protocol for data analysis of whole genome sequencing (WGS) for somatic variants. This protocol includes a recommended suite of software tools with settings that ensure results surpass a required quality threshold.
A general conclusion is that all participants have achieved a good level of quality at the sequencing stage, and the metrics measuring it are largely consistent, as there is very little dispersion. Differences in library preparation and sequencing protocols do not appear to significantly impact the expected quality of results. In the deliverable, the authors explain the performance of different participating laboratories for each one of the relevant sequencing metrics, and suggest general best practices to ensure the best quality.
The B1MG data analysis challenge
Informational
This document aims to bridge the connection between genomic and health data analyses. Achieving this mission mandates a meticulous exploration of existing voids and optimal methodologies within germline and tumor WGS. The central focus lies in the orchestration of a somatic WGS benchmarking initiative, encompassing three distinct challenges: Wet Lab, Full Pipeline, and Dry Lab challenges.
Genome of Europe Plan
The Genome of Europe (GoE) is is aiming to deliver >500,000 whole genome sequences (WGS) as a reference database to represent genetic diversity across Europe, and was chosen as a use case in the GDI project. This deliverable for GDI is a short report on the alignment of GoE working group planning as use case with GDI.
Draft data management policy published including ELSI best practice
Defining the data management policy is an important step to ensure genomic data in GDI is protected from unauthorised access, use, or disclosure, and that it complies to local legal and regulatory requirements. This is the draft version of the data management policy and it will evolve as the discussions around data governance, ELSI requirements and data management plans are taken into account in WP6 and in GDI as a whole. This report describes the main elements of the data management plan that need to be taken into account and as such this should be used as a guide for each node in GDI. Over time, we will describe good common practices for each of these elements, and individual nodes will be able to add any deviations from these common practices to represent the way data management is actually performed.
Report on European resources and data suitable for inclusion into the GDI
This report outlines the contributions of European countries to the Genomic Data Infrastructure (GDI), specifying the expected number of samples, the mix of legacy Whole Genome Sequences (WGS) and new DNA samples, sequencing technologies used, and the infrastructure for managing, storing, and sequencing these samples.
Report outlining the recommendations on data curation and ELSI compliance
It is a challenging task to work with genomic data sequenced over several years (legacy and future data). The aim of this deliverable is to gather recommendations and best practices for the process of curating data for ingestion into GDI nodes. Data curation is a broad term that includes managing data throughout its lifecycle, overlapping with other tasks and aspects of the GDI project. Since one important aspect of data curation is to ensure its quality—the availability and fitness for use and reuse-, this report will focus on data quality aspects and ELSI compliance during the submission process to the data repository. It also discusses some possibilities on how to assure data quality and elaborates on the concepts of data quality and data curation, with a strong focus on genomic data. This report then provides a list of resources for genomic data quality assessment.
Use case demonstrator package
This deliverable focuses on creating a use case demonstrator package, which includes an initial set of data collections tailored for specific use case scenarios within the European Genomic Data Infrastructure (GDI). Given the current lack of real data in the GDI nodes, efforts were concentrated on gathering synthetic and other available real data that closely align with intended use cases to facilitate comprehensive testing of the GDI infrastructure.
Report to identify the initial set of relevant data for use cases
Establishing minimal datasets and standards within GDI facilitates seamless data exchange, driving innovation in healthcare and improving patient outcomes. Deliverable 7.4 encapsulates the collaborative efforts of GDI, 1+MG, and B1MG initiatives towards establishing minimal datasets and standards for genomic data exchange and integration. Efforts focused on leveraging prior work from 1+MG and B1MG initiatives to define minimal datasets for GDI use cases, particularly in Cancer, Infectious Diseases and the Genome of Europe. Collaborative refinement processes ensured comprehensive standards adherence.
We are still working on the content for this page. If you are interested in adding to the page, then:
This is a community-driven website, so contributions are welcome! You will, of course, be listed as a contributor on the page.