Beyond 1 Million Genomes (B1MG) D3.3 The B1MG data analysis challenge

Abstract

<p>Germline and tumor whole genome sequencing (WGS) have now become a standard procedure, integral to both research and clinical practices. However, the diversity in analytical approaches across laboratories remains pronounced. This diversity calls for the establishment of cohesive standards, a need that has yet to be sufficiently addressed. Presently, there exists a scarcity of comprehensive schemes designed to authenticate or set benchmarks for the effectiveness of germline and tumor WGS pipelines.</p><p>Addressing this gap, the European H2020 initiative 1+MG has emerged with a specific mission: to bridge the connection between genomic and health data analyses. Achieving this mission mandates a meticulous exploration of existing voids and optimal methodologies within germline and tumor WGS. This is not only crucial for enhancing the quality of outcomes but also for fostering reproducibility and engendering trust among stakeholders.</p><p>To achieve these objectives, the collaborative efforts of the 1+MG and B1MG projects have been mobilised. The central focus lies in the orchestration of a somatic WGS benchmarking initiative, encompassing three distinct challenges:</p><p><strong>Wet Lab Challenge</strong>: This segment scrutinises the library preparation and sequencing stages, with an emphasis on evaluating the precision and robustness of these processes.</p><p><strong>Full Pipeline Challenge</strong>: Encompassing library preparation, sequencing, and data analysis, this challenge offers a comprehensive evaluation of the end-to-end workflow. The goal is to assess the integrity of the entire pipeline in generating reliable results.</p><p><strong>Dry Lab Challenge</strong>: The data analysis pipeline takes centre stage in this challenge, as it seeks to appraise the computational methodologies employed in deciphering and interpreting the genomic data.</p><p>By structuring these challenges, the 1+MG and B1MG projects have contributed significantly towards harmonising WGS practices, fostering a unified understanding of best practices, and nurturing confidence among stakeholders. This progressive approach not only ensures high-quality outcomes but also supports the critical drive for reproducibility and reliability within the realm of genomic and health data analysis.</p><p>To this date, the 1+MG WG4 has organised a comprehensive quality comparison for all the stages of the somatic whole genome variant calling process. As described above, we have divided the workflow into three main tasks: the wetlab, the full pipeline, and the dry lab challenges. For each of these stages, we have collected results from all the participating labs and obtained the relevant quality metrics. The comparison of results across all labs has provided the baseline for the construction of a curated dataset of somatic variants with the highest reliability. This goldset establishes the standard of quality against which individual laboratory observations are measured.      </p><p>The 1+MG WG4 has provided best practices for whole genome somatic variant calling through a comprehensive benchmark of quality metrics for all stages of the process. This work has also contributed to the generation of a goldset of somatic variant calls, for both small and large variants. In a larger framework, the 1+MG WG4 sets the quality requirements of genomic data for cross-border access and for personalised medicine practice.</p&gt

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 04/05/2024