17 research outputs found

    The COMPARE Data Hubs

    Get PDF
    Data sharing enables research communities to exchange findings and build upon the knowledge that arises from their discoveries. Areas of public and animal health as well as food safety would benefit from rapid data sharing when it comes to emergencies. However, ethical, regulatory and institutional challenges, as well as lack of suitable platforms which provide an infrastructure for data sharing in structured formats, often lead to data not being shared or at most shared in form of supplementary materials in journal publications. Here, we describe an informatics platform that includes workflows for structured data storage, managing and pre-publication sharing of pathogen sequencing data and its analysis interpretations with relevant stakeholders

    Expression Atlas update: gene and protein expression in multiple species.

    Get PDF
    The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa

    FAIR sharing of health data: a systematic review of applicable solutions

    Get PDF
    PurposeData sharing is essential in health science research. This has also been acknowledged by governments and institutions who have set-up a number of regulations, laws, and initiatives to facilitate it. A large number of initiatives has been trying to address data sharing issues. With the development of the FAIR principles, a set of detailed criteria for evaluating the relevance of such solutions is now available. This article intends to help researchers to choose a suitable solution for sharing their health data in a FAIR way.MethodsWe conducted a systematic literature review of data sharing platforms adapted to health science research. We selected these platforms through a query on Scopus, PubMed, and Web of Science and filtered them based on specific exclusion criteria. We assessed their relevance by evaluating their: implementation of the FAIR principles, ease of use by researchers, ease of implementation by institutions, and suitability for handling Individual Participant Data (IPD).ResultsWe categorized the 35 identified solutions as being either online or on-premises software platforms. Interoperability was the main obstacle for the solutions regarding the fulfilment of the FAIR principles. Additionally, we identified which solutions address sharing of IPD and anonymization issues. Vivli and Dataverse were identified as the two most all-round solutions for sharing health science data in a FAIR way.ConclusionsAlthough no solution is perfectly adapted to share all type of health data, there are work-arounds and interesting solutions to make health research data FAIR

    Metagenomics-based proficiency test of smoked salmon spiked with a mock community

    Get PDF
    An inter-laboratory proficiency test was organized to assess the ability of participants to perform shotgun metagenomic sequencing of cold smoked salmon, experimentally spiked with a mock community composed of six bacteria, one parasite, one yeast, one DNA, and two RNA viruses. Each participant applied its in-house wet-lab workflow(s) to obtain the metagenomic dataset(s), which were then collected and analyzed using MG-RAST. A total of 27 datasets were analyzed. Sample pre-processing, DNA extraction protocol, library preparation kit, and sequencing platform, influenced the abundance of specific microorganisms of the mock community. Our results highlight that despite differences in wet-lab protocols, the reads corresponding to the mock community members spiked in the cold smoked salmon, were both detected and quantified in terms of relative abundance, in the metagenomic datasets, proving the suitability of shotgun metagenomic sequencing as a genomic tool to detect microorganisms belonging to different domains in the same food matrix. The implementation of standardized wet-lab protocols would highly facilitate the comparability of shotgun metagenomic sequencing dataset across laboratories and sectors. Moreover, there is a need for clearly defining a sequencing reads threshold, to consider pathogens as detected or undetected in a food sample

    FAIR+E pathogen data for surveillance and research: lessons from COVID-19

    Get PDF
    The COVID-19 pandemic has exemplified the importance of interoperable and equitable data sharing for global surveillance and to support research. While many challenges could be overcome, at least in some countries, many hurdles within the organizational, scientific, technical and cultural realms still remain to be tackled to be prepared for future threats. We propose to (i) continue supporting global efforts that have proven to be efficient and trustworthy toward addressing challenges in pathogen molecular data sharing; (ii) establish a distributed network of Pathogen Data Platforms to (a) ensure high quality data, metadata standardization and data analysis, (b) perform data brokering on behalf of data providers both for research and surveillance, (c) foster capacity building and continuous improvements, also for pandemic preparedness; (iii) establish an International One Health Pathogens Portal, connecting pathogen data isolated from various sources (human, animal, food, environment), in a truly One Health approach and following FAIR principles. To address these challenging endeavors, we have started an ELIXIR Focus Group where we invite all interested experts to join in a concerted, expert-driven effort toward sustaining and ensuring high-quality data for global surveillance and research

    Comparison of sequencing methods and data processing pipelines for whole genome sequencing and minority single nucleotide variant (mSNV) analysis during an influenza A/H5N8 outbreak

    Get PDF
    As high-throughput sequencing technologies are becoming more widely adopted for analysing pathogens in disease outbreaks there needs to be assurance that the different sequencing technologies and approaches to data analysis will yield reliable and comparable results. Conversely, understanding where agreement cannot be achieved provides insight into the limitations of these approaches and also allows efforts to be focused on areas of the process that need improvement. This manuscript describes the next-generation sequencing of three closely related viruses, each analysed using different sequencing strategies, sequencing instruments and data processing pipelines. In order to determine the comparability of consensus sequences and minority (sub-consensus) single nucleotide variant (mSNV) identification, the biological samples, the sequence data from 3 sequencing platforms and the *.bam quality-trimmed alignment files of raw data of 3 influenza A/H5N8 viruses were shared. This analysis demonstrated that variation in the final result could be attributed to all stages in the process, but the most critical were the well-known homopolymer errors introduced by 454 sequencing, and the alignment processes in the different data processing pipelines which affected the consistency of mSNV detection. However, homopolymer errors aside, there was generally a good agreement between consensus sequences that were obtained for all combinations of sequencing platforms and data processing pipelines. Nevertheless, minority variant analysis will need a different level of careful standardization and awareness about the possible limitations, as shown in this study

    Benchmark of thirteen bioinformatic pipelines for metagenomic virus diagnostics using datasets from clinical samples

    Get PDF
    Introduction: Metagenomic sequencing is increasingly being used in clinical settings for difficult to diagnose cases. The performance of viral metagenomic protocols relies to a large extent on the bioinformatic analysis. In this study, the European Society for Clinical Virology (ESCV) Network on NGS (ENNGS) initiated a benchmark of metagenomic pipelines currently used in clinical virological laboratories.Methods: Metagenomic datasets from 13 clinical samples from patients with encephalitis or viral respiratory infections characterized by PCR were selected. The datasets were analyzed with 13 different pipelines currently used in virological diagnostic laboratories of participating ENNGS members. The pipelines and classification tools were: Centrifuge, DAMIAN, DIAMOND, DNASTAR, FEVIR, Genome Detective, Jovian, MetaMIC, MetaMix,One Codex, RIEMS, VirMet, and Taxonomer. Performance, characteristics, clinical use, and user-friendliness of these pipelines were analyzed.Results: Overall, viral pathogens with high loads were detected by all the evaluated metagenomic pipelines. In contrast, lower abundance pathogens and mixed infections were only detected by 3/13 pipelines, namely DNASTAR, FEVIR, and MetaMix. Overall sensitivity ranged from 80% (10/13) to 100% (13/13 datasets). Overall positive predictive value ranged from 71-100%. The majority of the pipelines classified sequences based on nucleotide similarity (8/13), only a minority used amino acid similarity, and 6 of the 13 pipelines assembled sequences de novo. No clear differences in performance were detected that correlated with these classification approaches. Read counts of target viruses varied between the pipelines over a range of 2-3 log, indicating differences in limit of detection.Conclusion: A wide variety of viral metagenomic pipelines is currently used in the participating clinical diagnostic laboratories. Detection of low abundant viral pathogens and mixed infections remains a challenge, implicating the need for standardization and validation of metagenomic analysis for clinical diagnostic use. Future studies should address the selective effects due to the choice of different reference viral databases.Molecular basis of virus replication, viral pathogenesis and antiviral strategie

    Genome sequencing for viral pathogen detection and surveillance

    Get PDF

    Genome sequencing for viral pathogen detection and surveillance

    Get PDF
    corecore