2 research outputs found

    Genome Annotation using Nanopublications: An Approach to Interoperability of Genetic Data

    No full text
    <p>With the wide spread use of Next Generation Sequencing (NGS) technologies, the primary bottleneck of genetic research has shifted from data production to data analysis. However, annotated datasets produced by different research groups are often in different formats, making genetic comparisons and integration with other datasets challenging and time consuming tasks. Here, we propose a new data interoperability approach that provides unambiguous (machine readable) description of genomic annotations based on a novel method of data publishing called nanopublication. A nanopublication is a schema built on top of existing semantic web technologies that consists of three components: an individual assertion (i.e., the genomic annotation); provenance (containing links to the experimental information and data processing steps); and publication info (information about data ownership and rights, allowing each genomic annotation to be citable and its scientific impact tracked ). We use nanopublications to demonstrate automatic interoperability between individual genomic annotations from the FANTOM5 consortium (transcription start sites) and the Leiden Open Variation Database (genetic variants). The nanopublications can also be integrated with the data of the other semantic web frameworks like COEUS. Exposing legacy information and new NGS data as nanopublications promises tremendous scaling advantages when integrating very large and heterogeneous genetic datasets.</p

    Standardized analysis and sharing of genome-phenome data for neuromuscular and rare disease research through the RD-Connect platform

    No full text
    <b>Abstract: </b><div>RD-Connect (rd-connect.eu) is an EU-funded project building an integrated platform to narrow the gaps in rare disease research, where patient populations, clinical expertise and research communities are small in number and highly fragmented. Guided by the needs of rare disease researchers and with neuromuscular and neurodegenerative researchers as its original collaborators, the RD-Connect platform securely integrates multiple types of omics data (genomics, proteomics and transcriptomics) with biosample and clinical information – at the level of an individual patient, a family or a whole cohort, providing not only a centralized data repository but also a sophisticated and user-friendly online analysis system. Whole-genome, exome or gene panel NGS datasets from individuals with rare diseases and family members are deposited at the European Genome-phenome Archive, a longstanding archiving system designed for long-term storage of these large datasets. The raw data is then processed by RD-Connect's standardised analysis and annotation pipeline to make data from different sequencing providers more comparable. The corresponding clinical information from each individual is recorded in a connected PhenoTips instance, a software solution that simplifies the capture of clinical data using the Human Phenotype Ontology, OMIM and Orpha codes. The results are made available to authorised users through the highly configurable online platform (platform.rd-connect.eu), which runs on a Hadoop cluster and uses ElasticSearch – technologies designed to handle big data at high speeds. The user-friendly interface enables filtering and prioritization of variants using the most common quality, genomic location, effect, pathogenicity and population frequency annotations, enabling users from clinical labs without extensive bioinformatics support to do their primary genomic analysis of their own patients online and compare them with other submitted cohorts. Additional tools, such as Exomiser, DiseaseCard, Alamut Functional Annotation (ALFA) and UMD Predictor (umd-predictor.eu) are integrated at several levels. The RD-Connect platform is designed to enable data sharing at various levels depending on user permissions. At the most basic level (“does this specific variant exist in any individual in this cohort?”) it has lit a Beacon within the Global Alliance for Genomics and Health’s Beacon Network (www.beacon-network.org). At the next stage of sharing – finding similarities between patients in different databases with a matching phenotype and a candidate variant in the same gene – it is actively involved in the development of Matchmaker Exchange (www.matchmakerexchange.org), allowing users of different systems to securely exchange information to find confirmatory cases. And finally, since all patients within the system have been consented for data sharing, users of the system, after validation and authorization, are able to access datasets from other centres, providing an instant means of gathering cohorts for cross-validation and further study. Although open to any rare disease, the platform is currently enriched for neuromuscular and neurodegenerative phenotypes and includes almost 1000 genomic datasets from the NeurOmics project (www.rd-neuromics.eu) with several other contributions in the pipeline, including 1000 limb-girdle muscular dystrophy index cases from the Myo-Seq project (www.myo-seq.org) and more. The platform is free of charge to use and is open for contributions of NGS and phenotypic data from research labs worldwide via [email protected] <p></p></div
    corecore