Search CORE

9 research outputs found

Standardized analysis and sharing of genome-phenome data for neuromuscular and rare disease research through the RD-Connect platform

Abstract: <div>RD-Connect (rd-connect.eu) is an EU-funded project building an integrated platform to narrow the gaps in rare disease research, where patient populations, clinical expertise and research communities are small in number and highly fragmented. Guided by the needs of rare disease researchers and with neuromuscular and neurodegenerative researchers as its original collaborators, the RD-Connect platform securely integrates multiple types of omics data (genomics, proteomics and transcriptomics) with biosample and clinical information – at the level of an individual patient, a family or a whole cohort, providing not only a centralized data repository but also a sophisticated and user-friendly online analysis system. Whole-genome, exome or gene panel NGS datasets from individuals with rare diseases and family members are deposited at the European Genome-phenome Archive, a longstanding archiving system designed for long-term storage of these large datasets. The raw data is then processed by RD-Connect's standardised analysis and annotation pipeline to make data from different sequencing providers more comparable. The corresponding clinical information from each individual is recorded in a connected PhenoTips instance, a software solution that simplifies the capture of clinical data using the Human Phenotype Ontology, OMIM and Orpha codes. The results are made available to authorised users through the highly configurable online platform (platform.rd-connect.eu), which runs on a Hadoop cluster and uses ElasticSearch – technologies designed to handle big data at high speeds. The user-friendly interface enables filtering and prioritization of variants using the most common quality, genomic location, effect, pathogenicity and population frequency annotations, enabling users from clinical labs without extensive bioinformatics support to do their primary genomic analysis of their own patients online and compare them with other submitted cohorts. Additional tools, such as Exomiser, DiseaseCard, Alamut Functional Annotation (ALFA) and UMD Predictor (umd-predictor.eu) are integrated at several levels. The RD-Connect platform is designed to enable data sharing at various levels depending on user permissions. At the most basic level (“does this specific variant exist in any individual in this cohort?”) it has lit a Beacon within the Global Alliance for Genomics and Health’s Beacon Network (www.beacon-network.org). At the next stage of sharing – finding similarities between patients in different databases with a matching phenotype and a candidate variant in the same gene – it is actively involved in the development of Matchmaker Exchange (www.matchmakerexchange.org), allowing users of different systems to securely exchange information to find confirmatory cases. And finally, since all patients within the system have been consented for data sharing, users of the system, after validation and authorization, are able to access datasets from other centres, providing an instant means of gathering cohorts for cross-validation and further study. Although open to any rare disease, the platform is currently enriched for neuromuscular and neurodegenerative phenotypes and includes almost 1000 genomic datasets from the NeurOmics project (www.rd-neuromics.eu) with several other contributions in the pipeline, including 1000 limb-girdle muscular dystrophy index cases from the Myo-Seq project (www.myo-seq.org) and more. The platform is free of charge to use and is open for contributions of NGS and phenotypic data from research labs worldwide via [email protected] </div

FigShare

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

<div>High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [<a href="https://www.force11.org/group/fairgroup" target="_blank">https://www.force11.org/group/fairgroup</a>] using nanopublications. An online tool (<a href="http://knowledge.bio" target="_blank">http://knowledge.bio</a>) is available to explore established and potential gene-disease associations in the context of other biomedical relations.</div

Directory of Open Access Journals

PubMed Central

FigShare

Correction of literature bias in the match score.

a,b) Distribution of genes and diseases recognized by LWAS when sorted by publication abundance (log number of MEDLINE abstracts). Red lines indicate the 5-abstract cut-off, below which concept profiles are not constructed. c,d) Distribution of gene and disease rank orders, binned in 10 percentile intervals (x-axis). Higher numbers indicating stronger associations (y-axis).</p

FigShare

Overview of LWAS workflow (concept profile creation and analysis).

Overview of LWAS workflow (concept profile creation and analysis).</p

FigShare

The relative distribution of LWAS association types.

Distribution of the top 105 highest-ranking implicit gene-disease pairs determined by manual inspection: Type I Gene family member (n = 71) represents gene-disease associations where a family member of the gene is causing the disease or a disease with very large phenotypic overlap; Type II Negation (n = 4) and Type III Homonym (n = 11) represent different classes of LWAS false positives composing 14% of the cases. Type IV Novel association (n = 19) indicates gene-disease associations of promise for follow up investigations.</p

FigShare

Top ranking overlapping concepts between Seckel Syndrome & CENPJ.

The contribution of each concept to the overall match score is given as a percentage.</p

FigShare

Gene-Disease LWAS using concept profiles and networks of implicit information.

a) Concepts X and Z share an association in a hypothetical concept network via an explicit link (co-occurrence) and multiple implicit links (indirect connections via an intermediate concept, Y1, Y2, and Y3). The concept profile for concept X is depicted where the weights (w) between concepts reflect the co-occurrence frequencies of each concept in the data source. b) Concept profiles for concepts X and Z have explicit links to concepts Y1, Y2, and Y3 but no explicit link between themselves, as reflected in their corresponding concept profiles. c) The intermediate shared concepts between concept profiles X and Z constitute implicit information, indirectly linking X and Z (red dotted line). The strength of the implicit link (match score) is computed as the inner product of the weights of matching concepts in the concept profiles. d & e) The distribution of concept profile size for gene (median 1142, maximum 56,028) and disease (median 995, maximum 81,562) concepts. f) The distribution of number of overlapping concepts between gene and disease concept profiles (median 180, maximum overlap 40,725). Only 23 concept pairs had no overlapping concepts. g) Concept profiles for the human gene CWH43 (left) and the disease “Hyperphosphatesia with Mental Retardation” (right) which share no explicit co-occurrence. The 37 overlapping concepts are shown clustered in between. Both the number and weights of these overlapping links contribute to the strength of the implicit association. h) The distribution of match scores (higher numbers indicating stronger associations) for the 204 million LWAS-derived gene-disease pairs for both the explicit (black) and implicit (red) associations.</p

FigShare

Ranked list of genes having high match scores to Seckel Syndrome based on overlapping concepts in their concept profiles generated until July 2009.

NUP85 is ambiguous, with PCNT as the synonym causing a homonym problem with the PCNT gene, and a large overlap of articles. ANTXR1 was previously labeled as ATR, causing the same sort of problem as for NUP85, with an overlap of articles with ATR. Only ATR had been identified as a causative gene for Seckel Syndrome by July 2009. Bold formatting indicates gene-disease associations derived by implicit information only (i.e., having no co-occurrences in the literature up to July 2009).</p

FigShare

Overlapping implicit gene-disease associations between LWAS and GWAS.

Green area: GWAS p-value cutoff of 10−5, yellow area: GWAS p-value cutoffs of 10−8, red horizontal area: LWAS 99th-percentile cutoff, blue horizontal area: LWAS 95th-percentile cutoff.</p

FigShare