22 research outputs found

    Haploinsufficiency predictions without study bias

    Get PDF
    Any given human individual carries multiple genetic variants that disrupt protein-coding genes, through structural variation, as well as nucleotide variants and indels. Predicting the phenotypic consequences of a gene disruption remains a significant challenge. Current approaches employ information from a range of biological networks to predict which human genes are haploinsufficient (meaning two copies are required for normal function) or essential (meaning at least one copy is required for viability). Using recently available study gene sets, we show that these approaches are strongly biased towards providing accurate predictions for well-studied genes. By contrast, we derive a haploinsufficiency score from a combination of unbiased large-scale high-throughput datasets, including gene co-expression and genetic variation in over 6000 human exomes. Our approach provides a haploinsufficiency prediction for over twice as many genes currently unassociated with papers listed in Pubmed as three commonly-used approaches, and outperforms these approaches for predicting haploinsufficiency for less-studied genes. We also show that fine-tuning the predictor on a set of well-studied ‘gold standard’ haploinsufficient genes does not improve the prediction for less-studied genes. This new score can readily be used to prioritize gene disruptions resulting from any genetic variant, including copy number variants, indels and single-nucleotide variants

    Genetic diagnosis of autoinflammatory disease patients using clinical exome sequencing

    Full text link
    Autoinflammatory diseases comprise a wide range of syndromes caused by dysregulation of the innate immune response. They are difficult to diagnose due to their phenotypic heterogeneity and variable expressivity. Thus, the genetic origin of the disease remains undetermined for an important proportion of patients. We aim to identify causal genetic variants in patients with suspected autoinflammatory disease and to test the advantages and limitations of the clinical exome gene panels for molecular diagnosis. Twenty-two unrelated patients with clinical features of autoinflammatory diseases were analyzed using clinical exome sequencing (~4800 genes), followed by bioinformatic analyses to detect likely pathogenic variants. By integrating genetic and clinical information, we found a likely causative heterozygous genetic variant in NFKBIA (p.D31N) in a North-African patient with a clinical picture resembling the deficiency of interleukin-1 receptor antagonist, and a heterozygous variant in DNASE2 (p.G322D) in a Spanish patient with a suspected lupus-like monogenic disorder. We also found variants likely to increase the susceptibility to autoinflammatory diseases in three additional Spanish patients: one with an initial diagnosis of juvenile idiopathic arthritis who carries two heterozygous UNC13D variants (p.R727Q and p.A59T), and two with early-onset inflammatory bowel disease harbouring NOD2 variants (p.L221R and p.A728V respectively). Our results show a similar proportion of molecular diagnosis to other studies using whole exome or targeted resequencing in primary immunodeficiencies. Thus, despite its main limitation of not including all candidate genes, clinical exome targeted sequencing can be an appropriate approach to detect likely causative variants in autoinflammatory diseases

    Nat Genet

    Get PDF
    Current methods for annotating and interpreting human genetic variation tend to exploit a single information type (for example, conservation) and/or are restricted in scope (for example, to missense changes). Here we describe Combined Annotation-Dependent Depletion (CADD), a method for objectively integrating many diverse annotations into a single measure (C score) for each variant. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants. We precompute C scores for all 8.6 billion possible human single-nucleotide variants and enable scoring of short insertions-deletions. C scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.DP1 HG007811/HG/NHGRI NIH HHS/United StatesDP1HG007811/DP/NCCDPHP CDC HHS/United StatesDP5 OD009145/OD/NIH HHS/United StatesDP5OD009145/OD/NIH HHS/United StatesU54 HG006493/HG/NHGRI NIH HHS/United StatesU54HG006493/HG/NHGRI NIH HHS/United States2014-09-01T00:00:00Z24487276PMC399297

    Meta-analysis of host response networks identifies a common core in tuberculosis

    Get PDF
    Tuberculosis remains a major global health challenge worldwide, causing more than a million deaths annually. To determine newer methods for detecting and combating the disease, it is necessary to characterise global host responses to infection. Several high throughput omics studies have provided a rich resource including a list of several genes differentially regulated in tuberculosis. An integrated analysis of these studies is necessary to identify a unified response to the infection. Such data integration is met with several challenges owing to platform dependency, patient heterogeneity, and variability in the extent of infection, resulting in little overlap among different datasets. Network-based approaches offer newer alternatives to integrate and compare diverse data. In this study, we describe a meta-analysis of host’s whole blood transcriptomic profiles that were integrated into a genome-scale protein–protein interaction network to generate response networks in active tuberculosis, and monitor their behaviour over treatment. We report the emergence of a highly active common core in disease, showing partial reversals upon treatment. The core comprises 380 genes in which STAT1, phospholipid scramblase 1 (PLSCR1), C1QB, OAS1, GBP2 and PSMB9 are prominent hubs. This network captures the interplay between several biological processes including pro-inflammatory responses, apoptosis, complement signalling, cytoskeletal rearrangement, and enhanced cytokine and chemokine signalling. The common core is specific to tuberculosis, and was validated on an independent dataset from an Indian cohort. A network-based approach thus enables the identification of common regulators that characterise the molecular response to infection, providing a platform-independent foundation to leverage maximum insights from available clinical data

    cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes

    Get PDF
    abstract: It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present cepip, a joint likelihood framework, for estimating a variant’s regulatory probability in a context-dependent manner. Our method exhibits significant GWAS signal enrichment and is superior to existing cell type-specific methods. Furthermore, using phenotypically relevant epigenomes to weight the GWAS single-nucleotide polymorphisms, we improve the statistical power of the gene-based association test.The electronic version of this article is the complete one and can be found online at: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1177-

    Network types and their application in natural variation studies in plants

    Get PDF
    We are in the age of data-driven biology. Not even a decade after the invention of high-throughput sequencing technologies, there are methods that accurately monitor DNA polymorphisms, transcription profiles, methylation states, transcription factor binding sites, chromatin compactness, nucleosome positions, dynamic histone marks, and so on. We are starting to generate comparable amounts of protein or metabolite data. A key issue is how are we going to make sense of all this information. Network analysis is the most promising method to integrate, query and display large amounts of data for human interpretation. This review shortly summarizes the basic types of networks, their properties and limitations. In addition, I introduce the application of networks to the study of the molecular mechanisms behind natural phenotypic variation

    Using network clustering to predict copy number variations associated with health disparities

    Get PDF
    Substantial health disparities exist between African Americans and Caucasians in the United States. Copy number variations (CNVs) are one form of human genetic variations that have been linked with complex diseases and often occur at different frequencies among African Americans and Caucasian populations. In this study, we aimed to investigate whether CNVs with differential population frequencies can contribute to health disparities from the perspective of gene networks. We inferred network clusters from two different human gene/protein networks. We then evaluated each network cluster for the occurrences of known pathogenic genes and genes located in CNVs with different population frequencies, and used false discovery rates (FDRs) to rank network clusters. This approach let us identify five clusters enriched with known pathogenic genes and with genes located in CNVs with different frequencies between African Americans and Caucasians. These clustering patterns predict four candidate causal population-specific CNVs that play potential roles in health disparities
    corecore