38 research outputs found

    Identification of de novo variants in nonsyndromic cleft lip with/without cleft palate patients with low polygenic risk scores

    Get PDF
    Background: Nonsyndromic cleft lip with/without cleft palate (nsCL/P) is a congenital malformation of multifactorial etiology. Research has identified >40 genome-wide significant risk loci, which explain less than 40% of nsCL/P heritability. Studies show that some of the hidden heritability is explained by rare penetrant variants. Methods: To identify new candidate genes, we searched for highly penetrant de novo variants (DNVs) in 50 nsCL/P patient/parent-trios with a low polygenic risk for the phenotype (discovery). We prioritized DNV-carrying candidate genes from the discovery for resequencing in independent cohorts of 1010 nsCL/P patients of diverse ethnicities and 1574 population-matched controls (replication). Segregation analyses and rare variant association in the replication cohort, in combination with additional data (genome-wide association data, expression, protein-protein-interactions), were used for final prioritization. Conclusion: In the discovery step, 60 DNVs were identified in 60 genes, including a variant in the established nsCL/P risk gene CDH1. Re-sequencing of 32 prioritized genes led to the identification of 373 rare, likely pathogenic variants. Finally, MDN1 and PAXIP1 were prioritized as top candidates. Our findings demonstrate that DNV detection, including polygenic risk score analysis, is a powerful tool for identifying nsCL/P candidate genes, which can also be applied to other multifactorial congenital malformations.Funding information: The present study was supported by the German Research Foundation (DFG)-Grants BE 3828/8-1, LU 1944/2-1, MA 2546/5-1, and LU1944/3-1. ACKNOWLEDGMENTS: The authors thank all patients, relatives, and control individuals for their participation. We thank the German support group for individuals with cleft lip and/or palate (Wolfgang Rosenthal Gesellschaft) for assistance with recruitment.We acknowledge the invaluable assistance of all clinical, laboratory, and bioinformatic personnel. The authors thank the Next Generation Sequencing Core Facility of the Medical Faculty of the University of Bonn for sequencing the samples that were used in this study. DbGaP datasets were accessed through dbGaP accession number phs000094.v1.p1 (Supplemental Acknowledgments). Finally, the authors thank the Genome Aggregation Database (gnomAD), and all groups that provided exome and genome variant data to this resource. A full list of gnomAD contributors is provided in the gnomAD flagship paper (Karczewski et al., 2020). Open Access funding enabled and organized by Projekt DEAL

    Extending the allelic spectrum at noncoding risk loci of orofacial clefting

    Get PDF
    Genome-wide association studies (GWAS) have generated unprecedented insights into the genetic etiology of orofacial clefting (OFC). The moderate effect sizes of associated noncoding risk variants and limited access to disease-relevant tissue represent considerable challenges for biological interpretation of genetic findings. As rare variants with stronger effect sizes are likely to also contribute to OFC, an alternative approach to delineate pathogenic mechanisms is to identify private mutations and/or an increased burden of rare variants in associated regions. This report describes a framework for targeted resequencing at selected noncoding risk loci contributing to nonsyndromic cleft lip with/without cleft palate (nsCL/P), the most frequent OFC subtype. Based on GWAS data, we selected three risk loci and identified candidate regulatory regions (CRRs) through the integration of credible SNP information, epigenetic data from relevant cells/tissues, and conservation scores. The CRRs (total 57 kb) were resequenced in a multiethnic study population (1061 patients; 1591 controls), using single-molecule molecular inversion probe technology. Combining evidence from in silico variant annotation, pedigree- and burden analyses, we identified 16 likely deleterious rare variants that represent new candidates for functional studies in nsCL/P. Our framework is scalable and represents a promising approach to the investigation of additional congenital malformations with multifactorial etiology

    Identification of de novo variants in nonsyndromic cleft lip with/without cleft palate patients with low polygenic risk scores

    Get PDF
    [Background]: Nonsyndromic cleft lip with/without cleft palate (nsCL/P) is a congenital malformation of multifactorial etiology. Research has identified >40 genome-wide significant risk loci, which explain less than 40% of nsCL/P heritability. Studies show that some of the hidden heritability is explained by rare penetrant variants. [Methods]: To identify new candidate genes, we searched for highly penetrant de novo variants (DNVs) in 50 nsCL/P patient/parent-trios with a low polygenic risk for the phenotype (discovery). We prioritized DNV-carrying candidate genes from the discovery for resequencing in independent cohorts of 1010 nsCL/P patients of diverse ethnicities and 1574 population-matched controls (replication). Segregation analyses and rare variant association in the replication cohort, in combination with additional data (genome-wide association data, expression, protein–protein-interactions), were used for final prioritization. [Conclusion]: In the discovery step, 60 DNVs were identified in 60 genes, including a variant in the established nsCL/P risk gene CDH1. Re-sequencing of 32 prioritized genes led to the identification of 373 rare, likely pathogenic variants. Finally, MDN1 and PAXIP1 were prioritized as top candidates. Our findings demonstrate that DNV detection, including polygenic risk score analysis, is a powerful tool for identifying nsCL/P candidate genes, which can also be applied to other multifactorial congenital malformations.The present study was supported by the German Research Foundation (DFG)-Grants BE 3828/8-1, LU 1944/2-1, MA 2546/5-1, and LU1944/3-1

    Knowledge discovery in biological big data : Tailor-made data analysis algorithms integrating expert knowledge

    No full text
    Over course of recent decades, rapid technological advances have led to the advent of big data analysis within biology and environmental science fields. This development has been enabled by new technologies such as data sharing and storing, alongside novel high-throughput methods, to generate large datasets at comparably low costs. Biological big data share common characteristics including heterogeneity, a large number of variables, and high noise. Traditional methods for data analysis and visualization are often not able to handle these characteristics and therefore fail to extract biologically meaningful results. To separate relevant knowledge from random patterns, expert knowledge is needed. A promising way to solve this problem is to integrate this expert knowledge in data mining techniques, which are especially suited for the analysis of big data. The aim of this study is the integration of expert knowledge in the analysis of big biological data. To achieve this, a data analysis workflow utilizing the characteristics of biological data was developed. This workflow was applied to three different big biological datasets from environmental research: a) Gene expression data from zebrafish (Danio rerio) following exposure to different environmental contaminants b) Taxonomic data and environmental parameters from a global soil-zoology database c) Fungal DNA sequence data from soil samples taken in differently managed forests. All three datasets were analysed via a data mining workflow, which consisted of preprocessing, application of a data mining algorithm, and visualisation, to handle the volume and complexity of the data. At different steps of the analysis workflow, domain-specific expert knowledge was integrated. In this manner, irrelevant or insignificant results were excluded, and only biologically meaningful results were derived. The integration of expert knowledge in the analysis of the zebrafish data strongly reduced data noise to reveal genes and patterns, which react specifically to one of the contaminants. An adapted version of the framework filtered out unimportant variables from the soil-zoology database and helped determine biologically relevant classes of the remaining parameters. Expert knowledge was then used to identify essential patterns in fungal communities and determine habitat-specific ecological guild compositions in the different forests. At specific steps, the collaboration of a domain expert and a data scientist turned out to be crucial for the success of the analysis. The workflow helped to identify these steps by subdividing the complex data analysis into smaller and more straightforward work tasks. Powerful visualizations were essential to enhance and improve the cooperation as they provided a platform for discussion and validation of the results. The ability to show multiple aspects of the data via a wide range of applications was one of the keys to the collaboration and all three applications relied heavily on them. The results of the present thesis demonstrate how domain-specific expert knowledge can be used to improve the results of data mining approaches in the analysis of big, heterogeneous biological data. The cooperation of data scientists and domain experts made it possible to account for the characteristics of the individual subjectspecific datasets, whilst maintaining the power of the data mining approaches
    corecore