8 research outputs found
Identity-by-descent filtering of exome sequence data for diseaseāgene identification in autosomal recessive disorders
Motivation: Next-generation sequencing and exome-capture technologies are currently revolutionizing the way geneticists screen for disease-causing mutations in rare Mendelian disorders. However, the identification of causal mutations is challenging due to the sheer number of variants that are identified in individual exomes. Although databases such as dbSNP or HapMap can be used to reduce the plethora of candidate genes by filtering out common variants, the remaining set of genes still remains on the order of dozens
Using familial information for variant filtering in high-throughput sequencing studies
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, 'sporadic' cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data
Discovering rare-disease-causing genes in the Whole Exome Sequencing (WES) era: analysis of a heterogeneous cohort of families with rare Mendelian diseases
Rare Mendelian diseases are estimated to be ~7,000 and while each is individually rare, together contribute significantly to morbidity, mortality, and healthcare costs. Providing a timely and molecularly defined diagnosis is the goal of the scientific community to make and adequate disease management, to find knowledge-based targeted treatments, to arrange a surveillance program for later-onset comorbidities, to provide a genetic counseling with respect to recurrence risks and prenatal diagnosis options for families. The recent development of methods for exome sequence capture, called Whole Exome Sequencing (WES) made it possible to investigate all the coding variants present in an individual human genome, allowing both the screening of unknown and known disease-genes, rapidly and cost-effectively. Taking advantage of WES, we studied a cohort of 25 families with different rare diseases like: Crisponi syndrome/Cold-induced sweating syndrome type 1, Syndromic Intellectual Disabilities, Progeroid-like syndrome, Osteopetrosis autosomic recessive, Genetic hearing loss and Epileptic Encephalopathy. We found one or more pathogenic variants in 15/25 families and putative pathogenic variant in 3/25 families. We discuss about the issues related to the study of rare diseases and to the analysis of WES data, and conclude with the statement that WES is a powerful, cost effective and rapid way to discover new genes implies in rare disease
A Simulation-based Approach to Study Rare Variant Associations Across the Disease Spectrum
Although complete understanding of the mechanisms of rare genetic variants in disease continues to elude us, Next Generation Sequencing (NGS) has facilitated significant gene discoveries across the disease spectrum. However, the cost of NGS hinders its use for identifying rare variants in common diseases that require large samples. To circumvent the need for larger samples, designing efficient sampling studies is crucial in order to detect potential associations. This research therefore evaluates sampling designs for rare variant - quantitative trait association studies and assesses the effect on power that freely available public cohort data can have in the design. Performing simulations and evaluating common and unconventional sampling schemes results in several noteworthy findings. Specifically, the extreme-trait design is the most powerful design for analyzing quantitative traits. This research also shows that sampling more individuals from the extreme of clinical interest does not increase power.
Variant filtering has served as a "proof-of-concept" approach for the discovery of disease-causing genes in Mendelian traits and formal statistical methods have been lacking in this area. However, combining variant filtering schemes with existing rare variant association tests is a practical alternative. Thus, this thesis also compares the robustness of six burden-based rare variant association tests for Mendelian traits after a variant filtering step in the presence of genetic heterogeneity and genotyping errors. This research shows that with low locus heterogeneity, these tests are powerful for testing association. With the exception of the weighted sum statistic (WSS), the remaining tests were very conservative in preserving the type I error when the number of affected and unaffected individuals was unequal. The WSS, on the other
hand, had inflated type I error as the number of unaffected individuals increased. The framework presented can serve as a catalyst to improve sampling design and to develop robust statistical methods for association testing
Recommended from our members
Quantifying recent variation and relatedness in human populations
Advances in the genetic analysis of humans have revealed a surprising abundance of local relatedness between purportedly unrelated individuals. Where common mutations classically inform us of ancient relationships, such segments of pairwise identical by descent (IBD) sharing from a common ancestor are the observable traces of recent inter-mating. Combining these two distinct sources of information can help disentangle the complex genetic structure and flux in human populations. When considered together with a heritable trait, the segments can also be used to interrogate unascertained rare variation and help in locating trait-effecting loci. This work presents methods for comprehensive analysis of population-wide IBD and explores applications to disease and the understanding of recent genetic variation. We propose several strategies for efficient detection of IBD segments in population genotype data. Our novel seed-based algorithm, GERMLINE, can reduce the computational burden of finding pairwise segments from quadratic to nearly linear time in a general population. We demonstrate that this approach is several orders of magnitude faster than the available all-pairs methods while maintaining higher accuracy. Next, we extended the GERMLINE technique to process cohorts of unlimited size by adaptively adjusting the search mechanism to meet resource restrictions. We confirm its effectiveness with an analysis of 50,000 individuals where contemporary methods can only process a few thousand. One draw-back of these two algorithms is the dependence on phased haplotype data as input - a constraint that becomes more difficult with large populations. We propose a solution to this problem with an algorithm that analyzes genotype data directly by exploring all potential haplotypes and scoring each putative segment based on linkage-disequilibrium. This solution significantly outperforms available methods when applied to full sequence data and is computationally efficient enough to analyze thousands of sequenced genomes where current methods can only determine haplotypes for several hundred. Secondly, we outline two algorithms for analyzing available IBD segments to increase our understanding of rare variation and complex disease. Motivated by whole-genome sequencing, we present the INFOSTIP algorithm, which uses IBD segments to optimize the selection of individuals for complete population ascertainment. In simulations, we show that INFOSTIP selection can significantly increase variant inference accuracy over random sampling and posit inference of 60% of an isolated population from 1% optimally selected individuals. Seeking to move beyond pairwise IBD segment analysis, we describe the DASH algorithm, which groups shared segments into IBD "clusters" that are likely to be commonly co-inherited and uses them as proxies for un-typed variation. In simulated disease studies, we show this reference-free approach to be much more powerful for detecting rare causal variants than either traditional single-marker analysis or imputation from a general reference panel. Applying the DASH algorithm to disease traits from different populations, we identify multiple novel loci of association. Together, these novel techniques integrate the power of population and disease genetics
New techniques to detect genomic variation
Variation in structure and composition of the DNA are found throughout our genome. All types of variation are collectively called __genomic variation__. Identification and analysis of genomic variation is important to distinguish neutral variants (__non pathogenic__) from variants involved in disease (__pathogenic__). Identification of new disease genes will increase our knowledge of the molecular pathogenesis of genetic disorders. Every technical advance in genetic analysis has revealed new levels of variation, ranging from single nucleotide differences to full chromosome changes. As new DNA methods are applied, increasing numbers of variants with unclear significance to disease (UVs) are identified and choices have to be made regarding the variants that deserve follow-up work. When the pathogenic consequence of a variant is unclear, the effect has to be studied in detail at other levels (functional studies, RNA studies, in silico analysis tools, and databases). The research described in this thesis outlines the rapid development and application of molecular techniques for detecting (pathogenic) genomic variation in the context of genetic disorders.UBL - phd migration 201