595 research outputs found

    Successful identification of rare variants using oligogenic segregation analysis as a prioritizing tool for whole-exome sequencing studies

    Get PDF
    We aim to identify rare variants that have large effects on trait variance using a cost-efficient strategy. We use an oligogenic segregation analysis as a prioritizing tool for whole-exome sequencing studies to identify families more likely to harbor rare variants, by estimating the mean number of quantitative trait loci (QTLs) in each family. We hypothesize that families with additional QTLs, relative to the other families, are more likely to segregate functional rare variants. We test the association of rare variants with the traits only in regions where at least modest evidence of linkage with the trait is observed, thereby reducing the number of tests performed. We found that family 7 harbored an estimated two, one, and zero additional QTLs for traits Q1, Q2, and Q4, respectively. Two rare variants (C4S4935 and C6S2981) segregating in family 7 were associated with Q1 and explained a substantial proportion of the observed linkage signal. These rare variants have 31 and 22 carriers, respectively, in the 128-member family and entered through a single but different founder. For Q2, we found one rare variant unique to family 7 that showed small effect and weak evidence of association; this was a false positive. These results are a proof of principle that prioritizing the sequencing of carefully selected extended families is a simple and cost-efficient design strategy for sequencing studies aiming at identifying functional rare variants

    Strategies for selection of subjects for sequencing after detection of a linkage peak

    Get PDF
    Linkage analysis has the potential to localize disease genes of interest, but the choice of which subjects to select for follow-up sequencing after identifying a linkage peak might influence the ability to find a disease gene. We compare nine different strategies for selection of subjects for follow-up sequencing using sequence data from the Genetic Analysis Workshop 17. We found that our more selective strategies, which included methods to identify case subjects more likely to be affected by genetic causes, out-performed sequencing all case and control subjects in linked pedigrees and required sequencing fewer individuals. We found that using genotype data from population control subjects had a higher benefit-cost ratio than sequencing control subjects selected as being the opposite extreme of the case subjects. We conclude that choosing case subjects for sequencing based on more selective strategies can be reliable and cost-effective

    Enhancing the discovery of rare disease variants through hierarchical modeling

    Get PDF
    Advances in next-generation sequencing technology are enabling researchers to capture a comprehensive picture of genomic variation across large numbers of individuals with unprecedented levels of efficiency. The main analytic challenge in disease mapping is how to mine the data for rare causal variants among a sea of neutral variation. To achieve this goal, investigators have proposed a number of methods that exploit biological knowledge. In this paper, I propose applying a Bayesian stochastic search variable selection algorithm in this context. My multivariate method is inspired by the combined multivariate and collapsing method. In this proposed method, however, I allow an arbitrary number of different sources of biological knowledge to inform the model as prior distributions in a two-level hierarchical model. This allows rare variants with similar prior distributions to share evidence of association. Using the 1000 Genomes Project single-nucleotide polymorphism data provided by Genetic Analysis Workshop 17, I show that through biologically informative prior distributions, some power can be gained over noninformative prior distributions

    Two-stage study designs combining genome-wide association studies, tag single-nucleotide polymorphisms, and exome sequencing: accuracy of genetic effect estimates

    Get PDF
    Genome-wide association studies (GWAS) test for disease-trait associations and estimate effect sizes at tag single-nucleotide polymorphisms (SNPs), which imperfectly capture variation at causal SNPs. Sequencing studies can examine potential causal SNPs directly; however, sequencing the whole genome or exome can be prohibitively expensive. Costs can be limited by using a GWAS to detect the associated region(s) at tag SNPs followed by targeted sequencing to identify and estimate the effect size of the causal variant. Genetic effect estimates obtained from association studies can be inflated because of a form of selection bias known as the winner’s curse. Conversely, estimates at tag SNPs can be attenuated compared to the causal SNP because of incomplete linkage disequilibrium. These two effects oppose each other. Analysis of rare SNPs further complicates our understanding of the winner’s curse because rare SNPs are difficult to tag and analysis can involve collapsing over multiple rare variants. In two-stage analysis of Genetic Analysis Workshop 17 simulated data sets, we find that selection at the tag SNP produces upward bias in the estimate of effect at the causal SNP, even when the tag and causal SNPs are not well correlated. The bias similarly carries through to effect estimates for rare variant summary measures. Replication studies designed with sample sizes computed using biased estimates will be under-powered to detect a disease-causing variant. Accounting for bias in the original study is critical to avoid discarding disease-associated SNPs at follow up

    Shale anisotropy and natural hydraulic fracture propagation: An example from the Jurassic (Toarcian) Posidonienschiefer, Germany

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordData for this study are available at https://doi.org/10.26208/xny8-4t47.Cores recovered from the Jurassic (Toarcian) Posidonienschiefer (Posidonia Shale) in the Lower Saxony Basin, Germany, contain calcite filled fractures (veins) at low angle to bedding. The veins preferentially form where the shale is both organic rich and thermally mature, supporting previous interpretations that the veins formed as hydraulic fractures in response to volumetric expansion of organic material during catagenesis. Despite the presence of hydrocarbons during fracturing, the calcite fill is fibrous and so the veins appear to have contained a mineral-saturated aqueous solution as they formed. The veins also contain myriad host-rock inclusions having sub-millimetric spacing. These inclusions are strands of host rock that were entrained as the veins grew by separating the host rock along bedding planes, rather than cutting across planes. The veins therefore produce significantly more surface area—by a factor of roughly five, for the size of veins observed—compared to an inclusion-free fracture of the same size. Analysis of vein geometry indicates that, with propagation, fracture surface area increases with fracture length raised to a power between 1 and 2, assuming linear aperture-length scaling. As such, this type of fracture efficiently dissipates elastic strain energy as it lengthens, stabilizing propagation and precluding dynamic crack growth. The apparent separation of the host rock along bedding planes suggests that the mechanical weakness of bedding planes is the cause of this inherently stable style of propagationUniversity of Oxfor

    Identifying influential regions in extremely rare variants using a fixed-bin approach

    Get PDF
    In this study, we analyze the Genetic Analysis Workshop 17 data to identify regions of single-nucleotide polymorphisms (SNPs) that exhibit a significant influence on response rate (proportion of subjects with an affirmative affected status), called the affected ratio, among rare variants. Under the null hypothesis, the distribution of rare variants is assumed to be uniform over case (affected) and control (unaffected) subjects. We attempt to pinpoint regions where the composition is significantly different between case and control events, specifically where there are unusually high numbers of rare variants among affected subjects. We focus on private variants, which require a degree of “collapsing” to combine information over several SNPs, to obtain meaningful results. Instead of implementing a gene-based approach, where regions would vary in size and sometimes be too small to achieve a strong enough signal, we implement a fixed-bin approach, with a preset number of SNPs per region, relying on the assumption that proximity and similarity go hand in hand. Through application of 100-SNP and 30-SNP fixed bins, we identify several most influential regions, which later are seen to contain some of the causal SNPs. The 100- and 30-SNP approaches detected seven and three causal SNPs among the most significant regions, respectively, with two overlapping SNPs located in the ELAVL4 gene, reported by both procedures

    A pathway-based association analysis model using common and rare variants

    Get PDF
    How various genetic effects in combination affect susceptibility to certain disease states continues to be a major area of methodological research. Various rare variant models have been proposed, in response to a common failure to either identify or validate biologically driven causal genetic variants in genome-wide association studies. Adopting the idea that multiple rare variants may effectively produce a combined effect equal to a single common variant effect through common linkage with this variant, we construct a pathway-based genetic association analysis model using both common and rare variants. This genetic model is applied to the disease status of unrelated individuals in replication 1 from Genetic Analysis Workshop 17. In this simulated example, we were able to identify several pathways that were potentially associated with the disease status and found that common variants showed stronger genetic effect than rare variants

    Enriching rare variants using family-specific linkage information

    Get PDF
    Genome-wide association studies have been successful in identifying common variants for common complex traits in recent years. However, common variants have generally failed to explain substantial proportions of the trait heritabilities. Rare variants, structural variations, and gene-gene and gene-environment interactions, among others, have been suggested as potential sources of the so-called missing heritability. With the advent of exome-wide and whole-genome next-generation sequencing technologies, finding rare variants in functionally important sites (e.g., protein-coding regions) becomes feasible. We investigate the role of linkage information to select families enriched for rare variants using the simulated Genetic Analysis Workshop 17 data. In each replicate of simulated phenotypes Q1 and Q2 on 697 subjects in 8 extended pedigrees, we select one pedigree with the largest family-specific LOD score. Across all 200 replications, we compare the probability that rare causal alleles will be carried in the selected pedigree versus a randomly chosen pedigree. One example of successful enrichment was exhibited for gene VEGFC. The causal variant had minor allele frequency of 0.0717% in the simulated unrelated individuals and explained about 0.1% of the phenotypic variance. However, it explained 7.9% of the phenotypic variance in the eight simulated pedigrees and 23.8% in the family that carried the minor allele. The carrier’s family was selected in all 200 replications. Thus our results show that family-specific linkage information is useful for selecting families for sequencing, thus ensuring that rare functional variants are segregating in the sequencing samples

    Do rare variant genotypes predict common variant genotypes?

    Get PDF
    The synthetic association hypothesis proposes that common genetic variants detectable in genome-wide association studies may reflect the net phenotypic effect of multiple rare polymorphisms distributed broadly within the focal gene rather than, as often assumed, the effect of common functional variants in high linkage disequilibrium with the focal marker. In a recent study, Dickson and colleagues demonstrated synthetic association in simulations and in two well-characterized, highly polymorphic human disease genes. The converse of this hypothesis is that rare variant genotypes must be correlated with common variant genotypes often enough to make the phenomenon of synthetic association possible. Here we used the exome genotype data provided for Genetic Analysis Workshop 17 to ask how often, how well, and under what conditions rare variant genotypes predict the genotypes of common variants within the same gene. We found nominal evidence of correlation between rare and common variants in 21-30% of cases examined for unrelated individuals; this rate increased to 38-44% for related individuals, underscoring the segregation that underlies synthetic association
    corecore