124 research outputs found

    MAPPING GENES FOR QUANTITATIVE TRAITS USING SELECTED SAMPLES OF SIBLING PAIRS

    Get PDF
    One of the most important research areas in human genetics is the effort to map genes associated with complex diseases such as cancer, heart disease, and diabetes. The public health relevance of these kinds of work is that gene mapping will bring an understanding of genetic risk and protective factors, and a description of the interaction between environment and genetic variation. In the last ten years there has been a dramatic increase in the number of studies seeking to map genes for quantitative traits. This has caused an explosion of new work on statistical methods for human quantitative trait locus (QTL) mapping. However, little of that work has dealt with selected samples, which are more common than population samples for human studies. This dissertation focuses on sibling pairs and considers the most common types of selected sampling. I surveyed most QTL mapping methods in the literature to evaluate which are appropriate for selected samples, and also developed new statistics for selected samples. Using simulation and analytical approaches, I identified the most powerful statistics for each type of sampling considered. I then compared various sampling designs using the best statistic for each and gave guidelines for choosing appropriate and powerful designs under different scenarios

    A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection

    Get PDF
    Abstract Background The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. Results We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as β€œR-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG’s accuracy in CNV detection. Conclusions Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power

    Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

    Get PDF
    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/

    Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation

    Get PDF
    Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available

    CGDSNPdb: a database resource for error-checked and imputed mouse SNPs

    Get PDF
    The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the β€˜imputed genotype resource’ in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600 000 SNPs and over 900 000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login

    A New Method for Detecting Associations with Rare Copy-Number Variants

    Get PDF
    Copy number variants (CNVs) play an important role in the etiology of many diseases such as cancers and psychiatric disorders. Due to a modest marginal effect size or the rarity of the CNVs, collapsing rare CNVs together and collectively evaluating their effect serves as a key approach to evaluating the collective effect of rare CNVs on disease risk. While a plethora of powerful collapsing methods are available for sequence variants (e.g., SNPs) in association analysis, these methods cannot be directly applied to rare CNVs due to the CNV-specific challenges, i.e., the multi-faceted nature of CNV polymorphisms (e.g., CNVs vary in size, type, dosage, and details of gene disruption), and etiological heterogeneity (e.g., heterogeneous effects of duplications and deletions that occur within a locus or in different loci). Existing CNV collapsing analysis methods (a.k.a. the burden test) tend to have suboptimal performance due to the fact that these methods often ignore heterogeneity and evaluate only the marginal effects of a CNV feature. We introduce CCRET, a random effects test for collapsing rare CNVs when searching for disease associations. CCRET is applicable to variants measured on a multi-categorical scale, collectively modeling the effects of multiple CNV features, and is robust to etiological heterogeneity. Multiple confounders can be simultaneously corrected. To evaluate the performance of CCRET, we conducted extensive simulations and analyzed large-scale schizophrenia datasets. We show that CCRET has powerful and robust performance under multiple types of etiological heterogeneity, and has performance comparable to or better than existing methods when there is no heterogeneity

    The Recombinational Anatomy of a Mouse Chromosome

    Get PDF
    Among mammals, genetic recombination occurs at highly delimited sites known as recombination hotspots. They are typically 1–2 kb long and vary as much as a 1,000-fold or more in recombination activity. Although much is known about the molecular details of the recombination process itself, the factors determining the location and relative activity of hotspots are poorly understood. To further our understanding, we have collected and mapped the locations of 5,472 crossover events along mouse Chromosome 1 arising in 6,028 meioses of male and female reciprocal F1 hybrids of C57BL/6J and CAST/EiJ mice. Crossovers were mapped to a minimum resolution of 225 kb, and those in the telomere-proximal 24.7 Mb were further mapped to resolve individual hotspots. Recombination rates were evolutionarily conserved on a regional scale, but not at the local level. There was a clear negative-exponential relationship between the relative activity and abundance of hotspot activity classes, such that a small number of the most active hotspots account for the majority of recombination. Females had 1.2Γ— higher overall recombination than males did, although the sex ratio showed considerable regional variation. Locally, entirely sex-specific hotspots were rare. The initiation of recombination at the most active hotspot was regulated independently on the two parental chromatids, and analysis of reciprocal crosses indicated that parental imprinting has subtle effects on recombination rates. It appears that the regulation of mammalian recombination is a complex, dynamic process involving multiple factors reflecting species, sex, individual variation within species, and the properties of individual hotspots

    An imputed genotype resource for the laboratory mouse

    Get PDF
    We have created a high-density SNP resource encompassing 7.87 million polymorphic loci across 49 inbred mouse strains of the laboratory mouse by combining data available from public databases and training a hidden Markov model to impute missing genotypes in the combined data. The strong linkage disequilibrium found in dense sets of SNP markers in the laboratory mouse provides the basis for accurate imputation. Using genotypes from eight independent SNP resources, we empirically validated the quality of the imputed genotypes and demonstrate that they are highly reliable for most inbred strains. The imputed SNP resource will be useful for studies of natural variation and complex traits. It will facilitate association study designs by providing high density SNP genotypes for large numbers of mouse strains. We anticipate that this resource will continue to evolve as new genotype data become available for laboratory mouse strains. The data are available for bulk download or query at http://cgd.jax.org/

    One CNV Discordance in <span class="italic">NRXN1</span> Observed Upon Genome-wide Screening in 38 Pairs of Adult Healthy Monozygotic Twins

    Get PDF
    Monozygotic (MZ) twins stem from the same single fertilized egg and therefore share all their inherited genetic variation. This is one of the unequivocal facts on which genetic epidemiology and twin studies are based. To what extent this also implies that MZ twins share genotypes in adult tissues is not precisely established, but a common pragmatic assumption is that MZ twins are 100% genetically identical also in adult tissues. During the past decade, this view has been challenged by several reports, with observations of differences in post-zygotic copy number variations (CNVs) between members of the same MZ pair. In this study, we performed a systematic search for differences of CNVs within 38 adult MZ pairs who had been misclassified as dizygotic (DZ) twins by questionnaire-based assessment. Initial scoring by PennCNV suggested a total of 967 CNV discordances. The within-pair correlation in number of CNVs detected was strongly dependent on confidence score filtering and reached a plateau of r = 0.8 when restricting to CNVs detected with confidence score larger than 50. The top-ranked discordances were subsequently selected for validation by quantitative polymerase chain reaction (qPCR), from which one single ~120kb deletion in NRXN1 on chromosome 2 (bp 51017111-51136802) was validated. Despite involving an exon, no sign of cognitive/mental consequences was apparent in the affected twin pair, potentially reflecting limited or lack of expression of the transcripts containing this exon in nerve/brain

    A customized and versatile high-density genotyping array for the mouse

    Get PDF
    We designed a high-density mouse genotyping array containing 623,124 SNPs that capture the known genetic variation present in the laboratory mouse. The array also contains 916,269 invariant genomic probes that are targeted to functional elements and regions known to harbor segmental duplications. The array opens the door to the characterization of genetic diversity, copy number variation, allele specific gene expression and DNA methylation and will extend the successes of human genome-wide association studies to the mouse
    • …
    corecore