252 research outputs found
Fourteen years of R/qtl: Just barely sustainable
R/qtl is an R package for mapping quantitative trait loci (genetic loci that
contribute to variation in quantitative traits) in experimental crosses. Its
development began in 2000. There have been 38 software releases since 2001. The
latest release contains 35k lines of R code and 24k lines of C code, plus 15k
lines of code for the documentation. Challenges in the development and
maintenance of the software are discussed. A key to the success of R/qtl is
that it remains a central tool for the chief developer's own research work, and
so its maintenance is of selfish importance.Comment: Previously submission to First Workshop on Sustainable Software for
Science: Practice and Experiences (WSSSPE),
http://wssspe.researchcomputing.org.uk; revised for submission to the Journal
of Open Research Software, http://openresearchsoftware.metajnl.com
USE OF HIDDEN MARKOV MODELS FOR QTL MAPPING
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem
The Genomes of Recombinant Inbred Lines: The Gory Details
Recombinant inbred lines (RILs) can serve as powerful tools for genetic mapping. Recently, members of the Complex Trait Consortium have proposed the development of a large panel of eight-way RILs in the mouse, derived from eight genetically diverse parental strains. Such a panel would be a valuable community resource. The use of such eight-way RILs will require a detailed understanding of the relationship between alleles at linked loci on an RI chromosome. We extend the work of Haldane and Waddington (1931) on twoway RILs and describe the map expansion, clustering of breakpoints, and other features of the genomes of multiple-strain RILs as a function of the level of crossover interference in meiosis.
In this technical report, we present all of our results, in their gory detail. We don’t intend to include such details in the final publication, but want to present them here for those who might be interested
Estimating the Number of Essential Genes in a Genome by Random Transposon Mutagenesis
We describe a Bayesian method for estimating the number of essential genes in a genome, on the basis of data on viable mutants for which a single transposon was inserted after a random TA site in a genome,potentially disrupting a gene. The prior distribution for the number of essential genes was taken to be uniform. A Gibbs sampler was used to estimate the posterior distribution. The method is illustrated with simulated data. Further simulations were used to study the performance of the procedure
Genotype Probabilities at Intermediate Generations in the Construction of Recombinant Inbred Lines
The mouse Collaborative Cross (CC) is a panel of eight-way recombinant inbred lines: eight diverse parental strains are intermated, followed by repeated sibling mating, many times in parallel, to create a new set of inbred lines whose genomes are random mosaics of the genomes of the original eight strains. Many generations are required to reach inbreeding, and so a number of investigators have sought to make use of phenotype and genotype data on mice from intermediate generations during the formation of the CC lines (so-called pre-CC mice). The development of a hidden Markov model for genotype reconstruction in such pre-CC mice, on the basis of incompletely informative genetic markers (such as single-nucleotide polymorphisms), formally requires the two-locus genotype probabilities at an arbitrary generation along the path to inbreeding. In this article, I describe my efforts to calculate such probabilities. While closed-form solutions for the two-locus genotype probabilities could not be derived, I provide a prescription for calculating such probabilities numerically. In addition, I present a number of useful quantities, including single-locus genotype probabilities, two-locus haplotype probabilities, and the fixation probability and map expansion at each generation along the course to inbreeding
Unification of Variance Components and Haseman-Elston Regression for Quantitative Trait Linkage Analysis
Two of the major approaches for linkage analysis with quantitative traits in humans include variance components and Haseman-Elston regression. Previously, these have been viewed as quite separate methods. We describe a general model, fit by use of generalized estimating equations (GEE), for which the variance components and Haseman-Elston methods (including many of the extensions to the original Haseman-Elston method) are special cases, corresponding to different choices for a working covariance matrix. We also show that the regression-based test of Sham et al.(2002) is equivalent to a robust score statistic derived from our GEE approach. These results have several important implications. First, this work provides new insight regarding the connection between these methods. Second, asymptotic approximations for power and sample size allow clear comparisons regarding the relative efficiency of the different methods. Third, our general framework suggests important extensions to the Haseman-Elston approach which make more complete use of the data in extended pedigrees and allow a natural incorporation of environmental and other covariates
POOR PERFORMANCE OF BOOTSTRAP CONFIDENCE INTERVALS FOR THE LOCATION OF A QUANTITATIVE TRAIT LOUCS
The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this context. The profile likelihood (or LOD curve) for QTL location has a tendency to peak at genetic markers, and so the distribution of the maximum likelihood estimate (MLE) of QTL location has the unusual feature of point masses at genetic markers; this contributes to the poor behavior of the bootstrap. Likelihood support intervals and approximate Bayes credible intervals, on the other hand, are shown to behave appropriately
- …