2,646 research outputs found

    Data-driven assessment of eQTL mapping methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of expression quantitative trait loci (eQTL) is a potentially powerful way to detect transcriptional regulatory relationships at the genomic scale. However, eQTL data sets often go underexploited because legacy QTL methods are used to map the relationship between the expression trait and genotype. Often these methods are inappropriate for complex traits such as gene expression, particularly in the case of epistasis.</p> <p>Results</p> <p>Here we compare legacy QTL mapping methods with several modern multi-locus methods and evaluate their ability to produce eQTL that agree with independent external data in a systematic way. We found that the modern multi-locus methods (Random Forests, sparse partial least squares, lasso, and elastic net) clearly outperformed the legacy QTL methods (Haley-Knott regression and composite interval mapping) in terms of biological relevance of the mapped eQTL. In particular, we found that our new approach, based on Random Forests, showed superior performance among the multi-locus methods.</p> <p>Conclusions</p> <p>Benchmarks based on the recapitulation of experimental findings provide valuable insight when selecting the appropriate eQTL mapping method. Our battery of tests suggests that Random Forests map eQTL that are more likely to be validated by independent data, when compared to competing multi-locus and legacy eQTL mapping methods.</p

    NATURAL AND ANTHROPOGENIC DRIVERS OF TREE EVOLUTIONARY DYNAMICS

    Get PDF
    Species of trees inhabit diverse and heterogeneous environments, and often play important ecological roles in such communities. As a result of their vast ecological breadth, trees have become adapted to various environmental pressures. In this dissertation I examine various environmental factors that drive evolutionary dynamics in threePinusspecies in California and Nevada, USA. In chapter two, I assess the role of management influence of thinning, fire, and their interaction on fine-scale gene flow within fire-suppressed populations of Pinus lambertiana, a historically dominant and ecologically important member of mixed-conifer forests of the Sierra Nevada, California. Here, I find evidence that treatment prescription differentially affects fine-scale genetic structure and effective gene flow in this species. In my third chapter, I describe the development of a dense linkage map for Pinus balfouriana which I use in chapter four to assess the quantitative trait locus (QTL) landscape of water-use efficiency across two isolated ranges of the species. I find evidence that precipitation-related variables structure the geographical range of P. balfouriana, that traits related to water-use efficiency are heritable and differentiated across populations, and associated QTLs underlying this phenotypic variation explain large proportions of total variation. In chapter five, I assess evidence for local adaptation to the eastern Sierra Nevada rain shadow within P. albicaulisacross fine spatial scales of the Lake Tahoe Basin, USA. Here, genetic variation of traits related to water availability were structured more so across populations than neutral variation, and loci identified by genome-wide association methods show elevated signals of local adaptation that track soil water availability. In chapter six, I review theory related to polygenic local adaptation and literature of genotype-phenotype associations in trees. I find that evidence suggests a polygenic basis for many traits important to conservation and industry, and I suggest paths forward to best describing such genetic bases in tree species. Overall, my results show that spatial and genetic structure of trees are often driven by their environment, and that ongoing selective pressures driven by environmental change will continue to be important in these systems

    Applications and extensions of Random Forests in genetic and environmental studies

    Get PDF
    Transcriptional regulation refers to the molecular systems that control the concentration of mRNA species within the cell. Variation in these controlling systems is not only responsible for many diseases, but also contributes to the vast phenotypic diversity in the biological world. There are powerful experimental approaches to probe these regulatory systems, and the focus of my doctoral research has been to develop and apply effective computational methods that exploit these rich data sets more completely. First, I present a method for mapping genetic regulators of gene expression (expression quantitative trait loci, or eQTL) using Random Forests. This approach allows for flexible modeling and feature selection, and results in eQTL that are more biologically supportable than those mapped with competing methods. Next, I present a method that finds interactions between genes that in turn regulate the expression of other genes. This is accomplished by finding recurring decision motifs in the forest structure that represent dependencies between genetic loci. Third, I present a method to use distributional differences in eQTL data to establish the regulatory roles of genes relative to other disease-associated genes. Using this method, we found that genes that are master regulators of other disease genes are more likely to be consistently associated with the disease in genetic association studies. Finally, I present a novel application of Random Forests to determine the mode of regulation of toxin-perturbed genes, using time-resolved gene expression. The results demonstrate a novel approach to supervised weighted clustering of gene expression data

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

    Genetic studies and improvement of Pinus caribaea morelet

    Get PDF

    Ecological parameters in a Bombina hybrid zone

    Get PDF

    Molecular Marker Linkage Mapping in Southern Pine (Longleaf Pine and Slash Pine).

    Get PDF
    The goal of this work was to develop molecular markers for use in a backcross breeding program to speed the introgression of genes influencing rapid early height growth (EHG) from slash pine (Pinus elliottii Engelm. var. elliottii) into longleaf pine (Pinus palustris Mill.). The efficacy of molecular markers for genetic mapping in the Pinaceae was determined in segregating haploid and diploid populations. Initial screening for genetic polymorphisms was conducted using the random amplified polymorphic DNA (RAPD) technique. Using DNAs obtained from haploid megagametophytes, a RAPD-based genetic map for longleaf pine clone 3-356 (16 linkage groups and 6 pairs (133 markers) covering 1,635 cM) was constructed. Concern regarding the efficacy of RAPD data lead to a series of computer simulations investigating the effects of missing and mis-scored data on linkage group construction. Given the parameters investigated, levels as high as 15% missing data and 2% mis-scored data still provided accurate low-to medium-density map construction. Individual parental maps were constructed with F\sb1 progeny from a slash pine H-28 (X) x longleaf pine 3-356 (X) cross. The longleaf pine 3-356 map consisted of 18 groups and 3 pairs (122 markers) covering 1367.5 cM, and the slash pine H-28 map 13 groups and 6 pairs (91 markers) covering 952.9 cM. Orders and distances of loci in common between the two maps constructed for longleaf pine 3-356 were compared. Orders were found to be conserved for those groups containing three or more loci. However, genetic distance estimates varied considerably, but not in any systematic manner. RAPD and allozyme loci identified as being heterozygous in both parents were utilized to combine the parent-specific maps constructed for the slash pine x longleaf pine cross. Five RAPD loci and one allozyme locus suggested homology between the otherwise parent-specific linkage groups. Substantial phenotypic variation for EHG was observed in the F\sb1 population, therefore the parent-specific markers and maps were used to localize putative EHG QTL. Using simultaneous marker models (multiple regression), marker loci were found to be significantly associated with QTL influencing hypocotyl length, total height, brown spot resistance and root collar diameter

    Identification of quantitative trait loci influencing early height growth in longleaf pine (Pinus palustris Mill)

    Get PDF
    The delay in early height growth (EHG) has been a limiting factor for artificial regeneration of longleaf pine (Pinus palustris Mill.). Simple Sequence Repeat (SSR) markers have been used to map the genome and quantitative trait loci controlling the EHG in a backcross family (longleaf pine x slash pine) x longleaf pine. A total of 228 locus specific SSR markers were screened against 6 longleaf pine recurrent parents and a sample of 7 longlef x slash pine hybrid parents. In total, 135 polymorphic markers were identified. Based on the genetic variance in EHG, available sample size, and the number of SSR marker polymorphisms, a half-sib family with a common paternal parent (Derr488) and 6 longleaf maternal parents were selected from 27 backcross families as the final mapping population. One hundred and twenty three (123) polymorphic markers showed polymorphisms across the half-sib family. An individual linkage map was built for each full-sib family first, and then the linkage maps from different full-sib families were integrated by common orthologous SSR markers with software JoinMap (ver3.0). There were 112 polymorphic markers mapped to the integrated map which contained 16 linkage groups. The observed map length was 1874.3 cM and covered 79.85% of genome. The estimated 95% confidence interval for genome length was 1781.3-2411.6 cM. Seventeen (17) QTLs were identified by single marker regression using 305 backcross progenies. For the interval mapping, the tallest and shortest 8 percent of seedlings were selected for QTL detection (phase I), and then random selections of 8 percent of the seedlings from the rest of the population and 25 seedlings from both tails of the within family distributions were used for unbiased QTL verification and mapping (phase II). Nine QTLs were detected and verified as associated with the 5 growth traits under P=0.05 chromosome-wide threshold. There was only weak evidence of QTL stability during the three years of growth under this study
    • ā€¦
    corecore