55 research outputs found

    FaST-LMM for two-way Epistasis tests on high performance clusters

    Get PDF
    [EN] We introduce a version of the epistasis test in FaST-LMM for clusters of multithreaded processors. This new software maintains the sensitivity of the original FaST-LMM while delivering acceleration that is close to linear on 12-16 nodes of two recent platforms, with respect to improved implementation of FaST-LMM presented in an earlier work. This efficiency is attained through several enhancements on the original single-node version of FaST-LMM, together with the development of a message passing interface (MPI)-based version that ensures a balanced distribution of the workload as well as a multigraphics processing unit (GPU) module that can exploit the presence of multiple GPUs per node.The researchers from the Universitat Jaume I were supported by projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER.MartĂ­nez, H.; Barrachina, S.; Castillo, M.; Quintana OrtĂ­, ES.; Rambla De Argila, J.; Farre, X.; Navarro, A. (2018). FaST-LMM for two-way Epistasis tests on high performance clusters. Journal of Computational Biology. 25(8):862-870. https://doi.org/10.1089/cmb.2018.0087S86287025

    WISH-R- a fast and efficient tool for construction of epistatic networks for complex traits and diseases

    Get PDF
    Abstract Background Genetic epistasis is an often-overlooked area in the study of the genomics of complex traits. Genome-wide association studies are a useful tool for revealing potential causal genetic variants, but in this context, epistasis is generally ignored. Data complexity and interpretation issues make it difficult to process and interpret epistasis. As the number of interaction grows exponentially with the number of variants, computational limitation is a bottleneck. Gene Network based strategies have been successful in integrating biological data and identifying relevant hub genes and pathways related to complex traits. In this study, epistatic interactions and network-based analysis are combined in the Weighted Interaction SNP hub (WISH) method and implemented in an efficient and easy to use R package. Results The WISH R package (WISH-R) was developed to calculate epistatic interactions on a genome-wide level based on genomic data. It is easy to use and install, and works on regular genomic data. The package filters data based on linkage disequilibrium and calculates epistatic interaction coefficients between SNP pairs based on a parallelized efficient linear model and generalized linear model implementations. Normalized epistatic coefficients are analyzed in a network framework, alleviating multiple testing issues and integrating biological signal to identify modules and pathways related to complex traits. Functions for visualizing results and testing runtimes are also provided. Conclusion The WISH-R package is an efficient implementation for analyzing genome-wide epistasis for complex diseases and traits. It includes methods and strategies for analyzing epistasis from initial data filtering until final data interpretation. WISH offers a new way to analyze genomic data by combining epistasis and network based analysis in one method and provides options for visualizations. This alleviates many of the existing hurdles in the analysis of genomic interactions

    A Modular Parallel Pipeline Architecture for GWAS Applications in a Cluster Environment

    Get PDF
    A Genome Wide Association Study (GWAS) is an important bioinformatics method to associate variants with traits, identify causes of diseases and increase plant and crop production. There are several optimizations for improving GWAS performance, including running applications in parallel. However, it can be difficult for researchers to utilize different data types and workflows using existing approaches. A potential solution for this problem is to model GWAS algorithms as a set of modular tasks. In this thesis, a modular pipeline architecture for GWAS applications is proposed that can leverage a parallel computing environment as well as store and retrieve data using a shared data cache. To show that the proposed architecture increases performance of GWAS applications, two case studies are conducted in which the proposed architecture is implemented on a bioinformatics pipeline package called TASSEL and a GWAS application called FaST-LMM using both Apache Spark and Dask as the parallel processing framework and Redis as the shared data cache. The case studies implement parallel processing modules and shared data cache modules according to the specifications of the proposed architecture. Based on the case studies, a number of experiments are conducted that compare the performance of the implemented architecture on a cluster environment with the original programs. The experiments reveal that the modified applications indeed perform faster than the original sequential programs. However, the modified applications do not scale with cluster resources, as the sequential part of the operations prevent the parallelization from having linear scalability. Finally, an evaluation of the architecture was conducted based on feedback from software developers and bioinformaticians. The evaluation reveals that the domain experts find the architecture useful; the implementations have sufficient performance improvement and they are also easy to use, although a GUI based implementation would be preferable

    Genome-Wide Association Studies to Improve Wood Properties: Challenges and Prospects

    Get PDF
    Wood formation is an excellent model system for quantitative trait analysis due to the strong associations between the transcriptional and metabolic traits that contribute to this complex process. Investigating the genetic architecture and regulatory mechanisms underlying wood formation will enhance our understanding of the quantitative genetics and genomics of complex phenotypic variation. Genome-wide association studies (GWASs) represent an ideal statistical strategy for dissecting the genetic basis of complex quantitative traits. However, elucidating the molecular mechanisms underlying many favorable loci that contribute to wood formation and optimizing GWAS design remain challenging in this omics era. In this review, we summarize the recent progress in GWAS-based functional genomics of wood property traits in major timber species such as Eucalyptus, Populus, and various coniferous species. We discuss several appropriate experimental designs for extensive GWAS in a given undomesticated tree population, such as omics-wide association studies and high-throughput phenotyping technologies. We also explain why more attention should be paid to rare allelic and major structural variation. Finally, we explore the potential use of GWAS for the molecular breeding of trees. Such studies will help provide an integrated understanding of complex quantitative traits and should enable the molecular design of new cultivars

    NATURAL AND ANTHROPOGENIC DRIVERS OF TREE EVOLUTIONARY DYNAMICS

    Get PDF
    Species of trees inhabit diverse and heterogeneous environments, and often play important ecological roles in such communities. As a result of their vast ecological breadth, trees have become adapted to various environmental pressures. In this dissertation I examine various environmental factors that drive evolutionary dynamics in threePinusspecies in California and Nevada, USA. In chapter two, I assess the role of management influence of thinning, fire, and their interaction on fine-scale gene flow within fire-suppressed populations of Pinus lambertiana, a historically dominant and ecologically important member of mixed-conifer forests of the Sierra Nevada, California. Here, I find evidence that treatment prescription differentially affects fine-scale genetic structure and effective gene flow in this species. In my third chapter, I describe the development of a dense linkage map for Pinus balfouriana which I use in chapter four to assess the quantitative trait locus (QTL) landscape of water-use efficiency across two isolated ranges of the species. I find evidence that precipitation-related variables structure the geographical range of P. balfouriana, that traits related to water-use efficiency are heritable and differentiated across populations, and associated QTLs underlying this phenotypic variation explain large proportions of total variation. In chapter five, I assess evidence for local adaptation to the eastern Sierra Nevada rain shadow within P. albicaulisacross fine spatial scales of the Lake Tahoe Basin, USA. Here, genetic variation of traits related to water availability were structured more so across populations than neutral variation, and loci identified by genome-wide association methods show elevated signals of local adaptation that track soil water availability. In chapter six, I review theory related to polygenic local adaptation and literature of genotype-phenotype associations in trees. I find that evidence suggests a polygenic basis for many traits important to conservation and industry, and I suggest paths forward to best describing such genetic bases in tree species. Overall, my results show that spatial and genetic structure of trees are often driven by their environment, and that ongoing selective pressures driven by environmental change will continue to be important in these systems

    The discovery of novel recessive genetic disorders in dairy cattle : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Animal Science at AL Rae Centre of Genetics and Breeding, Massey University, Palmerston North, New Zealand

    Get PDF
    The selection of desirable characteristics in livestock has resulted in the transmission of advantageous genetic variants for generations. The advent of artificial insemination has accelerated the propagation of these advantageous genetic variants and led to tremendous advances in animal productivity. However, this intensive selection has led to the rapid uptake of deleterious alleles as well. Recently, a recessive mutation in the GALNT2 gene was identified to dramatically impair growth and production traits in dairy cattle causing small calf syndrome. The research presented here seeks to further investigate the presence and impact of recessive mutations in dairy cattle. A primary aim of genetics is to identify causal variants and understand how they act to manipulate a phenotype. As datasets have expanded, larger analyses are now possible and statistical methods to discover causal mutations have become commonplace. One such method, the genome-wide association study (GWAS), presents considerable exploratory utility in identifying quantitative trait loci (QTL) and causal mutations. GWAS' have predominantly focused on identifying additive genetic effects assuming that each allele at a locus acts independently of the other, whereas non-additive effects including dominant, recessive, and epistatic effects have been neglected. Here, we developed a single-locus non-additive GWAS model intended for the detection of dominant and recessive genetic mechanisms. We applied our non-additive GWAS model to growth, developmental, and lactation phenotypes in dairy cattle. We identified several candidate causal mutations that are associated with moderate to large deleterious recessive disorders of animal welfare and production. These mutations included premature-stop (MUS81, ITGAL, LRCH4, RBM34), splice disrupting (FGD4, GALNT2), and missense (PLCD4, MTRF1, DPF2, DOCK8, SLC25A4, KIAA0556, IL4R) variants, and these occur at surprisingly high frequencies in cattle. We further investigated these candidates for anatomical, molecular, and metabolic phenotypes to understand how these disorders might manifest. In some cases, these mutations were analogous to disorder-causing mutations in other species, these included: Coffin-Siris syndrome (DPF2); Charcot Marie Tooth disease (FGD4); a congenital disorder of glycosylation (GALNT2); hyper Immunoglobulin-E syndrome (DOCK8); Joubert syndrome (KIAA0556); and mitochondrial disease (SLC25A4). These discoveries demonstrate that deleterious recessive mutations exist in dairy cattle at remarkably high frequencies and we are able to detect these disorders through modern genotyping and phenotyping capabilities. These are important findings that can be used to improve the health and productivity of dairy cattle in New Zealand and internationally

    The role of the environment in eco-evolutionary feedback dynamics

    Get PDF
    In my thesis, I studied the effect of environmental changes such as the induction of abiotic stress and spatial structure in the link between evolution and ecology with the aim to develop an understanding when and how often ecological and evolutionary dynamics interplay to affect the fate of natural populations.In meiner Doktorarbeit untersuchte ich die Auswirkung von Umweltveränderungen wie abiotischem Stress und räumlicher Struktur auf das Verhältnis zwischen Evolution und Ökologie. Mein Ziel ist es, ein Verständnis dafür zu entwickeln, in welchem Umfang ökologische und evolutionäre Dynamik zusammenwirken, um natürliche Populationen zu beeinflussen

    A phylogenetic method to perform genome-wide association studies in microbes

    Get PDF
    Genome-Wide Association Studies (GWAS) are designed to perform an unbiased search of genetic sequence data with the intent of identifying statistically significant associations with a phenotype or trait of interest. The application of GWAS methods to microbial organisms promises to improve the way we understand, manage, and treat infectious diseases. Yet, while microbial pathogens continue to undermine human health, wealth, and longevity, microbial GWAS methods remain unable to fully capitalise on the growing wealth of bacterial and viral genetic sequence data. Clonal population structure and homologous recombination in microbial organisms make it difficult for existing GWAS methods to achieve both the precision needed to reject false positive findings and the statistical power required to detect genuine associations between microbial genotypic and phenotypic variants. In this thesis, we investigate potential solutions to the most substantial methodological challenges in microbial GWAS, and we introduce a new phylogenetic GWAS approach that has been specifically designed for use in bacterial samples. In presenting our approach, we describe the features that render it robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Our approach is applicable to organisms ranging from purely clonal to frequently recombining, to sequence data from both the core and accessory genome, and to binary, categorical, and continuous phenotypes. We also describe the efforts taken to make our method efficient, scalable, and accessible in its implementation within the open-source R package we have created, called treeWAS. Next, we apply our GWAS method to simulated datasets. We develop multiple frameworks for simulating genotypic and phenotypic data with control over relevant parameters. We then present the results of our simulation study, and we use thorough performance testing to demonstrate the power and specificity of our approach, as compared to the performance of alternative cluster-based and dimension-reduction methods. Our approach is then applied to three empirical datasets, from Neisseria gonorrhoeae and Neisseria meningitidis, where we identify core SNPs associated with binary drug resistance and continuous antibiotic minimum inhibitory concentration phenotypes, as well as both core SNP and accessory genome associations with invasive and commensal phenotypes. These applications illustrate the versatility and potential of our method, demonstrating in each case that our approach is capable of confirming known resistance- or virulence-associated loci and discovering novel associations. Our thesis concludes with a review of the previous chapters and an evaluation of the strengths and limitations displayed by the current implementation of our phylogenetic approach to association testing. We discuss key areas for further development, and we propose potential solutions to advance the development of microbial GWAS in future work.Open Acces

    Investigating the genetic architecture and adaptive relevance of complex traits in Cape Verde Arabidopsis

    Get PDF
    Understanding how organisms adapt to new environments is a key goal of evolutionary biology. Populations subject to abrupt environmental change must adapt quickly to avoid extinction. Small populations are especially vulnerable to habitat changes, confronting high extinction risk due to limited genetic variation and low efficiency of selection. Theory predicts that the age of a population and its long- term effective size should influence adaptation and trait architecture. Here, we investigate the mechanisms of adaptation after a sudden shift to a more arid climate using natural populations of Arabidopsis thaliana in Cape Verde (CVI). CVI Arabidopsis is found on two islands (Santo Antão and Fogo) and represents diverged, monophyletic lineages based on the near absence of shared polymorphisms with each other or the continent. Time to flowering was reduced in parallel on the islands, causing a consequent increase in fitness, and allowing adaptation to the arid CVI. This change was mediated by convergent de novo loss of function of two core flowering time genes: FRI in Santo Antão and FLC in Fogo. Our results reveal a case where expansion of the new populations coincided with the emergence and proliferation of these novel variants, consistent with models of rapid adaptation and evolutionary rescue. We further contrast the genetic architecture of flowering time in the recently formed small Ne Arabidopsis lineages from Cape Verde with their much older, larger Ne progenitor – the Moroccan population. We find that polygenicity is severely reduced in the colonizing populations and effect sizes of candidate loci are exponentially distributed, consistent with fitness measures showing evidence for directional selection in the islands. In addition to the major effect variants FRI K232X and FLC R3X, we identify candidate variants from core flowering time pathways as well as those that indirectly affect flowering time, including nutrient processing and light sensing. Surprisingly we find no effect of the well- known Cvi-0-EDI (CRY2 V367M) variant in the natural population. Our results provide a particularly clear empirical example of the effect of demographic history has on trait architecture

    The genetic and life course epidemiology of familial adiposity

    Get PDF
    The developmental overnutrition hypothesis proposes that prenatal exposure to maternal obesity causes increased risk of obesity and cardiometabolic disease in the offspring in subsequent adult life. Maternal body mass index (BMI) before or during pregnancy is positively associated with offspring adiposity from birth to adulthood, and with adult cardiometabolic disease incidence and mortality, but whether these associations are causal remains uncertain. This thesis aimed to investigate whether greater maternal BMI before or during pregnancy causes greater offspring adiposity in childhood and adolescence, and whether maternal BMI is associated with an adverse offspring cardiometabolic risk factor profile in adulthood. I analysed data from five European prospective birth cohorts: the Northern Finland Birth Cohorts (NFBCs) 1966 and 1986, the Avon Longitudinal Study of Parents and Children (ALSPAC), Born in Bradford (BiB) and Generation R, as well as the UK Biobank. I applied polygenic risk scoring (PRS), intergenerational Mendelian randomization (MR), bivariate Genomic Restricted Maximum Likelihood implemented in the GCTA software package (bivariate GCTA-GREML) and maternal GCTA-GREML. In NFBC1966, greater maternal BMI was associated with greater offspring adiposity and insulin resistance in adulthood, but these associations were somewhat attenuated on adjustment for a PRS partially capturing the offspring’s genetic predisposition to increased BMI. In ALSPAC and BiB, MR analyses suggested that maternal BMI does not have a large causal effect on offspring adiposity in late childhood and adolescence. Bivariate GCTA-GREML analyses in five cohorts showed that imputed offspring single nucleotide polymorphisms (SNPs) explained up to half of the phenotypic covariance between maternal BMI and offspring child and adolescent adiposity. Maternal GCTA-GREML analyses in ALSPAC and BiB showed that genetic confounding (the direct effects of maternal alleles inherited by the offspring) is an important explanation for this. My findings suggest that interventions aimed at reducing pre-conceptional adiposity in women are of uncertain effectiveness as a means to reduce offspring obesity risk.Open Acces
    • …
    corecore