4,732 research outputs found

    Directional genetic differentiation and asymmetric migration

    Get PDF
    Understanding the population structure and patterns of gene flow within species is of fundamental importance to the study of evolution. In the fields of population and evolutionary genetics, measures of genetic differentiation are commonly used to gather this information. One potential caveat is that these measures assume gene flow to be symmetric. However, asymmetric gene flow is common in nature, especially in systems driven by physical processes such as wind or water currents. Since information about levels of asymmetric gene flow among populations is essential for the correct interpretation of the distribution of contemporary genetic diversity within species, this should not be overlooked. To obtain information on asymmetric migration patterns from genetic data, complex models based on maximum likelihood or Bayesian approaches generally need to be employed, often at great computational cost. Here, a new simpler and more efficient approach for understanding gene flow patterns is presented. This approach allows the estimation of directional components of genetic divergence between pairs of populations at low computational effort, using any of the classical or modern measures of genetic differentiation. These directional measures of genetic differentiation can further be used to calculate directional relative migration and to detect asymmetries in gene flow patterns. This can be done in a user-friendly web application called divMigrate-online introduced in this paper. Using simulated data sets with known gene flow regimes, we demonstrate that the method is capable of resolving complex migration patterns under a range of study designs.Comment: 25 pages, 8 (+3) figures, 1 tabl

    Evolution Strategies for Learning Sparse Matrix Representations of Gene Regulatory Networks

    Get PDF
    Currently, a massive amount of temporal gene expression data is available to researchers, which makes it possible to infer Gene Regulatory Networks (GRNs). Gene regulatory networks are theoretical models to represent excitatory and inhibitory interactions between genes. GRNs are useful in understanding how genes function, and hence they are also useful in pharmaceutical and other applications in biology and medicine. However, despite the importance of GRNs, the process of inferring GRNs from observational data is very difficult. This thesis applies evolutionary algorithms to the problem of GRN inference. We propose a novel evolutionary algorithm: hierarchical evolution strategy (HES) to target the specific difficulties in GRN inference. We propose a sparse matrix representation of GRN to account for sparse connectivity in biological gene interactions. Unlike traditional evolution strategies, we divide our optimization into two concurrent processes: connectivity construction and numerical optimization. In each generation, we first establish connectivity structure of the GRN. Inside the same generation, we apply a secondary ES to find the best numerical values with those fixed connections. We also propose a hybrid crowding method to maintain high population diversity while applying the evolutionary algorithms. High population diversity leads to broader exploration area in the search space, therefore preventing premature convergence. The results obtained show that the proposed HES outperforms other algorithms, and has the potential to scale up to realistic problems with thousands of genes

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

    Inferring orthologous gene regulatory networks using interspecies data fusion

    Get PDF
    MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase
    • …
    corecore