4,732 research outputs found
Directional genetic differentiation and asymmetric migration
Understanding the population structure and patterns of gene flow within
species is of fundamental importance to the study of evolution. In the fields
of population and evolutionary genetics, measures of genetic differentiation
are commonly used to gather this information. One potential caveat is that
these measures assume gene flow to be symmetric. However, asymmetric gene flow
is common in nature, especially in systems driven by physical processes such as
wind or water currents. Since information about levels of asymmetric gene flow
among populations is essential for the correct interpretation of the
distribution of contemporary genetic diversity within species, this should not
be overlooked. To obtain information on asymmetric migration patterns from
genetic data, complex models based on maximum likelihood or Bayesian approaches
generally need to be employed, often at great computational cost. Here, a new
simpler and more efficient approach for understanding gene flow patterns is
presented. This approach allows the estimation of directional components of
genetic divergence between pairs of populations at low computational effort,
using any of the classical or modern measures of genetic differentiation. These
directional measures of genetic differentiation can further be used to
calculate directional relative migration and to detect asymmetries in gene flow
patterns. This can be done in a user-friendly web application called
divMigrate-online introduced in this paper. Using simulated data sets with
known gene flow regimes, we demonstrate that the method is capable of resolving
complex migration patterns under a range of study designs.Comment: 25 pages, 8 (+3) figures, 1 tabl
Evolution Strategies for Learning Sparse Matrix Representations of Gene Regulatory Networks
Currently, a massive amount of temporal gene expression data is available to researchers, which makes it possible to infer Gene Regulatory Networks (GRNs). Gene regulatory networks are theoretical models to represent excitatory and inhibitory interactions between genes. GRNs are useful in understanding how genes function, and hence they are also useful in pharmaceutical and other applications in biology and medicine. However, despite the importance of GRNs, the process of inferring GRNs from observational data is very difficult.
This thesis applies evolutionary algorithms to the problem of GRN inference. We propose a novel evolutionary algorithm: hierarchical evolution strategy (HES) to target the specific difficulties in GRN inference. We propose a sparse matrix representation of GRN to account for sparse connectivity in biological gene interactions. Unlike traditional evolution strategies, we divide our optimization into two concurrent processes: connectivity construction and numerical optimization. In each generation, we first establish connectivity structure of the GRN. Inside the same generation, we apply a secondary ES to find the best numerical values with those fixed connections. We also propose a hybrid crowding method to maintain high population diversity while applying the evolutionary algorithms. High population diversity leads to broader exploration area in the search space, therefore preventing premature convergence.
The results obtained show that the proposed HES outperforms other algorithms, and has the potential to scale up to realistic problems with thousands of genes
Probabilistic analysis of the human transcriptome with side information
Understanding functional organization of genetic information is a major
challenge in modern biology. Following the initial publication of the human
genome sequence in 2001, advances in high-throughput measurement technologies
and efficient sharing of research material through community databases have
opened up new views to the study of living organisms and the structure of life.
In this thesis, novel computational strategies have been developed to
investigate a key functional layer of genetic information, the human
transcriptome, which regulates the function of living cells through protein
synthesis. The key contributions of the thesis are general exploratory tools
for high-throughput data analysis that have provided new insights to
cell-biological networks, cancer mechanisms and other aspects of genome
function.
A central challenge in functional genomics is that high-dimensional genomic
observations are associated with high levels of complex and largely unknown
sources of variation. By combining statistical evidence across multiple
measurement sources and the wealth of background information in genomic data
repositories it has been possible to solve some the uncertainties associated
with individual observations and to identify functional mechanisms that could
not be detected based on individual measurement sources. Statistical learning
and probabilistic models provide a natural framework for such modeling tasks.
Open source implementations of the key methodological contributions have been
released to facilitate further adoption of the developed methods by the
research community.Comment: Doctoral thesis. 103 pages, 11 figure
Inferring orthologous gene regulatory networks using interspecies data fusion
MOTIVATION:
The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression.
RESULTS:
Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase
GNGS: An Artificial Intelligent Tool for Generating and Analyzing Gene Networks from Microarray Data
- …