102 research outputs found

    Artificial neural networks : A comparative study of implementations for human chromosome classification

    Get PDF
    Artificial neural networks are a popular field of artificial intelligence and have commonly been applied to solve many prediction, classification and diagnostic tasks. One such task is the analysis of human chromosomes. This thesis investigates the use of artificial neural networks (ANNs) as automated chromosome classifiers. The investigation involves the thorough analysis of seven different implementation techniques. These include three techniques using artificial neural networks, two techniques using ANN s supported by another method and two techniques not using ANNs. These seven implementations are evaluated according to the classification accuracy achieved and according to their support of important system measures, such as robustness and validity. The results collected show that ANNs perform relatively well in terms of classification accuracy, though other implementations achieved higher results. However, ANNs provide excellent support of essential system measures. This leads to a well-rounded implementation, consisting of a good balance between accuracy and system features, and thus an effective technique for automated human chromosome classification

    Modeling the spatio-temporal organization and segregation of bacterial chromosomes

    Get PDF
    This work examined the spatio-temporal organization and segregation of bacterial DNA in order to investigate the fundamental processes regulating the inheritance of genetic material and the proliferation of life. For the investigation of the spatio-temporal organization of genetic material in the cell fundamental physical principles were used in this work. The aim was to use concepts of polymer physics to formulate physical models of the complex biological reality. These models were evaluated in computer simulations and compared with experimental data. In the first project of this thesis, the spatial organization of DNA in multipartite bacteria (= bacteria with multiple replicons) was investigated. The results of this work reveal high order of spatial organization even for multipartite bacteria. The organization could be reproduced using a physical model of compacted DNA and geometric constraints on individual genes. Furthermore, it was possible to make accurate predictions for different mutants and to predict interactions between replicons with the developed model. The second project focused on the study of simultaneous replication and segregation of bacterial DNA. Segregation patterns of the ori were analyzed in the model organism Bacillus subtilis. Using Molecular Dynamics simulations, it was shown that entropic segregation of chromosomes is a plausible mechanism for the segregation of genetic material that would also explain the observed variability in the experimental data. The model of entropic segregation of bacterial chromosomes was extended in the third project by the implementation of additional segregation mechanisms, so that a large data set of different trajectories of the ori through the cell could be generated. Thus, machine learning models could be used to classify the different segregation movements. The evaluation of the predictions showed very good results and encourages future classification of experimental data based on the developed models. This work is intended to provide new perspectives on the organization of DNA in the bacterial cell as well as a better understanding of the physical basis of cellular processes

    Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning

    Get PDF
    Contains fulltext : 228326pre.pdf (preprint version ) (Open Access) Contains fulltext : 228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    A FAIR approach to genomics

    Get PDF
    The aim of this thesis was to increase our understanding on how genome information leads to function and phenotype. To address these questions, I developed a semantic systems biology framework capable of extracting knowledge, biological concepts and emergent system properties, from a vast array of publicly available genome information. In chapter 2, Empusa is described as an infrastructure that bridges the gap between the intended and actual content of a database. This infrastructure was used in chapters 3 and 4 to develop the framework. Chapter 3 describes the development of the Genome Biology Ontology Language and the GBOL stack of supporting tools enforcing consistency within and between the GBOL definitions in the ontology (OWL) and the Shape Expressions (ShEx) language describing the graph structure. A practical implementation of a semantic systems biology framework for FAIR (de novo) genome annotation is provided in chapter 4. The semantic framework and genome annotation tool described in this chapter has been used throughout this thesis to consistently, structurally and functionally annotate and mine microbial genomes used in chapter 5-10. In chapter 5, we introduced how the concept of protein domains and corresponding architectures can be used in comparative functional genomics to provide for a fast, efficient and scalable alternative to sequence-based methods. This allowed us to effectively compare and identify functional variations between hundreds to thousands of genomes. In chapter 6, we used 432 available complete Pseudomonas genomes to study the relationship between domain essentiality and persistence. In this chapter the focus was mainly on domains involved in metabolic functions. The metabolic domain space was explored for domain essentiality and persistence through the integration of heterogeneous data sources including six published metabolic models, a vast gene expression repository and transposon data. In chapter 7, the correlation between the expected and observed genotypes was explored using 16S-rRNA phylogeny and protein domain class content as input. In this chapter it was shown that domain class content yields a higher resolution in comparison to 16S-rRNA when analysing evolutionary distances. Using protein domain classes, we also were able to identify signifying domains, which may have important roles in shaping a species. To demonstrate the use of semantic systems biology workflows in a biotechnological setting we expanded the resource with more than 80.000 bacterial genomes. The genomic information of this resource was mined using a top down approach to identify strains having the trait for 1,3-propanediol production. This resulted in the molecular identification of 49 new species. In addition, we also experimentally verified that 4 species were capable of producing 1,3-propanediol. As discussed in chapter 10, the here developed semantic systems biology workflows were successfully applied in the discovery of key elements in symbiotic relationships, to improve functional genome annotation and in comparative genomics studies. Wet/dry-lab collaboration was often at the basis of the obtained results. The success of the collaboration between the wet and dry field, prompted me to develop an undergraduate course in which the concept of the “Moist” workflow was introduced (Chapter 9).</p

    Risk Management using Model Predictive Control

    Get PDF
    Forward planning and risk management are crucial for the success of any system or business dealing with the uncertainties of the real world. Previous approaches have largely assumed that the future will be similar to the past, or used simple forecasting techniques based on ad-hoc models. Improving solutions requires better projection of future events, and necessitates robust forward planning techniques that consider forecasting inaccuracies. This work advocates risk management through optimal control theory, and proposes several techniques to combine it with time-series forecasting. Focusing on applications in foreign exchange (FX) and battery energy storage systems (BESS), the contributions of this thesis are three-fold. First, a short-term risk management system for FX dealers is formulated as a stochastic model predictive control (SMPC) problem in which the optimal risk-cost profiles are obtained through dynamic control of the dealers’ positions on the spot market. Second, grammatical evolution (GE) is used to automate non-linear time-series model selection, validation, and forecasting. Third, a novel measure for evaluating forecasting models, as a part of the predictive model in finite horizon optimal control applications, is proposed. Using both synthetic and historical data, the proposed techniques were validated and benchmarked. It was shown that the stochastic FX risk management system exhibits better risk management on a risk-cost Pareto frontier compared to rule-based hedging strategies, with up to 44.7% lower cost for the same level of risk. Similarly, for a real-world BESS application, it was demonstrated that the GE optimised forecasting models outperformed other prediction models by at least 9%, improving the overall peak shaving capacity of the system to 57.6%

    Speciation and sex-biased gene expression in the scarce swallowtails

    Get PDF
    Speciation is the process by which closely related populations of organisms differentiate following reductions in the effective rate of genetic exchange between them over time. For most speciation events, population genetic data is the only available information about how reproductive isolation has arisen. We have a poor understanding of how evolutionary forces and genomic features contribute to reproductive isolation, primarily due to the difficulty of inferring barriers to gene flow. In particular, it is unclear what role genes that are sex-biased in expression and/or sex-linked play in speciation. In my thesis, I aim to infer the locations of putative barriers to gene flow to understand to what extent different genomic features, in particular fast-evolving sex-biased genes, contribute to reproductive isolation between a sister species pair of scarce swallowtail (Iphiclides) butterflies. In my first research project, I estimate core population genetic parameters across all sister species pairs of European butterflies and fit simple models of divergence to ask how well classic phylogeographic hypotheses fit recent diversification events in this taxonomic group. In my second research project, I infer explicit models of the speciation process and model effective migration rates along the genome to locate putative barriers to gene flow. I ask whether these barriers to long-term gene flow are associated with areas of the genome that show a reduction in recent introgression across a hybrid zone. In my third and final research project, I extend the demographic modeling of speciation in the Iphiclides species pair to the Z chromosome and ask whether barrier regions are associated with sex-biased genes, as a result of their faster rate of evolution. In summary, my findings suggest that fast-evolving male-biased genes likely contribute to extensive sex-linked reproductive isolation, as well as paving the way for future research on the population genetics of European butterflies and the evolutionary genomics of speciation

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov
    • 

    corecore