1,570,046 research outputs found

    A framework for evolutionary systems biology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many difficult problems in evolutionary genomics are related to mutations that have weak effects on fitness, as the consequences of mutations with large effects are often simple to predict. Current systems biology has accumulated much data on mutations with large effects and can predict the properties of knockout mutants in some systems. However experimental methods are too insensitive to observe small effects.</p> <p>Results</p> <p>Here I propose a novel framework that brings together evolutionary theory and current systems biology approaches in order to quantify small effects of mutations and their epistatic interactions <it>in silico</it>. Central to this approach is the definition of fitness correlates that can be computed in some current systems biology models employing the rigorous algorithms that are at the core of much work in computational systems biology. The framework exploits synergies between the realism of such models and the need to understand real systems in evolutionary theory. This framework can address many longstanding topics in evolutionary biology by defining various 'levels' of the adaptive landscape. Addressed topics include the distribution of mutational effects on fitness, as well as the nature of advantageous mutations, epistasis and robustness. Combining corresponding parameter estimates with population genetics models raises the possibility of testing evolutionary hypotheses at a new level of realism.</p> <p>Conclusion</p> <p>EvoSysBio is expected to lead to a more detailed understanding of the fundamental principles of life by combining knowledge about well-known biological systems from several disciplines. This will benefit both evolutionary theory and current systems biology. Understanding robustness by analysing distributions of mutational effects and epistasis is pivotal for drug design, cancer research, responsible genetic engineering in synthetic biology and many other practical applications.</p

    Building a finite state automaton for physical processes using queries and counterexamples on long short-term memory models

    Get PDF
    Most neural networks (NN) are commonly used as black-box functions. A network takes an input and produces an output, without the user knowing what rules and system dynamics have produced the specific output. In some situations, such as safety-critical applications, having the capability of understanding and validating models before ap&#x2;plying them can be crucial. In this regard, some approaches for representing NN in more understandable ways, attempt to accurately extract symbolic knowledge from the networks using interpretable and simple systems consisting of a finite set of states and transitions known as deterministic finite-state automata (DFA). In this thesis, we have considered a rule extraction approach developed by Weiss et al. that employs the exact learning method L* to extract DFA from recurrent neural networks (RNNs) trained on classifying symbolic data sequences. Our aim has been to study the practicality of applying their rule extraction approach on more complex data based on physical processes consisting of continuous values. Specifically, we experimented with datasets of varying complexities, considering both the inherent complexity of the dataset itself and complexities introduced from different discretization intervals used to represent the continuous data values. Datasets incorporated in this thesis encompass sine wave prediction datasets, sequence value prediction datasets, and a safety-critical well-drilling pressure scenario generated through the use of the well-drilling simulator OpenLab and the sparse identification of nonlinear dynamical systems (SINDy) algorithm. We observe that the rule extraction algorithm is able to extract simple and small DFA representations of LSTM models. On the considered datasets, extracted DFA generally demonstrates worse performance than the LSTM models used for extraction. Overall, for both increasing problem complexity and more discretization intervals, the performance of the extracted DFA decreases. However, DFA extracted from datasets discretized using few intervals yields more impressive results, and the algorithm can in some cases extract DFA that outperforms their respective LSTM models.Masteroppgave i informatikkINF399MAMN-INFMAMN-PRO

    Efficient Algorithms And Optimizations For Scientific Computing On Many-Core Processors

    Get PDF
    Designing efficient algorithms for many-core and multicore architectures requires using different strategies to allow for the best exploitation of the hardware resources on those architectures. Researchers have ported many scientific applications to modern many-core and multicore parallel architectures, and by doing so they have achieved significant speedups over running on single CPU cores. While many applications have achieved significant speedups, some applications still require more effort to accelerate due to their inherently serial behavior. One class of applications that has this serial behavior is the Monte Carlo simulations. Monte Carlo simulations have been used to simulate many problems in statistical physics and statistical mechanics that were not possible to simulate using Molecular Dynamics. While there are a fair number of well-known and recognized GPU Molecular Dynamics codes, the existing Monte Carlo ensemble simulations have not been ported to the GPU, so they are relatively slow and could not run large systems in a reasonable amount of time. Due to the previously mentioned shortcomings of existing Monte Carlo ensemble codes and due to the interest of researchers to have a fast Monte Carlo simulation framework that can simulate large systems, a new GPU framework called GOMC is implemented to simulate different particle and molecular-based force fields and ensembles. GOMC simulates different Monte Carlo ensembles such as the canonical, grand canonical, and Gibbs ensembles. This work describes many challenges in developing a GPU Monte Carlo code for such ensembles and how I addressed these challenges. This work also describes efficient many-core and multicore large-scale energy calculations for Monte Carlo Gibbs ensemble using cell lists. Designing Monte Carlo molecular simulations is challenging as they have less computation and parallelism when compared to similar molecular dynamics applications. The modified cell list allows for more speedup gains for energy calculations on both many-core and multicore architectures when compared to other implementations without using the conventional cell lists. The work presents results and analysis of the cell list algorithms for each one of the parallel architectures using top of the line GPUs, CPUs, and Intel’s Phi coprocessors. In addition, the work evaluates the performance of the cell list algorithms for different problem sizes and different radial cutoffs. In addition, this work evaluates two cell list approaches, a hybrid MPI+OpenMP approach and a hybrid MPI+CUDA approach. The cell list methods are evaluated on a small cluster of multicore CPUs, Intel Phi coprocessors, and GPUs. The performance results are evaluated using different combinations of MPI processes, threads, and problem sizes. Another application presented in this dissertation involves the understanding of the properties of crystalline materials, and their design and control. Recent developments include the introduction of new models to simulate system behavior and properties that are of large experimental and theoretical interest. One of those models is the Phase-Field Crystal (PFC) model. The PFC model has enabled researchers to simulate 2D and 3D crystal structures and study defects such as dislocations and grain boundaries. In this work, GPUs are used to accelerate various dynamic properties of polycrystals in the 2D PFC model. Some properties require very intensive computation that may involve hundreds of thousands of atoms. The GPU implementation has achieved significant speedups of more than 46 times for some large systems simulations

    Limits for Stochastic Reaction Networks

    Get PDF
    Reaction systems have been introduced in the 70s to model biochemical systems. Nowadays their range of applications has increased and they are fruitfully used in different fields. The concept is simple: some chemical species react, the set of chemical reactions form a graph and a rate function is associated with each reaction. Such functions describe the speed of the different reactions, or their propensities. Two modelling regimes are then available: the evolution of the different species concentrations can be deterministically modelled through a system of ODE, while the counts of the different species at a certain time are stochastically modelled by means of a continuous-time Markov chain. Our work concerns primarily stochastic reaction systems, and their asymptotic properties. In Paper I, we consider a reaction system with intermediate species, i.e. species that are produced and fast degraded along a path of reactions. Let the rates of degradation of the intermediate species be functions of a parameter N that tends to infinity. We consider a reduced system where the intermediate species have been eliminated, and find conditions on the degradation rate of the intermediates such that the behaviour of the reduced network tends to that of the original one. In particular, we prove a uniform punctual convergence in distribution and weak convergence of the integrals of continuous functions along the paths of the two models. Under some extra conditions, we also prove weak convergence of the two processes. The result is stated in the setting of multiscale reaction systems: the amounts of all the species and the rates of all the reactions of the original model can scale as powers of N. A similar result also holds for the deterministic case, as shown in Appendix IA. In Paper II, we focus on the stationary distributions of the stochastic reaction systems. Specifically, we build a theory for stochastic reaction systems that is parallel to the deficiency zero theory for deterministic systems, which dates back to the 70s. A deficiency theory for stochastic reaction systems was missing, and few results connecting deficiency and stochastic reaction systems were known. The theory we build connects special form of product-form stationary distributions with structural properties of the reaction graph of the system. In Paper III, a special class of reaction systems is considered, namely systems exhibiting absolute concentration robust species. Such species, in the deterministic modelling regime, assume always the same value at any positive steady state. In the stochastic setting, we prove that, if the initial condition is a point in the basin of attraction of a positive steady state of the corresponding deterministic model and tends to infinity, then up to a fixed time T the counts of the species exhibiting absolute concentration robustness are, on average, near to their equilibrium value. The result is not obvious because when the counts of some species tend to infinity, so do some rate functions, and the study of the system may become hard. Moreover, the result states a substantial concordance between the paths of the stochastic and the deterministic models

    Вычислительный подход к построению биологии

    Get PDF
    According to some critics, if biology is a kind of reverse engineering for the nature, it is quite poorly prepared for the task. Thus, the issue is more likely with its ontology. Multiple hypotheses and conjectures found in papers on methodological issues claim that living systems should be viewed as complex networks of signal-transmitting paths, both neural and non-neural, that feature modularity and feedback circuits and are prone to emergent properties and increasing complexity. If so, we are on the eve of a new stage in computer models development where not only computers are used to emulate life, but life itself is construed as a complex network of interacting natural computers. In 2002, Yuri Lazebnik used a salient and profound metaphor to clarify the main theoretical shortage that keeps biology from being a unified and deductively consistent science modeled after physics. Asking if a biologist could fix a broken radio, he revealed that what is missing there is a uni-fied formal language for describing ultimate elements of living devices together with their typical combinations, as it is commonly done in radio engineering. I specify in the paper that what Lazebnik means by a “formal language” is not a language of propositions about the world, i.e., of asserting some states of affairs, but rather a language of listing relevant types of objects and their relations. I refer to it as a domain ontology. A theory needs another language to describe actual states of affairs, which most probably shall be mathematical to be able to represent complicated natural structures in their detail. Then I touch on the popular views, according to which a domain ontology is inferred by a theory prop-er. The history of science shows that true theories that are viable today were often paired with now abandoned ontologies, like that of Caloric or Phlogiston. I suggest that a theory does not infer its on-tology, but rather is interpreted thereupon, being inferentially independent of it. I also review some historically important attempt to mathematize the knowledge of life. I mention Alan Turing’s article on morphogenesis where he used some linear differential equations to explain emergence of complexity from homogeneity. Then I briefly touch on works Nicolas Rashevsky whose theories provided inspira-tion to the inventors of artificial neural networks and allowed for abundant use of different mathemati-cal tools by his disciple Robert Rosen in his study of metabolism. Closer to nowadays, various compu-tational theories in biology have emerged. Some of them treat protein combinations as networks of signal-transmitting pathways that can store and process information. Moreover, in unicellular organ-isms, protein-based circuits replace the whole of the nervous system as a behavior-controlling network. Other theories propose a view, in which an organism is construed as a system of modules connected with protocols, of interfaces. A domain ontology like this may considerably simplify the task of scien-tific description. A special attention is paid to applications of the known free-energy (minimization) principle to the life science matters, as it has initially intended to explain issues of cognitive science. In general, within this view, for an organism to survive is to minimize its thermodynamic potential ener-gy, for which purpose the living being as a whole, and all its subsystems, must constantly produce statistical models of environment that are constantly updated with incoming data. Some strong Bayesi-an mathematics combine with this ontology to claim the whole enterprise as the most prominent uni-versal theory of complex developing systems nowadays. As a general output of the survey, I propose a computational methodological approach of doing biology based on the famous Marr’s three-level view on computational systems together with the necessity of identifying elementary nodes, of which living systems are composed. Such an approach may, as I hope, generate a set of competing theories that will eventually help biologists to fix their “radio”

    An analysis of nature-based treatment processes for cleaning contaminated surface water runoff from an informal settlement: a case study of the Stiebeuel River catchment, Franschhoek, South Africa

    Get PDF
    Contaminated surface water runoff from inadequate drainage and sanitation systems in informal settlements threaten the quality of available freshwater and can negatively impact both human and environmental health. Biofiltration systems (biofilters) provide water pollution controls without inputs of additional energy and chemicals, placing them in the overall context of the need for affordable and sustainable stormwater infrastructure in informal settlements. In addition, cleaned waters from biofilters may be suitable for some reuse applications if they are well-designed and maintained. However, most research is conducted in developed countries where heavy metals are the main surface water pollutant. Consequently, little is known about the extent to which biofilters can be used to meet the water quality targets in conditions likely to be found in informal settlements. In addition, no attempts have been made to recover or reuse the surface water runoff from informal settlements, despite its high nutrient loadings. This study analyses the extent to which biofilters can be used to clean and reuse contaminated surface water runoff from informal settlements. The objectives are threefold: (i) to analyse the performance of two field-scale biofiltration cells (one vegetated and one non-vegetated) that are batch-fed with surface water runoff from an upstream informal settlement; (ii) to determine the effects of varying operating, design and environmental parameters on the performance of the cells; and (iii) to develop a model which predicts the outflow pollutant concentrations under varying conditions. Both cells effectively reduced ammonia (NH3), Total Phosphate (TP) and Escherichia coli (E. coli) concentrations, but leached nitrate (NO3 - ) and nitrite (NO2 - ). The treated waters were suitable for irrigational reuse, however, additional disinfection was required to reduce faecal contamination in some cases. Correlation analyses showed that inflow water quality significantly influenced cell performance, with the vegetated cell outperforming the non-vegetated cell under higher inflow pollutant concentrations. Multiple regression models also investigated several parameters influencing outflow NH3 and showed that inflow pH, temperature and NH3 concentration can be used to determine the outflow NH3 concentration of the cells. These models are important for predicting cell performance and thus can be used to improve the design and/or operation of the cells for varying inflow water quality conditions

    Temporal Data Modeling and Reasoning for Information Systems

    Get PDF
    Temporal knowledge representation and reasoning is a major research field in Artificial Intelligence, in Database Systems, and in Web and Semantic Web research. The ability to model and process time and calendar data is essential for many applications like appointment scheduling, planning, Web services, temporal and active database systems, adaptive Web applications, and mobile computing applications. This article aims at three complementary goals. First, to provide with a general background in temporal data modeling and reasoning approaches. Second, to serve as an orientation guide for further specific reading. Third, to point to new application fields and research perspectives on temporal knowledge representation and reasoning in the Web and Semantic Web
    corecore