378 research outputs found
The Nondeterministic Waiting Time Algorithm: A Review
We present briefly the Nondeterministic Waiting Time algorithm. Our technique
for the simulation of biochemical reaction networks has the ability to mimic
the Gillespie Algorithm for some networks and solutions to ordinary
differential equations for other networks, depending on the rules of the
system, the kinetic rates and numbers of molecules. We provide a full
description of the algorithm as well as specifics on its implementation. Some
results for two well-known models are reported. We have used the algorithm to
explore Fas-mediated apoptosis models in cancerous and HIV-1 infected T cells
On the Computational Power of DNA Annealing and Ligation
In [20] it was shown that the DNA primitives of Separate,
Merge, and Amplify were not sufficiently powerful to invert
functions defined by circuits in linear time. Dan Boneh et
al [4] show that the addition of a ligation primitive, Append, provides the missing power. The question becomes, "How powerful is ligation? Are Separate, Merge, and Amplify
necessary at all?" This paper proposes to informally explore
the power of annealing and ligation for DNA computation.
We conclude, in fact, that annealing and ligation alone are
theoretically capable of universal computation
Exposing and fixing causes of inconsistency and nondeterminism in clustering implementations
Cluster analysis aka Clustering is used in myriad applications, including high-stakes domains, by millions of users. Clustering users should be able to assume that clustering implementations are correct, reliable, and for a given algorithm, interchangeable. Based on observations in a wide-range of real-world clustering implementations, this dissertation challenges the aforementioned assumptions.
This dissertation introduces an approach named SmokeOut that uses differential clustering to show that clustering implementations suffer from nondeterminism and inconsistency: on a given input dataset and using a given clustering algorithm, clustering outcomes and accuracy vary widely between (1) successive runs of the same toolkit, i.e., nondeterminism, and (2) different toolkits, i.e, inconsistency. Using a statistical approach, this dissertation quantifies and exposes statistically significant differences across runs and toolkits. This dissertation exposes the diverse root causes of nondeterminism or inconsistency, such as default parameter settings, noise insertion, distance metrics, termination criteria. Based on these findings, this dissertation introduces an automatic approach for locating the root causes of nondeterminism and inconsistency.
This dissertation makes several contributions: (1) quantifying clustering outcomes across different algorithms, toolkits, and multiple runs; (2) using a statistical rigorous approach for testing clustering implementations; (3) exposing root causes of nondeterminism and inconsistency; and (4) automatically finding nondeterminism and inconsistency’s root causes
Recommended from our members
Modelling the evolution of biological complexity with a two-dimensional lattice self-assembly process
Self-assembling systems are prevalent across numerous scales of nature, lying at the heart of diverse physical and biological phenomena.
Individual protein subunits self-assembling into complexes is often a vital first step of biological processes.
Errors during protein assembly, due to mutations or misfolds, can have devastating effects and are responsible for an assortment of protein diseases, known as proteopathies.
With proteins exhibiting endless layers of complexity, building any all-encompassing model is unrealistic.
Coarse-grained models, despite not faithfully capturing every detail of the original system, have massive potential to assist understanding complex phenomenon.
A principal actor in self-assembly is the binding interactions between subunits, and so geometric constraints, polarity, kinetic forces, etc. can often be marginalised.
This work explores how self-assembly and its outcomes are inextricably tied to the involved interactions through the use of a two-dimensional lattice polyomino model.
%Armed with this tractable model, we can probe how dynamics acting on evolution are reflected in interaction properties.
First, this thesis addresses how the interaction characteristics of self-assembly building blocks determine what structures they form.
Specifically, if the same structures are consistently produced and remain finite in size.
Assembly graphs store subunit interaction information and are used in classifying these two properties, the determinism and boundedness respectively.
Arbitrary sets of building blocks are classified without the costly overhead of repeated stochastic assembling, improving both the analysis speed and accuracy.
Furthermore, assembly graphs naturally integrate combinatorial and graph techniques, enabling a wider range of future polyomino studies.
The second part narrows in on implications of nondeterministic assembly on interaction strength evolution.
Generalising subunit binding sites with mutable binary strings introduces such interaction strengths into the polyomino model.
Deterministic assemblies obey analytic expectations.
Conversely, interactions in nondeterministic assemblies rapidly diverge from equilibrium to minimise assembly inconsistency.
Optimal interaction strengths during assembly are also reflected in evolution.
Transitions between certain polyominoes are strongly forbidden when interaction strengths are misaligned.
The third aspect focuses on genetic duplication, an evolutionary event observed in organisms across all taxa.
Through polyomino evolutions, a duplication-heteromerisation pathway emerges as an efficient process.
This pathway exploits the advantages of both self-interactions and pairwise-interactions, and accelerates evolution by avoiding complexity bottlenecks.
Several simulation predictions are successfully validated against a large data set of protein complexes.
These results focus on coarse-grained models rather than quantified biological insight.
Despite this, they reinforce existing observations of protein complexes, as well as posing several new mechanisms for the evolution of biological complexity
PiGx: reproducible genomics analysis pipelines with GNU Guix
In bioinformatics, as well as other computationally-intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations or for the wider dissemination of workflows. Providing this type of reproducibility and traceability, however, is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally come in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies with GNU Guix. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines called PiGx for the analysis of RNA-seq, ChIP-seq, Bisulfite-seq, and single-cell RNA-seq. All pipelines process raw experimental data, and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own data sets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pigx
Fuel Efficient Computation in Passive Self-Assembly
In this paper we show that passive self-assembly in the context of the tile
self-assembly model is capable of performing fuel efficient, universal
computation. The tile self-assembly model is a premiere model of self-assembly
in which particles are modeled by four-sided squares with glue types assigned
to each tile edge. The assembly process is driven by positive and negative
force interactions between glue types, allowing for tile assemblies floating in
the plane to combine and break apart over time. We refer to this type of
assembly model as passive in that the constituent parts remain unchanged
throughout the assembly process regardless of their interactions. A
computationally universal system is said to be fuel efficient if the number of
tiles used up per computation step is bounded by a constant. Work within this
model has shown how fuel guzzling tile systems can perform universal
computation with only positive strength glue interactions. Recent work has
introduced space-efficient, fuel-guzzling universal computation with the
addition of negative glue interactions and the use of a powerful non-diagonal
class of glue interactions. Other recent work has shown how to achieve fuel
efficient computation within active tile self-assembly. In this paper we
utilize negative interactions in the tile self-assembly model to achieve the
first computationally universal passive tile self-assembly system that is both
space and fuel-efficient. In addition, we achieve this result using a limited
diagonal class of glue interactions
Improving GPU Simulations of Spiking Neural P Systems
In this work we present further extensions and improvements
of a Spiking Neural P system (for short, SNP systems) simulator on graphics
processing units (for short, GPUs). Using previous results on representing SNP
system computations using linear algebra, we analyze and implement a compu-
tation simulation algorithm on the GPU. A two-level parallelism is introduced
for the computation simulations. We also present a set of benchmark SNP sys-
tems to stress test the simulation and show the increased performance obtained
using GPUs over conventional CPUs. For a 16 neuron benchmark SNP system
with 65536 nondeterministic rule selection choices, we report a 2.31 speedup of
the GPU-based simulations over CPU-based simulations.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
Discrete nondeterministic modeling of biochemical networks
The ideas expressed in this work pertain to biochemical modeling. We explore our technique, the Nondeterministic Waiting Time algorithm, for modeling molecular signaling cascades. The algorithm is presented with pseudocode along with an explanation of its implementation. The entire source code can be found in the Appendices. This algorithm builds on earlier work from the lab of Dr. Andrei Nun, the advisor for this dissertation. We discuss several important extensions including: (i) a heap with special maintenance functions for sorting reaction waiting times, (ii) a nondeterministic component for handling reaction competition, and (iii) a memory enhancement allowing slower reactions to compete with faster reactions.
Several example systems are provided for comparisons between modeling with systems of ordinary differential equations, the Gillespie Algorithm, and our Nondeterministic Waiting Time algorithm. Our algorithm has a unique ability to exhibit behavior similar to the solutions to systems of ordinary differential equations for certain models and parameter choices, but it also has the nondeterministic component which yields results similar stochastic methods (e.g., the Gillespie Algorithm).
Next, we turn our attention to the Fas-mediated apoptotic signaling cascade. Fas signaling has important implications in the research of cancer, autoimmune and neurodegenerative disorders. We provide an exhaustive account of results from the Nondeterministic Waiting Time algorithm in comparison to solutions to the system of ordinary differential equations described by another modeling group. Our work with the Fas pathway led us to explore a new model, focusing on the effects of HIV-1 proteins on the Fas signaling cascade. There is extensive information in the literature on the effects of the HIV-1 proteins on this pathway. The model described in this work represents the first attempt ever made in modeling Fas-induced apoptosis in latently infected T cells.
There are several extensions for the Fas model discussed at the end of the work. Calcium signaling would be an interesting avenue to investigate, building on some recent results reported in the literature. For the HIV model, there are several extensions discussed. We also suggest a new direction for the Nondeterministic Waiting Time algorithm exploring parallelization options
Regular Expressions in a CS Formal Languages Course
Regular expressions in an Automata Theory and Formal Languages course are
mostly treated as a theoretical topic. That is, to some degree their
mathematical properties and their role to describe languages is discussed. This
approach fails to capture the interest of most Computer Science students. It is
a missed opportunity to engage Computer Science students that are far more
motivated by practical applications of theory. To this end, regular expressions
may be discussed as the description of an algorithm to generate words in a
language that is easily programmed. This article describes a programming-based
methodology to introduce students to regular expressions in an Automata Theory
and Formal Languages course. The language of instruction is FSM in which there
is a regular expression type. Thus, facilitating the study of regular
expressions and of algorithms based on regular expressions.Comment: In Proceedings TFPIE 2023, arXiv:2308.0611
- …