68 research outputs found
Fast Approximate Shortest Hyperpaths for Inferring Pathways in Cell Signaling Hypergraphs
Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions then corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The state of the art for shortest hyperpaths in cell-signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees.
We present for the first time a heuristic for general shortest hyperpaths that properly handles cycles, and is guaranteed to be efficient. Its accuracy is demonstrated through exhaustive experiments on all instances from the standard NCI-PID and Reactome pathway databases, which show the heuristic finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all possible hyperpaths shows that the solution found by the heuristic is in fact optimal
Inferring a DNA sequence from erroneous copies
AbstractWe suggest a novel approach for efficiently reconstructing an original DNA sequence from erroneous copies
A branch-and-cut approach to physical mapping with end-probes
A fundamental problem in computational biology is the construction of physical maps of chromosomes from hybridization experiments between unique probes and clones of chromosome fragments in the presence of error. Alizadeh, Karp, Weisser and Zweig (Algorithmica 13:1/2, 52-76, 1995) first considered a maximum-likelihood model of the problem that is equivalent to finding an ordering of the probes that minimizes a weighted sum of errors, and developed several effective heuristics. We show that by exploiting information about the end-probes of clones, this model can be formulated as a weighted Betweenness Problem. This affords the significant advantage of allowing the well-developed tools of integer linear-programming and branch-and-cut algorithms to be brought to bear on physical mapping, enabling us for the first time to solve small mapping instances to optimality even in the presence of high error. We also show that by combining the optimal solution of many small overlapping Betweenness Problems, one can effectively screen errors from larger instances, and solve the edited instance to optimality as a Hamming-Distance Traveling Salesman Problem. This suggests a new combined approach to physical map construction
ANTARES: Progress towards building a `Broker' of time-domain alerts
The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is
a joint effort of NOAO and the Department of Computer Science at the University
of Arizona to build prototype software to process alerts from time-domain
surveys, especially LSST, to identify those alerts that must be followed up
immediately. Value is added by annotating incoming alerts with existing
information from previous surveys and compilations across the electromagnetic
spectrum and from the history of past alerts. Comparison against a knowledge
repository of properties and features of known or predicted kinds of variable
phenomena is used for categorization. The architecture and algorithms being
employed are described
The ANTARES Astronomical Time-Domain Event Broker
We describe the Arizona-NOIRLab Temporal Analysis and Response to Events
System (ANTARES), a software instrument designed to process large-scale streams
of astronomical time-domain alerts. With the advent of large-format CCDs on
wide-field imaging telescopes, time-domain surveys now routinely discover tens
of thousands of new events each night, more than can be evaluated by
astronomers alone. The ANTARES event broker will process alerts, annotating
them with catalog associations and filtering them to distinguish customizable
subsets of events. We describe the data model of the system, the overall
architecture, annotation, implementation of filters, system outputs, provenance
tracking, system performance, and the user interface.Comment: 24 Pages, 8 figures, Accepted by A
Machine Learning-based Brokers for Real-time Classification of the LSST Alert Stream
The unprecedented volume and rate of transient events that will be discovered
by the Large Synoptic Survey Telescope (LSST) demands that the astronomical
community update its followup paradigm. Alert-brokers -- automated software
system to sift through, characterize, annotate and prioritize events for
followup -- will be critical tools for managing alert streams in the LSST era.
The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is
one such broker. In this work, we develop a machine learning pipeline to
characterize and classify variable and transient sources only using the
available multiband optical photometry. We describe three illustrative stages
of the pipeline, serving the three goals of early, intermediate and
retrospective classification of alerts. The first takes the form of variable vs
transient categorization, the second, a multi-class typing of the combined
variable and transient dataset, and the third, a purity-driven subtyping of a
transient class. While several similar algorithms have proven themselves in
simulations, we validate their performance on real observations for the first
time. We quantitatively evaluate our pipeline on sparse, unevenly sampled,
heteroskedastic data from various existing observational campaigns, and
demonstrate very competitive classification performance. We describe our
progress towards adapting the pipeline developed in this work into a real-time
broker working on live alert streams from time-domain surveys.Comment: 33 pages, 14 figures, submitted to ApJ
Automatic Design of Synthetic Gene Circuits through Mixed Integer Non-linear Programming
Automatic design of synthetic gene circuits poses a significant challenge to synthetic biology, primarily due to the complexity of biological systems, and the lack of rigorous optimization methods that can cope with the combinatorial explosion as the number of biological parts increases. Current optimization methods for synthetic gene design rely on heuristic algorithms that are usually not deterministic, deliver sub-optimal solutions, and provide no guaranties on convergence or error bounds. Here, we introduce an optimization framework for the problem of part selection in synthetic gene circuits that is based on mixed integer non-linear programming (MINLP), which is a deterministic method that finds the globally optimal solution and guarantees convergence in finite time. Given a synthetic gene circuit, a library of characterized parts, and user-defined constraints, our method can find the optimal selection of parts that satisfy the constraints and best approximates the objective function given by the user. We evaluated the proposed method in the design of three synthetic circuits (a toggle switch, a transcriptional cascade, and a band detector), with both experimentally constructed and synthetic promoter libraries. Scalability and robustness analysis shows that the proposed framework scales well with the library size and the solution space. The work described here is a step towards a unifying, realistic framework for the automated design of biological circuits
Recommended from our members
Exact and approximation algorithms for DNA sequence reconstruction.
The DNA sequence in every human being is a text of three billion characters from a four letter alphabet; determining this sequence is a major project in molecular biology. The fundamental task biologists face is to reconstruct a long sequence given short fragments from unknown locations. These fragments contain errors, and may represent the sequence on one strand of the double-helix, or the reverse complement sequence on the other strand. The Sequence Reconstruction Problem is, given a collection F of fragment sequences and an error rate 0 ≤ ε < 1, find a shortest sequence S such that every fragment F ∈ F, or its reverse complement, matches a substring of S with at most ε|F| errors. Sequence Reconstruction is NP-complete. We decompose the problem into (1) constructing a graph of approximate overlaps between pairs of fragments, (2) selecting a set of overlaps of maximum total weight that induce a consistent layout of the fragments, (3) merging the overlaps into a multiple sequence alignment and voting on a consensus. A solution to (1) through (3) yields a reconstructed sequence feasible at error rate 2ε/(1-ε) and at most a factor 1/1-ε longer than the shortest reconstruction, given some assumptions on fragment error. We define a measure of the overlap in a reconstruction, show that maximizing the overlap minimizes the length, and that approximating (2) within a factor of α approximates Sequence Reconstruction within a factor of (1- ε)α under the overlap measure. We construct the overlap graph for (1) in O(εN²) time given fragments of total length N at error rate ε. We develop two exact and two approximation algorithms for (2). Our best exact algorithm computes an optimal layout for a graph of E overlaps and V fragments in O(K(E + V log V)) time, where K ≤ 2ᴱ is the size of the branch-and-bound search tree. Our best approximation algorithm computes a layout with overlap at least 1/2 the maximum in O(V(E + V log V)log V) time. This is the first treatment of Sequence Reconstruction with inexact data and unknown complementarity
- …