16 research outputs found

    Machine Learning-based Brokers for Real-time Classification of the LSST Alert Stream

    Full text link
    The unprecedented volume and rate of transient events that will be discovered by the Large Synoptic Survey Telescope (LSST) demands that the astronomical community update its followup paradigm. Alert-brokers -- automated software system to sift through, characterize, annotate and prioritize events for followup -- will be critical tools for managing alert streams in the LSST era. The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is one such broker. In this work, we develop a machine learning pipeline to characterize and classify variable and transient sources only using the available multiband optical photometry. We describe three illustrative stages of the pipeline, serving the three goals of early, intermediate and retrospective classification of alerts. The first takes the form of variable vs transient categorization, the second, a multi-class typing of the combined variable and transient dataset, and the third, a purity-driven subtyping of a transient class. While several similar algorithms have proven themselves in simulations, we validate their performance on real observations for the first time. We quantitatively evaluate our pipeline on sparse, unevenly sampled, heteroskedastic data from various existing observational campaigns, and demonstrate very competitive classification performance. We describe our progress towards adapting the pipeline developed in this work into a real-time broker working on live alert streams from time-domain surveys.Comment: 33 pages, 14 figures, submitted to ApJ

    Automatic Design of Synthetic Gene Circuits through Mixed Integer Non-linear Programming

    Get PDF
    Automatic design of synthetic gene circuits poses a significant challenge to synthetic biology, primarily due to the complexity of biological systems, and the lack of rigorous optimization methods that can cope with the combinatorial explosion as the number of biological parts increases. Current optimization methods for synthetic gene design rely on heuristic algorithms that are usually not deterministic, deliver sub-optimal solutions, and provide no guaranties on convergence or error bounds. Here, we introduce an optimization framework for the problem of part selection in synthetic gene circuits that is based on mixed integer non-linear programming (MINLP), which is a deterministic method that finds the globally optimal solution and guarantees convergence in finite time. Given a synthetic gene circuit, a library of characterized parts, and user-defined constraints, our method can find the optimal selection of parts that satisfy the constraints and best approximates the objective function given by the user. We evaluated the proposed method in the design of three synthetic circuits (a toggle switch, a transcriptional cascade, and a band detector), with both experimentally constructed and synthetic promoter libraries. Scalability and robustness analysis shows that the proposed framework scales well with the library size and the solution space. The work described here is a step towards a unifying, realistic framework for the automated design of biological circuits

    Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree

    Get PDF
    We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor d+1 d\Gamma1 of the minimum in O(k d T (d; n) + k 2d n 2 ) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T (d; n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set S of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in S. The time T (d; n) is O(d2 d n d ), given O(ds d+1 )-..

    Aligning Alignments

    No full text
    While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence) , a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in w..

    Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree

    No full text
    . We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distance between the sequences labeling its endpoints. In this paper, we consider the case when the given tree is a regular d-ary tree for some fixed d and provide a d+1 d01 -approximation algorithm for this problem that runs in time O(d(2kn) d +n 2 k 2d ) where k is the number of leaves in the tree and n is the maximum length of any of the sequences labeling the leaves. We also consider a new bottleneck objective in labeling the internal nodes. In this version, we wish to find the labeling of the internal nodes that minimizes the maximum cost of any edge in the tree. For this problem we provide a simple 2ffi + 1-approximation algorithm where ffi is the depth of the given undirected tree def..

    Combinatorial algorithms for DNA sequence assembly

    No full text
    The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP-hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ..

    Assessing Distant Homology Between an Aligned Family and a Proposed Member Through Accurate Sequence Alignment

    No full text
    A major challenge in computational biology is the identification of evolutionarily related macromolecules in cases of distant homology, where pairwise sequence similarity may be low, or even insignificant. Methods for the quantitative assessment of such distant relationships must go beyond simple pairwise sequence comparison, and exploit all the information available in multiple alignments of related sequences. Evaluation of the statistical significance of a pairwise alignment score by random shuffling is an established approach, which we extend here to the more general case of evaluating the similarity between a query sequence and a prealigned family of macromolecules whose multiple alignment may contain gaps. The method involves (1) the optimal alignment of the query sequence against the prealigned family, for which we develop a new algorithm that accurately takes into account the structure of gaps in the alignment, followed by (2) repeated shuffling of the query sequence and calcula..
    corecore