39,316 research outputs found

    Sequence alignment by passing messages

    Get PDF
    BACKGROUND: Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pair-HMM), which has been especially popular in comparative sequence analysis for several reasons, including their effectiveness in modeling and detecting sequence homology, model simplicity, and the existence of efficient algorithms for applying the model to sequence alignment problems. However, despite these advantages, pair-HMMs also have a number of practical limitations that may degrade their alignment performance or render them unsuitable for certain alignment tasks. RESULTS: In this work, we propose a novel scheme for comparing and aligning biological sequences that can effectively address the shortcomings of the traditional pair-HMMs. The proposed scheme is based on a simple message-passing approach, where messages are exchanged between neighboring symbol pairs that may be potentially aligned in the optimal sequence alignment. The message-passing process yields probabilistic symbol alignment confidence scores, which may be used for predicting the optimal alignment that maximizes the expected number of correctly aligned symbol pairs. CONCLUSIONS: Extensive performance evaluation on protein alignment benchmark datasets shows that the proposed message-passing scheme clearly outperforms the traditional pair-HMM-based approach, in terms of both alignment accuracy and computational efficiency. Furthermore, the proposed scheme is numerically robust and amenable to massive parallelization

    Interference Alignment via Message-Passing

    Full text link
    We introduce an iterative solution to the problem of interference alignment (IA) over MIMO channels based on a message-passing formulation. We propose a parameterization of the messages that enables the computation of IA precoders by a min-sum algorithm over continuous variable spaces -- under this parameterization, suitable approximations of the messages can be computed in closed-form. We show that the iterative leakage minimization algorithm of Cadambe et al. is a special case of our message-passing algorithm, obtained for a particular schedule. Finally, we show that the proposed algorithm compares favorably to iterative leakage minimization in terms of convergence speed, and discuss a distributed implementation.Comment: Submitted to the IEEE International Conference on Communications (ICC) 201

    Generalized sequential tree-reweighted message passing

    Full text link
    This paper addresses the problem of approximate MAP-MRF inference in general graphical models. Following [36], we consider a family of linear programming relaxations of the problem where each relaxation is specified by a set of nested pairs of factors for which the marginalization constraint needs to be enforced. We develop a generalization of the TRW-S algorithm [9] for this problem, where we use a decomposition into junction chains, monotonic w.r.t. some ordering on the nodes. This generalizes the monotonic chains in [9] in a natural way. We also show how to deal with nested factors in an efficient way. Experiments show an improvement over min-sum diffusion, MPLP and subgradient ascent algorithms on a number of computer vision and natural language processing problems

    Clustering with shallow trees

    Full text link
    We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with this general scheme three biological/medical structured datasets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight.Comment: 11 pages, 7 figure

    Models of Interaction as a Grounding for Peer to Peer Knowledge Sharing

    Get PDF
    Most current attempts to achieve reliable knowledge sharing on a large scale have relied on pre-engineering of content and supply services. This, like traditional knowledge engineering, does not by itself scale to large, open, peer to peer systems because the cost of being precise about the absolute semantics of services and their knowledge rises rapidly as more services participate. We describe how to break out of this deadlock by focusing on semantics related to interaction and using this to avoid dependency on a priori semantic agreement; instead making semantic commitments incrementally at run time. Our method is based on interaction models that are mobile in the sense that they may be transferred to other components, this being a mechanism for service composition and for coalition formation. By shifting the emphasis to interaction (the details of which may be hidden from users) we can obtain knowledge sharing of sufficient quality for sustainable communities of practice without the barrier of complex meta-data provision prior to community formation

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing
    • …
    corecore