6 research outputs found

    The Rainbow Prim Algorithm for Selecting Putative Orthologous Protein Sequences

    Get PDF
    We present a selection method designed for eliminating species redundancy in clusters of putative orthologous sequences, to be applied as a post-processing procedure to pre-clustered data obtained from other methods. The algorithm can always zero-out the cluster redundancy while preserving the number of species of the original cluster

    MultiMSOAR 2.0: An Accurate Tool to Identify Ortholog Groups among Multiple Genomes

    Get PDF
    The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs) such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs). We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free

    A Unified Framework for Integer Programming Formulation of Graph Matching Problems

    Get PDF
    Graph theory has been a powerful tool in solving difficult and complex problems arising in all disciplines. In particular, graph matching is a classical problem in pattern analysis with enormous applications. Many graph problems have been formulated as a mathematical program then solved using exact, heuristic and/or approximated-guaranteed procedures. On the other hand, graph theory has been a powerful tool in visualizing and understanding of complex mathematical programming problems, especially integer programs. Formulating a graph problem as a natural integer program (IP) is often a challenging task. However, an IP formulation of the problem has many advantages. Several researchers have noted the need for natural IP formulation of graph theoretic problems. The aim of the present study is to provide a unified framework for IP formulation of graph matching problems. Although there are many surveys on graph matching problems, however, none is concerned with IP formulation. This paper is the first to provide a comprehensive IP formulation for such problems. The framework includes variety of graph optimization problems in the literature. While these problems have been studied by different research communities, however, the framework presented here helps to bring efforts from different disciplines to tackle such diverse and complex problems. We hope the present study can significantly help to simplify some of difficult problems arising in practice, especially in pattern analysis

    Guest Editor's Introduction to the Special Section on Computational Biology and Bioinformatics (WABI) -- Part 2

    No full text
    In this continuation of the special issue of papers invited from the Fifth International Workshop on Algorithms in Bioinformatics (WABI ’05), we present three papers which are summarized below. An alternative to the classical Hannenhali-Pevzner theory for sorting by reversal is described by Severine Berard, Anne Bergeron, Cedric Chauve, and Christophe Paul in their paper “Perfect Sorting by Reversal Is Not Always Difficult,” where the authors propose new algorithms for computing pairwise rearrangements that conserve the combinatorial structure of genomes. Ortholog detection is important in functional genome annotation. Akshay Vashist, Casimir A. Kulikowsky, and Ilya Muchnick present a new clustering algorithm on a weighted multipartite graph for automatically extracting groups of orthologous genes in their paper “Ortholog Clustering on a Multipartite Graph.”Cryo-Electron Microscopy (EM) is a powerful technique for low-resolution description of large macromolecule assemblies that are difficult to solve at the atomic level. The analysis of low resolution electron maps, however, may highlight components of complexes, as is shown in the paper “EMatch: Discovery of High Resolution Structural Homologues of Protein Domains in Intermediate Resolution Cryo-EM Maps” by Keren Lasker, Oranit Dror, Maxim Shatsky, Ruth Nussinov, and Haim J. Wolfson. The authors describe a novel integrated approach for recognizing structural homologues of protein domains present in a 6- 10 A resolution cryo-EM map. Many thanks once again to the reviewers who helped make these issues possibl
    corecore