Search CORE

173 research outputs found

An efficient algorithm for sequence comparison with block reversals

Author: Cenk Sahinalp S.
Muthukrishnan S.
Publication venue: Published by Elsevier B.V.
Publication date: 16/06/2004
Field of study

AbstractGiven two sequences X and Y that are strings over some alphabet set, we consider the distance d(X,Y) between them defined to be minimum number of character replacements and block (substring) reversals needed to transform X to Y (or vice versa). The operations are required to be disjoint. This is the “simplest” sequence comparison problem we know of that allows natural block edit operations. Block reversals arise naturally in genomic sequence comparison; they are also of interest in matching music data. We present an algorithm for exactly computing the distance d(X,Y); it takes time O(|X|log2|X|), and hence, is near-linear. Trivial approach takes quadratic time

Elsevier - Publisher Connector

Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version

Author: Sahinalp S. Cenk
Salari Raheleh
Schönhuth Alexander
Publication venue
Publication date: 11/06/2010
Field of study

Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Fast prediction of RNA-RNA interaction

Author: Backofen Rolf
Sahinalp S Cenk
Salari Raheleh
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Regulatory antisense RNAs are a class of ncRNAs that regulate gene expression by prohibiting the translation of an mRNA by establishing stable interactions with a target sequence. There is great demand for efficient computational methods to predict the specific interaction between an ncRNA and its target mRNA(s). There are a number of algorithms in the literature which can predict a variety of such interactions - unfortunately at a very high computational cost. Although some existing target prediction approaches are much faster, they are specialized for interactions with a single binding site. Methods In this paper we present a novel algorithm to accurately predict the minimum free energy structure of RNA-RNA interaction under the most general type of interactions studied in the literature. Moreover, we introduce a fast heuristic method to predict the specific (multiple) binding sites of two interacting RNAs. Results We verify the performance of our algorithms for joint structure and binding site prediction on a set of known interacting RNA pairs. Experimental results show our algorithms are highly accurate and outperform all competitive approaches.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

A Multi-labeled Tree Edit Distance for Comparing "Clonal Trees" of Tumor Progression

Author: Karpov Nikolai
Malikic Salem
Rahman Md. Khaledur
Sahinalp S. Cenk
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

We introduce a new edit distance measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree edit distance (MLTED) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximal common tree. We show that the MLTED measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well. We have implemented our algorithm to compute MLTED exactly and applied it to a variety of data sets successfully. The source code of our method can be found in: https://github.com/khaled-rahman/leafDelTED

Dagstuhl Research Online Publication Server

Mirroring co-evolving trees in the light of their topologies

Author: Hajirasouliha Iman
Juan David
Sahinalp S. Cenk
Schönhuth Alexander
Valencia Alfonso
Publication venue
Publication date: 01/01/2011
Field of study

Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to maximize the distance matrices corresponding to the tree topologies in question. In this paper we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 minute on a single processor vs. 730 hours on a supercomputer. Furthermore we have advantages over the current state-of-the-art heuristic search approach in terms of precision as well as a recently suggested overall performance measure for mirrortree approaches, while incurring only acceptable losses in recall. A C implementation of the method demonstrated in this paper is available at http://compbio.cs.sfu.ca/mirrort.htmComment: 13 pages, 2 figures, Iman Hajirasouliha and Alexander Sch\"onhuth are joint first author

arXiv.org e-Print Archive

CiteSeerX

Sparsification of RNA structure prediction including pseudoknots

Author: Backofen Rolf
Möhl Mathias
Sahinalp S Cenk
Salari Raheleh
Will Sebastian
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem. Results In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms, among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerich algorithm). In all algorithms the number of "candidate" substructures to be considered is reduced. Conclusions Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.</p

CiteSeerX

Crossref

DSpace@MIT

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution

Author: Fereydoun Hormozdiari
Mark B Gerstein
Nataša Pržulj
Petra Berenbrink
S. Cenk Sahinalp
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

UCL Discovery

eScholarship - University of California

A Multi-Labeled Tree Dissimilarity Measure for Comparing “Clonal Trees” of Tumor Progression

Author: Karpov Nikolai
Malikic Salem
Rahman Md. Khaledur
Sahinalp S. Cenk
Publication venue
Publication date: 27/07/2019
Field of study

We introduce a new dissimilarity measure between a pair of “clonal trees”, each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. We show that the MLTD measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well

Simon Fraser University Institutional Repository