Search CORE

606 research outputs found

On Almost Monge All Scores Matrices

Author: Carmel Amir
Tsur Dekel
Ziv-Ukelson Michal
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

Dagstuhl Research Online Publication Server

Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit

Author: Crochemore Maxime
Landau Gad M.
Schieber Baruch
Ziv-Ukelson Michal
Publication venue: King's College London Publications
Publication date: 01/01/2005
Field of study

International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Finding the region of pseudo-periodic tandem repeats in biological sequences

Author: CRH Raetz
D Jaitly
EW Myers
GM Landau
H Wan
JP Schmidt
JS Sim
L Li
Lusheng Wang
M Tang
M Vaara
R Vuorio
V Biou
Xiaowen Liu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

SUMMARY: The genomes of many species are dominated by short sequences repeated consecutively. It is estimated that over 10% of the human genome consists of tandemly repeated sequences. Finding repeated regions in long sequences is important in sequence analysis. We develop a software, LocRepeat, that finds regions of pseudo-periodic repeats in a long sequence. We use the definition of Li et al. [1] for the pseudo-periodic partition of a region and extend the algorithm that can select the repeated region from a given long sequence and give the pseudo-periodic partition of the region. AVAILABILITY: LocRepeat is available a

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sequence Alignment in Molecular Biology

Author: Apostolico Alberto
Fiancarlo Raffaele
Publication venue: 'Purdue University (bepress)'
Publication date: 01/11/1995
Field of study

Purdue E-Pubs

An Almost Optimal Edit Distance Oracle

Author: Charalampopoulos Panagiotis
Gawrychowski Pawe?
Mozes Shay
Weimann Oren
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 01/01/2021
Field of study

We consider the problem of preprocessing two strings S and T, of lengths m and n, respectively, in order to be able to efficiently answer the following queries: Given positions i,j in S and positions a,b in T, return the optimal alignment score of S[i..j] and T[a..b]. Let N = mn. We present an oracle with preprocessing time N^{1+o(1)} and space N^{1+o(1)} that answers queries in log^{2+o(1)}N time. In other words, we show that we can efficiently query for the alignment score of every pair of substrings after preprocessing the input for almost the same time it takes to compute just the alignment of S and T. Our oracle uses ideas from our distance oracle for planar graphs [STOC 2019] and exploits the special structure of the alignment graph. Conditioned on popular hardness conjectures, this result is optimal up to subpolynomial factors. Our results apply to both edit distance and longest common subsequence (LCS). The best previously known oracle with construction time and size ?(N) has slow ?(?N) query time [Sakai, TCS 2019], and the one with size N^{1+o(1)} and query time log^{2+o(1)}N (using a planar graph distance oracle) has slow ?(N^{3/2}) construction time [Long & Pettie, SODA 2021]. We improve both approaches by roughly a ? N factor

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Computing alignment plots efficiently

Author: Krusche Peter
Tiskin Alexander
Publication venue
Publication date: 10/09/2009
Field of study

Dot plots are a standard method for local comparison of biological sequences. In a dot plot, a substring to substring distance is computed for all pairs of fixed-size windows in the input strings. Commonly, the Hamming distance is used since it can be computed in linear time. However, the Hamming distance is a rather crude measure of string similarity, and using an alignment-based edit distance can greatly improve the sensitivity of the dot plot method. In this paper, we show how to compute alignment plots of the latter type efficiently. Given two strings of length m and n and a window size w, this problem consists in computing the edit distance between all pairs of substrings of length w, one from each input string. The problem can be solved by repeated application of the standard dynamic programming algorithm in time O(mnw^2). This paper gives an improved data-parallel algorithm, running in time

O(mnw/\gamma/p)

using vector operations that work on

\gamma

values in parallel and

p

processors. We show experimental results from an implementation of this algorithm, which uses Intel's MMX/SSE instructions for vector parallelism and MPI for coarse-grained parallelism.Comment: Presented at ParCo 200

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Exact Distance Oracles for Planar Graphs

Author: Mozes Shay
Sommer Christian
Publication venue
Publication date: 01/01/2010
Field of study

We present new and improved data structures that answer exact node-to-node distance queries in planar graphs. Such data structures are also known as distance oracles. For any directed planar graph on n nodes with non-negative lengths we obtain the following: * Given a desired space allocation

S\in[n\lg\lg n,n^2]

, we show how to construct in

\tilde O(S)

time a data structure of size

O(S)

that answers distance queries in

\tilde O(n/\sqrt S)

time per query. As a consequence, we obtain an improvement over the fastest algorithm for k-many distances in planar graphs whenever

k\in[\sqrt n,n)

. * We provide a linear-space exact distance oracle for planar graphs with query time

O(n^{1/2+eps})

for any constant eps>0. This is the first such data structure with provable sublinear query time. * For edge lengths at least one, we provide an exact distance oracle of space

\tilde O(n)

such that for any pair of nodes at distance D the query time is

\tilde O(min {D,\sqrt n})

. Comparable query performance had been observed experimentally but has never been explained theoretically. Our data structures are based on the following new tool: given a non-self-crossing cycle C with

c = O(\sqrt n)

nodes, we can preprocess G in

\tilde O(n)

time to produce a data structure of size

O(n \lg\lg c)

that can answer the following queries in

\tilde O(c)

time: for a query node u, output the distance from u to all the nodes of C. This data structure builds on and extends a related data structure of Klein (SODA'05), which reports distances to the boundary of a face, rather than a cycle. The best distance oracles for planar graphs until the current work are due to Cabello (SODA'06), Djidjev (WG'96), and Fakcharoenphol and Rao (FOCS'01). For

\sigma\in(1,4/3)

and space

S=n^\sigma

, we essentially improve the query time from

n^2/S

\sqrt{n^2/S}

.Comment: To appear in the proceedings of the 23rd ACM-SIAM Symposium on Discrete Algorithms, SODA 201

arXiv.org e-Print Archive

CiteSeerX