Search CORE

4,212 research outputs found

Matching Subsequences in Trees

Author: D. Harel
H. Yang
P. Kilpeläinen
P. Zezula
R.A. Baeza-Yates
T. Hagerup
T. Schlieder
W. Chen
Publication venue
Publication date: 01/01/2006
Field of study

Given two rooted, labeled trees

P

and

T

the tree path subsequence problem is to determine which paths in

P

are subsequences of which paths in

T

. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new algorithms improving the previously best known time and space bounds.Comment: Minor correction of typos, et

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Research Database In Technology

Compressed Spaced Suffix Arrays

Author: Gagie Travis
Manzini Giovanni
Valenzuela Daniel
Publication venue
Publication date: 01/01/2014
Field of study

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to normal suffix arrays (SAs) and still support fast random access to them. We first prove a theoretical upper bound on the space needed to store an SSA when we already have the SA. We then present experiments indicating that our approach works even better in practice

arXiv.org e-Print Archive

CiteSeerX

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

LRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations

Author: Barbay Jérémy
Fischer Johannes
Publication venue
Publication date: 29/09/2010
Field of study

LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data structures to algorithms: some compressed succinct indices for range minimum queries; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and indirect application in time all the shortest as the permutation is compressible.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Computer Aided Simulation of DNA Fingerprint Amplified Fragment Length Polymophism (AFLP) Using Suffix Tree Indexing and Data Mining

Author: Budiman Agung
Prilianti Kestrilia Rega
Putra Sulistyo Emantoko Dwi
Whendy Bhinawan
Publication venue: 'Universitas Gadjah Mada'
Publication date: 01/09/2011
Field of study

AFLP is one of the DNA Fingerprinting techniques which have broad application as genetic marker in various fields. Begin with the DNA sequence digestion using one or more particular restriction enzyme, ligation of the adapters to the overhanging sticky ends followed by DNA fragments amplification using PCR. The PCR reaction uses primers that match the adapter sequence and have some (1 to 3) dditional “selective” bases which could be any bases, this reduces the number of bands that will be amplified. Such technique intended to increase the amplified fragments peculiarity so the polymorphism of the organism being studied could be well visualized by gel electrophoresis. The computer aided of AFLP simulation developed in this research was aimed to predict this electrophoresis result by simulate the digestion, ligation and PCR process using some pattern recognition algorithm applied to the DNA sequence from online databases. Through this simulation the researcher could determine the best combination of restriction enzyme and selective bases for their laboratory experiment. Suffix tree indexing was conducted during the exploration process of the genome sequence (in FASTA format) to find the restriction sites rapidly and create fragments of it. Data modeling enable the system draws the fragments into virtual DNA’s electrophoresis pattern. Data mining accomplish the simulation by exploring overall possible virtual DNA’s electrophoresis pattern and determine the best restriction enzyme and selective bases combination by calculating certain quantitative criteria

University of Surabaya Institutional Repository

Faster Approximate String Matching for Short Patterns

Author: A. Andersson
A.H. Wright
D. Gusfield
D. Harel
D.E. Knuth
E. Ukkonen
E. Ukkonen
E.W. Myers
F.T. Leighton
G. Myers
G. Navarro
G.M. Landau
H. Hyyrö
K.E. Batcher
M. Farach-Colton
M.A. Bender
P. Bille
P. Sellers
Philip Bille
R. Baeza-Yates
R. Cole
R.A. Baeza-Yates
R.A. Wagner
S. Albers
S. Alstrup
S. Wu
S.C. Sahinalp
T. Hagerup
T.H. Cormen
V.L. Arlazarov
W. Masek
Z. Galil
Z. Galil
Publication venue
Publication date: 17/03/2011
Field of study

We study the classical approximate string matching problem, that is, given strings

P

and

Q

and an error threshold

k

, find all ending positions of substrings of

Q

whose edit distance to

P

is at most

k

. Let

P

and

Q

have lengths

m

and

n

, respectively. On a standard unit-cost word RAM with word size

w \geq \log n

we present an algorithm using time

O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n)

When

P

is short, namely,

m = 2^{o(\sqrt{\log n})}

m = 2^{o(\sqrt{w/\log w})}

this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.Comment: To appear in Theory of Computing System

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology