Search CORE

3,062 research outputs found

Efficient CRCW-PRAM Algorithms Combining Multiple Autonomous Databases

Author: Apostolico Alberto
Publication venue: 'Purdue University (bepress)'
Publication date: 18/02/1991
Field of study

Purdue E-Pubs

Notes on Learning Probabilistic Automata

Author: Apostolico Alberto
Publication venue: 'Purdue University (bepress)'
Publication date: 01/09/1999
Field of study

Purdue E-Pubs

Archivio istituzionale della ricerca - Università di Padova

Space-efficient detection of unusual words

Author: A Apostolico
A Apostolico
CAR Hoare
D Belazzougui
D Belazzougui
J Herold
J Lin
M Crochemore
S Chairungsee
Publication venue
Publication date: 01/01/2015
Field of study

Detecting all the strings that occur in a text more frequently or less frequently than expected according to an IID or a Markov model is a basic problem in string mining, yet current algorithms are based on data structures that are either space-inefficient or incur large slowdowns, and current implementations cannot scale to genomes or metagenomes in practice. In this paper we engineer an algorithm based on the suffix tree of a string to use just a small data structure built on the Burrows-Wheeler transform, and a stack of

O(\sigma^2\log^2 n)

bits, where

n

is the length of the string and

\sigma

is the size of the alphabet. The size of the stack is

o(n)

except for very large values of

\sigma

. We further improve the algorithm by removing its time dependency on

\sigma

, by reporting only a subset of the maximal repeats and of the minimal rare words of the string, and by detecting and scoring candidate under-represented strings that

\textit{do not occur}

in the string. Our algorithms are practical and work directly on the BWT, thus they can be immediately applied to a number of existing datasets that are available in this form, returning this string mining problem to a manageable scale.Comment: arXiv admin note: text overlap with arXiv:1502.0637

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Compression of Biological Sequences by Greedy Off-Line Textual Subsitution

Author: Apostolico Alberto
Lonardi Stefano
Publication venue: 'Purdue University (bepress)'
Publication date: 01/11/1999
Field of study

Purdue E-Pubs

10231 Abstracts Collection -- Structure Discovery in Biology: Motifs, Networks & Phylogenies

Author: Apostolico Alberto
Dress Andreas
Parida Laxmi
Publication venue: Dagstuhl Seminar Proceedings. 10231 - Structure Discovery in Biology: Motifs, Networks & Phylogenies
Publication date: 01/01/2010
Field of study

From 06.06. to 11.06.2010, the Dagstuhl Seminar 10231 ``Structure Discovery in Biology: Motifs, Networks & Phylogenies \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

Of Periods, Quasiperiods, Repetitions and Covers

Author: Apostolico Alberto
Breslauer Dany
Publication venue: 'Purdue University (bepress)'
Publication date: 01/03/1997
Field of study

Purdue E-Pubs