Search CORE

3 research outputs found

RIME: Repeat Identification

Author: Federico M
Peterlongo P
PISANTI NADIA
Sagot MF
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

We present an algorithm for detecting long similar fragments occurring at least twice in a set of biological sequences. The problem becomes computationally challenging when the frequency of a repeat is allowed to increase and when a non-negligible number of insertions, deletions and substitutions are allowed. We introduce in this paper an algorithm, Rime1 1 Rime is also a reference to Coleridge's poem "The Rime of an Ancient Mariner" which contains many repetitions as a poetic device. (for Repeat Identification: long, Multiple, and with Edits) that performs this task, and manages instances whose size and combination of parameters cannot be handled by other currently existing methods. This is achieved by using a filter as a preprocessing step, and by then exploiting the information gathered by the filter in the following actual repeat inference step. To the best of our knowledge, Rime is the first algorithm that can accurately deal with very long repeats (up to a few thousands), occurring possibly several times, and with a rate of differences (substitutions and indels) allowed among copies of a same repeat of 10-15% or even more

Archivio della Ricerca - Università di Pisa

Extracting string motif bases for quorum higher than two

Author: ROMBO Simona Ester
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Bases of generators of motifs consisting of strings in which some positions can be occupied by a don’t care provide a useful conceptual tool for their description and a way to reduce the time and space involved in the discovery process. In the last few years, a few algorithms have been proposed for the extraction of a basis, building in large part on combinatorial properties of strings and their autocorrelations. Currently, the most efficient techniques for binary alphabets and quorum q = 2 require time quadratic in the length of the host string. The present paper explores properties of motif bases for quorum q ≥ 2, both with binary and general alphabets, by also showing that important results holding for quorum q = 2 cannot be extended to this, more general, case. Furthermore, the extraction of motifs in which a bound is set on the maximum allowed number of don’t cares is addressed, and suitable algorithms are proposed whose computational complexity depends on the fixed bound

Elsevier - Publisher Connector

Crossref

Open Access Repository

Archivio istituzionale della ricerca - Università di Palermo

Extracting string motif bases for quorum higher than two

Author: Apostolico
Apostolico
Apostolico
Apostolico
Apostolico
Brazma
Grossi
Grossi
Gusfield
Parida
Pelfrêne
Pisanti
Simona E. Rombo
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref