Search CORE

38,031 research outputs found

Real-time and distributed applications for dictionary-based data compression

Author: DE AGOSTINO Sergio
Publication venue: Petre Dini
Publication date: 01/01/2015
Field of study

The greedy approach to dictionary-based static text compression can be executed by a finite state machine. When it is applied in parallel to different blocks of data independently, there is no lack of robustness even on standard large scale distributed systems with input files of arbitrary size. Beyond standard large scale, a negative effect on the compression effectiveness is caused by the very small size of the data blocks. A robust approach for extreme distributed systems is presented in this paper, where this problem is fixed by overlapping adjacent blocks and preprocessing the neighborhoods of the boundaries. Moreover, we introduce the notion of pseudo-prefix dictionary, which allows optimal compression by means of a real-time semi-greedy procedure and a slight improvement on the compression ratio obtained by the distributed implementations

Archivio della ricerca- Università di Roma La Sapienza

Effect of a standardised dietary restriction protocol on multiple laboratory strains of Drosophila melanogaster

Author: Bass T.M.
Grandison R.C.
Partridge L.
Piper M.D.W.
Tanimoto H.
Wong R.
Publication venue
Publication date: 01/01/2009
Field of study

Background: Outcomes of lifespan studies in model organisms are particularly susceptible to variations in technical procedures. This is especially true of dietary restriction, which is implemented in many different ways among laboratories. Principal Findings: In this study, we have examined the effect of laboratory stock maintenance, genotype differences and microbial infection on the ability of dietary restriction (DR) to extend life in the fruit fly Drosophila melanogaster. None of these factors block the DR effect. Conclusions: These data lend support to the idea that nutrient restriction genuinely extends lifespan in flies, and that any mechanistic discoveries made with this model are of potential relevance to the determinants of lifespan in other organisms

Public Library of Science (PLOS)

Directory of Open Access Journals

UCL Discovery

PubMed Central

The University of Manchester - Institutional Repository

Repetitions in infinite palindrome-rich words

Author: A Blondin Massé
A Glen
A Luca de
A Ostrowski
C Guo
D Angluin
E Pelantová
F Dejean
G Rote
J Currie
J Vesti
J Vesti
M Bucci
M Crochemore
M Rao
M Rubinchik
Narad Rampersad
R Groult
S Brlek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/04/2019
Field of study

Rich words are characterized by containing the maximum possible number of distinct palindromes. Several characteristic properties of rich words have been studied; yet the analysis of repetitions in rich words still involves some interesting open problems. We address lower bounds on the repetition threshold of infinite rich words over 2 and 3-letter alphabets, and construct a candidate infinite rich word over the alphabet

\Sigma_2=\{0,1\}

with a small critical exponent of

2+\sqrt{2}/2

. This represents the first progress on an open problem of Vesti from 2017.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Internal Pattern Matching Queries in a Text and Applications

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 13/10/2014
Field of study

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword

x

in another subword

y

of a given text, assuming that

|y|=\mathcal{O}(|x|)

, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding

\delta

-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed

\delta

we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

arXiv.org e-Print Archive

Crossref

Optimal-Time Text Indexing in BWT-runs Bounded Space

Author: Gagie Travis
Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 11/07/2017
Field of study

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is

r

, the number of runs in their Burrows-Wheeler Transform (BWT). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used

O(r)

space and was able to efficiently count the number of occurrences of a pattern of length

m

in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of

r

. Since then, a number of other indexes with space bounded by other measures of repetitiveness --- the number of phrases in the Lempel-Ziv parse, the size of the smallest grammar generating the text, the size of the smallest automaton recognizing the text factors --- have been proposed for efficiently locating, but not directly counting, the occurrences of a pattern. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the