Search CORE

42,493 research outputs found

Composite repetition-aware data structures

Author: A Blumer
A Lempel
D Arroyuelo
D Belazzougui
DE Willard
J Radoszewski
J Sirén
J Ziv
M Crochemore
M Crochemore
M Raffinot
P Ferragina
S Kreft
T Gagie
V Mäkinen
V Mäkinen
W Rytter
Publication venue
Publication date: 01/01/2015
Field of study

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Computing LZ77 in Run-Compressed Space

Author: Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 21/10/2015
Field of study

In this paper, we show that the LZ77 factorization of a text T {\in\Sigma^n} can be computed in O(R log n) bits of working space and O(n log R) time, R being the number of runs in the Burrows-Wheeler transform of T reversed. For extremely repetitive inputs, the working space can be as low as O(log n) bits: exponentially smaller than the text itself. As a direct consequence of our result, we show that a class of repetition-aware self-indexes based on a combination of run-length encoded BWT and LZ77 can be built in asymptotically optimal O(R + z) words of working space, z being the size of the LZ77 parsing

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Moving Toward Non-transcription Based Discourse Analysis in Stable and Progressive Aphasia

Author: Dalton Sarah Grace
Hubbard H. Isabel
Richardson Jessica D.
Publication venue: e-Publications@Marquette
Publication date: 23/12/2019
Field of study

Measurement of communication ability at the discourse level holds promise for predicting how well persons with stable (e.g., stroke-induced), or progressive aphasia navigate everyday communicative interactions. However, barriers to the clinical utilization of discourse measures have persisted. Recent advancements in the standardization of elicitation protocols and the existence of large databases for development of normative references have begun to address some of these barriers. Still, time remains a consistently reported barrier by clinicians. Non-transcription based discourse measurement would reduce the time required for discourse analysis, making clinical utilization a reality. The purpose of this article is to present evidence regarding discourse measures (main concept analysis, core lexicon, and derived efficiency scores) that are well suited to non-transcription based analysis. Combined with previous research, our results suggest that these measures are sensitive to changes following stroke or neurodegenerative disease. Given the evidence, further research specifically assessing the reliability of these measures in clinical implementation is warranted

epublications@Marquette

Fast Label Extraction in the CDAWG

Author: A Blumer
D Belazzougui
D Gusfield
J Sirén
L Gasieniec
LS Russo
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Raffinot
MA Bender
O Berkman
T Gagie
V Mäkinen
V Mäkinen
Publication venue
Publication date: 26/09/2017
Field of study

The compact directed acyclic word graph (CDAWG) of a string

T

of length

n

takes space proportional just to the number

e

of right extensions of the maximal repeats of

T

, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which

e

grows significantly more slowly than

n

. We reduce from

O(m\log{\log{n}})

O(m)

the time needed to count the number of occurrences of a pattern of length

m

, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from

O(m\log{\log{n}}+\mathtt{occ})

O(m+\mathtt{occ})

in the time needed to locate all the

\mathtt{occ}

occurrences of the pattern. We also reduce from

O(k\log{\log{n}})

O(k)

the time needed to read the

k

characters of the label of an edge of the suffix tree of

T

, and we reduce from

O(m\log{\log{n}})

O(m)

the time needed to compute the matching statistics between a query of length

m

and

T

, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864

arXiv.org e-Print Archive

Crossref

Recommended from our members

Narrating the archive and archiving narrative: the electronic book and the logic of the index

Author: Davies Rosamund
Publication venue: Common Ground Publishing
Publication date: 01/01/2008
Field of study

The creation of my hypermedia work Index of Love, which narrates a love story as an archive of moments, images and objects recollected, also articulated for me the potential of the book as electronic text. The book has always existed as both narrative and archive. Tables of contents and indexes allow the book to function simultaneously as linear narrative and non-linear, searchable database. The book therefore has more in common with the so-called 'new media' of the 21st century than it does with the dominant 20th century media of film, video and audiotape, whose logic and mode of distribution are resolutely linear. My thesis is that the non-linear logic of new media brings to the fore an aspect of the book - the index - whose potential for the production of narrative is only just beginning to be explored. When a reader/user accesses an electronic work, such as a website, via its menu, they simultaneously experience it as narrative and archive. The narrative journey taken is created through the menu choices made. Within the electronic book, therefore, the index (or menu) has the potential to function as more than just an analytical or navigational tool. It has the potential to become a creative, structuring device. This opens up new possibilities for the book, particularly as, in its paper based form, the book indexes factual work, but not fiction. In the electronic book, however, the index offers as rich a potential for fictional narratives as it does for factual volumes. [ABSTRACT FROM AUTHOR

Greenwich Academic Literature Archive

From Perception to Recollection: A spatio-temporal mediated interaction

Author: Themistokleous George
Publication venue
Publication date: 25/03/2018
Field of study

De Montfort University Open Research Archive

Common aetiology for diverse language skills in 41/2-year-old twins

Author: Bishop D.V.M.
Dale P.S.
Harlaar N.
Hayiou-Thomas M.E.
Kovas Y.
Plomin R.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2006
Field of study

Multivariate genetic analysis was used to examine the genetic and environmental aetiology of the interrelationships of diverse linguistic skills. This study used data from a large sample of 4 1/2 year-old twins who were tested on measures assessing articulation, phonology, grammar, vocabulary, and verbal memory. Phenotypic analysis suggested two latent factors: articulation (2 measures) and general language (the remaining 7), and a genetic model incorporating these factors provided a good fit to the data. Almost all genetic and shared environmental influences on the 9 measures acted through the two latent factors. There was also substantial aetiological overlap between the two latent factors, with a genetic correlation of 0·64 and shared environment correlation of 1·00. We conclude that to a large extent, the same genetic and environmental factors underlie the development of individual differences in a wide range of linguistic skills

Goldsmiths Research Online

Oxford University Research Archive

King's Research Portal

White Rose Research Online