Search CORE

2,444 research outputs found

Longest Common Extensions in Sublinear Space

Author: A Amir
D Gusfield
D Harel
EW Myers
G Manacher
GM Landau
GM Landau
GM Landau
MG Main
NJ Fine
P Bille
R Cole
R Kolpakov
RM Karp
Publication venue
Publication date: 01/01/2015
Field of study

The longest common extension problem (LCE problem) is to construct a data structure for an input string

T

of length

n

that supports LCE

(i,j)

queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions

i

and

j

T

. This classic problem has a well-known solution that uses

O(n)

space and

O(1)

query time. In this paper we show that for any trade-off parameter

1 \leq \tau \leq n

, the problem can be solved in

O(\frac{n}{\tau})

space and

O(\tau)

query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.Comment: An extended abstract of this paper has been accepted to CPM 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

Practical Performance of Space Efficient Data Structures for Longest Common Extensions

Author: Dinklage Patrick
Fischer Johannes
Herlez Alexander
Kociumaka Tomasz
Kurpicz Florian
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 01/01/2020
Field of study

KITopen

Dagstuhl Research Online Publication Server

Music Retrieval System Using Query-by-Humming

Author: Patel Parth
Publication venue: SJSU ScholarWorks
Publication date: 11/12/2019
Field of study

Music Information Retrieval (MIR) is a particular research area of great interest because there are various strategies to retrieve music. To retrieve music, it is important to find a similarity between the input query and the matching music. Several solutions have been proposed that are currently being used in the application domain(s) such as Query- by-Example (QBE) which takes a sample of an audio recording playing in the background and retrieves the result. However, there is no efficient approach to solve this problem in a Query-by-Humming (QBH) application. In a Query-by-Humming application, the aim is to retrieve music that is most similar to the hummed query in an efficient manner. In this paper, I shall discuss the different music information retrieval techniques and their system architectures. Moreover, I will discuss the Query-by-Humming approach and its various techniques that allow for a novel method for music retrieval. Lastly, we conclude that the proposed system was effective combined with the MIDI dataset and custom hummed queries that were recorded from a sample of people. Although, the MRR was measured at 0.82 – 0.90 for only 100 songs in the database, the retrieval time was very high. Therefore, improving the retrieval time and Deep Learning approaches are suggested for future work

SJSU ScholarWorks

Universal Compressed Text Indexing

Author: Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 06/09/2018
Field of study

The rise of repetitive datasets has lately generated a lot of interest in compressed self-indexes based on dictionary compression, a rich and heterogeneous family that exploits text repetitions in different ways. For each such compression scheme, several different indexing solutions have been proposed in the last two decades. To date, the fastest indexes for repetitive texts are based on the run-length compressed Burrows-Wheeler transform and on the Compact Directed Acyclic Word Graph. The most space-efficient indexes, on the other hand, are based on the Lempel-Ziv parsing and on grammar compression. Indexes for more universal schemes such as collage systems and macro schemes have not yet been proposed. Very recently, Kempa and Prezza [STOC 2018] showed that all dictionary compressors can be interpreted as approximation algorithms for the smallest string attractor, that is, a set of text positions capturing all distinct substrings. Starting from this observation, in this paper we develop the first universal compressed self-index, that is, the first indexing data structure based on string attractors, which can therefore be built on top of any dictionary-compressed text representation. Let

\gamma

be the size of a string attractor for a text of length

n

. Our index takes

O(\gamma\log(n/\gamma))

words of space and supports locating the

occ

occurrences of any pattern of length

m

O(m\log n + occ\log^{\epsilon}n)

time, for any constant

\epsilon>0

. This is, in particular, the first index for general macro schemes and collage systems. Our result shows that the relation between indexing and compression is much deeper than what was previously thought: the simple property standing at the core of all dictionary compressors is sufficient to support fast indexed queries.Comment: Fixed with reviewer's comment

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Repositorio Académico de la Universidad de Chile

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Time-space trade-offs for lempel-ziv compressed indexing

Author: Bille Philip
Ettienne Mikko Berggren
Gørtz Inge Li
Vildhøj Hjalte Wedel
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

Given a string

S

, the \emph{compressed indexing problem} is to preprocess

S

into a compressed representation that supports fast \emph{substring queries}. The goal is to use little space relative to the compressed size of

S

while supporting fast queries. We present a compressed index based on the Lempel--Ziv 1977 compression scheme. We obtain the following time-space trade-offs: For constant-sized alphabets; (i)

O(m + occ \lg\lg n)

time using

O(z\lg(n/z)\lg\lg z)

space, or (ii)

O(m(1 + \frac{\lg^\epsilon z}{\lg(n/z)}) + occ(\lg\lg n + \lg^\epsilon z))

time using

O(z\lg(n/z))

space. For integer alphabets polynomially bounded by

n

; (iii)

O(m(1 + \frac{\lg^\epsilon z}{\lg(n/z)}) + occ(\lg\lg n + \lg^\epsilon z))

time using

O(z(\lg(n/z) + \lg\lg z))

space, or (iv)

O(m + occ(\lg\lg n + \lg^{\epsilon} z))

time using

O(z(\lg(n/z) + \lg^{\epsilon} z))

space, where

n

and

m

are the length of the input string and query string respectively,

z

is the number of phrases in the LZ77 parse of the input string,

occ

is the number of occurrences of the query in the input and

\epsilon > 0

is an arbitrarily small constant. In particular, (i) improves the leading term in the query time of the previous best solution from

O(m\lg m)

O(m)

at the cost of increasing the space by a factor

\lg \lg z

. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of

O(m(1+\frac{\lg^{\epsilon} z}{\lg (n/z)}))

. However, for any polynomial compression ratio, i.e.,

z = O(n^{1-\delta})

, for constant

\delta > 0

, this becomes

O(m)

. Our index also supports extraction of any substring of length

\ell

O(\ell + \lg(n/z))

time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search

arXiv.org e-Print Archive

Online Research Database In Technology