Search CORE

4 research outputs found

Optimal Color Range Reporting in One Dimension

Author: B. Chazelle
D.E. Willard
E.M. McCreight
L. Arge
M. Thorup
M.L. Fredman
P. Beame
P. Emde Boas van
P. Gupta
P.B. Miltersen
Q. Shi
R. Janardan
T.M. Chan
Publication venue
Publication date: 01/01/2013
Field of study

Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range

Q

, the answer to a color reporting query contains only distinct colors of points in

Q

. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal

O(k+1)

time, where

k

is the number of colors in the answer and

N

is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model

arXiv.org e-Print Archive

Crossref

On Optimal Top-K String Retrieval

Author: Shah Rahul
Sheng Cheng
Thankachan Sharma V.
Vitter Jeffrey Scott
Publication venue
Publication date: 01/01/2012
Field of study

Let

{\cal{D}}

\{d_1, d_2, d_3, ..., d_D\}

be a given set of

D

(string) documents of total length

n

. The top-

k

document retrieval problem is to index

\cal{D}

such that when a pattern

P

of length

p

, and a parameter

k

come as a query, the index returns the

k

most relevant documents to the pattern

P

. Hon et. al. \cite{HSV09} gave the first linear space framework to solve this problem in

O(p + k\log k)

time. This was improved by Navarro and Nekrich \cite{NN12} to

O(p + k)

. These results are powerful enough to support arbitrary relevance functions like frequency, proximity, PageRank, etc. In many applications like desktop or email search, the data resides on disk and hence disk-bound indexes are needed. Despite of continued progress on this problem in terms of theoretical, practical and compression aspects, any non-trivial bounds in external memory model have so far been elusive. Internal memory (or RAM) solution to this problem decomposes the problem into

O(p)

subproblems and thus incurs the additive factor of

O(p)

. In external memory, these approaches will lead to

O(p)

I/Os instead of optimal

O(p/B)

I/O term where

B

is the block-size. We re-interpret the problem independent of

p

, as interval stabbing with priority over tree-shaped structure. This leads us to a linear space index in external memory supporting top-

k

queries (with unsorted outputs) in near optimal

O(p/B + \log_B n + \log^{(h)} n + k/B)

I/Os for any constant

h

{

\log^{(1)}n =\log n

and

\log^{(h)} n = \log (\log^{(h-1)} n)

}. Then we get

O(n\log^*n)

space index with optimal

O(p/B+\log_B n + k/B)

I/Os.Comment: 3 figure

arXiv.org e-Print Archive

CiteSeerX

Efficient Indexing for Structured and Unstructured Data

Author: Patil Manish Madhukar
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query the data. Indexing techniques aim to reduce the query time by preprocessing the data. Diversity of data sources in real world makes it imperative to develop application specific indexing solutions based on the data to be queried. Data can be structured i.e., relational tables or unstructured i.e., free text. Moreover, increasingly many applications need to seamlessly analyze both kinds of data making data integration a central issue. Integrating text with structured data needs to account for missing values, errors in the data etc. Probabilistic models have been proposed recently for this purpose. These models are also useful for applications where uncertainty is inherent in data e.g. sensor networks. This dissertation aims to propose efficient indexing solutions for several problems that lie at the intersection of database and information retrieval such as joining ranked inputs, full-text documents searching etc. Other well-known problems of ranked retrieval and pattern matching are also studied under probabilistic settings. For each problem, the worst-case theoretical bounds of the proposed solutions are established and/or their practicality is demonstrated by thorough experimentation

Louisiana State University

Algorithms and Data Structures for Strings, Points and Integers:or, Points about Strings and Strings about Points

Author: Vind Søren Juhl
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology