4 research outputs found

    Optimal Color Range Reporting in One Dimension

    Full text link
    Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range QQ, the answer to a color reporting query contains only distinct colors of points in QQ. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal O(k+1)O(k+1) time, where kk is the number of colors in the answer and NN is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model

    On Optimal Top-K String Retrieval

    Full text link
    Let D{\cal{D}} = {d1,d2,d3,...,dD}\{d_1, d_2, d_3, ..., d_D\} be a given set of DD (string) documents of total length nn. The top-kk document retrieval problem is to index D\cal{D} such that when a pattern PP of length pp, and a parameter kk come as a query, the index returns the kk most relevant documents to the pattern PP. Hon et. al. \cite{HSV09} gave the first linear space framework to solve this problem in O(p+klogk)O(p + k\log k) time. This was improved by Navarro and Nekrich \cite{NN12} to O(p+k)O(p + k). These results are powerful enough to support arbitrary relevance functions like frequency, proximity, PageRank, etc. In many applications like desktop or email search, the data resides on disk and hence disk-bound indexes are needed. Despite of continued progress on this problem in terms of theoretical, practical and compression aspects, any non-trivial bounds in external memory model have so far been elusive. Internal memory (or RAM) solution to this problem decomposes the problem into O(p)O(p) subproblems and thus incurs the additive factor of O(p)O(p). In external memory, these approaches will lead to O(p)O(p) I/Os instead of optimal O(p/B)O(p/B) I/O term where BB is the block-size. We re-interpret the problem independent of pp, as interval stabbing with priority over tree-shaped structure. This leads us to a linear space index in external memory supporting top-kk queries (with unsorted outputs) in near optimal O(p/B+logBn+log(h)n+k/B)O(p/B + \log_B n + \log^{(h)} n + k/B) I/Os for any constant hh{log(1)n=logn\log^{(1)}n =\log n and log(h)n=log(log(h1)n)\log^{(h)} n = \log (\log^{(h-1)} n)}. Then we get O(nlogn)O(n\log^*n) space index with optimal O(p/B+logBn+k/B)O(p/B+\log_B n + k/B) I/Os.Comment: 3 figure

    Efficient Indexing for Structured and Unstructured Data

    Get PDF
    The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query the data. Indexing techniques aim to reduce the query time by preprocessing the data. Diversity of data sources in real world makes it imperative to develop application specific indexing solutions based on the data to be queried. Data can be structured i.e., relational tables or unstructured i.e., free text. Moreover, increasingly many applications need to seamlessly analyze both kinds of data making data integration a central issue. Integrating text with structured data needs to account for missing values, errors in the data etc. Probabilistic models have been proposed recently for this purpose. These models are also useful for applications where uncertainty is inherent in data e.g. sensor networks. This dissertation aims to propose efficient indexing solutions for several problems that lie at the intersection of database and information retrieval such as joining ranked inputs, full-text documents searching etc. Other well-known problems of ranked retrieval and pattern matching are also studied under probabilistic settings. For each problem, the worst-case theoretical bounds of the proposed solutions are established and/or their practicality is demonstrated by thorough experimentation
    corecore