Search CORE

478,120 research outputs found

String Searching with Ranking Constraints and Uncertainty

Author: Biswas Sudip
Publication venue: LSU Digital Commons
Publication date: 01/01/2015
Field of study

Strings play an important role in many areas of computer science. Searching pattern in a string or string collection is one of the most classic problems. Different variations of this problem such as document retrieval, ranked document retrieval, dictionary matching has been well studied. Enormous growth of internet, large genomic projects, sensor networks, digital libraries necessitates not just efficient algorithms and data structures for the general string indexing, but indexes for texts with fuzzy information and support for queries with different constraints. This dissertation addresses some of these problems and proposes indexing solutions. One such variation is document retrieval query for included and excluded/forbidden patterns, where the objective is to retrieve all the relevant documents that contains the included patterns and does not contain the excluded patterns. We continue the previous work done on this problem and propose more efficient solution. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. The forbidden pattern problem suffers from the drawback that linear space (in words) solutions are unlikely to yield a solution better than O(root(n/occ)) per document reporting time, where n is the total length of the documents and occ is the number of output documents. Continuing this path, we introduce a new variation, namely document retrieval with forbidden extension query, where the forbidden pattern is an extension of the included pattern.We also address the more general top-k version of the problem, which retrieves the top k documents, where the ranking is based on PageRank relevance metric. This problem finds motivation from search applications. It also holds theoretical interest as we show that the hardness of forbidden pattern problem is alleviated in this problem. We achieve linear space and optimal query time for this variation. We also propose succinct indexes for both these problems. Position restricted pattern matching considers the scenario where only part of the text is searched. We propose succinct index for this problem with efficient query time. An important application for this problem stems from searching in genomic sequences, where only part of the gene sequence is searched for interesting patterns. The problem of computing discriminating(resp. generic) words is to report all minimal(resp. maximal) extensions of a query pattern which are contained in at most(resp. at least) a given number of documents. These problems are motivated from applications in computational biology, text mining and automated text classification. We propose succinct indexes for these problems. Strings with uncertainty and fuzzy information play an important role in increasingly many applications. We propose a general framework for indexing uncertain strings such that a deterministic query string can be searched efficiently. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We consider two basic problems of string searching, namely substring searching and string listing. We formulate these well known problems for uncertain strings paradigm and propose exact and approximate solution for them. We also discuss a constrained variation of orthogonal range searching. Given a set of points, the task of orthogonal range searching is to build a data structure such that all the points inside a orthogonal query region can be reported. We introduce a new variation, namely shared constraint range searching which naturally arises in constrained pattern matching applications. Shared constraint range searching is a special four sided range reporting query problem where two constraints has sharing among them, effectively reducing the number of independent constraints. For this problem, we propose a linear space index that can match the best known bound for three dimensional dominance reporting problem. We extend our data structure in the external memory model

Louisiana State University

Range Queries on Uncertain Data

Author: B Chazelle
B Chazelle
B Chazelle
B Chazelle
G Frederickson
J Driscoll
J Mitchell
M Yiu
P Agarwal
Publication venue
Publication date: 09/01/2015
Field of study

Given a set

P

n

uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on

P

to answer range queries of the following three types for any query interval

I

: (1) top-

1

query: find the point in

P

that lies in

I

with the highest probability, (2) top-

k

query: given any integer

k\leq n

as part of the query, return the

k

points in

P

that lie in

I

with the highest probabilities, and (3) threshold query: given any threshold

\tau

as part of the query, return all points of

P

that lie in

I

with probabilities at least

\tau

. We present data structures for these range queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014. In this full version, we also present solutions to the most general case of the problem (i.e., the histogram bounded case), which were left as open problems in the preliminary versio

arXiv.org e-Print Archive

Crossref

A Dynamic I/O-Efficient Structure for One-Dimensional Top-k Range Reporting

Author: Tao Yufei
Publication venue
Publication date: 01/01/2014
Field of study

We present a structure in external memory for "top-k range reporting", which uses linear space, answers a query in O(lg_B n + k/B) I/Os, and supports an update in O(lg_B n) amortized I/Os, where n is the input size, and B is the block size. This improves the state of the art which incurs O(lg^2_B n) amortized I/Os per update.Comment: In PODS'1

arXiv.org e-Print Archive

University of Queensland eSpace

Orthogonal Range Reporting and Rectangle Stabbing for Fat Rectangles

Author: A Efrat
B Aronov
B Chazelle
B Chazelle
B Chazelle
EM McCreight
JL Bentley
M Berg de
M Karpinski
MJ Katz
P Afshani
TM Chan
Y Nekrich
Publication venue
Publication date: 06/05/2019
Field of study

In this paper we study two geometric data structure problems in the special case when input objects or queries are fat rectangles. We show that in this case a significant improvement compared to the general case can be achieved. We describe data structures that answer two- and three-dimensional orthogonal range reporting queries in the case when the query range is a \emph{fat} rectangle. Our two-dimensional data structure uses

O(n)

words and supports queries in

O(\log\log U +k)

time, where

n

is the number of points in the data structure,

U

is the size of the universe and

k

is the number of points in the query range. Our three-dimensional data structure needs

O(n\log^{\varepsilon}U)

words of space and answers queries in

O(\log \log U + k)

time. We also consider the rectangle stabbing problem on a set of three-dimensional fat rectangles. Our data structure uses

O(n)

space and answers stabbing queries in

O(\log U\log\log U +k)

time.Comment: extended version of a WADS'19 pape

arXiv.org e-Print Archive

Crossref

Carleton University's Institutional Repository

Optimal Color Range Reporting in One Dimension

Author: B. Chazelle
D.E. Willard
E.M. McCreight
L. Arge
M. Thorup
M.L. Fredman
P. Beame
P. Emde Boas van
P. Gupta
P.B. Miltersen
Q. Shi
R. Janardan
T.M. Chan
Publication venue
Publication date: 01/01/2013
Field of study

Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range

Q

, the answer to a color reporting query contains only distinct colors of points in

Q

. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal

O(k+1)

time, where

k

is the number of colors in the answer and

N

is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model

arXiv.org e-Print Archive

Crossref

PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn

Author: Aggarwal G.
Greenberg A.
Kreps J.
Voigt P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/09/2018
Field of study

Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable analytics in a privacy-preserving manner, while satisfying product requirements. We present PriPeARL, a framework for privacy-preserving analytics and reporting, inspired by differential privacy. We describe the overall design and architecture, and the key modeling components, focusing on the unique challenges associated with privacy, coverage, utility, and consistency. We perform an experimental study in the context of ads analytics and reporting at LinkedIn, thereby demonstrating the tradeoffs between privacy and utility needs, and the applicability of privacy-preserving mechanisms to real-world data. We also highlight the lessons learned from the production deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information and Knowledge Management (CIKM 2018

arXiv.org e-Print Archive

Crossref

I/O-Efficient Planar Range Skyline and Attrition Priority Queues

Author: Kejlberg-Rasmussen Casper
Tao Yufei
Tsakalidis Konstantinos
Tsichlas Kostas
Yoon Jeonghun
Publication venue
Publication date: 01/01/2013
Field of study

In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.234

arXiv.org e-Print Archive

Crossref

Hong Kong University of Science and Technology Institutional Repository