29 research outputs found
On the Cost of Negation for Dynamic Pruning
Negated query terms allow documents containing such terms to be filtered out of a search results list, supporting disambiguation. In this work, the effect of negation on the efficiency of disjunctive, top-k retrieval is examined. First, we show how negation can be integrated efficiently into two popular dynamic pruning algorithms. Then, we explore the efficiency of our approach, and show that while often efficient, negation can negatively impact the dynamic pruning effectiveness for certain queries
Reverse k Nearest Neighbor Search over Trajectories
GPS enables mobile devices to continuously provide new opportunities to
improve our daily lives. For example, the data collected in applications
created by Uber or Public Transport Authorities can be used to plan
transportation routes, estimate capacities, and proactively identify low
coverage areas. In this paper, we study a new kind of query-Reverse k Nearest
Neighbor Search over Trajectories (RkNNT), which can be used for route planning
and capacity estimation. Given a set of existing routes DR, a set of passenger
transitions DT, and a query route Q, a RkNNT query returns all transitions that
take Q as one of its k nearest travel routes. To solve the problem, we first
develop an index to handle dynamic trajectory updates, so that the most
up-to-date transition data are available for answering a RkNNT query. Then we
introduce a filter refinement framework for processing RkNNT queries using the
proposed indexes. Next, we show how to use RkNNT to solve the optimal route
planning problem MaxRkNNT (MinRkNNT), which is to search for the optimal route
from a start location to an end location that could attract the maximum (or
minimum) number of passengers based on a pre-defined travel distance threshold.
Experiments on real datasets demonstrate the efficiency and scalability of our
approaches. To the best of our best knowledge, this is the first work to study
the RkNNT problem for route planning.Comment: 12 page
Updatable Learned Indexes Meet Disk-Resident DBMS -- From Evaluations to Design Choices
Although many updatable learned indexes have been proposed in recent years,
whether they can outperform traditional approaches on disk remains unknown. In
this study, we revisit and implement four state-of-the-art updatable learned
indexes on disk, and compare them against the B+-tree under a wide range of
settings. Through our evaluation, we make some key observations: 1) Overall,
the B+-tree performs well across a range of workload types and datasets. 2) A
learned index could outperform B+-tree or other learned indexes on disk for a
specific workload. For example, PGM achieves the best performance in write-only
workloads while LIPP significantly outperforms others in lookup-only workloads.
We further conduct a detailed performance analysis to reveal the strengths and
weaknesses of these learned indexes on disk. Moreover, we summarize the
observed common shortcomings in five categories and propose four design
principles to guide future design of on-disk, updatable learned indexes: (1)
reducing the index's tree height, (2) better data structures to lower operation
overheads, (3) improving the efficiency of scan operations, and (4) more
efficient storage layout.Comment: 22 page
A Linear-Time Algorithm for Finding Induced Planar Subgraphs
In this paper we study the problem of efficiently and effectively extracting induced planar subgraphs. Edwards and Farr proposed an algorithm with O(mn) time complexity to find an induced planar subgraph of at least 3n/(d+1) vertices in a graph of maximum degree d. They also proposed an alternative algorithm with O(mn) time complexity to find an induced planar subgraph graph of at least 3n/(bar{d}+1) vertices, where bar{d} is the average degree of the graph. These two methods appear to be best known when d and bar{d} are small. Unfortunately, they sacrifice accuracy for lower time complexity by using indirect indicators of planarity. A limitation of those approaches is that the algorithms do not implicitly test for planarity, and the additional costs of this test can be significant in large graphs. In contrast, we propose a linear-time algorithm that finds an induced planar subgraph of n-nu vertices in a graph of n vertices, where nu denotes the total number of vertices shared by the detected Kuratowski subdivisions. An added benefit of our approach is that we are able to detect when a graph is planar, and terminate the reduction. The resulting planar subgraphs also do not have any rigid constraints on the maximum degree of the induced subgraph. The experiment results show that our method achieves better performance than current methods on graphs with small skewness
Spatial Object Recommendation with Hints: When Spatial Granularity Matters
Existing spatial object recommendation algorithms generally treat objects
identically when ranking them. However, spatial objects often cover different
levels of spatial granularity and thereby are heterogeneous. For example, one
user may prefer to be recommended a region (say Manhattan), while another user
might prefer a venue (say a restaurant). Even for the same user, preferences
can change at different stages of data exploration. In this paper, we study how
to support top-k spatial object recommendations at varying levels of spatial
granularity, enabling spatial objects at varying granularity, such as a city,
suburb, or building, as a Point of Interest (POI). To solve this problem, we
propose the use of a POI tree, which captures spatial containment relationships
between POIs. We design a novel multi-task learning model called MPR (short for
Multi-level POI Recommendation), where each task aims to return the top-k POIs
at a certain spatial granularity level. Each task consists of two subtasks: (i)
attribute-based representation learning; (ii) interaction-based representation
learning. The first subtask learns the feature representations for both users
and POIs, capturing attributes directly from their profiles. The second subtask
incorporates user-POI interactions into the model. Additionally, MPR can
provide insights into why certain recommendations are being made to a user
based on three types of hints: user-aspect, POI-aspect, and interaction-aspect.
We empirically validate our approach using two real-life datasets, and show
promising performance improvements over several state-of-the-art methods