22 research outputs found

    Answering Top-k Queries Over a Mixture of Attractive and Repulsive Dimensions

    Full text link
    In this paper, we formulate a top-k query that compares objects in a database to a user-provided query object on a novel scoring function. The proposed scoring function combines the idea of attractive and repulsive dimensions into a general framework to overcome the weakness of traditional distance or similarity measures. We study the properties of the proposed class of scoring functions and develop efficient and scalable index structures that index the isolines of the function. We demonstrate various scenarios where the query finds application. Empirical evaluation demonstrates a performance gain of one to two orders of magnitude on querying time over existing state-of-the-art top-k techniques. Further, a qualitative analysis is performed on a real dataset to highlight the potential of the proposed query in discovering hidden data characteristics.Comment: VLDB201

    A Survey of Techniques for Answering Top-K Queries

    Get PDF
    Top-k queries are useful in retrieving top-k records from a given set of records depending on the value of a function F on their attributes. Many techniques have been proposed in database literature for answering top-k queries. These are mainly categorized into three: Sorted-list based, layer based and View based. In first category, records are sorted along each dimension and then assigned a rank to each of the records using parallel scanning method. Threshold Algorithm (TA) and Fagin2019;s Algorithm (FA) are the examples of sorted-list based category. Second category is layer based category, in which all the records are organized into layers such as in onion technique and robust indexing technique. Third category includes methods such as PREFER and LPTA (Linear Programming Adaptation of Threshold Algorithm) and processing is based on the materialized views

    Branch-and-Bound Ranked Search by Minimizing Parabolic Polynomials

    Get PDF
    The Branch-and-Bound Ranked Search algorithm (BRS) is an efficient method for answering top-k queries based on R-trees using multivariate scoring functions. To make BRS effective with ascending rankings, the algorithm must be able to identify lower bounds of the scoring functions for exploring search partitions. This paper presents BRS supporting parabolic polynomials. These functions are common to minimize combined scores over different attributes and cover a variety of applications. To the best of our knowledge the problem to develop an algorithm for computing lower bounds for the BRS method has not been well addressed yet

    Quasi-Convex Scoring Functions in Branch-and-Bound Ranked Search

    Get PDF
    For answering top-k queries in which attributes are aggregated to a scalar value for defining a ranking, usually the well-known branch-and-bound principle can be used for efficient query answering. Standard algorithms (e.g., Branch-and-Bound Ranked Search, BRS for short) require scoring functions to be monotone, such that a top-k ranking can be computed in sublinear time in the average case. If monotonicity cannot be guaranteed, efficient query answering algorithms are not known. To make branch-and-bound effective with descending or ascending rankings (maximum top-k or minimum top-k queries, respectively), BRS must be able to identify bounds for exploring search partitions, and only for monotonic ranking functions this is trivial. In this paper, we investigate the class of quasi-convex functions used for scoring objects, and we examine how bounds for exploring data partitions can correctly and efficiently be computed for quasi-convex functions in BRS for maximum top-k queries. Given that quasi-convex scoring functions can usefully be employed for ranking objects in a variety of applications, the mathematical findings presented in this paper are indeed significant for practical top-k query answering

    Efficient Evaluation of Multiple Preference Queries

    Get PDF
    Research Center, School of Information Systems, Singapore Management Universit

    Efficient All Top-k Computation - A Unified Solution for All Top-k, Reverse Top-k and Top-m Influential Queries

    Get PDF

    Optimization of multi-domain queries on the Web

    Get PDF
    Where can I attend an interesting database workshop close to a sunny beach? Who are the strongest experts on service computing based upon their recent publication record and accepted European projects? Can I spend an April week- end in a city served by a low-cost direct flight from Milano offering a Mahler's symphony? We regard the above queries as multi-domain queries, i.e., queries that can be answered by combining knowledge from two or more domains (such as: seaside locations, flights, publications, accepted projects, conference offerings, and so on). This information is avail- able on the Web, but no general-purpose software system can accept the above queries nor compute the answer. At the most, dedicated systems support specific multi-domain compositions (e.g., Google-local locates information such as restaurants and hotels upon geographic maps). This paper presents an overall framework for multi-domain queries on the Web. We address the following problems: (a) expressing multi-domain queries with an abstract formalism, (b) separating the treatment of "search" services within the model, by highlighting their dierences from "exact" Web services, (c) explaining how the same query can be mapped to multiple "query plans", i.e., a well-dened scheduling of service invocations, possibly in parallel, which complies with their access limitations and preserves the ranking order in which search services return results; (d) introducing cross- domain joins as first-class operation within plans; (e) eval- uating the query plans against several cost metrics so as to choose the most promising one for execution. This frame- work adapts to a variety of application contexts, ranging from end-user-oriented mash-up scenarios up to complex ap- plication integration scenarios

    Preference Queries in Large Multi-Cost Transportation Networks

    Get PDF
    Research on spatial network databases has so far considered that there is a single cost value associated with each road segment of the network. In most real-world situations, however, there may exist multiple cost types involved in transportation decision making. For example, the different costs of a road segment could be its Euclidean length, the driving time, the walking time, possible toll fee, etc. The relative significance of these cost types may vary from user to user. In this paper we consider such multi-cost transportation networks (MCN), where each edge (road segment) is associated with multiple cost values. We formulate skyline and top-k queries in MCNs and design algorithms for their efficient processing. Our solutions have two important properties in preference-based querying; the skyline methods are progressive and the top-k ones are incremental. The performance of our techniques is evaluated with experiments on a real road network

    A Fair Assignment Algorithm for Multiple Preference Queries

    Get PDF
    Consider an internship assignment system, where at the end of each academic year, interested university students search and apply for available positions, based on their preferences (e.g., nature of the job, salary, office location, etc). In a variety of facility, task or position assignment contexts, users have personal preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple simultaneous requests, a single object cannot be assigned to more than one users. The challenge is to compute a fair 1-1 matching between the queries and the objects. We model this as a stable-marriage problem and propose an efficient method for its processing. Our algorithm iteratively finds stable query-object pairs and removes them from the problem. At its core lies a novel skyline maintenance technique, which we prove to be I/O optimal. We conduct an extensive experimental evaluation using real and synthetic data, which demonstrates that our approach outperforms adaptations of previous methods by several orders of magnitude