1,370 research outputs found
Efficient Computation of Subspace Skyline over Categorical Domains
Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed
the way we search for accommodation, restaurants, etc. The underlying datasets
in such applications have numerous attributes that are mostly Boolean or
Categorical. Discovering the skyline of such datasets over a subset of
attributes would identify entries that stand out while enabling numerous
applications. There are only a few algorithms designed to compute the skyline
over categorical attributes, yet are applicable only when the number of
attributes is small.
In this paper, we place the problem of skyline discovery over categorical
attributes into perspective and design efficient algorithms for two cases. (i)
In the absence of indices, we propose two algorithms, ST-S and ST-P, that
exploits the categorical characteristics of the datasets, organizing tuples in
a tree data structure, supporting efficient dominance tests over the candidate
set. (ii) We then consider the existence of widely used precomputed sorted
lists. After discussing several approaches, and studying their limitations, we
propose TA-SKY, a novel threshold style algorithm that utilizes sorted lists.
Moreover, we further optimize TA-SKY and explore its progressive nature, making
it suitable for applications with strict interactive requirements. In addition
to the extensive theoretical analysis of the proposed algorithms, we conduct a
comprehensive experimental evaluation of the combination of real (including the
entire AirBnB data collection) and synthetic datasets to study the practicality
of the proposed algorithms. The results showcase the superior performance of
our techniques, outperforming applicable approaches by orders of magnitude
Integrating OLAP and Ranking: The Ranking-Cube Methodology
Recent years have witnessed an enormous growth of data in business, industry, and Web applications. Database search often returns a large collection of results, which poses challenges to both efficient query processing and effective digest of the query results. To address this problem, ranked search has been introduced to database systems. We study the problem of On-Line Analytical Processing (OLAP) of ranked queries, where ranked queries are conducted in the arbitrary subset of data defined by multi-dimensional selections. While pre-computation and multi-dimensional aggregation is the standard solution for OLAP, materializing dynamic ranking results is unrealistic because the ranking criteria are not known until the query time. To overcome such difficulty, we develop a new ranking cube method that performs semi on-line materialization and semi online computation in this thesis. Its complete life cycle, including cube construction, incremental maintenance, and query processing, is also discussed. We further extend the ranking cube in three dimensions. First, how to answer queries in high-dimensional data. Second, how to answer queries which involves joins over multiple relations. Third, how to answer general preference queries (besides ranked queries, such as skyline queries). Our performance studies show that ranking-cube is orders of magnitude faster than previous approaches
- …