Search CORE

719 research outputs found

Diamond Dicing

Author: Antony
Bouman
Börzsönyi
Cerf
Daniel Lemire
Donjerkovic
Engene
Fang
Frank
Godin
Hahn
Hazel Webb
Kaser
Knorr
Kondo
Korn
Kumar
Lemire
Ley
Mazón
MonetDB BV
Netflix Inc.
Ng
O'Neil
Owen Kaser
Porter
Rizzi
Sarawagi
Tang
Transaction Processing Performance Council
Turney
Webb
Webb
Wille
Ślezak
Publication venue: 'Elsevier BV'
Publication date: 01/09/2013
Field of study

In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

arXiv.org e-Print Archive

R-libre

Crossref

EFFICIENT SKYLINE SYSTEM DEVELOPMENT FOR NORMAL AND HIDDEN DATABASES: APPLICATION FOR GOOGLE FLIGHTS

Author: ADAM GEORGES J.
Publication venue
Publication date: 01/01/2018
Field of study

Deep web databases provide strict search interface and limited web access with top-k results based on a pre-defined ranking function. However, top-k results may not be suitable for multi-criteria decision making because of the variety in preferences. To make the results more relevant to such a decision maker, skyline records were introduced, and as per definition these records are not dominated by any other record such that a record dominates another if it is better or as good as other for all attributes and better in at least one attribute. In this report, we introduce an algorithm for discovering skyline records from hidden databases using different multi-objective attributes on a real-world database. We predicted a new lower bound for the minimum issued number of queries to extract the skyline. This was supported by our algorithm which accomplished the above task in an efficient manner including the worst-case scenario hence proving our theory via running rigorous experiments on a hidden database given the limitations on hand.This contribution was made possible by NPRP grant #07- 794-1-145 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors

Qatar University Institutional Repository

The right expert at the right time and place: From expertise identification to expertise selection

Author: Apers Peter
Bunningen Arthur van
Evers Sander
Feng Ling
Fokkinga Maarten
Heerde Harold van
Hiemstra Djoerd
Serdyukov Pavel
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

We propose a unified and complete solution for expert finding in organizations, including not only expertise identification, but also expertise selection functionality. The latter two include the use of implicit and explicit preferences of users on meeting each other, as well as localization and planning as important auxiliary processes. We also propose a solution for privacy protection, which is urgently required in view of the huge amount of privacy sensitive data involved. Various parts are elaborated elsewhere, and we look forward to a realization and usage of the proposed system as a whole

Radboud Repository

University of Twente Research Information

Recommended from our members

Complex Query Operators on Modern Parallel Architectures

Author: Zois Vasileios
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators

eScholarship - University of California

Integration of Skyline Queries into Spark SQL

Author: Grasmann Lukas
Pichler Reinhard
Selzer Alexander
Publication venue
Publication date: 07/10/2022
Field of study

Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL

arXiv.org e-Print Archive

Maintaining sliding window skylines on data streams

Author: Dimitris Papadias
Yufei Tao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Mining and Managing User-Generated Content and Preferences

Author: Valkanas Georgios
Publication venue
Publication date: 01/08/2014
Field of study

Ιn this thesis, we present techniques to manage the results of expressive queries, such as skyline, and mine online content that has been generated by users. Given the numerous scenarios and applications where content mining can be applied, we focus, in particular, to two cases: review mining and social media analysis. More specifically, we focus on preference queries, where users can query a set of items, each associated with an attribute set. For each of the attributes, users can specify their preference on whether to minimize or maximize it, e.g., "minimize price", "maximize performance", etc. Such queries are also know as "pareto optimal", or "skyline queries". A drawback of this query type is that the result may become too large for the user to inspect manually. We propose an approach that addresses this issue, by selecting a set of diverse skyline results. We provide a formal definition of skyline diversification and present efficient techniques to return such a set of points. The result can then be ranked according to established quality criteria. We also propose an alternative scheme for ranking skyline results, following an information retrieval approach

Digital Repository of Hellenic Managing Authority of the Operational Programme "Education and Lifelong Learning" (EDULLL)

Melody retrieval on the Web

Author: Chai Wei, 1972-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2001
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.Includes bibliographical references (p. 87-90).The emergence of digital music on the Internet requires new information retrieval methods adapted to specific characteristics and needs. While music retrieval based on the text information, such as title, composers, or subject classification, has been implemented in many existing systems, retrieval of a piece of music based on musical content, especially an incomplete, imperfect recall of a fragment of the music, has not yet been fully explored. This thesis will explore both theoretical and practical issues involved in a web-based melody retrieval system. I built a query-by-humming system, which can find a piece of music in the digital music repository based on a few hummed notes. Since an input query (hummed melody) may have various errors due to uncertainty of the user's memory or the user's singing ability, the system should be able to tolerate errors. Furthermore, extracting melodies to build a melody database is also a complicated task. Therefore, melody representation, query construction, melody matching and melody extraction are critical for an efficient and robust query-by-humming system. Thus, these are the main tasks to be addressed in the thesis. Compared to previous systems, a new and more effective melody representation and corresponding matching methods which combined both pitch and rhythmic information were adopted, a whole set of tools and deliverable software were implemented, and experiments were conducted to evaluate the system performance as well as to explore other melody perception issues. Experimental results demonstrate that our methods incorporating rhythmic information rather than previous pitch-only methods did help improving the effectiveness of a query-by-humming system.by Wei Chai.S.M

DSpace@MIT