719 research outputs found

    Diamond Dicing

    Get PDF
    In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

    EFFICIENT SKYLINE SYSTEM DEVELOPMENT FOR NORMAL AND HIDDEN DATABASES: APPLICATION FOR GOOGLE FLIGHTS

    Get PDF
    Deep web databases provide strict search interface and limited web access with top-k results based on a pre-defined ranking function. However, top-k results may not be suitable for multi-criteria decision making because of the variety in preferences. To make the results more relevant to such a decision maker, skyline records were introduced, and as per definition these records are not dominated by any other record such that a record dominates another if it is better or as good as other for all attributes and better in at least one attribute. In this report, we introduce an algorithm for discovering skyline records from hidden databases using different multi-objective attributes on a real-world database. We predicted a new lower bound for the minimum issued number of queries to extract the skyline. This was supported by our algorithm which accomplished the above task in an efficient manner including the worst-case scenario hence proving our theory via running rigorous experiments on a hidden database given the limitations on hand.This contribution was made possible by NPRP grant #07- 794-1-145 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors

    The right expert at the right time and place: From expertise identification to expertise selection

    Get PDF
    We propose a unified and complete solution for expert finding in organizations, including not only expertise identification, but also expertise selection functionality. The latter two include the use of implicit and explicit preferences of users on meeting each other, as well as localization and planning as important auxiliary processes. We also propose a solution for privacy protection, which is urgently required in view of the huge amount of privacy sensitive data involved. Various parts are elaborated elsewhere, and we look forward to a realization and usage of the proposed system as a whole

    Integration of Skyline Queries into Spark SQL

    Full text link
    Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL

    Maintaining sliding window skylines on data streams

    Full text link

    Mining and Managing User-Generated Content and Preferences

    Get PDF
    Ιn this thesis, we present techniques to manage the results of expressive queries, such as skyline, and mine online content that has been generated by users. Given the numerous scenarios and applications where content mining can be applied, we focus, in particular, to two cases: review mining and social media analysis. More specifically, we focus on preference queries, where users can query a set of items, each associated with an attribute set. For each of the attributes, users can specify their preference on whether to minimize or maximize it, e.g., "minimize price", "maximize performance", etc. Such queries are also know as "pareto optimal", or "skyline queries". A drawback of this query type is that the result may become too large for the user to inspect manually. We propose an approach that addresses this issue, by selecting a set of diverse skyline results. We provide a formal definition of skyline diversification and present efficient techniques to return such a set of points. The result can then be ranked according to established quality criteria. We also propose an alternative scheme for ranking skyline results, following an information retrieval approach

    Melody retrieval on the Web

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.Includes bibliographical references (p. 87-90).The emergence of digital music on the Internet requires new information retrieval methods adapted to specific characteristics and needs. While music retrieval based on the text information, such as title, composers, or subject classification, has been implemented in many existing systems, retrieval of a piece of music based on musical content, especially an incomplete, imperfect recall of a fragment of the music, has not yet been fully explored. This thesis will explore both theoretical and practical issues involved in a web-based melody retrieval system. I built a query-by-humming system, which can find a piece of music in the digital music repository based on a few hummed notes. Since an input query (hummed melody) may have various errors due to uncertainty of the user's memory or the user's singing ability, the system should be able to tolerate errors. Furthermore, extracting melodies to build a melody database is also a complicated task. Therefore, melody representation, query construction, melody matching and melody extraction are critical for an efficient and robust query-by-humming system. Thus, these are the main tasks to be addressed in the thesis. Compared to previous systems, a new and more effective melody representation and corresponding matching methods which combined both pitch and rhythmic information were adopted, a whole set of tools and deliverable software were implemented, and experiments were conducted to evaluate the system performance as well as to explore other melody perception issues. Experimental results demonstrate that our methods incorporating rhythmic information rather than previous pitch-only methods did help improving the effectiveness of a query-by-humming system.by Wei Chai.S.M
    • 

    corecore