128 research outputs found

    Skyline/Preference query processing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Efficient indexing for skyline queries with partially ordered domains

    Get PDF
    Master'sMASTER OF SCIENC

    Supporting Multi-Criteria Decision Support Queries over Disparate Data Sources

    Get PDF
    In the era of big data revolution, marked by an exponential growth of information, extracting value from data enables analysts and businesses to address challenging problems such as drug discovery, fraud detection, and earthquake predictions. Multi-Criteria Decision Support (MCDS) queries are at the core of big-data analytics resulting in several classes of MCDS queries such as OLAP, Top-K, Pareto-optimal, and nearest neighbor queries. The intuitive nature of specifying multi-dimensional preferences has made Pareto-optimal queries, also known as skyline queries, popular. Existing skyline algorithms however do not address several crucial issues such as performing skyline evaluation over disparate sources, progressively generating skyline results, or robustly handling workload with multiple skyline over join queries. In this dissertation we thoroughly investigate topics in the area of skyline-aware query evaluation. In this dissertation, we first propose a novel execution framework called SKIN that treats skyline over joins as first class citizens during query processing. This is in contrast to existing techniques that treat skylines as an add-on, loosely integrated with query processing by being placed on top of the query plan. SKIN is effective in exploiting the skyline characteristics of the tuples within individual data sources as well as across disparate sources. This enables SKIN to significantly reduce two primary costs, namely the cost of generating the join results and the cost of skyline comparisons to compute the final results. Second, we address the crucial business need to report results early; as soon as they are being generated so that users can formulate competitive decisions in near real-time. On top of SKIN, we built a progressive query evaluation framework ProgXe to transform the execution of queries involving skyline over joins to become non-blocking, i.e., to be progressively generating results early and often. By exploiting SKIN\u27s principle of processing query at multiple levels of abstraction, ProgXe is able to: (1) extract the output dependencies in the output spaces by analyzing both the input and output space, and (2) exploit this knowledge of abstract-level relationships to guarantee correctness of early output. Third, real-world applications handle query workloads with diverse Quality of Service (QoS) requirements also referred to as contracts. Time sensitive queries, such as fraud detection, require results to progressively output with minimal delay, while ad-hoc and reporting queries can tolerate delay. In this dissertation, by building on the principles of ProgXe we propose the Contract-Aware Query Execution (CAQE) framework to support the open problem of contract driven multi-query processing. CAQE employs an adaptive execution strategy to continuously monitor the run-time satisfaction of queries and aggressively take corrective steps whenever the contracts are not being met. Lastly, to elucidate the portability of the core principle of this dissertation, the reasoning and query processing at different levels of data abstraction, we apply them to solve an orthogonal research question to auto-generate recommendation queries that facilitate users in exploring a complex database system. User queries are often too strict or too broad requiring a frustrating trial-and-error refinement process to meet the desired result cardinality while preserving original query semantics. Based on the principles of SKIN, we propose CAPRI to automatically generate refined queries that: (1) attain the desired cardinality and (2) minimize changes to the original query intentions. In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in both efficiency, as well as resource consumption

    Skyline queries in dynamic environments

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Skyline Query Processing

    Get PDF
    This thesis deals with a special subset of multi-dimensional set of points, called the Skyline. These points are the maxima or minima of the complete set and are of special interest for the field of decision support. Coming from basic algorithms for computing the Skyline we will develop ideas and algorithms for "on-the-fly" or online computation of the Skyline. We will also extend the concept of Skyline with new application domains leading us to user profiling with the help of Skyline

    Efficient Algorithms for Similarity and Skyline Summary on Multidimensional Datasets.

    Full text link
    Efficient management of large multidimensional datasets has attracted much attention in the database research community. Such large multidimensional datasets are common and efficient algorithms are needed for analyzing these data sets for a variety of applications. In this thesis, we focus our study on two very common classes of analysis: similarity and skyline summarization. We first focus on similarity when one of the dimensions in the multidimensional dataset is temporal. We then develop algorithms for evaluating skyline summaries effectively for both temporal and low-cardinality attribute domain datasets and propose different methods for improving the effectiveness of the skyline summary operation. This thesis begins by studying similarity measures for time-series datasets and efficient algorithms for time-series similarity evaluation. The first contribution of this thesis is a new algorithm which can be used to evaluate similarity methods whose matching criteria is bounded by a specified threshold value. The second contribution of this thesis is the development of a new time-interval skyline operator, which continuously computes the current skyline over a data stream. We present a new algorithm called LookOut for evaluating such queries efficiently, and empirically demonstrate the scalability of this algorithm. Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution. The third contribution of this thesis is a novel technique called the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from low-cardinality domains. The utility of the skyline as a data summarization technique is often diminished by the volume of points in the skyline The final contribution of this thesis is a novel scheme which remedies the skyline volume problem by ranking the elements of the skyline based on their importance to the skyline summary. Collectively, the techniques described in this thesis present efficient methods for two common and computationally intensive analysis operations on large multidimensional datasets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57643/2/mmorse_1.pd

    Representing and reasoning with qualitative preferences for compositional systems

    Get PDF
    Many applications call for techniques for representing and reasoning about preferences, i.e., relative desirability over a set of alternatives. Preferences over the alternatives are typically derived from preferences with respect to the various attributes of the alternatives (e.g., a student\u27s preference for one course over another may be influenced by his preference for the topic, the time of the day when the course is offered, etc.). Such preferences are often qualitative and conditional. When the alternatives are expressed as tuples of valuations of the relevant attributes, preferences between alternatives can often be expressed in the form of (a) preferences over the values of each attribute, and (b) relative importance of certain attributes over others. An important problem in reasoning with multi-attribute qualitative preferences is dominance testing, i.e., to find if one alternative (assignment to all attributes) is preferred over another. This problem is hard (PSPACE-complete) in general for well known qualitative conditional preference languages such as TCP-nets. We provide two practical approaches to dominance testing. First, we study a restricted unconditional preference language, and provide a dominance relation that can be computed in polynomial time by evaluating the satisfiability of an appropriately constructed logic formula. Second, we show how to reduce dominance testing for TCP-nets to reachability analysis in an induced preference graph. We provide an encoding of TCP-nets in the form of a Kripke structure for CTL. We show how to compute dominance using NuSMV, a model checker for CTL. We address the problem of identifying a preferred outcome in a setting where the outcomes or alternatives to be compared are composite in nature (i.e., collections of components that satisfy certain functional requirements). We define a dominance relation that allows us to compare collections of objects in terms of preferences over attributes of the objects that make up the collection, and show that the dominance relation is a strict partial order under certain conditions. We provide algorithms that use this dominance relation to identify only (sound), all (complete), or at least one (weakly complete) of the most preferred collections. We establish some key properties of the dominance relation and analyze the quality of solutions produced by the algorithms. We present results of simulation experiments aimed at comparing the algorithms, and report interesting conjectures and results that were derived from our analysis. Finally, we show how the above formalism and algorithms can be used in preference-based service composition, substitution, and adaptation
    corecore