89 research outputs found

    Supporting Multi-Criteria Decision Support Queries over Disparate Data Sources

    Get PDF
    In the era of big data revolution, marked by an exponential growth of information, extracting value from data enables analysts and businesses to address challenging problems such as drug discovery, fraud detection, and earthquake predictions. Multi-Criteria Decision Support (MCDS) queries are at the core of big-data analytics resulting in several classes of MCDS queries such as OLAP, Top-K, Pareto-optimal, and nearest neighbor queries. The intuitive nature of specifying multi-dimensional preferences has made Pareto-optimal queries, also known as skyline queries, popular. Existing skyline algorithms however do not address several crucial issues such as performing skyline evaluation over disparate sources, progressively generating skyline results, or robustly handling workload with multiple skyline over join queries. In this dissertation we thoroughly investigate topics in the area of skyline-aware query evaluation. In this dissertation, we first propose a novel execution framework called SKIN that treats skyline over joins as first class citizens during query processing. This is in contrast to existing techniques that treat skylines as an add-on, loosely integrated with query processing by being placed on top of the query plan. SKIN is effective in exploiting the skyline characteristics of the tuples within individual data sources as well as across disparate sources. This enables SKIN to significantly reduce two primary costs, namely the cost of generating the join results and the cost of skyline comparisons to compute the final results. Second, we address the crucial business need to report results early; as soon as they are being generated so that users can formulate competitive decisions in near real-time. On top of SKIN, we built a progressive query evaluation framework ProgXe to transform the execution of queries involving skyline over joins to become non-blocking, i.e., to be progressively generating results early and often. By exploiting SKIN\u27s principle of processing query at multiple levels of abstraction, ProgXe is able to: (1) extract the output dependencies in the output spaces by analyzing both the input and output space, and (2) exploit this knowledge of abstract-level relationships to guarantee correctness of early output. Third, real-world applications handle query workloads with diverse Quality of Service (QoS) requirements also referred to as contracts. Time sensitive queries, such as fraud detection, require results to progressively output with minimal delay, while ad-hoc and reporting queries can tolerate delay. In this dissertation, by building on the principles of ProgXe we propose the Contract-Aware Query Execution (CAQE) framework to support the open problem of contract driven multi-query processing. CAQE employs an adaptive execution strategy to continuously monitor the run-time satisfaction of queries and aggressively take corrective steps whenever the contracts are not being met. Lastly, to elucidate the portability of the core principle of this dissertation, the reasoning and query processing at different levels of data abstraction, we apply them to solve an orthogonal research question to auto-generate recommendation queries that facilitate users in exploring a complex database system. User queries are often too strict or too broad requiring a frustrating trial-and-error refinement process to meet the desired result cardinality while preserving original query semantics. Based on the principles of SKIN, we propose CAPRI to automatically generate refined queries that: (1) attain the desired cardinality and (2) minimize changes to the original query intentions. In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in both efficiency, as well as resource consumption

    Doctor of Philosophy

    Get PDF
    dissertationWe are living in an age where data are being generated faster than anyone has previously imagined across a broad application domain, including customer studies, social media, sensor networks, and the sciences, among many others. In some cases, data are generated in massive quantities as terabytes or petabytes. There have been numerous emerging challenges when dealing with massive data, including: (1) the explosion in size of data; (2) data have increasingly more complex structures and rich semantics, such as representing temporal data as a piecewise linear representation; (3) uncertain data are becoming a common occurrence for numerous applications, e.g., scientific measurements or observations such as meteorological measurements; (4) and data are becoming increasingly distributed, e.g., distributed data collected and integrated from distributed locations as well as data stored in a distributed file system within a cluster. Due to the massive nature of modern data, it is oftentimes infeasible for computers to efficiently manage and query them exactly. An attractive alternative is to use data summarization techniques to construct data summaries, where even efficiently constructing data summaries is a challenging task given the enormous size of data. The data summaries we focus on in this thesis include the histogram and ranking operator. Both data summaries enable us to summarize a massive dataset to a more succinct representation which can then be used to make queries orders of magnitude more efficient while still allowing approximation guarantees on query answers. Our study has focused on the critical task of designing efficient algorithms to summarize, query, and manage massive data

    Mining and Managing User-Generated Content and Preferences

    Get PDF
    Ιn this thesis, we present techniques to manage the results of expressive queries, such as skyline, and mine online content that has been generated by users. Given the numerous scenarios and applications where content mining can be applied, we focus, in particular, to two cases: review mining and social media analysis. More specifically, we focus on preference queries, where users can query a set of items, each associated with an attribute set. For each of the attributes, users can specify their preference on whether to minimize or maximize it, e.g., "minimize price", "maximize performance", etc. Such queries are also know as "pareto optimal", or "skyline queries". A drawback of this query type is that the result may become too large for the user to inspect manually. We propose an approach that addresses this issue, by selecting a set of diverse skyline results. We provide a formal definition of skyline diversification and present efficient techniques to return such a set of points. The result can then be ranked according to established quality criteria. We also propose an alternative scheme for ranking skyline results, following an information retrieval approach

    Proteomic insights into the modulation of foetal neurogenesis by the anti-retroviral efavirenz

    Get PDF
    Background: South African guidelines recommend that HIV-positive pregnant women immediately initiate antiretroviral therapy (efavirenz, emtricitabine, and tenofovir), regardless of trimester. Efavirenz causes central nervous system neuropathy and has been linked to birth defects such as encephalocoele. Cohort studies of HIV-uninfected children exposed to antiretroviral treatment in utero report minor learning delays but are inconclusive. Non-transformed human derived neuroepithelial stem (NES) represent a unique pre-clinical model in which to investigate the effects of efavirenz on the developing neural system. Efavirenz-induced global cellular molecular changes may be characterised using mass spectrometry (MS). Aims: To optimise an MS-based efavirenz extraction and detection assay, and to investigate efavirenzinduced NES proteomic responses. Methods: A TSQ Vantage triple quadrupole mass spectrometer was employed to optimise targeted detection of efavirenz extracted from cultured cells and supernatant. Cells were cultured for 72 hours, incorporating a 24-hourly efavirenz treatment. Efavirenz concentration dynamics were assessed over this period, and cells were harvested every 24 hours for discovery proteomic analysis using a Q-Exactive quadrupole-Orbitrap mass spectrometer. Results: Drug extraction with acetonitrile was selected as the optimal extraction and detection technique. In cell culture, efavirenz concentration increased after 24 hours and decreased after 48 hours. A total of 1663 protein groups were identified, with 26, 39, and 80 protein groups differentially expressed 24, 48, and 72 hours respectively post EFV treatment. The most significantly enriched deregulated pathways included cholesterol biosynthesis, mRNA splicing, and JAK/STAT and Wnt signalling. Conclusions: Efavirenz-altered protein expression reflects functional pathway perturbations, which may contribute to clinically-observed neurological effects. Orthoganal and in vivo confirmation is required

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    South Manti Timber Salvage Draft Environmental Impact Statement

    Get PDF
    The South Manti project area is located approximately 45 miles southwest of Price, Utah. The project area consists of approximately 24,597 acres of National Forest System lands within the southern portion of the Wasatch Plateau (Townships 19, 20, and 21 South; Range 4 East; SLM). This project was initiated in response to epidemic spruce beetle (Dendroctonus rufipennis) activity across the South Manti landscape. Extensive Engelmann spruce mortality has occurred as the result of epidemic spruce beetle populations. Representing over 10,000 acres, most of the spruce trees in the project area\u27s Engelmann spruce-Subalpine fir cover type are dead or dying (70% of the spruce trees greater than 5 inches in diameter are dead, 90% of the spruce trees greater than 11 inches in diameter are dead). This Draft Environmental Impact Statement summarizes the analysis that was completed on the resulting alternatives considered for timber salvage harvest and related activities such as road work, road rehabilitation, and reforestation in the project area. This Draft Environmental Impact Statement also discloses the association of each alternative to the Agency\u27s final interim rule of March 1, 1999, which temporarily suspends decisionmaking on road construction and reconstruction in many unroaded areas within the National Forest System until a revised policy is issued or 18 months from the effective rule date, whichever is sooner. The disclosure of information in the Draft Environmental Impact Statement is intended to provide a meaningful basis for public review and comment
    • …
    corecore