17,383 research outputs found

    How to evaluate multiple range-sum queries progressively

    Get PDF
    Decision support system users typically submit batches of range-sum queries simultaneously rather than issuing individual, unrelated queries. We propose a wavelet based technique that exploits I/O sharing across a query batch to evaluate the set of queries progressively and efficiently. The challenge is that now controlling the structure of errors across query results becomes more critical than minimizing error per individual query. Consequently, we define a class of structural error penalty functions and show how they are controlled by our technique. Experiments demonstrate that our technique is efficient as an exact algorithm, and the progressive estimates are accurate, even after less than one I/O per query

    Progressive Processing of Continuous Range Queries in Hierarchical Wireless Sensor Networks

    Full text link
    In this paper, we study the problem of processing continuous range queries in a hierarchical wireless sensor network. Contrasted with the traditional approach of building networks in a "flat" structure using sensor devices of the same capability, the hierarchical approach deploys devices of higher capability in a higher tier, i.e., a tier closer to the server. While query processing in flat sensor networks has been widely studied, the study on query processing in hierarchical sensor networks has been inadequate. In wireless sensor networks, the main costs that should be considered are the energy for sending data and the storage for storing queries. There is a trade-off between these two costs. Based on this, we first propose a progressive processing method that effectively processes a large number of continuous range queries in hierarchical sensor networks. The proposed method uses the query merging technique proposed by Xiang et al. as the basis and additionally considers the trade-off between the two costs. More specifically, it works toward reducing the storage cost at lower-tier nodes by merging more queries, and toward reducing the energy cost at higher-tier nodes by merging fewer queries (thereby reducing "false alarms"). We then present how to build a hierarchical sensor network that is optimal with respect to the weighted sum of the two costs. It allows for a cost-based systematic control of the trade-off based on the relative importance between the storage and energy in a given network environment and application. Experimental results show that the proposed method achieves a near-optimal control between the storage and energy and reduces the cost by 0.989~84.995 times compared with the cost achieved using the flat (i.e., non-hierarchical) setup as in the work by Xiang et al.Comment: 41 pages, 20 figure

    PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation

    Full text link
    Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. This allows for the interactive data exploration of the largest datasets. In this paper we introduce the first framework for parallel online aggregation in which the estimation virtually does not incur any overhead on top of the actual execution. We define a generic interface to express any estimation model that abstracts completely the execution details. We design a novel estimator specifically targeted at parallel online aggregation. When executed by the framework over a massive 8TB8\text{TB} TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and without incurring overhead.Comment: 36 page

    QueryOR: a comprehensive web platform for genetic variant analysis and prioritization

    Get PDF
    Background: Whole genome and exome sequencing are contributing to the extraordinary progress in the study of human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate data analysis. Results: Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features. Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts. Conclusions: QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/

    Progressive Simplification of Polygonal Curves

    Get PDF
    Simplifying polygonal curves at different levels of detail is an important problem with many applications. Existing geometric optimization algorithms are only capable of minimizing the complexity of a simplified curve for a single level of detail. We present an O(n3m)O(n^3m)-time algorithm that takes a polygonal curve of n vertices and produces a set of consistent simplifications for m scales while minimizing the cumulative simplification complexity. This algorithm is compatible with distance measures such as the Hausdorff, the Fr\'echet and area-based distances, and enables simplification for continuous scaling in O(n5)O(n^5) time. To speed up this algorithm in practice, we present new techniques for constructing and representing so-called shortcut graphs. Experimental evaluation of these techniques on trajectory data reveals a significant improvement of using shortcut graphs for progressive and non-progressive curve simplification, both in terms of running time and memory usage.Comment: 20 pages, 20 figure

    XWeB: the XML Warehouse Benchmark

    Full text link
    With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems
    • …
    corecore