17,383 research outputs found
How to evaluate multiple range-sum queries progressively
Decision support system users typically submit batches of range-sum queries simultaneously rather than issuing individual, unrelated queries. We propose a wavelet based technique that exploits I/O sharing across a query batch to evaluate the set of queries progressively and efficiently. The challenge is that now controlling the structure of errors across query results becomes more critical than minimizing error per individual query. Consequently, we define a class of structural error penalty functions and show how they are controlled by our technique. Experiments demonstrate that our technique is efficient as an exact algorithm, and the progressive estimates are accurate, even after less than one I/O per query
Progressive Processing of Continuous Range Queries in Hierarchical Wireless Sensor Networks
In this paper, we study the problem of processing continuous range queries in
a hierarchical wireless sensor network. Contrasted with the traditional
approach of building networks in a "flat" structure using sensor devices of the
same capability, the hierarchical approach deploys devices of higher capability
in a higher tier, i.e., a tier closer to the server. While query processing in
flat sensor networks has been widely studied, the study on query processing in
hierarchical sensor networks has been inadequate. In wireless sensor networks,
the main costs that should be considered are the energy for sending data and
the storage for storing queries. There is a trade-off between these two costs.
Based on this, we first propose a progressive processing method that
effectively processes a large number of continuous range queries in
hierarchical sensor networks. The proposed method uses the query merging
technique proposed by Xiang et al. as the basis and additionally considers the
trade-off between the two costs. More specifically, it works toward reducing
the storage cost at lower-tier nodes by merging more queries, and toward
reducing the energy cost at higher-tier nodes by merging fewer queries (thereby
reducing "false alarms"). We then present how to build a hierarchical sensor
network that is optimal with respect to the weighted sum of the two costs. It
allows for a cost-based systematic control of the trade-off based on the
relative importance between the storage and energy in a given network
environment and application. Experimental results show that the proposed method
achieves a near-optimal control between the storage and energy and reduces the
cost by 0.989~84.995 times compared with the cost achieved using the flat
(i.e., non-hierarchical) setup as in the work by Xiang et al.Comment: 41 pages, 20 figure
PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation
Online aggregation provides estimates to the final result of a computation
during the actual processing. The user can stop the computation as soon as the
estimate is accurate enough, typically early in the execution. This allows for
the interactive data exploration of the largest datasets. In this paper we
introduce the first framework for parallel online aggregation in which the
estimation virtually does not incur any overhead on top of the actual
execution. We define a generic interface to express any estimation model that
abstracts completely the execution details. We design a novel estimator
specifically targeted at parallel online aggregation. When executed by the
framework over a massive TPC-H instance, the estimator provides
accurate confidence bounds early in the execution even when the cardinality of
the final result is seven orders of magnitude smaller than the dataset size and
without incurring overhead.Comment: 36 page
QueryOR: a comprehensive web platform for genetic variant analysis and prioritization
Background: Whole genome and exome sequencing are contributing to the extraordinary progress in the study of
human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate
data analysis.
Results: Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well
as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive,
flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying
formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future
expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features.
Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a
global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy
to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing
variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts.
Conclusions: QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant
prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/
Progressive Simplification of Polygonal Curves
Simplifying polygonal curves at different levels of detail is an important
problem with many applications. Existing geometric optimization algorithms are
only capable of minimizing the complexity of a simplified curve for a single
level of detail. We present an -time algorithm that takes a polygonal
curve of n vertices and produces a set of consistent simplifications for m
scales while minimizing the cumulative simplification complexity. This
algorithm is compatible with distance measures such as the Hausdorff, the
Fr\'echet and area-based distances, and enables simplification for continuous
scaling in time. To speed up this algorithm in practice, we present
new techniques for constructing and representing so-called shortcut graphs.
Experimental evaluation of these techniques on trajectory data reveals a
significant improvement of using shortcut graphs for progressive and
non-progressive curve simplification, both in terms of running time and memory
usage.Comment: 20 pages, 20 figure
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
- …