19 research outputs found
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
Data Centric Sensor Stream Reduction for Real-Time Applications in Wireless Sensor Networks
This work presents a data-centric strategy to meet deadlines in soft real-time applications in wireless sensor networks. This strategy considers three main aspects: (i) The design of real-time application to obtain the minimum deadlines; (ii) An analytic model to estimate the ideal sample size used by data-reduction algorithms; and (iii) Two data-centric stream-based sampling algorithms to perform data reduction whenever necessary. Simulation results show that our data-centric strategies meet deadlines without loosing data representativeness