2,892 research outputs found
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
Classification of functional brain data for multimedia retrieval
This study introduces new signal processing methods for extracting meaningful information from brain signals (functional magnetic resonance imaging and single unit recording) and proposes a content-based retrieval system for functional brain data. First, a new method that combines maximal overlapped discrete wavelet transforms (MODWT) and dynamic time warping (DTW) is presented as a solution for dynamically detecting the hemodynamic response from fMRI data. Second, a new method for neuron spike sorting is presented that uses the maximal overlap discrete wavelet transform and rotated principal component analysis. Third, a procedure to characterize firing patterns of neuron spikes from the human brain, in both the temporal domain and the frequency domain, is presented. The combination of multitaper spectral estimation and a polynomial curve-fitting method is employed to transform the firing patterns to the frequency domain. To generate temporal shapes, eight local maxima are smoothly connected by a cubic spline interpolation. A rotated principal component analysis is used to extract common firing patterns as templates from a training set of 4100 neuron spike signals. Dynamic time warping is then used to assign each neuron firing to the closest template without shift error. These techniques are utilized in the development of a content-based retrieval system for human brain data
An objective based classification of aggregation techniques for wireless sensor networks
Wireless Sensor Networks have gained immense popularity in recent years due to their ever increasing capabilities and wide range of critical applications. A huge body of research efforts has been dedicated to find ways to utilize limited resources of these sensor nodes in an efficient manner. One of the common ways to minimize energy consumption has been aggregation of input data. We note that every aggregation technique has an improvement objective to achieve with respect to the output it produces. Each technique is designed to achieve some target e.g. reduce data size, minimize transmission energy, enhance accuracy etc. This paper presents a comprehensive survey of aggregation techniques that can be used in distributed manner to improve lifetime and energy conservation of wireless sensor networks. Main contribution of this work is proposal of a novel classification of such techniques based on the type of improvement they offer when applied to WSNs. Due to the existence of a myriad of definitions of aggregation, we first review the meaning of term aggregation that can be applied to WSN. The concept is then associated with the proposed classes. Each class of techniques is divided into a number of subclasses and a brief literature review of related work in WSN for each of these is also presented
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Accurate and Efficient Private Release of Datacubes and Contingency Tables
A central problem in releasing aggregate information about sensitive data is
to do so accurately while providing a privacy guarantee on the output. Recent
work focuses on the class of linear queries, which include basic counting
queries, data cubes, and contingency tables. The goal is to maximize the
utility of their output, while giving a rigorous privacy guarantee. Most
results follow a common template: pick a "strategy" set of linear queries to
apply to the data, then use the noisy answers to these queries to reconstruct
the queries of interest. This entails either picking a strategy set that is
hoped to be good for the queries, or performing a costly search over the space
of all possible strategies.
In this paper, we propose a new approach that balances accuracy and
efficiency: we show how to improve the accuracy of a given query set by
answering some strategy queries more accurately than others. This leads to an
efficient optimal noise allocation for many popular strategies, including
wavelets, hierarchies, Fourier coefficients and more. For the important case of
marginal queries we show that this strictly improves on previous methods, both
analytically and empirically. Our results also extend to ensuring that the
returned query answers are consistent with an (unknown) data set at minimal
extra cost in terms of time and noise
- …