Search CORE

2,892 research outputs found

Structure-Aware Sampling: Flexible and Accurate Summarization

Author: Cohen Edith
Cormode Graham
Duffield Nick
Publication venue
Publication date: 01/01/2011
Field of study

In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

arXiv.org e-Print Archive

CiteSeerX

On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

Author: Ham DA
Heinis T
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2015
Field of study

Spiral - Imperial College Digital Repository

Classification of functional brain data for multimedia retrieval

Author: Cho Hansang
Publication venue
Publication date: 01/01/2005
Field of study

This study introduces new signal processing methods for extracting meaningful information from brain signals (functional magnetic resonance imaging and single unit recording) and proposes a content-based retrieval system for functional brain data. First, a new method that combines maximal overlapped discrete wavelet transforms (MODWT) and dynamic time warping (DTW) is presented as a solution for dynamically detecting the hemodynamic response from fMRI data. Second, a new method for neuron spike sorting is presented that uses the maximal overlap discrete wavelet transform and rotated principal component analysis. Third, a procedure to characterize firing patterns of neuron spikes from the human brain, in both the temporal domain and the frequency domain, is presented. The combination of multitaper spectral estimation and a polynomial curve-fitting method is employed to transform the firing patterns to the frequency domain. To generate temporal shapes, eight local maxima are smoothly connected by a cubic spline interpolation. A rotated principal component analysis is used to extract common firing patterns as templates from a training set of 4100 neuron spike signals. Dynamic time warping is then used to assign each neuron firing to the closest template without shift error. These techniques are utilized in the development of a content-based retrieval system for human brain data

University of Washington Structural Informatics Group Publications

DSpace at The University of Washington

An objective based classification of aggregation techniques for wireless sensor networks

Author: B. Krishnamachari
B. Sinopoli
B.V. Dasarathy
C. Intanagonwiwat
C.R. Vivek Mhatre
D.L. Hall
E. Fasolo
E.F. Nakamura
E.F. Nakamura
H. Cam
J. Hellerstein
K. Kalpakis
L. Wald
O. Ozdemir
P.F. Swaszek
R. Olfati-Saber
R. Rajagopalan
S. Li
S. Madden
T.Q.S. Quek
Y. Borgne Le
Y. Yao
Z. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Wireless Sensor Networks have gained immense popularity in recent years due to their ever increasing capabilities and wide range of critical applications. A huge body of research efforts has been dedicated to find ways to utilize limited resources of these sensor nodes in an efficient manner. One of the common ways to minimize energy consumption has been aggregation of input data. We note that every aggregation technique has an improvement objective to achieve with respect to the output it produces. Each technique is designed to achieve some target e.g. reduce data size, minimize transmission energy, enhance accuracy etc. This paper presents a comprehensive survey of aggregation techniques that can be used in distributed manner to improve lifetime and energy conservation of wireless sensor networks. Main contribution of this work is proposal of a novel classification of such techniques based on the type of improvement they offer when applied to WSNs. Due to the existence of a myriad of definitions of aggregation, we first review the meaning of term aggregation that can be applied to WSN. The concept is then associated with the proposed classes. Each class of techniques is divided into a number of subclasses and a brief literature review of related work in WSN for each of these is also presented

Southampton (e-Prints Soton)

Crossref

Digital Image Access & Retrieval

Author: Heidorn P. Bryan
Sandore Beth
Publication venue: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Publication date: 01/01/1997
Field of study

The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Accurate and Efficient Private Release of Datacubes and Contingency Tables

Author: Cormode Graham
Procopiuc Cecilia M.
Srivastava Divesh
Yaroslavtsev Grigory
Publication venue
Publication date: 25/07/2012
Field of study

A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a "strategy" set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to improve the accuracy of a given query set by answering some strategy queries more accurately than others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise

arXiv.org e-Print Archive

CiteSeerX