2,892 research outputs found

    Structure-Aware Sampling: Flexible and Accurate Summarization

    Full text link
    In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

    On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

    Get PDF

    Classification of functional brain data for multimedia retrieval

    Get PDF
    This study introduces new signal processing methods for extracting meaningful information from brain signals (functional magnetic resonance imaging and single unit recording) and proposes a content-based retrieval system for functional brain data. First, a new method that combines maximal overlapped discrete wavelet transforms (MODWT) and dynamic time warping (DTW) is presented as a solution for dynamically detecting the hemodynamic response from fMRI data. Second, a new method for neuron spike sorting is presented that uses the maximal overlap discrete wavelet transform and rotated principal component analysis. Third, a procedure to characterize firing patterns of neuron spikes from the human brain, in both the temporal domain and the frequency domain, is presented. The combination of multitaper spectral estimation and a polynomial curve-fitting method is employed to transform the firing patterns to the frequency domain. To generate temporal shapes, eight local maxima are smoothly connected by a cubic spline interpolation. A rotated principal component analysis is used to extract common firing patterns as templates from a training set of 4100 neuron spike signals. Dynamic time warping is then used to assign each neuron firing to the closest template without shift error. These techniques are utilized in the development of a content-based retrieval system for human brain data

    An objective based classification of aggregation techniques for wireless sensor networks

    No full text
    Wireless Sensor Networks have gained immense popularity in recent years due to their ever increasing capabilities and wide range of critical applications. A huge body of research efforts has been dedicated to find ways to utilize limited resources of these sensor nodes in an efficient manner. One of the common ways to minimize energy consumption has been aggregation of input data. We note that every aggregation technique has an improvement objective to achieve with respect to the output it produces. Each technique is designed to achieve some target e.g. reduce data size, minimize transmission energy, enhance accuracy etc. This paper presents a comprehensive survey of aggregation techniques that can be used in distributed manner to improve lifetime and energy conservation of wireless sensor networks. Main contribution of this work is proposal of a novel classification of such techniques based on the type of improvement they offer when applied to WSNs. Due to the existence of a myriad of definitions of aggregation, we first review the meaning of term aggregation that can be applied to WSN. The concept is then associated with the proposed classes. Each class of techniques is divided into a number of subclasses and a brief literature review of related work in WSN for each of these is also presented

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Accurate and Efficient Private Release of Datacubes and Contingency Tables

    Full text link
    A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a "strategy" set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to improve the accuracy of a given query set by answering some strategy queries more accurately than others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise
    • …
    corecore