7 research outputs found

    Linear Sketches for Approximate Aggregate Range Queries 1,2

    No full text
    Answering aggregate queries approximately over multidimensional data is an important problem that arises naturally in many applications. An approach to the problem is to maintain a succinct (i.e. O(k) space) representation, called sketch, of the frequency distribution h of the data, and use ˆ h for answering queries. Common sketches are constructed via linear mappings of h onto a k–dimensional space, e.g. map h to its top–k Fourier/Wavelet coefficients. We call such sketches linear sketches, since ˆ h = P ∗ h for some sketching matrix P. Linear sketches have the benefit that they can be easily maintained incrementally over data streams. Sketches are typically optimized for approximating the data distribution, but not the answers to queries. In this paper, we are concerned with linear sketches that approximate well not only the data but also the answers to the aggregate queries. The quality of approximations is measured using the mean squared and relative errors (MSE and RLE). A query is represented by a column vector q such that its answer is q T h. A given set of queries can be represented by an appropriate query matrix Q. We show that the MSE for the queries is minimized when the sketching matrix used to construct a linear sketch of h has as columns the top-k eigenvectors of the query matrix Q. Further, if the quer

    Analysis of Sub-block Placement and Victim Caching Techniques

    No full text
    Rapid advances in computer technology have led to the development of processors with peak performances of the order of GHz. Since it is not feasible to have unlimited fast-memory, the performance of these processors is handicapped if the performance of the memory-hierarchy is poor. Caching techniques have been developed with this in mind. This paper presents the analysis of the performance of two such techniques. Sub-block placement: This technique reduces the miss penalty by reducing the bandwidth between the cache and it’s next level. Our results show that sub-block placement enhances the performance both for L1 and L2, more significantly in the former. The performance improves with the increase in the number of subblocks. Victim caching: This technique reduces the miss rate by adding a small, fully associative cache between a cache and the next level in the memory hierarchy. The results show that victim caches reduce the miss rate in L1 caches, but the reduction achieved depends on the structure and configuration of the cache and it’s victim cache. Our study of the performance of victim caches as the block size, cache size and associativity of the caches was varied showed that there can be a significant improvement in performance. However, as cache sizes increase or associativity becomes higher, victim caches do not greatly enhance performance

    myjournal manuscript No. (will be inserted by the editor) On-board Vehicle Data Stream Monitoring using MineFleet and Fast Resource Constrained Monitoring of Correlation Matrices

    No full text
    Abstract This paper considers the problem of monitoring vehicle data streams in a resource-constrained environment. It particularly focuses on a monitoring task that requires frequent computation of correlation matrices using lightweight onboard computing devices. It motivates this problem in the context of the Mine-Fleet Real-Time system and offers a randomized algorithm for fast monitoring of correlation (FMC), inner product, and Euclidean distance matrices among others. Unlike the existing approaches that compute all the entries of these matrices from a data set, the proposed technique works using a divide-and-conquer approach. This paper presents a probabilistic test for quickly detecting whether or not a subset of coefficients contains a significant one with a magnitude greater than a user given threshold. This test is used for quickly identifying the portions of the space that contain significant coefficients. The proposed algorithm is particularly suitable for monitoring correlation and related matrices computed from continuous data streams.

    Distance Measures for Effective Clustering of ARIMA Time-Series

    No full text
    Many environmental and socioeconomic time--series data can be adequately modeled using Auto-Regressive Integrated Moving Average (ARIMA) models. We call such time--series ARIMA time--series. We consider the problem of clustering ARIMA time--series. We propose the use of the Linear Predictive Coding (LPC) cepstrum of time--series for clustering ARIMA time--series, by using the Euclidean distance between the LPC cepstra of two time--series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time--series. For example, few LPC cepstral coefficients are sufficient in order to discriminate between time--series that are modeled by different ARIMA models. In fact this approach requires fewer coefficients than traditional approaches, such as DFT and DWT. The proposed distance measure can be used for measuring the similarity between different ARIMA models as well
    corecore