77,955 research outputs found

    Coding of non-stationary sources as a foundation for detecting change points and outliers in binary time-series

    Get PDF
    An interesting scheme for estimating and adapting distributions in real-time for non-stationary data has recently been the focus of study for several different tasks relating to time series and data mining, namely change point detection, outlier detection and online compression/sequence prediction. An appealing feature is that unlike more sophisticated procedures, it is as fast as the related stationary procedures which are simply modified through discounting or windowing. The discount scheme makes older observations lose their influence on new predictions. The authors of this article recently used a discount scheme for introducing an adaptive version of the Context Tree Weighting compression algorithm. The mentioned change point and outlier detection methods rely on the changing compression ratio of an online compression algorithm. Here we are beginning to provide theoretical foundations for the use of these adaptive estimation procedures that have already shown practical promise

    FogGIS: Fog Computing for Geospatial Big Data Analytics

    Full text link
    Cloud Geographic Information Systems (GIS) has emerged as a tool for analysis, processing and transmission of geospatial data. The Fog computing is a paradigm where Fog devices help to increase throughput and reduce latency at the edge of the client. This paper developed a Fog-based framework named Fog GIS for mining analytics from geospatial data. We built a prototype using Intel Edison, an embedded microprocessor. We validated the FogGIS by doing preliminary analysis. including compression, and overlay analysis. Results showed that Fog computing hold a great promise for analysis of geospatial data. We used several open source compression techniques for reducing the transmission to the cloud.Comment: 6 pages, 4 figures, 1 table, 3rd IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (09-11 December, 2016) Indian Institute of Technology (Banaras Hindu University) Varanasi, Indi

    CONCISE: Compressed 'n' Composable Integer Set

    Full text link
    Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. However, bitmaps usually use a large storage space, thus requiring compression. Nevertheless, there is a space-time tradeoff among compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades some space to allow for bitwise operations without first decompressing bitmaps. WAH has been recognized as the most efficient scheme in terms of computation time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set), a new scheme that enjoys significatively better performances than those of WAH. In particular, when compared to WAH, our algorithm is able to reduce the required memory up to 50%, by having similar or better performance in terms of computation time. Further, we show that CONCISE can be efficiently used to manipulate bitmaps representing sets of integral numbers in lieu of well-known data structures such as arrays, lists, hashtables, and self-balancing binary search trees. Extensive experiments over synthetic data show the effectiveness of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page

    An Overview of Moving Object Trajectory Compression Algorithms

    Get PDF
    Compression technology is an efficient way to reserve useful and valuable data as well as remove redundant and inessential data from datasets. With the development of RFID and GPS devices, more and more moving objects can be traced and their trajectories can be recorded. However, the exponential increase in the amount of such trajectory data has caused a series of problems in the storage, processing, and analysis of data. Therefore, moving object trajectory compression undoubtedly becomes one of the hotspots in moving object data mining. To provide an overview, we survey and summarize the development and trend of moving object compression and analyze typical moving object compression algorithms presented in recent years. In this paper, we firstly summarize the strategies and implementation processes of classical moving object compression algorithms. Secondly, the related definitions about moving objects and their trajectories are discussed. Thirdly, the validation criteria are introduced for evaluating the performance and efficiency of compression algorithms. Finally, some application scenarios are also summarized to point out the potential application in the future. It is hoped that this research will serve as the steppingstone for those interested in advancing moving objects mining

    Reducing the loss of information through annealing text distortion

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects
    • …
    corecore