77,955 research outputs found
Coding of non-stationary sources as a foundation for detecting change points and outliers in binary time-series
An interesting scheme for estimating and adapting distributions in real-time for non-stationary data has recently been the focus of study for several different tasks relating to time series and data mining, namely change point detection, outlier detection and online compression/sequence prediction. An appealing feature is that unlike more sophisticated procedures, it is as fast as the related stationary procedures which are simply modified through discounting or windowing. The discount scheme makes older observations lose their influence on new predictions. The authors of this article recently used a discount scheme for introducing an adaptive version of the Context Tree Weighting compression algorithm. The mentioned change point and outlier detection methods rely on the changing compression ratio of an online compression algorithm. Here we are beginning to provide theoretical foundations for the use of these adaptive estimation procedures that have already shown practical promise
FogGIS: Fog Computing for Geospatial Big Data Analytics
Cloud Geographic Information Systems (GIS) has emerged as a tool for
analysis, processing and transmission of geospatial data. The Fog computing is
a paradigm where Fog devices help to increase throughput and reduce latency at
the edge of the client. This paper developed a Fog-based framework named Fog
GIS for mining analytics from geospatial data. We built a prototype using Intel
Edison, an embedded microprocessor. We validated the FogGIS by doing
preliminary analysis. including compression, and overlay analysis. Results
showed that Fog computing hold a great promise for analysis of geospatial data.
We used several open source compression techniques for reducing the
transmission to the cloud.Comment: 6 pages, 4 figures, 1 table, 3rd IEEE Uttar Pradesh Section
International Conference on Electrical, Computer and Electronics (09-11
December, 2016) Indian Institute of Technology (Banaras Hindu University)
Varanasi, Indi
CONCISE: Compressed 'n' Composable Integer Set
Bit arrays, or bitmaps, are used to significantly speed up set operations in
several areas, such as data warehousing, information retrieval, and data
mining, to cite a few. However, bitmaps usually use a large storage space, thus
requiring compression. Nevertheless, there is a space-time tradeoff among
compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades
some space to allow for bitwise operations without first decompressing bitmaps.
WAH has been recognized as the most efficient scheme in terms of computation
time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set),
a new scheme that enjoys significatively better performances than those of WAH.
In particular, when compared to WAH, our algorithm is able to reduce the
required memory up to 50%, by having similar or better performance in terms of
computation time. Further, we show that CONCISE can be efficiently used to
manipulate bitmaps representing sets of integral numbers in lieu of well-known
data structures such as arrays, lists, hashtables, and self-balancing binary
search trees. Extensive experiments over synthetic data show the effectiveness
of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page
An Overview of Moving Object Trajectory Compression Algorithms
Compression technology is an efficient way to reserve useful and valuable data as well as remove redundant and inessential data from datasets. With the development of RFID and GPS devices, more and more moving objects can be traced and their trajectories can be recorded. However, the exponential increase in the amount of such trajectory data has caused a series of problems in the storage, processing, and analysis of data. Therefore, moving object trajectory compression undoubtedly becomes one of the hotspots in moving object data mining. To provide an overview, we survey and summarize the development and trend of moving object compression and analyze typical moving object compression algorithms presented in recent years. In this paper, we firstly summarize the strategies and implementation processes of classical moving object compression algorithms. Secondly, the related definitions about moving objects and their trajectories are discussed. Thirdly, the validation criteria are introduced for evaluating the performance and efficiency of compression algorithms. Finally, some application scenarios are also summarized to point out the potential application in the future. It is hoped that this research will serve as the steppingstone for those interested in advancing moving objects mining
Reducing the loss of information through annealing text distortion
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects
- …