Search CORE

19,346 research outputs found

Efficient Compression Technique for Sparse Sets

Author: Kulkarni Raghav
Pratap Rameshwar
Sohony Ishan
Publication venue
Publication date: 16/08/2017
Field of study

Recent technological advancements have led to the generation of huge amounts of data over the web, such as text, image, audio and video. Most of this data is high dimensional and sparse, for e.g., the bag-of-words representation used for representing text. Often, an efficient search for similar data points needs to be performed in many applications like clustering, nearest neighbour search, ranking and indexing. Even though there have been significant increases in computational power, a simple brute-force similarity-search on such datasets is inefficient and at times impossible. Thus, it is desirable to get a compressed representation which preserves the similarity between data points. In this work, we consider the data points as sets and use Jaccard similarity as the similarity measure. Compression techniques are generally evaluated on the following parameters --1) Randomness required for compression, 2) Time required for compression, 3) Dimension of the data after compression, and 4) Space required to store the compressed data. Ideally, the compressed representation of the data should be such, that the similarity between each pair of data points is preserved, while keeping the time and the randomness required for compression as low as possible. We show that the compression technique suggested by Pratap and Kulkarni also works well for Jaccard similarity. We present a theoretical proof of the same and complement it with rigorous experimentations on synthetic as well as real-world datasets. We also compare our results with the state-of-the-art "min-wise independent permutation", and show that our compression algorithm achieves almost equal accuracy while significantly reducing the compression time and the randomness

arXiv.org e-Print Archive

Crossref

The 2-dimensional non-linear sigma-model on a random latice

Author: A. A. Belavin
A. Di Giacomo
A. Di Giacomo
A. M. Polyakov
B. Allés
B. Allés
E. Brézin
E. M. Ilgenfritz
F. Farchioni
G. Parisi
I. Bender
J. Hoek
J. Hoek
M. Beccaria
M. Campostrini
M. Falcioni
M. Teper
M. Teper
N. H. Christ
N. H. Christ
P. Hasenfratz
P. Hasenfratz
R. Friedberg
U. Wolff
U. Wolff
U. Wolff
Z. Qiu
Z. Qiu
Publication venue: 'American Physical Society (APS)'
Publication date: 28/03/1995
Field of study

The O(n) non-linear

\sigma

-model is simulated on 2-dimensional regular and random lattices. We use two different levels of randomness in the construction of the random lattices and give a detailed explanation of the geometry of such lattices. In the simulations, we calculate the mass gap for

n=3, 4

and 8, analysing the asymptotic scaling of the data and computing the ratio of Lambda parameters

\Lambda_{\rm random}/\Lambda_{\rm regular}

. These ratios are in agreement with previous semi-analytical calculations. We also numerically calculate the topological susceptibility by using the cooling method.Comment: REVTeX file, 23 pages. 13 postscript figures in a separate compressed tar fil

arXiv.org e-Print Archive

Crossref

Compressed/reconstructed test images for CRAF/Cassini

Author: Arnold S.
Cheung K.-M.
Dolinar S.
Onyszchuk I.
Pollara F.
Publication venue
Publication date
Field of study

A set of compressed, then reconstructed, test images submitted to the Comet Rendezvous Asteroid Flyby (CRAF)/Cassini project is presented as part of its evaluation of near lossless high compression algorithms for representing image data. A total of seven test image files were provided by the project. The seven test images were compressed, then reconstructed with high quality (root mean square error of approximately one or two gray levels on an 8 bit gray scale), using discrete cosine transforms or Hadamard transforms and efficient entropy coders. The resulting compression ratios varied from about 2:1 to about 10:1, depending on the activity or randomness in the source image. This was accomplished without any special effort to optimize the quantizer or to introduce special postprocessing to filter the reconstruction errors. A more complete set of measurements, showing the relative performance of the compression algorithms over a wide range of compression ratios and reconstruction errors, shows that additional compression is possible at a small sacrifice in fidelity

NASA Technical Reports Server

Computing A Glimpse of Randomness

Author: C. -k. Shu
C. S. Calude
Chi-kou Shu
Cristian S. Calude
M. J. Dinneen
Michael J. Dinneen
Publication venue
Publication date: 01/01/2002
Field of study

A Chaitin Omega number is the halting probability of a universal Chaitin (self-delimiting Turing) machine. Every Omega number is both computably enumerable (the limit of a computable, increasing, converging sequence of rationals) and random (its binary expansion is an algorithmic random sequence). In particular, every Omega number is strongly non-computable. The aim of this paper is to describe a procedure, which combines Java programming and mathematical proofs, for computing the exact values of the first 64 bits of a Chaitin Omega: 0000001000000100000110001000011010001111110010111011101000010000. Full description of programs and proofs will be given elsewhere.Comment: 16 pages; Experimental Mathematics (accepted

arXiv.org e-Print Archive

CiteSeerX