658 research outputs found
An Introduction to Neural Data Compression
Neural compression is the application of neural networks and other machine
learning methods to data compression. Recent advances in statistical machine
learning have opened up new possibilities for data compression, allowing
compression algorithms to be learned end-to-end from data using powerful
generative models such as normalizing flows, variational autoencoders,
diffusion probabilistic models, and generative adversarial networks. The
present article aims to introduce this field of research to a broader machine
learning audience by reviewing the necessary background in information theory
(e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image
quality assessment, perceptual metrics), and providing a curated guide through
the essential ideas and methods in the literature thus far
Coding local and global binary visual features extracted from video sequences
Binary local features represent an effective alternative to real-valued
descriptors, leading to comparable results for many visual analysis tasks,
while being characterized by significantly lower computational complexity and
memory requirements. When dealing with large collections, a more compact
representation based on global features is often preferred, which can be
obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW)
model. Several applications, including for example visual sensor networks and
mobile augmented reality, require visual features to be transmitted over a
bandwidth-limited network, thus calling for coding techniques that aim at
reducing the required bit budget, while attaining a target level of efficiency.
In this paper we investigate a coding scheme tailored to both local and global
binary features, which aims at exploiting both spatial and temporal redundancy
by means of intra- and inter-frame coding. In this respect, the proposed coding
scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC)
paradigm. That is, visual features are extracted from the acquired content,
encoded at remote nodes, and finally transmitted to a central controller that
performs visual analysis. This is in contrast with the traditional approach, in
which visual content is acquired at a node, compressed and then sent to a
central unit for further processing, according to the Compress-Then-Analyze
(CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of
rate-efficiency curves in the context of two different visual analysis tasks:
homography estimation and content-based retrieval. Our results show that the
novel ATC paradigm based on the proposed coding primitives can be competitive
with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin
An Efficient Light-weight LSB steganography with Deep learning Steganalysis
Active research is going on to securely transmit a secret message or
so-called steganography by using data-hiding techniques in digital images.
After assessing the state-of-the-art research work, we found, most of the
existing solutions are not promising and are ineffective against machine
learning-based steganalysis. In this paper, a lightweight steganography scheme
is presented through graphical key embedding and obfuscation of data through
encryption. By keeping a mindset of industrial applicability, to show the
effectiveness of the proposed scheme, we emphasized mainly deep learning-based
steganalysis. The proposed steganography algorithm containing two schemes
withstands not only statistical pattern recognizers but also machine learning
steganalysis through feature extraction using a well-known pre-trained deep
learning network Xception. We provided a detailed protocol of the algorithm for
different scenarios and implementation details. Furthermore, different
performance metrics are also evaluated with statistical and machine learning
performance analysis. The results were quite impressive with respect to the
state of the arts. We received 2.55% accuracy through statistical steganalysis
and machine learning steganalysis gave maximum of 49.93~50% correctly
classified instances in good condition.Comment: Accepted pape
EFFICIENT IMAGE COMPRESSION AND DECOMPRESSION ALGORITHMS FOR OCR SYSTEMS
This paper presents an efficient new image compression and decompression methods for document images, intended for usage in the pre-processing stage of an OCR system designed for needs of the “Nikola Tesla Museum” in Belgrade. Proposed image compression methods exploit the Run-Length Encoding (RLE) algorithm and an algorithm based on document character contour extraction, while an iterative scanline fill algorithm is used for image decompression. Image compression and decompression methods are compared with JBIG2 and JPEG2000 image compression standards. Segmentation accuracy results for ground-truth documents are obtained in order to evaluate the proposed methods. Results show that the proposed methods outperform JBIG2 compression regarding the time complexity, providing up to 25 times lower processing time at the expense of worse compression ratio results, as well as JPEG2000 image compression standard, providing up to 4-fold improvement in compression ratio. Finally, time complexity results show that the presented methods are sufficiently fast for a real time character segmentation system
Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications
Communication systems to date primarily aim at reliably communicating bit
sequences. Such an approach provides efficient engineering designs that are
agnostic to the meanings of the messages or to the goal that the message
exchange aims to achieve. Next generation systems, however, can be potentially
enriched by folding message semantics and goals of communication into their
design. Further, these systems can be made cognizant of the context in which
communication exchange takes place, providing avenues for novel design
insights. This tutorial summarizes the efforts to date, starting from its early
adaptations, semantic-aware and task-oriented communications, covering the
foundations, algorithms and potential implementations. The focus is on
approaches that utilize information theory to provide the foundations, as well
as the significant role of learning in semantics and task-aware communications.Comment: 28 pages, 14 figure
Mapping Stream Programs into the Compressed Domain
Due to the high data rates involved in audio, video, and signalprocessing applications, it is imperative to compress the data todecrease the amount of storage used. Unfortunately, this implies thatany program operating on the data needs to be wrapped by adecompression and re-compression stage. Re-compression can incursignificant computational overhead, while decompression swamps theapplication with the original volume of data.In this paper, we present a program transformation that greatlyaccelerates the processing of compressible data. Given a program thatoperates on uncompressed data, we output an equivalent program thatoperates directly on the compressed format. Our transformationapplies to stream programs, a restricted but useful class ofapplications with regular communication and computation patterns. Ourformulation is based on LZ77, a lossless compression algorithm that isutilized by ZIP and fully encapsulates common formats such as AppleAnimation, Microsoft RLE, and Targa.We implemented a simple subset of our techniques in the StreamItcompiler, which emits executable plugins for two popular video editingtools: MEncoder and Blender. For common operations such as coloradjustment and video compositing, mapping into the compressed domainoffers a speedup roughly proportional to the overall compressionratio. For our benchmark suite of 12 videos in Apple Animationformat, speedups range from 1.1x to 471x, with a median of 15x
- …