1,078 research outputs found
VXA: A Virtual Architecture for Durable Compressed Archives
Data compression algorithms change frequently, and obsolete decoders do not
always run on new hardware and operating systems, threatening the long-term
usability of content archived using those algorithms. Re-encoding content into
new formats is cumbersome, and highly undesirable when lossy compression is
involved. Processor architectures, in contrast, have remained comparatively
stable over recent decades. VXA, an archival storage system designed around
this observation, archives executable decoders along with the encoded content
it stores. VXA decoders run in a specialized virtual machine that implements an
OS-independent execution environment based on the standard x86 architecture.
The VXA virtual machine strictly limits access to host system services, making
decoders safe to run even if an archive contains malicious code. VXA's adoption
of a "native" processor architecture instead of type-safe language technology
allows reuse of existing "hand-optimized" decoders in C and assembly language,
and permits decoders access to performance-enhancing architecture features such
as vector processing instructions. The performance cost of VXA's virtualization
is typically less than 15% compared with the same decoders running natively.
The storage cost of archived decoders, typically 30-130KB each, can be
amortized across many archived files sharing the same compression method.Comment: 14 pages, 7 figures, 2 table
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
A document image model and estimation algorithm for optimized JPEG decompression
The JPEG standard is one of the most prevalent image compression schemes in use today. While JPEG was designed for use with natural images, it is also widely used for the encoding of raster documents. Unfortunately, JPEG\u27s characteristic blocking and ringing artifacts can severely degrade the quality of text and graphics in complex documents. We propose a JPEG decompression algorithm which is designed to produce substantially higher quality images from the same standard JPEG encodings. The method works by incorporating a document image model into the decoding process which accounts for the wide variety of content in modern complex color documents. The method works by first segmenting the JPEG encoded document into regions corresponding to background, text, and picture content. The regions corresponding to text and background are then decoded using maximum a posteriori (MAP) estimation. Most importantly, the MAP reconstruction of the text regions uses a model which accounts for the spatial characteristics of text and graphics. Our experimental comparisons to the baseline JPEG decoding as well as to three other decoding schemes, demonstrate that our method substantially improves the quality of decoded images, both visually and as measured by PSNR
Information extraction from multimedia web documents: an open-source platform and testbed
The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
Locally Adaptive Resolution (LAR) codec
The JPEG committee has initiated a study of potential technologies dedicated to future generation image compression systems. The idea is to design a new norm of image compression, named JPEG AIC (Advanced Image Coding), together with advanced evaluation methodologies, closely matching to human vision system characteristics. JPEG AIC thus aimed at defining a complete coding system able to address advanced functionalities such as lossy to lossless compression, scalability (spatial, temporal, depth, quality, complexity, component, granularity...), robustness, embed-ability, content description for image handling at object level... The chosen compression method would have to fit perceptual metrics defined by the JPEG community within the JPEG AIC project. In this context, we propose the Locally Adaptive Resolution (LAR) codec as a contribution to the relative call for technologies, tending to fit all of previous functionalities. This method is a coding solution that simultaneously proposes a relevant representation of the image. This property is exploited through various complementary coding schemes in order to design a highly scalable encoder. The LAR method has been initially introduced for lossy image coding. This efficient image compression solution relies on a content-based system driven by a specific quadtree representation, based on the assumption that an image can be represented as layers of basic information and local texture. Multiresolution versions of this codec have shown their efficiency, from low bit rates up to lossless compressed images. An original hierarchical self-extracting region representation has also been elaborated: a segmentation process is realized at both coder and decoder, leading to a free segmentation map. This later can be further exploited for color region encoding, image handling at region level. Moreover, the inherent structure of the LAR codec can be used for advanced functionalities such as content securization purposes. In particular, dedicated Unequal Error Protection systems have been produced and tested for transmission over the Internet or wireless channels. Hierarchical selective encryption techniques have been adapted to our coding scheme. Data hiding system based on the LAR multiresolution description allows efficient content protection. Thanks to the modularity of our coding scheme, complexity can be adjusted to address various embedded systems. For example, basic version of the LAR coder has been implemented onto FPGA platform while respecting real-time constraints. Pyramidal LAR solution and hierarchical segmentation process have also been prototyped on DSPs heterogeneous architectures. This chapter first introduces JPEG AIC scope and details associated requirements. Then we develop the technical features, of the LAR system, and show the originality of the proposed scheme, both in terms of functionalities and services. In particular, we show that the LAR coder remains efficient for natural images, medical images, and art images
T2CI-GAN: Text to Compressed Image generation using Generative Adversarial Network
The problem of generating textual descriptions for the visual data has gained
research attention in the recent years. In contrast to that the problem of
generating visual data from textual descriptions is still very challenging,
because it requires the combination of both Natural Language Processing (NLP)
and Computer Vision techniques. The existing methods utilize the Generative
Adversarial Networks (GANs) and generate the uncompressed images from textual
description. However, in practice, most of the visual data are processed and
transmitted in the compressed representation. Hence, the proposed work attempts
to generate the visual data directly in the compressed representation form
using Deep Convolutional GANs (DCGANs) to achieve the storage and computational
efficiency. We propose GAN models for compressed image generation from text.
The first model is directly trained with JPEG compressed DCT images (compressed
domain) to generate the compressed images from text descriptions. The second
model is trained with RGB images (pixel domain) to generate JPEG compressed DCT
representation from text descriptions. The proposed models are tested on an
open source benchmark dataset Oxford-102 Flower images using both RGB and JPEG
compressed versions, and accomplished the state-of-the-art performance in the
JPEG compressed domain. The code will be publicly released at GitHub after
acceptance of paper.Comment: Accepted for publication at IAPR's 6th CVIP 202
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries
With advanced image journaling tools, one can easily alter the semantic
meaning of an image by exploiting certain manipulation techniques such as
copy-clone, object splicing, and removal, which mislead the viewers. In
contrast, the identification of these manipulations becomes a very challenging
task as manipulated regions are not visually apparent. This paper proposes a
high-confidence manipulation localization architecture which utilizes
resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder
network to segment out manipulated regions from non-manipulated ones.
Resampling features are used to capture artifacts like JPEG quality loss,
upsampling, downsampling, rotation, and shearing. The proposed network exploits
larger receptive fields (spatial maps) and frequency domain correlation to
analyze the discriminative characteristics between manipulated and
non-manipulated regions by incorporating encoder and LSTM network. Finally,
decoder network learns the mapping from low-resolution feature maps to
pixel-wise predictions for image tamper localization. With predicted mask
provided by final layer (softmax) of the proposed architecture, end-to-end
training is performed to learn the network parameters through back-propagation
using ground-truth masks. Furthermore, a large image splicing dataset is
introduced to guide the training process. The proposed method is capable of
localizing image manipulations at pixel level with high precision, which is
demonstrated through rigorous experimentation on three diverse datasets
Wireless End-to-End Image Transmission System using Semantic Communications
Semantic communication is considered the future of mobile communication,
which aims to transmit data beyond Shannon's theorem of communications by
transmitting the semantic meaning of the data rather than the bit-by-bit
reconstruction of the data at the receiver's end. The semantic communication
paradigm aims to bridge the gap of limited bandwidth problems in modern
high-volume multimedia application content transmission. Integrating AI
technologies with the 6G communications networks paved the way to develop
semantic communication-based end-to-end communication systems. In this study,
we have implemented a semantic communication-based end-to-end image
transmission system, and we discuss potential design considerations in
developing semantic communication systems in conjunction with physical channel
characteristics. A Pre-trained GAN network is used at the receiver as the
transmission task to reconstruct the realistic image based on the Semantic
segmented image at the receiver input. The semantic segmentation task at the
transmitter (encoder) and the GAN network at the receiver (decoder) is trained
on a common knowledge base, the COCO-Stuff dataset. The research shows that the
resource gain in the form of bandwidth saving is immense when transmitting the
semantic segmentation map through the physical channel instead of the ground
truth image in contrast to conventional communication systems. Furthermore, the
research studies the effect of physical channel distortions and quantization
noise on semantic communication-based multimedia content transmission.Comment: Accepted for IEEE Acces
OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model
In today's technological era, document images play an important and integral
part in our day to day life, and specifically with the surge of Covid-19,
digitally scanned documents have become key source of communication, thus
avoiding any sort of infection through physical contact. Storage and
transmission of scanned document images is a very memory intensive task, hence
compression techniques are being used to reduce the image size before archival
and transmission. To extract information or to operate on the compressed
images, we have two ways of doing it. The first way is to decompress the image
and operate on it and subsequently compress it again for the efficiency of
storage and transmission. The other way is to use the characteristics of the
underlying compression algorithm to directly process the images in their
compressed form without involving decompression and re-compression. In this
paper, we propose a novel idea of developing an OCR for CCITT (The
International Telegraph and Telephone Consultative Committee) compressed
machine printed TIFF document images directly in the compressed domain. After
segmenting text regions into lines and words, HMM is applied for recognition
using three coding modes of CCITT- horizontal, vertical and the pass mode.
Experimental results show that OCR on pass modes give a promising results.Comment: The paper has 14 figures and 1 tabl
- …