21,265 research outputs found
A Novel Image Compression Method Based on Classified Energy and Pattern Building Blocks
In this paper, a novel image compression method based on generation of the so-called classified energy and pattern blocks (CEPB) is introduced and evaluation results are presented. The CEPB is constructed using the training images and then located at both the transmitter and receiver sides of the communication system. Then the energy and pattern blocks of input images to be reconstructed are determined by the same way in the construction of the CEPB. This process is also associated with a matching procedure to determine the index numbers of the classified energy and pattern blocks in the CEPB which best represents (matches) the energy and pattern blocks of the input images. Encoding parameters are block scaling coefficient and index numbers of energy and pattern blocks determined for each block of the input images. These parameters are sent from the transmitter part to the receiver part and the classified energy and pattern blocks associated with the index numbers are pulled from the CEPB. Then the input image is reconstructed block by block in the receiver part using a mathematical model that is proposed. Evaluation results show that the method provides considerable image compression ratios and image quality even at low bit rates.The work described in this paper was funded by the Isik University Scientific Research Fund (Project contract no. 10B301). The author would like to thank to Professor B. S. Yarman (Istanbul University, College of Engineering, Department of Electrical-Electronics Engineering), Assistant Professor Hakan Gurkan (Isik University, Engineering Faculty, Department of Electrical-Electronics Engineering), the researchers in the International Computer Science Institute (ICSI), Speech Group, University of California at Berkeley, CA, USA and the researchers in the SRI International, Speech Technology and Research (STAR) Laboratory, Menlo Park, CA, USA for many helpful discussions on this work during his postdoctoral fellow years. The author also would like to thank the anonymous reviewers for their valuable comments and suggestions which substantially improved the quality of this paperPublisher's Versio
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
It is desirable to train convolutional networks (CNNs) to run more
efficiently during inference. In many cases however, the computational budget
that the system has for inference cannot be known beforehand during training,
or the inference budget is dependent on the changing real-time resource
availability. Thus, it is inadequate to train just inference-efficient CNNs,
whose inference costs are not adjustable and cannot adapt to varied inference
budgets. We propose a novel approach for cost-adjustable inference in CNNs -
Stochastic Downsampling Point (SDPoint). During training, SDPoint applies
feature map downsampling to a random point in the layer hierarchy, with a
random downsampling ratio. The different stochastic downsampling configurations
known as SDPoint instances (of the same model) have computational costs
different from each other, while being trained to minimize the same prediction
loss. Sharing network parameters across different instances provides
significant regularization boost. During inference, one may handpick a SDPoint
instance that best fits the inference budget. The effectiveness of SDPoint, as
both a cost-adjustable inference approach and a regularizer, is validated
through extensive experiments on image classification
Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review
With the large chunks of social media data being created daily and the
parallel rise of realistic multimedia tampering methods, detecting and
localising tampering in images and videos has become essential. This survey
focusses on approaches for tampering detection in multimedia data using deep
learning models. Specifically, it presents a detailed analysis of benchmark
datasets for malicious manipulation detection that are publicly available. It
also offers a comprehensive list of tampering clues and commonly used deep
learning architectures. Next, it discusses the current state-of-the-art
tampering detection methods, categorizing them into meaningful types such as
deepfake detection methods, splice tampering detection methods, copy-move
tampering detection methods, etc. and discussing their strengths and
weaknesses. Top results achieved on benchmark datasets, comparison of deep
learning approaches against traditional methods and critical insights from the
recent tampering detection methods are also discussed. Lastly, the research
gaps, future direction and conclusion are discussed to provide an in-depth
understanding of the tampering detection research arena
Fully Dynamic Inference with Deep Neural Networks
Modern deep neural networks are powerful and widely applicable models that
extract task-relevant information through multi-level abstraction. Their
cross-domain success, however, is often achieved at the expense of
computational cost, high memory bandwidth, and long inference latency, which
prevents their deployment in resource-constrained and time-sensitive scenarios,
such as edge-side inference and self-driving cars. While recently developed
methods for creating efficient deep neural networks are making their real-world
deployment more feasible by reducing model size, they do not fully exploit
input properties on a per-instance basis to maximize computational efficiency
and task accuracy. In particular, most existing methods typically use a
one-size-fits-all approach that identically processes all inputs. Motivated by
the fact that different images require different feature embeddings to be
accurately classified, we propose a fully dynamic paradigm that imparts deep
convolutional neural networks with hierarchical inference dynamics at the level
of layers and individual convolutional filters/channels. Two compact networks,
called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance
basis which layers or filters/channels are redundant and therefore should be
skipped. L-Net and C-Net also learn how to scale retained computation outputs
to maximize task accuracy. By integrating L-Net and C-Net into a joint design
framework, called LC-Net, we consistently outperform state-of-the-art dynamic
frameworks with respect to both efficiency and classification accuracy. On the
CIFAR-10 dataset, LC-Net results in up to 11.9 fewer floating-point
operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic
inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4
fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries
With advanced image journaling tools, one can easily alter the semantic
meaning of an image by exploiting certain manipulation techniques such as
copy-clone, object splicing, and removal, which mislead the viewers. In
contrast, the identification of these manipulations becomes a very challenging
task as manipulated regions are not visually apparent. This paper proposes a
high-confidence manipulation localization architecture which utilizes
resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder
network to segment out manipulated regions from non-manipulated ones.
Resampling features are used to capture artifacts like JPEG quality loss,
upsampling, downsampling, rotation, and shearing. The proposed network exploits
larger receptive fields (spatial maps) and frequency domain correlation to
analyze the discriminative characteristics between manipulated and
non-manipulated regions by incorporating encoder and LSTM network. Finally,
decoder network learns the mapping from low-resolution feature maps to
pixel-wise predictions for image tamper localization. With predicted mask
provided by final layer (softmax) of the proposed architecture, end-to-end
training is performed to learn the network parameters through back-propagation
using ground-truth masks. Furthermore, a large image splicing dataset is
introduced to guide the training process. The proposed method is capable of
localizing image manipulations at pixel level with high precision, which is
demonstrated through rigorous experimentation on three diverse datasets
- …