21,265 research outputs found

    A Novel Image Compression Method Based on Classified Energy and Pattern Building Blocks

    Get PDF
    In this paper, a novel image compression method based on generation of the so-called classified energy and pattern blocks (CEPB) is introduced and evaluation results are presented. The CEPB is constructed using the training images and then located at both the transmitter and receiver sides of the communication system. Then the energy and pattern blocks of input images to be reconstructed are determined by the same way in the construction of the CEPB. This process is also associated with a matching procedure to determine the index numbers of the classified energy and pattern blocks in the CEPB which best represents (matches) the energy and pattern blocks of the input images. Encoding parameters are block scaling coefficient and index numbers of energy and pattern blocks determined for each block of the input images. These parameters are sent from the transmitter part to the receiver part and the classified energy and pattern blocks associated with the index numbers are pulled from the CEPB. Then the input image is reconstructed block by block in the receiver part using a mathematical model that is proposed. Evaluation results show that the method provides considerable image compression ratios and image quality even at low bit rates.The work described in this paper was funded by the Isik University Scientific Research Fund (Project contract no. 10B301). The author would like to thank to Professor B. S. Yarman (Istanbul University, College of Engineering, Department of Electrical-Electronics Engineering), Assistant Professor Hakan Gurkan (Isik University, Engineering Faculty, Department of Electrical-Electronics Engineering), the researchers in the International Computer Science Institute (ICSI), Speech Group, University of California at Berkeley, CA, USA and the researchers in the SRI International, Speech Technology and Research (STAR) Laboratory, Menlo Park, CA, USA for many helpful discussions on this work during his postdoctoral fellow years. The author also would like to thank the anonymous reviewers for their valuable comments and suggestions which substantially improved the quality of this paperPublisher's Versio

    Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

    Full text link
    It is desirable to train convolutional networks (CNNs) to run more efficiently during inference. In many cases however, the computational budget that the system has for inference cannot be known beforehand during training, or the inference budget is dependent on the changing real-time resource availability. Thus, it is inadequate to train just inference-efficient CNNs, whose inference costs are not adjustable and cannot adapt to varied inference budgets. We propose a novel approach for cost-adjustable inference in CNNs - Stochastic Downsampling Point (SDPoint). During training, SDPoint applies feature map downsampling to a random point in the layer hierarchy, with a random downsampling ratio. The different stochastic downsampling configurations known as SDPoint instances (of the same model) have computational costs different from each other, while being trained to minimize the same prediction loss. Sharing network parameters across different instances provides significant regularization boost. During inference, one may handpick a SDPoint instance that best fits the inference budget. The effectiveness of SDPoint, as both a cost-adjustable inference approach and a regularizer, is validated through extensive experiments on image classification

    Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review

    Full text link
    With the large chunks of social media data being created daily and the parallel rise of realistic multimedia tampering methods, detecting and localising tampering in images and videos has become essential. This survey focusses on approaches for tampering detection in multimedia data using deep learning models. Specifically, it presents a detailed analysis of benchmark datasets for malicious manipulation detection that are publicly available. It also offers a comprehensive list of tampering clues and commonly used deep learning architectures. Next, it discusses the current state-of-the-art tampering detection methods, categorizing them into meaningful types such as deepfake detection methods, splice tampering detection methods, copy-move tampering detection methods, etc. and discussing their strengths and weaknesses. Top results achieved on benchmark datasets, comparison of deep learning approaches against traditional methods and critical insights from the recent tampering detection methods are also discussed. Lastly, the research gaps, future direction and conclusion are discussed to provide an in-depth understanding of the tampering detection research arena

    Fully Dynamic Inference with Deep Neural Networks

    Full text link
    Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction. Their cross-domain success, however, is often achieved at the expense of computational cost, high memory bandwidth, and long inference latency, which prevents their deployment in resource-constrained and time-sensitive scenarios, such as edge-side inference and self-driving cars. While recently developed methods for creating efficient deep neural networks are making their real-world deployment more feasible by reducing model size, they do not fully exploit input properties on a per-instance basis to maximize computational efficiency and task accuracy. In particular, most existing methods typically use a one-size-fits-all approach that identically processes all inputs. Motivated by the fact that different images require different feature embeddings to be accurately classified, we propose a fully dynamic paradigm that imparts deep convolutional neural networks with hierarchical inference dynamics at the level of layers and individual convolutional filters/channels. Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped. L-Net and C-Net also learn how to scale retained computation outputs to maximize task accuracy. By integrating L-Net and C-Net into a joint design framework, called LC-Net, we consistently outperform state-of-the-art dynamic frameworks with respect to both efficiency and classification accuracy. On the CIFAR-10 dataset, LC-Net results in up to 11.9×\times fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4×\times fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods

    Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

    Full text link
    With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets
    corecore