3,196 research outputs found

    Cycle-IR: Deep Cyclic Image Retargeting

    Full text link
    Supervised deep learning techniques have achieved great success in various fields due to getting rid of the limitation of handcrafted representations. However, most previous image retargeting algorithms still employ fixed design principles such as using gradient map or handcrafted features to compute saliency map, which inevitably restricts its generality. Deep learning techniques may help to address this issue, but the challenging problem is that we need to build a large-scale image retargeting dataset for the training of deep retargeting models. However, building such a dataset requires huge human efforts. In this paper, we propose a novel deep cyclic image retargeting approach, called Cycle-IR, to firstly implement image retargeting with a single deep model, without relying on any explicit user annotations. Our idea is built on the reverse mapping from the retargeted images to the given input images. If the retargeted image has serious distortion or excessive loss of important visual information, the reverse mapping is unlikely to restore the input image well. We constrain this forward-reverse consistency by introducing a cyclic perception coherence loss. In addition, we propose a simple yet effective image retargeting network (IRNet) to implement the image retargeting process. Our IRNet contains a spatial and channel attention layer, which is able to discriminate visually important regions of input images effectively, especially in cluttered images. Given arbitrary sizes of input images and desired aspect ratios, our Cycle-IR can produce visually pleasing target images directly. Extensive experiments on the standard RetargetMe dataset show the superiority of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and obtains the best result in the user study. Code is available at https://github.com/mintanwei/Cycle-IR.Comment: 12 page

    Fast Video Retargeting Based on Seam Carving with Parental Labeling

    Full text link
    Seam carving is a state-of-the-art content-aware image resizing technique that effectively preserves the salient areas of an image. However, when applied to video retargeting, not only is it time intensive, but it also creates highly visible frame-wise discontinuities. In this paper, we propose a novel video retargeting method based on seam carving. First, for a single frame, we locate and remove several seams instead of one seam at once. Second, we use a dynamic spatiotemporal buffer of energy maps and a standard deviation operator to carve out the same seams in a temporal cube of frames with low variation in energy. Last but not least, an improved energy function that considers motions detected through difference method is employed. During testing, these enhancements result in a 93 percent reduction in processing time and a higher frame-wise consistency, thus showing superior performance compared to existing video retargeting methods

    Image Resizing by Reconstruction from Deep Features

    Full text link
    Traditional image resizing methods usually work in pixel space and use various saliency measures. The challenge is to adjust the image shape while trying to preserve important content. In this paper we perform image resizing in feature space where the deep layers of a neural network contain rich important semantic information. We directly adjust the image feature maps, extracted from a pre-trained classification network, and reconstruct the resized image using a neural-network based optimization. This novel approach leverages the hierarchical encoding of the network, and in particular, the high-level discriminative power of its deeper layers, that recognizes semantic objects and regions and allows maintaining their aspect ratio. Our use of reconstruction from deep features diminishes the artifacts introduced by image-space resizing operators. We evaluate our method on benchmarks, compare to alternative approaches, and demonstrate its strength on challenging images.Comment: 13 pages, 21 figure

    Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

    Full text link
    Existing face sketch synthesis (FSS) similarity measures are sensitive to slight image degradation (e.g., noise, blur). However, human perception of the similarity of two sketches will consider both structure and texture as essential factors and is not sensitive to slight ("pixel-level") mismatches. Consequently, the use of existing similarity measures can lead to better algorithms receiving a lower score than worse algorithms. This unreliable evaluation has significantly hindered the development of the FSS field. To solve this problem, we propose a novel and robust style similarity measure called Scoot-measure (Structure CO-Occurrence Texture Measure), which simultaneously evaluates "block-level" spatial structure and co-occurrence texture statistics. In addition, we further propose 4 new meta-measures and create 2 new datasets to perform a comprehensive evaluation of several widely-used FSS measures on two large databases. Experimental results demonstrate that our measure not only provides a reliable evaluation but also achieves significantly improved performance. Specifically, the study indicated a higher degree (78.8%) of correlation between our measure and human judgment than the best prior measure (58.6%). Our code will be made available.Comment: 9pages, 15 figures, conferenc

    A Novel Semantics and Feature Preserving Perspective for Content Aware Image Retargeting

    Full text link
    There is an increasing requirement for efficient image retargeting techniques to adapt the content to various forms of digital media. With rapid growth of mobile communications and dynamic web page layouts, one often needs to resize the media content to adapt to the desired display sizes. For various layouts of web pages and typically small sizes of handheld portable devices, the importance in the original image content gets obfuscated after resizing it with the approach of uniform scaling. Thus, there occurs a need for resizing the images in a content aware manner which can automatically discard irrelevant information from the image and present the salient features with more magnitude. There have been proposed some image retargeting techniques keeping in mind the content awareness of the input image. However, these techniques fail to prove globally effective for various kinds of images and desired sizes. The major problem is the inefficiency of these algorithms to process these images with minimal visual distortion while also retaining the meaning conveyed from the image. In this dissertation, we present a novel perspective for content aware image retargeting, which is well implementable in real time. We introduce a novel method of analysing semantic information within the input image while also maintaining the important and visually significant features. We present the various nuances of our algorithm mathematically and logically, and show that the results prove better than the state-of-the-art techniques.Comment: 74 Pages, 46 Figures, Masters Thesi

    CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth

    Full text link
    Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns. Experiments confirm that this improves the generalization capabilities of depth prediction networks considerably, and clearly outperforms the state of the art when the train and test images are acquired with different cameras.Comment: Camera ready version for CVPR 2019. Project page: http://webdiis.unizar.es/~jmfacil/camconvs

    Recognizing Partial Biometric Patterns

    Full text link
    Biometric recognition on partial captured targets is challenging, where only several partial observations of objects are available for matching. In this area, deep learning based methods are widely applied to match these partial captured objects caused by occlusions, variations of postures or just partial out of view in person re-identification and partial face recognition. However, most current methods are not able to identify an individual in case that some parts of the object are not obtainable, while the rest are specialized to certain constrained scenarios. To this end, we propose a robust general framework for arbitrary biometric matching scenarios without the limitations of alignment as well as the size of inputs. We introduce a feature post-processing step to handle the feature maps from FCN and a dictionary learning based Spatial Feature Reconstruction (SFR) to match different sized feature maps in this work. Moreover, the batch hard triplet loss function is applied to optimize the model. The applicability and effectiveness of the proposed method are demonstrated by the results from experiments on three person re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial person datasets (Partial REID and Partial iLIDS) and two partial face datasets (CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is ensured in comparison with several state-of-the-art approaches. The code is released online and can be found on the website: https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure

    ISWAR: An Imaging System with Watermarking and Attack Resilience

    Full text link
    With the explosive growth of internet technology, easy transfer of digital multimedia is feasible. However, this kind of convenience with which authorized users can access information, turns out to be a mixed blessing due to information piracy. The emerging field of Digital Rights Management (DRM) systems addresses issues related to the intellectual property rights of digital content. In this paper, an object-oriented (OO) DRM system, called "Imaging System with Watermarking and Attack Resilience" (ISWAR), is presented that generates and authenticates color images with embedded mechanisms for protection against infringement of ownership rights as well as security attacks. In addition to the methods, in the object-oriented sense, for performing traditional encryption and decryption, the system implements methods for visible and invisible watermarking. This paper presents one visible and one invisible watermarking algorithm that have been integrated in the system. The qualitative and quantitative results obtained for these two watermarking algorithms with several benchmark images indicate that high-quality watermarked images are produced by the algorithms. With the help of experimental results it is demonstrated that the presented invisible watermarking techniques are resilient to the well known benchmark attacks and hence a fail-safe method for providing constant protection to ownership rights

    Texture Segmentation Based Video Compression Using Convolutional Neural Networks

    Full text link
    There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality

    Engineering Deep Representations for Modeling Aesthetic Perception

    Full text link
    Many aesthetic models in computer vision suffer from two shortcomings: 1) the low descriptiveness and interpretability of those hand-crafted aesthetic criteria (i.e., nonindicative of region-level aesthetics), and 2) the difficulty of engineering aesthetic features adaptively and automatically toward different image sets. To remedy these problems, we develop a deep architecture to learn aesthetically-relevant visual attributes from Flickr1, which are localized by multiple textual attributes in a weakly-supervised setting. More specifically, using a bag-ofwords (BoW) representation of the frequent Flickr image tags, a sparsity-constrained subspace algorithm discovers a compact set of textual attributes (e.g., landscape and sunset) for each image. Then, a weakly-supervised learning algorithm projects the textual attributes at image-level to the highly-responsive image patches at pixel-level. These patches indicate where humans look at appealing regions with respect to each textual attribute, which are employed to learn the visual attributes. Psychological and anatomical studies have shown that humans perceive visual concepts hierarchically. Hence, we normalize these patches and feed them into a five-layer convolutional neural network (CNN) to mimick the hierarchy of human perceiving the visual attributes. We apply the learned deep features on image retargeting, aesthetics ranking, and retrieval. Both subjective and objective experimental results thoroughly demonstrate the competitiveness of our approach
    • …
    corecore