3,196 research outputs found
Cycle-IR: Deep Cyclic Image Retargeting
Supervised deep learning techniques have achieved great success in various
fields due to getting rid of the limitation of handcrafted representations.
However, most previous image retargeting algorithms still employ fixed design
principles such as using gradient map or handcrafted features to compute
saliency map, which inevitably restricts its generality. Deep learning
techniques may help to address this issue, but the challenging problem is that
we need to build a large-scale image retargeting dataset for the training of
deep retargeting models. However, building such a dataset requires huge human
efforts.
In this paper, we propose a novel deep cyclic image retargeting approach,
called Cycle-IR, to firstly implement image retargeting with a single deep
model, without relying on any explicit user annotations. Our idea is built on
the reverse mapping from the retargeted images to the given input images. If
the retargeted image has serious distortion or excessive loss of important
visual information, the reverse mapping is unlikely to restore the input image
well. We constrain this forward-reverse consistency by introducing a cyclic
perception coherence loss. In addition, we propose a simple yet effective image
retargeting network (IRNet) to implement the image retargeting process. Our
IRNet contains a spatial and channel attention layer, which is able to
discriminate visually important regions of input images effectively, especially
in cluttered images. Given arbitrary sizes of input images and desired aspect
ratios, our Cycle-IR can produce visually pleasing target images directly.
Extensive experiments on the standard RetargetMe dataset show the superiority
of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and
obtains the best result in the user study. Code is available at
https://github.com/mintanwei/Cycle-IR.Comment: 12 page
Fast Video Retargeting Based on Seam Carving with Parental Labeling
Seam carving is a state-of-the-art content-aware image resizing technique
that effectively preserves the salient areas of an image. However, when applied
to video retargeting, not only is it time intensive, but it also creates highly
visible frame-wise discontinuities. In this paper, we propose a novel video
retargeting method based on seam carving. First, for a single frame, we locate
and remove several seams instead of one seam at once. Second, we use a dynamic
spatiotemporal buffer of energy maps and a standard deviation operator to carve
out the same seams in a temporal cube of frames with low variation in energy.
Last but not least, an improved energy function that considers motions detected
through difference method is employed. During testing, these enhancements
result in a 93 percent reduction in processing time and a higher frame-wise
consistency, thus showing superior performance compared to existing video
retargeting methods
Image Resizing by Reconstruction from Deep Features
Traditional image resizing methods usually work in pixel space and use
various saliency measures. The challenge is to adjust the image shape while
trying to preserve important content. In this paper we perform image resizing
in feature space where the deep layers of a neural network contain rich
important semantic information. We directly adjust the image feature maps,
extracted from a pre-trained classification network, and reconstruct the
resized image using a neural-network based optimization. This novel approach
leverages the hierarchical encoding of the network, and in particular, the
high-level discriminative power of its deeper layers, that recognizes semantic
objects and regions and allows maintaining their aspect ratio. Our use of
reconstruction from deep features diminishes the artifacts introduced by
image-space resizing operators. We evaluate our method on benchmarks, compare
to alternative approaches, and demonstrate its strength on challenging images.Comment: 13 pages, 21 figure
Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure
Existing face sketch synthesis (FSS) similarity measures are sensitive to
slight image degradation (e.g., noise, blur). However, human perception of the
similarity of two sketches will consider both structure and texture as
essential factors and is not sensitive to slight ("pixel-level") mismatches.
Consequently, the use of existing similarity measures can lead to better
algorithms receiving a lower score than worse algorithms. This unreliable
evaluation has significantly hindered the development of the FSS field. To
solve this problem, we propose a novel and robust style similarity measure
called Scoot-measure (Structure CO-Occurrence Texture Measure), which
simultaneously evaluates "block-level" spatial structure and co-occurrence
texture statistics. In addition, we further propose 4 new meta-measures and
create 2 new datasets to perform a comprehensive evaluation of several
widely-used FSS measures on two large databases. Experimental results
demonstrate that our measure not only provides a reliable evaluation but also
achieves significantly improved performance. Specifically, the study indicated
a higher degree (78.8%) of correlation between our measure and human judgment
than the best prior measure (58.6%). Our code will be made available.Comment: 9pages, 15 figures, conferenc
A Novel Semantics and Feature Preserving Perspective for Content Aware Image Retargeting
There is an increasing requirement for efficient image retargeting techniques
to adapt the content to various forms of digital media. With rapid growth of
mobile communications and dynamic web page layouts, one often needs to resize
the media content to adapt to the desired display sizes. For various layouts of
web pages and typically small sizes of handheld portable devices, the
importance in the original image content gets obfuscated after resizing it with
the approach of uniform scaling. Thus, there occurs a need for resizing the
images in a content aware manner which can automatically discard irrelevant
information from the image and present the salient features with more
magnitude. There have been proposed some image retargeting techniques keeping
in mind the content awareness of the input image. However, these techniques
fail to prove globally effective for various kinds of images and desired sizes.
The major problem is the inefficiency of these algorithms to process these
images with minimal visual distortion while also retaining the meaning conveyed
from the image. In this dissertation, we present a novel perspective for
content aware image retargeting, which is well implementable in real time. We
introduce a novel method of analysing semantic information within the input
image while also maintaining the important and visually significant features.
We present the various nuances of our algorithm mathematically and logically,
and show that the results prove better than the state-of-the-art techniques.Comment: 74 Pages, 46 Figures, Masters Thesi
CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth
Single-view depth estimation suffers from the problem that a network trained
on images from one camera does not generalize to images taken with a different
camera model. Thus, changing the camera model requires collecting an entirely
new training dataset. In this work, we propose a new type of convolution that
can take the camera parameters into account, thus allowing neural networks to
learn calibration-aware patterns. Experiments confirm that this improves the
generalization capabilities of depth prediction networks considerably, and
clearly outperforms the state of the art when the train and test images are
acquired with different cameras.Comment: Camera ready version for CVPR 2019. Project page:
http://webdiis.unizar.es/~jmfacil/camconvs
Recognizing Partial Biometric Patterns
Biometric recognition on partial captured targets is challenging, where only
several partial observations of objects are available for matching. In this
area, deep learning based methods are widely applied to match these partial
captured objects caused by occlusions, variations of postures or just partial
out of view in person re-identification and partial face recognition. However,
most current methods are not able to identify an individual in case that some
parts of the object are not obtainable, while the rest are specialized to
certain constrained scenarios. To this end, we propose a robust general
framework for arbitrary biometric matching scenarios without the limitations of
alignment as well as the size of inputs. We introduce a feature post-processing
step to handle the feature maps from FCN and a dictionary learning based
Spatial Feature Reconstruction (SFR) to match different sized feature maps in
this work. Moreover, the batch hard triplet loss function is applied to
optimize the model. The applicability and effectiveness of the proposed method
are demonstrated by the results from experiments on three person
re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial
person datasets (Partial REID and Partial iLIDS) and two partial face datasets
(CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is
ensured in comparison with several state-of-the-art approaches. The code is
released online and can be found on the website:
https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure
ISWAR: An Imaging System with Watermarking and Attack Resilience
With the explosive growth of internet technology, easy transfer of digital
multimedia is feasible. However, this kind of convenience with which authorized
users can access information, turns out to be a mixed blessing due to
information piracy. The emerging field of Digital Rights Management (DRM)
systems addresses issues related to the intellectual property rights of digital
content. In this paper, an object-oriented (OO) DRM system, called "Imaging
System with Watermarking and Attack Resilience" (ISWAR), is presented that
generates and authenticates color images with embedded mechanisms for
protection against infringement of ownership rights as well as security
attacks. In addition to the methods, in the object-oriented sense, for
performing traditional encryption and decryption, the system implements methods
for visible and invisible watermarking. This paper presents one visible and one
invisible watermarking algorithm that have been integrated in the system. The
qualitative and quantitative results obtained for these two watermarking
algorithms with several benchmark images indicate that high-quality watermarked
images are produced by the algorithms. With the help of experimental results it
is demonstrated that the presented invisible watermarking techniques are
resilient to the well known benchmark attacks and hence a fail-safe method for
providing constant protection to ownership rights
Texture Segmentation Based Video Compression Using Convolutional Neural Networks
There has been a growing interest in using different approaches to improve
the coding efficiency of modern video codec in recent years as demand for
web-based video consumption increases. In this paper, we propose a model-based
approach that uses texture analysis/synthesis to reconstruct blocks in texture
regions of a video to achieve potential coding gains using the AV1 codec
developed by the Alliance for Open Media (AOM). The proposed method uses
convolutional neural networks to extract texture regions in a frame, which are
then reconstructed using a global motion model. Our preliminary results show an
increase in coding efficiency while maintaining satisfactory visual quality
Engineering Deep Representations for Modeling Aesthetic Perception
Many aesthetic models in computer vision suffer from two shortcomings: 1) the
low descriptiveness and interpretability of those hand-crafted aesthetic
criteria (i.e., nonindicative of region-level aesthetics), and 2) the
difficulty of engineering aesthetic features adaptively and automatically
toward different image sets. To remedy these problems, we develop a deep
architecture to learn aesthetically-relevant visual attributes from Flickr1,
which are localized by multiple textual attributes in a weakly-supervised
setting. More specifically, using a bag-ofwords (BoW) representation of the
frequent Flickr image tags, a sparsity-constrained subspace algorithm discovers
a compact set of textual attributes (e.g., landscape and sunset) for each
image. Then, a weakly-supervised learning algorithm projects the textual
attributes at image-level to the highly-responsive image patches at
pixel-level. These patches indicate where humans look at appealing regions with
respect to each textual attribute, which are employed to learn the visual
attributes. Psychological and anatomical studies have shown that humans
perceive visual concepts hierarchically. Hence, we normalize these patches and
feed them into a five-layer convolutional neural network (CNN) to mimick the
hierarchy of human perceiving the visual attributes. We apply the learned deep
features on image retargeting, aesthetics ranking, and retrieval. Both
subjective and objective experimental results thoroughly demonstrate the
competitiveness of our approach
- …