29,972 research outputs found
Deep Aesthetic Quality Assessment with Semantic Information
Human beings often assess the aesthetic quality of an image coupled with the
identification of the image's semantic content. This paper addresses the
correlation issue between automatic aesthetic quality assessment and semantic
recognition. We cast the assessment problem as the main task among a multi-task
deep model, and argue that semantic recognition task offers the key to address
this problem. Based on convolutional neural networks, we employ a single and
simple multi-task framework to efficiently utilize the supervision of aesthetic
and semantic labels. A correlation item between these two tasks is further
introduced to the framework by incorporating the inter-task relationship
learning. This item not only provides some useful insight about the correlation
but also improves assessment accuracy of the aesthetic task. Particularly, an
effective strategy is developed to keep a balance between the two tasks, which
facilitates to optimize the parameters of the framework. Extensive experiments
on the challenging AVA dataset and Photo.net dataset validate the importance of
semantic recognition in aesthetic quality assessment, and demonstrate that
multi-task deep models can discover an effective aesthetic representation to
achieve state-of-the-art results.Comment: 13 pages, 10 figure
A deep architecture for unified aesthetic prediction
Image aesthetics has become an important criterion for visual content
curation on social media sites and media content repositories. Previous work on
aesthetic prediction models in the computer vision community has focused on
aesthetic score prediction or binary image labeling. However, raw aesthetic
annotations are in the form of score histograms and provide richer and more
precise information than binary labels or mean scores. Consequently, in this
work we focus on the rarely-studied problem of predicting aesthetic score
distributions and propose a novel architecture and training procedure for our
model. Our model achieves state-of-the-art results on the standard AVA
large-scale benchmark dataset for three tasks: (i) aesthetic quality
classification; (ii) aesthetic score regression; and (iii) aesthetic score
distribution prediction, all while using one model trained only for the
distribution prediction task. We also introduce a method to modify an image
such that its predicted aesthetics changes, and use this modification to gain
insight into our model
Aesthetic-based Clothing Recommendation
Recently, product images have gained increasing attention in clothing
recommendation since the visual appearance of clothing products has a
significant impact on consumers' decision. Most existing methods rely on
conventional features to represent an image, such as the visual features
extracted by convolutional neural networks (CNN features) and the
scale-invariant feature transform algorithm (SIFT features), color histograms,
and so on. Nevertheless, one important type of features, the \emph{aesthetic
features}, is seldom considered. It plays a vital role in clothing
recommendation since a users' decision depends largely on whether the clothing
is in line with her aesthetics, however the conventional image features cannot
portray this directly. To bridge this gap, we propose to introduce the
aesthetic information, which is highly relevant with user preference, into
clothing recommender systems. To achieve this, we first present the aesthetic
features extracted by a pre-trained neural network, which is a brain-inspired
deep structure trained for the aesthetic assessment task. Considering that the
aesthetic preference varies significantly from user to user and by time, we
then propose a new tensor factorization model to incorporate the aesthetic
features in a personalized manner. We conduct extensive experiments on
real-world datasets, which demonstrate that our approach can capture the
aesthetic preference of users and significantly outperform several
state-of-the-art recommendation methods.Comment: WWW 201
Visually-aware Recommendation with Aesthetic Features
Visual information plays a critical role in human decision-making process.
While recent developments on visually-aware recommender systems have taken the
product image into account, none of them has considered the aesthetic aspect.
We argue that the aesthetic factor is very important in modeling and predicting
users' preferences, especially for some fashion-related domains like clothing
and jewelry. This work addresses the need of modeling aesthetic information in
visually-aware recommender systems. Technically speaking, we make three key
contributions in leveraging deep aesthetic features: (1) To describe the
aesthetics of products, we introduce the aesthetic features extracted from
product images by a deep aesthetic network. We incorporate these features into
recommender system to model users' preferences in the aesthetic aspect. (2)
Since in clothing recommendation, time is very important for users to make
decision, we design a new tensor decomposition model for implicit feedback
data. The aesthetic features are then injected to the basic tensor model to
capture the temporal dynamics of aesthetic preferences (e.g., seasonal
patterns). (3) We also use the aesthetic features to optimize the learning
strategy on implicit feedback data. We enrich the pairwise training samples by
considering the similarity among items in the visual space and graph space; the
key idea is that a user may likely have similar perception on similar items. We
perform extensive experiments on several real-world datasets and demonstrate
the usefulness of aesthetic features and the effectiveness of our proposed
methods.Comment: Accepted by VLDBJ. arXiv admin note: substantial text overlap with
arXiv:1809.0582
Learning Photography Aesthetics with Deep CNNs
Automatic photo aesthetic assessment is a challenging artificial intelligence
task. Existing computational approaches have focused on modeling a single
aesthetic score or a class (good or bad), however these do not provide any
details on why the photograph is good or bad, or which attributes contribute to
the quality of the photograph. To obtain both accuracy and human interpretation
of the score, we advocate learning the aesthetic attributes along with the
prediction of the overall score. For this purpose, we propose a novel multitask
deep convolution neural network, which jointly learns eight aesthetic
attributes along with the overall aesthetic score. We report near human
performance in the prediction of the overall aesthetic score. To understand the
internal representation of these attributes in the learned model, we also
develop the visualization technique using back propagation of gradients. These
visualizations highlight the important image regions for the corresponding
attributes, thus providing insights about model's representation of these
attributes. We showcase the diversity and complexity associated with different
attributes through a qualitative analysis of the activation maps.Comment: Accepted in The 28th Modern Artificial Intelligence and Cognitive
Science Conferenc
A Computational Approach to Relative Aesthetics
Computational visual aesthetics has recently become an active research area.
Existing state-of-art methods formulate this as a binary classification task
where a given image is predicted to be beautiful or not. In many applications
such as image retrieval and enhancement, it is more important to rank images
based on their aesthetic quality instead of binary-categorizing them.
Furthermore, in such applications, it may be possible that all images belong to
the same category. Hence determining the aesthetic ranking of the images is
more appropriate. To this end, we formulate a novel problem of ranking images
with respect to their aesthetic quality. We construct a new dataset of image
pairs with relative labels by carefully selecting images from the popular AVA
dataset. Unlike in aesthetics classification, there is no single threshold
which would determine the ranking order of the images across our entire
dataset. We propose a deep neural network based approach that is trained on
image pairs by incorporating principles from relative learning. Results show
that such relative training procedure allows our network to rank the images
with a higher accuracy than a state-of-art network trained on the same set of
images using binary labels.Comment: ICPR 201
CAPTAIN: Comprehensive Composition Assistance for Photo Taking
Many people are interested in taking astonishing photos and sharing with
others. Emerging hightech hardware and software facilitate ubiquitousness and
functionality of digital photography. Because composition matters in
photography, researchers have leveraged some common composition techniques to
assess the aesthetic quality of photos computationally. However, composition
techniques developed by professionals are far more diverse than well-documented
techniques can cover. We leverage the vast underexplored innovations in
photography for computational composition assistance. We propose a
comprehensive framework, named CAPTAIN (Composition Assistance for Photo
Taking), containing integrated deep-learned semantic detectors, sub-genre
categorization, artistic pose clustering, personalized aesthetics-based image
retrieval, and style set matching. The framework is backed by a large dataset
crawled from a photo-sharing Website with mostly photography enthusiasts and
professionals. The work proposes a sequence of steps that have not been
explored in the past by researchers. The work addresses personal preferences
for composition through presenting a ranked-list of photographs to the user
based on user-specified weights in the similarity measure. The matching
algorithm recognizes the best shot among a sequence of shots with respect to
the user's preferred style set. We have conducted a number of experiments on
the newly proposed components and reported findings. A user study demonstrates
that the work is useful to those taking photos.Comment: 30 pages, 21 figures, 4 tables, submitted to IJCV (International
Journal of Computer Vision
Engineering Deep Representations for Modeling Aesthetic Perception
Many aesthetic models in computer vision suffer from two shortcomings: 1) the
low descriptiveness and interpretability of those hand-crafted aesthetic
criteria (i.e., nonindicative of region-level aesthetics), and 2) the
difficulty of engineering aesthetic features adaptively and automatically
toward different image sets. To remedy these problems, we develop a deep
architecture to learn aesthetically-relevant visual attributes from Flickr1,
which are localized by multiple textual attributes in a weakly-supervised
setting. More specifically, using a bag-ofwords (BoW) representation of the
frequent Flickr image tags, a sparsity-constrained subspace algorithm discovers
a compact set of textual attributes (e.g., landscape and sunset) for each
image. Then, a weakly-supervised learning algorithm projects the textual
attributes at image-level to the highly-responsive image patches at
pixel-level. These patches indicate where humans look at appealing regions with
respect to each textual attribute, which are employed to learn the visual
attributes. Psychological and anatomical studies have shown that humans
perceive visual concepts hierarchically. Hence, we normalize these patches and
feed them into a five-layer convolutional neural network (CNN) to mimick the
hierarchy of human perceiving the visual attributes. We apply the learned deep
features on image retargeting, aesthetics ranking, and retrieval. Both
subjective and objective experimental results thoroughly demonstrate the
competitiveness of our approach
Image aesthetic evaluation using paralleled deep convolution neural network
Image aesthetic evaluation has attracted much attention in recent years.
Image aesthetic evaluation methods heavily depend on the effective aesthetic
feature. Traditional meth-ods always extract hand-crafted features. However,
these hand-crafted features are always designed to adapt particu-lar datasets,
and extraction of them needs special design. Rather than extracting
hand-crafted features, an automati-cally learn of aesthetic features based on
deep convolutional neural network (DCNN) is first adopt in this paper. As we
all know, when the training dataset is given, the DCNN architecture with high
complexity may meet the over-fitting problem. On the other side, the DCNN
architecture with low complexity would not efficiently extract effective
features. For these reasons, we further propose a paralleled convolutional
neural network (PDCNN) with multi-level structures to automatically adapt to
the training dataset. Experimental results show that our proposed PDCNN
architecture achieves better performance than other traditional methods.Comment: 7 pages, 6 figures, 9 table
Automated Deep Photo Style Transfer
Photorealism is a complex concept that cannot easily be formulated
mathematically. Deep Photo Style Transfer is an attempt to transfer the style
of a reference image to a content image while preserving its photorealism. This
is achieved by introducing a constraint that prevents distortions in the
content image and by applying the style transfer independently for semantically
different parts of the images. In addition, an automated segmentation process
is presented that consists of a neural network based segmentation method
followed by a semantic grouping step. To further improve the results a measure
for image aesthetics is used and elaborated. If the content and the style image
are sufficiently similar, the result images look very realistic. With the
automation of the image segmentation the pipeline becomes completely
independent from any user interaction, which allows for new applications
- …