6,760 research outputs found
Aesthetic-Driven Image Enhancement by Adversarial Learning
We introduce EnhanceGAN, an adversarial learning based model that performs
automatic image enhancement. Traditional image enhancement frameworks typically
involve training models in a fully-supervised manner, which require expensive
annotations in the form of aligned image pairs. In contrast to these
approaches, our proposed EnhanceGAN only requires weak supervision (binary
labels on image aesthetic quality) and is able to learn enhancement operators
for the task of aesthetic-based image enhancement. In particular, we show the
effectiveness of a piecewise color enhancement module trained with weak
supervision, and extend the proposed EnhanceGAN framework to learning a deep
filtering-based aesthetic enhancer. The full differentiability of our image
enhancement operators enables the training of EnhanceGAN in an end-to-end
manner. We further demonstrate the capability of EnhanceGAN in learning
aesthetic-based image cropping without any groundtruth cropping pairs. Our
weakly-supervised EnhanceGAN reports competitive quantitative results on
aesthetic-based color enhancement as well as automatic image cropping, and a
user study confirms that our image enhancement results are on par with or even
preferred over professional enhancement
Improving Outfit Recommendation with Co-supervision of Fashion Generation
The task of fashion recommendation includes two main challenges: visual
understanding and visual matching. Visual understanding aims to extract
effective visual features. Visual matching aims to model a human notion of
compatibility to compute a match between fashion items. Most previous studies
rely on recommendation loss alone to guide visual understanding and matching.
Although the features captured by these methods describe basic characteristics
(e.g., color, texture, shape) of the input items, they are not directly related
to the visual signals of the output items (to be recommended). This is
problematic because the aesthetic characteristics (e.g., style, design), based
on which we can directly infer the output items, are lacking. Features are
learned under the recommendation loss alone, where the supervision signal is
simply whether the given two items are matched or not. To address this problem,
we propose a neural co-supervision learning framework, called the FAshion
Recommendation Machine (FARM). FARM improves visual understanding by
incorporating the supervision of generation loss, which we hypothesize to be
able to better encode aesthetic information. FARM enhances visual matching by
introducing a novel layer-to-layer matching mechanism to fuse aesthetic
information more effectively, and meanwhile avoiding paying too much attention
to the generation quality and ignoring the recommendation performance.
Extensive experiments on two publicly available datasets show that FARM
outperforms state-of-the-art models on outfit recommendation, in terms of AUC
and MRR. Detailed analyses of generated and recommended items demonstrate that
FARM can encode better features and generate high quality images as references
to improve recommendation performance
Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
Image aesthetics assessment (IAA) is a challenging task due to its highly
subjective nature. Most of the current studies rely on large-scale datasets
(e.g., AVA and AADB) to learn a general model for all kinds of photography
images. However, little light has been shed on measuring the aesthetic quality
of artistic images, and the existing datasets only contain relatively few
artworks. Such a defect is a great obstacle to the aesthetic assessment of
artistic images. To fill the gap in the field of artistic image aesthetics
assessment (AIAA), we first introduce a large-scale AIAA dataset: Boldbrush
Artistic Image Dataset (BAID), which consists of 60,337 artistic images
covering various art forms, with more than 360,000 votes from online users. We
then propose a new method, SAAN (Style-specific Art Assessment Network), which
can effectively extract and utilize style-specific and generic aesthetic
information to evaluate artistic images. Experiments demonstrate that our
proposed approach outperforms existing IAA methods on the proposed BAID dataset
according to quantitative comparisons. We believe the proposed dataset and
method can serve as a foundation for future AIAA works and inspire more
research in this field. Dataset and code are available at:
https://github.com/Dreemurr-T/BAID.gitComment: Accepted by CVPR 202
Towards Learning Representations in Visual Computing Tasks
abstract: The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.
The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.
In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.Dissertation/ThesisDoctoral Dissertation Computer Science 201
複雑さに関連した特徴を用いた主観的印象予測
学位の種別: 課程博士審査委員会委員 : (主査)東京大学准教授 山﨑 俊彦, 東京大学教授 相澤 清晴, 国立情報学研究所教授 佐藤 真一, 東京大学教授 佐藤 洋一, 東京大学教授 苗村 健University of Tokyo(東京大学
- …