1,443 research outputs found
Region-Aware Portrait Retouching with Sparse Interactive Guidance
Portrait retouching aims to improve the aesthetic quality of input portrait
photos and especially requires human-region priority. \pink{The deep
learning-based methods largely elevate the retouching efficiency and provide
promising retouched results. However, existing portrait retouching methods
focus on automatic retouching, which treats all human-regions equally and
ignores users' preferences for specific individuals,} thus suffering from
limited flexibility in interactive scenarios. In this work, we emphasize the
importance of users' intents and explore the interactive portrait retouching
task. Specifically, we propose a region-aware retouching framework with two
branches: an automatic branch and an interactive branch. \pink{The automatic
branch involves an encoding-decoding process, which searches region candidates
and performs automatic region-aware retouching without user guidance. The
interactive branch encodes sparse user guidance into a priority condition
vector and modulates latent features with a region selection module to further
emphasize the user-specified regions. Experimental results show that our
interactive branch effectively captures users' intents and generalizes well to
unseen scenes with sparse user guidance, while our automatic branch also
outperforms the state-of-the-art retouching methods due to improved
region-awareness.
Enabling Seamless Access to Digital Graphical Contents for Visually Impaired Individuals via Semantic-Aware Processing
Vision is one of the main sources through which people obtain information from the world, but unfortunately, visually-impaired people are partially or completely deprived of this type of information. With the help of computer technologies, people with visual impairment can independently access digital textual information by using text-to-speech and text-to-Braille software. However, in general, there still exists a major barrier for people who are blind to access the graphical information independently in real-time without the help of sighted people. In this paper, we propose a novel multi-level and multi-modal approach aiming at addressing this challenging and practical problem, with the key idea being semantic-aware visual-to-tactile conversion through semantic image categorization and segmentation, and semantic-driven image simplification. An end-to-end prototype system was built based on the approach. We present the details of the approach and the system, report sample experimental results with realistic data, and compare our approach with current typical practice
Intuitive, Interactive Beard and Hair Synthesis with Generative Models
We present an interactive approach to synthesizing realistic variations in
facial hair in images, ranging from subtle edits to existing hair to the
addition of complex and challenging hair in images of clean-shaven subjects. To
circumvent the tedious and computationally expensive tasks of modeling,
rendering and compositing the 3D geometry of the target hairstyle using the
traditional graphics pipeline, we employ a neural network pipeline that
synthesizes realistic and detailed images of facial hair directly in the target
image in under one second. The synthesis is controlled by simple and sparse
guide strokes from the user defining the general structural and color
properties of the target hairstyle. We qualitatively and quantitatively
evaluate our chosen method compared to several alternative approaches. We show
compelling interactive editing results with a prototype user interface that
allows novice users to progressively refine the generated image to match their
desired hairstyle, and demonstrate that our approach also allows for flexible
and high-fidelity scalp hair synthesis.Comment: To be presented in the 2020 Conference on Computer Vision and Pattern
Recognition (CVPR 2020, Oral Presentation). Supplementary video can be seen
at: https://www.youtube.com/watch?v=v4qOtBATrv
Improving the Accuracy of Beauty Product Recommendations by Assessing Face Illumination Quality
We focus on addressing the challenges in responsible beauty product
recommendation, particularly when it involves comparing the product's color
with a person's skin tone, such as for foundation and concealer products. To
make accurate recommendations, it is crucial to infer both the product
attributes and the product specific facial features such as skin conditions or
tone. However, while many product photos are taken under good light conditions,
face photos are taken from a wide range of conditions. The features extracted
using the photos from ill-illuminated environment can be highly misleading or
even be incompatible to be compared with the product attributes. Hence bad
illumination condition can severely degrade quality of the recommendation.
We introduce a machine learning framework for illumination assessment which
classifies images into having either good or bad illumination condition. We
then build an automatic user guidance tool which informs a user holding their
camera if their illumination condition is good or bad. This way, the user is
provided with rapid feedback and can interactively control how the photo is
taken for their recommendation. Only a few studies are dedicated to this
problem, mostly due to the lack of dataset that is large, labeled, and diverse
both in terms of skin tones and light patterns. Lack of such dataset leads to
neglecting skin tone diversity. Therefore, We begin by constructing a diverse
synthetic dataset that simulates various skin tones and light patterns in
addition to an existing facial image dataset. Next, we train a Convolutional
Neural Network (CNN) for illumination assessment that outperforms the existing
solutions using the synthetic dataset. Finally, we analyze how the our work
improves the shade recommendation for various foundation products.Comment: 7 pages, 5 figures. Presented in FAccTRec202
High-Quality Face Caricature via Style Translation
Caricature is an exaggerated form of artistic portraiture that accentuates
unique yet subtle characteristics of human faces. Recently, advancements in
deep end-to-end techniques have yielded encouraging outcomes in capturing both
style and elevated exaggerations in creating face caricatures. Most of these
approaches tend to produce cartoon-like results that could be more practical
for real-world applications. In this study, we proposed a high-quality,
unpaired face caricature method that is appropriate for use in the real world
and uses computer vision techniques and GAN models. We attain the exaggeration
of facial features and the stylization of appearance through a two-step
process: Face caricature generation and face caricature projection. The face
caricature generation step creates new caricature face datasets from real
images and trains a generative model using the real and newly created
caricature datasets. The Face caricature projection employs an encoder trained
with real and caricature faces with the pretrained generator to project real
and caricature faces. We perform an incremental facial exaggeration from the
real image to the caricature faces using the encoder and generator's latent
space. Our projection preserves the facial identity, attributes, and
expressions from the input image. Also, it accounts for facial occlusions, such
as reading glasses or sunglasses, to enhance the robustness of our model.
Furthermore, we conducted a comprehensive comparison of our approach with
various state-of-the-art face caricature methods, highlighting our process's
distinctiveness and exceptional realism.Comment: 14 pages, 21 figure
Probabilistic framework for image understanding applications using Bayesian Networks
Machine learning algorithms have been successfully utilized in various systems/devices. They have the ability to improve the usability/quality of such systems in terms of intelligent user interface, fast performance, and more importantly, high accuracy. In this research, machine learning techniques are used in the field of image understanding, which is a common research area between image analysis and computer vision, to involve higher processing level of a target image to make sense of the scene captured in it. A general probabilistic framework for image understanding where topics associated with (i) collection of images to generate a comprehensive and valid database, (ii) generation of an unbiased ground-truth for the aforesaid database, (iii) selection of classification features and elimination of the redundant ones, and (iv) usage of such information to test a new sample set, are discussed. Two research projects have been developed as examples of the general image understanding framework; identification of region(s) of interest, and image segmentation evaluation. These techniques, in addition to others, are combined in an object-oriented rendering system for printing applications. The discussion included in this doctoral dissertation explores the means for developing such a system from an image understanding/ processing aspect. It is worth noticing that this work does not aim to develop a printing system. It is only proposed to add some essential features for current printing pipelines to achieve better visual quality while printing images/photos. Hence, we assume that image regions have been successfully extracted from the printed document. These images are used as input to the proposed object-oriented rendering algorithm where methodologies for color image segmentation, region-of-interest identification and semantic features extraction are employed. Probabilistic approaches based on Bayesian statistics have been utilized to develop the proposed image understanding techniques
- …