1,039 research outputs found
Mask-guided Style Transfer Network for Purifying Real Images
Recently, the progress of learning-by-synthesis has proposed a training model
for synthetic images, which can effectively reduce the cost of human and
material resources. However, due to the different distribution of synthetic
images compared with real images, the desired performance cannot be achieved.
To solve this problem, the previous method learned a model to improve the
realism of the synthetic images. Different from the previous methods, this
paper try to purify real image by extracting discriminative and robust features
to convert outdoor real images to indoor synthetic images. In this paper, we
first introduce the segmentation masks to construct RGB-mask pairs as inputs,
then we design a mask-guided style transfer network to learn style features
separately from the attention and bkgd(background) regions and learn content
features from full and attention region. Moreover, we propose a novel
region-level task-guided loss to restrain the features learnt from style and
content. Experiments were performed using mixed studies (qualitative and
quantitative) methods to demonstrate the possibility of purifying real images
in complex directions. We evaluate the proposed method on various public
datasets, including LPW, COCO and MPIIGaze. Experimental results show that the
proposed method is effective and achieves the state-of-the-art results.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0582
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
- …