Search CORE

78 research outputs found

Perceptually Guided Photo Retargeting

Author: Hong Richang
Nie Liqiang
Shao Ling
Xia Yingjie
Yan Yan
Zhang Luming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2016
Field of study

We propose perceptually guided photo retargeting, which shrinks a photo by simulating a human's process of sequentially perceiving visually/semantically important regions in a photo. In particular, we first project the local features (graphlets in this paper) onto a semantic space, wherein visual cues such as global spatial layout and rough geometric context are exploited. Thereafter, a sparsity-constrained learning algorithm is derived to select semantically representative graphlets of a photo, and the selecting process can be interpreted by a path which simulates how a human actively perceives semantics in a photo. Furthermore, we learn the prior distribution of such active graphlet paths (AGPs) from training photos that are marked as esthetically pleasing by multiple users. The learned priors enforce the corresponding AGP of a retargeted photo to be maximally similar to those from the training photos. On top of the retargeting model, we further design an online learning scheme to incrementally update the model with new photos that are esthetically pleasing. The online update module makes the algorithm less dependent on the number and contents of the initial training data. Experimental results show that: 1) the proposed AGP is over 90% consistent with human gaze shifting path, as verified by the eye-tracking data, and 2) the retargeting algorithm outperforms its competitors significantly, as AGP is more indicative of photo esthetics than conventional saliency maps

Crossref

University of East Anglia digital repository

Recycle-GAN: Unsupervised Video Retargeting

Author: C Cao
C Liu
E Hsu
J Walker
N Kholgade
O Ronneberger
O Russakovsky
Qi-Xing Huang
Publication venue
Publication date: 15/08/2018
Field of study

We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style. Our approach combines both spatial and temporal information along with adversarial losses for content translation and style preservation. In this work, we first study the advantages of using spatiotemporal constraints over spatial constraints for effective retargeting. We then demonstrate the proposed approach for the problems where information in both space and time matters such as face-to-face translation, flower-to-flower, wind and cloud synthesis, sunrise and sunset.Comment: ECCV 2018; Please refer to project webpage for videos - http://www.cs.cmu.edu/~aayushb/Recycle-GA

arXiv.org e-Print Archive

Crossref

FastShrinkage: Perceptually-aware retargeting toward mobile platforms

Author: LIU Wei
LIU Zhenguang
SHAH Rajiv Ratn
WANG Zepeng
XIA Yingjie
YANG Yi
ZHANG Luming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2017
Field of study

Institutional Knowledge at Singapore Management University

VITON: An Image-based Virtual Try-on Network

Author: Davis Larry S.
Han Xintong
Wu Zhe
Wu Zuxuan
Yu Ruichi
Publication venue
Publication date: 12/06/2018
Field of study

We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy. Conditioned upon a new clothing-agnostic yet descriptive person representation, our framework first generates a coarse synthesized image with the target clothing item overlaid on that same person in the same pose. We further enhance the initial blurry clothing area with a refinement network. The network is trained to learn how much detail to utilize from the target clothing item, and where to apply to the person in order to synthesize a photo-realistic image in which the target item deforms naturally with clear visual patterns. Experiments on our newly collected Zalando dataset demonstrate its promise in the image-based virtual try-on task over state-of-the-art generative models

arXiv.org e-Print Archive

Crossref

Intelligent visual media processing: when graphics meets vision

Author: Cheng Ming-Ming
Hou Qi-Bin
Rosin Paul L
Zhang Song-Hai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The computer graphics and computer vision communities have been working closely together in recent years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data; iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions

Online Research @ Cardiff

Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

Author: Matthews I
Theobald B
Publication venue
Publication date: 01/01/2012
Field of study

We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

University of East Anglia digital repository