Search CORE

1,515 research outputs found

Context-aware Synthesis for Video Frame Interpolation

Author: Liu Feng
Niklaus Simon
Publication venue
Publication date: 29/03/2018
Field of study

Video frame interpolation algorithms typically estimate optical flow or its variations and then use it to guide the synthesis of an intermediate frame between two consecutive original frames. To handle challenges like occlusion, bidirectional flow between the two input frames is often estimated and used to warp and blend the input frames. However, how to effectively blend the two warped frames still remains a challenging problem. This paper presents a context-aware synthesis approach that warps not only the input frames but also their pixel-wise contextual information and uses them to interpolate a high-quality intermediate frame. Specifically, we first use a pre-trained neural network to extract per-pixel contextual information for input frames. We then employ a state-of-the-art optical flow algorithm to estimate bidirectional flow between them and pre-warp both input frames and their context maps. Finally, unlike common approaches that blend the pre-warped frames, our method feeds them and their context maps to a video frame synthesis neural network to produce the interpolated frame in a context-aware fashion. Our neural network is fully convolutional and is trained end to end. Our experiments show that our method can handle challenging scenarios such as occlusion and large motion and outperforms representative state-of-the-art approaches.Comment: CVPR 2018, http://graphics.cs.pdx.edu/project/ctxsy

arXiv.org e-Print Archive

Crossref

PDXScholar (Portland State University)

GAGAN: Geometry-Aware Generative Adversarial Networks

Author: Kossaifi Jean
Panagakis Yannis
Pantic Maja
Tran Linh
Publication venue
Publication date: 27/03/2018
Field of study

Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures. However, aside from their texture, the visual appearance of objects is significantly influenced by their shape geometry; information which is not taken into account by existing generative models. This paper introduces the Geometry-Aware Generative Adversarial Networks (GAGAN) for incorporating geometric information into the image generation process. Specifically, in GAGAN the generator samples latent variables from the probability space of a statistical shape model. By mapping the output of the generator to a canonical coordinate frame through a differentiable geometric transformation, we enforce the geometry of the objects and add an implicit connection from the prior to the generated object. Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality than current GAN-based methods. Our method can be used to augment any existing GAN architecture and improve the quality of the images generated

arXiv.org e-Print Archive

Crossref

Video-based Sign Language Recognition without Temporal Segmentation

Author: Huang Jie
Li Houqiang
Li Weiping
Zhang Qilin
Zhou Wengang
Publication venue
Publication date: 30/01/2018
Field of study

Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.Comment: 32nd AAAI Conference on Artificial Intelligence (AAAI-18), Feb. 2-7, 2018, New Orleans, Louisiana, US

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Unified Framework for Compositional Fitting of Active Appearance Models

Author: Alabort-i-Medina Joan
Zafeiriou Stefanos
Publication venue
Publication date: 01/01/2016
Field of study

Active Appearance Models (AAMs) are one of the most popular and well-established techniques for modeling deformable objects in computer vision. In this paper, we study the problem of fitting AAMs using Compositional Gradient Descent (CGD) algorithms. We present a unified and complete view of these algorithms and classify them with respect to three main characteristics: i) cost function; ii) type of composition; and iii) optimization method. Furthermore, we extend the previous view by: a) proposing a novel Bayesian cost function that can be interpreted as a general probabilistic formulation of the well-known project-out loss; b) introducing two new types of composition, asymmetric and bidirectional, that combine the gradients of both image and appearance model to derive better conver- gent and more robust CGD algorithms; and c) providing new valuable insights into existent CGD algorithms by reinterpreting them as direct applications of the Schur complement and the Wiberg method. Finally, in order to encourage open research and facilitate future comparisons with our work, we make the implementa- tion of the algorithms studied in this paper publicly available as part of the Menpo Project.Comment: 39 page

arXiv.org e-Print Archive

Springer - Publisher Connector

Spiral - Imperial College Digital Repository

Facial Expression Recognition from World Wild Web

Author: Abdollahi Hojjat
Chan David
Hassani Behzad
Mahoor Mohammad H.
Mollahosseini Ali
Salvador Michelle J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/01/2017
Field of study

Recognizing facial expression in a wild setting has remained a challenging task in computer vision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. This paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions from the web. Three search engines were queried using 1250 emotion related keywords in six different languages and the retrieved images were mapped by two annotators to six basic expressions and neutral. Deep neural networks and noise modeling were used in three different training scenarios to find how accurately facial expressions can be recognized when trained on noisy images collected from the web using query terms (e.g. happy face, laughing man, etc)? The results of our experiments show that deep neural networks can recognize wild facial expressions with an accuracy of 82.12%

arXiv.org e-Print Archive

Crossref

The KW-boundary hybrid digital waveguide mesh for room acoustics applications

Author: Beeson Mark
Murphy Damian T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2007
Field of study

The digital waveguide mesh is a discrete-time simulation used to model acoustic wave propagation through a bounded medium. It can be applied to the simulation of the acoustics of rooms through the generation of impulse responses suitable for auralization purposes. However, large-scale three-dimensional mesh structures are required for high quality results. These structures must therefore be efficient and also capable of flexible boundary implementation in terms of both geometrical layout and the possibility for improved mesh termination algorithms. The general one-dimensional N-port boundary termination is investigated, where N depends on the geometry of the modeled domain and the mesh topology used. The equivalence between physical variable Kirchoff-model, and scattering-based wave-model boundary formulations is proved. This leads to the KW-hybrid one-dimensional N-port boundary-node termination, which is shown to be equivalent to the Kirchoff- and wave-model cases. The KW-hybrid boundary-node is implemented as part of a new hybrid two-dimensional triangular digital waveguide mesh. This is shown to offer the possibility for large-scale, computationally efficient mesh structures for more complex shapes. It proves more accurate than a similar rectilinear mesh in terms of geometrical fit, and offers significant savings in processing time and memory use over a standard wave-based model. The new hybrid mesh also has the potential for improved real-world room boundary simulations through the inclusion of additional mixed modeling algorithms

White Rose Research Online

Disentangling geometry and appearance with regularised geometry-aware generative adversarial networks

Author: Kossaifi J
Panagakis Y
Pantic M
Tran L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/01/2019
Field of study

Deep generative models have significantly advanced image generation, enabling generation of visually pleasing images with realistic texture. Apart from the texture, it is the shape geometry of objects that strongly dictates their appearance. However, currently available generative models do not incorporate geometric information into the image generation process. This often yields visual objects of degenerated quality. In this work, we propose a regularized Geometry-Aware Generative Adversarial Network (GAGAN) which disentangles appearance and shape in the latent space. This regularized GAGAN enables the generation of images with both realistic texture and shape. Specifically, we condition the generator on a statistical shape prior. The prior is enforced through mapping the generated images onto a canonical coordinate frame using a differentiable geometric transformation. In addition to incorporating geometric information, this constrains the search space and increases the model’s robustness. We show that our approach is versatile, able to generalise across domains (faces, sketches, hands and cats) and sample sizes (from as little as ∼200-30,000 to more than 200, 000). We demonstrate superior performance through extensive quantitative and qualitative experiments in a variety of tasks and settings. Finally, we leverage our model to automatically and accurately detect errors or drifting in facial landmarks detection and tracking in-the-wild

Spiral - Imperial College Digital Repository