Search CORE

12,763 research outputs found

Self-Tuned Deep Super Resolution

Author: Chang Shiyu
Han Wei
Huang Thomas S.
Wang Zhangyang
Wang Zhaowen
Yang Jianchao
Yang Yingzhen
Publication venue
Publication date: 21/04/2015
Field of study

Deep learning has been successfully applied to image super resolution (SR). In this paper, we propose a deep joint super resolution (DJSR) model to exploit both external and self similarities for SR. A Stacked Denoising Convolutional Auto Encoder (SDCAE) is first pre-trained on external examples with proper data augmentations. It is then fine-tuned with multi-scale self examples from each input, where the reliability of self examples is explicitly taken into account. We also enhance the model performance by sub-model training and selection. The DJSR model is extensively evaluated and compared with state-of-the-arts, and show noticeable performance improvements both quantitatively and perceptually on a wide range of images

arXiv.org e-Print Archive

Ventral-stream-like shape representation : from pixel intensity values to trainable object-selective COSFIRE models

Author: Azzopardi George
Petkov Nicolai
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Keywords: hierarchical representation, object recognition, shape, ventral stream, vision and scene understanding, robotics, handwriting analysisThe remarkable abilities of the primate visual system have inspired the construction of computational models of some visual neurons. We propose a trainable hierarchical object recognition model, which we call S-COSFIRE (S stands for Shape and COSFIRE stands for Combination Of Shifted FIlter REsponses) and use it to localize and recognize objects of interests embedded in complex scenes. It is inspired by the visual processing in the ventral stream (V1/V2 → V4 → TEO). Recognition and localization of objects embedded in complex scenes is important for many computer vision applications. Most existing methods require prior segmentation of the objects from the background which on its turn requires recognition. An S-COSFIRE ﬁlter is automatically conﬁgured to be selective for an arrangement of contour-based features that belong to a prototype shape speciﬁed by an example. The conﬁguration comprises selecting relevant vertex detectors and determining certain blur and shift parameters. The response is computed as the weighted geometric mean of the blurred and shifted responses of the selected vertex detectors. S-COSFIRE ﬁlters share similar properties with some neurons in inferotemporal cortex, which provided inspiration for this work. We demonstrate the effectiveness of S-COSFIRE filters in two applications: letter and keyword spotting in handwritten manuscripts and object spotting in complex scenes for the computer vision system of a domestic robot. S-COSFIRE ﬁlters are effective to recognize and localize (deformable) objects in images of complex scenes without requiring prior segmentation. They are versatile trainable shape detectors, conceptually simple and easy to implement. The presented hierarchical shape representation contributes to a better understanding of the brain and to more robust computer vision algorithms.peer-reviewe

Proceedings - University of Groningen

Directory of Open Access Journals

Frontiers - Publisher Connector

Dissertations of the University of Groningen

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Author: Bengio Samy
Chorowski Jan
Saurous Rif A.
Weiss Ron J.
Publication venue
Publication date: 08/03/2018
Field of study

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances. Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples. To this end we use a differentiable mel-filterbank feature extraction pipeline and train a convolutional CTC speech recognition network. Our system is able to extract speaker characteristics from very limited amounts of target speaker data, as little as a few seconds, and can be used to generate realistic speech babble or reconstruct an utterance in a different voice.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

End-to-End Localization and Ranking for Relative Attributes

Author: A Shrivastava
CL Zitnick
J. R. R. Uijlings
M Rastegari
MH Kiapour
N Kumar
S Branson
S Li
Publication venue
Publication date: 08/08/2016
Field of study

We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons. Unlike previous methods, our network jointly learns the attribute's features, localization, and ranker. The localization module of our network discovers the most informative image region for the attribute, which is then used by the ranking module to learn a ranking model of the attribute. Our end-to-end framework also significantly speeds up processing and is much faster than previous methods. We show state-of-the-art ranking results on various relative attribute datasets, and our qualitative localization results clearly demonstrate our network's ability to learn meaningful image patches.Comment: Appears in European Conference on Computer Vision (ECCV), 201

arXiv.org e-Print Archive