Search CORE

2,271 research outputs found

Colorization as a Proxy Task for Visual Understanding

Author: Larsson Gustav
Maire Michael
Shakhnarovich Gregory
Publication venue
Publication date: 13/08/2017
Field of study

We investigate and improve self-supervision as a drop-in replacement for ImageNet pretraining, focusing on automatic colorization as the proxy task. Self-supervised training has been shown to be more promising for utilizing unlabeled data than other, traditional unsupervised learning methods. We build on this success and evaluate the ability of our self-supervised network in several contexts. On VOC segmentation and classification tasks, we present results that are state-of-the-art among methods not using ImageNet labels for pretraining representations. Moreover, we present the first in-depth analysis of self-supervision via colorization, concluding that formulation of the loss, training details and network architecture play important roles in its effectiveness. This investigation is further expanded by revisiting the ImageNet pretraining paradigm, asking questions such as: How much training data is needed? How many labels are needed? How much do features change when fine-tuned? We relate these questions back to self-supervision by showing that colorization provides a similarly powerful supervisory signal as various flavors of ImageNet pretraining.Comment: CVPR 2017 (Project page: http://people.cs.uchicago.edu/~larsson/color-proxy/

arXiv.org e-Print Archive

Crossref

Multi-scale Orderless Pooling of Deep Convolutional Activation Features

Author: D.G. Lowe
F. Perronnin
H. Jegou
H. Jégou
J. Sanchez
S. Singh
Publication venue
Publication date: 01/01/2014
Field of study

Deep convolutional neural networks (CNN) have shown their promise as a universal representation for recognition. However, global CNN activations lack geometric invariance, which limits their robustness for classification and matching of highly variable scenes. To improve the invariance of CNN activations without degrading their discriminative power, this paper presents a simple but effective scheme called multi-scale orderless pooling (MOP-CNN). This scheme extracts CNN activations for local patches at multiple scale levels, performs orderless VLAD pooling of these activations at each level separately, and concatenates the result. The resulting MOP-CNN representation can be used as a generic feature for either supervised or unsupervised recognition tasks, from image classification to instance-level retrieval; it consistently outperforms global CNN activations without requiring any joint training of prediction layers for a particular target dataset. In absolute terms, it achieves state-of-the-art results on the challenging SUN397 and MIT Indoor Scenes classification datasets, and competitive results on ILSVRC2012/2013 classification and INRIA Holidays retrieval datasets

arXiv.org e-Print Archive

CiteSeerX

Crossref

Feature Learning from Spectrograms for Assessment of Personality Traits

Author: Attabi Yazid
Carbonneau Marc-André
Gagnon Ghyslain
Granger Eric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/10/2016
Field of study

Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on feature learning and spectrogram analysis is proposed to simplify the feature extraction process while maintaining a high level of accuracy. The proposed method learns a dictionary of discriminant features from patches extracted in the spectrogram representations of training speech segments. Each speech segment is then encoded using the dictionary, and the resulting feature set is used to perform classification of personality traits. Experiments indicate that the proposed method achieves state-of-the-art results with a significant reduction in complexity when compared to the most recent reference methods. The number of features, and difficulties linked to the feature extraction process are greatly reduced as only one type of descriptors is used, for which the 6 parameters can be tuned automatically. In contrast, the simplest reference method uses 4 types of descriptors to which 6 functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

On the use of SIFT features for face authentication

Author: Bicego Manuele
Grosso Enrico
Lagorio Andrea
Tistarelli Massimo
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

Several pattern recognition and classification techniques have been applied to the biometrics domain. Among them, an interesting technique is the Scale Invariant Feature Transform (SIFT), originally devised for object recognition. Even if SIFT features have emerged as a very powerful image descriptors, their employment in face analysis context has never been systematically investigated. This paper investigates the application of the SIFT approach in the context of face authentication. In order to determine the real potential and applicability of the method, different matching schemes are proposed and tested using the BANCA database and protocol, showing promising results

CiteSeerX

UnissResearch

Gabor-Based RCM Features for Ear Recognition

Author: Faez Karim
Yazdanpanah Ali Pour
Publication venue: 'IntechOpen'
Publication date: 27/07/2011
Field of study

IntechOpen

Review of Person Re-identification Techniques

Author: Aini Hussain
Allouch A.
Bhattacharyya A.
Bilmes J.A.
Cong D‐N.T.
Cong T.
Corvee E.
De Oliveira I.O.
Du Y.
Forsśen P.E.
Gheissari N.
Goldmann L.
Halimah Badioze Zaman
Hamdoun O.
Horprasert T.
Kawai R.
Khedher M.I.
Lantagne M.
Layne R.
Mohamad Hanif Md. Saad
Mohammad Ali Saghafi
Musa Z.B.
Nguyen H.Q.
Ohara Y.
Skog D.
Stauffer C.
Sun J.
Wang J.
Xiang J.
Yang H.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/12/2014
Field of study

Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals