Search CORE

98 research outputs found

Learning object categories from Google's image search

Author: Fei-Fei L
Fergus R.
Perona P.
Zisserman A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate the models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets

CiteSeerX

Caltech Authors

A Comparison of Affine Region Detectors

Author: A. Zisserman
C. Schmid
F. Schaffalitzky
J. Matas
K. Mikolajczyk
L. Van Gool
T. Kadir
T. Tuytelaars
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Shape description and matching using integral invariants on eccentricity transformed images

Author: A Amanatiadis
A Bengtsson
A Elad
A Ion
A Shashua
A Zisserman
AB Hamza
AM Bronstein
AM Bronstein
AM Bruckstein
BW Hong
C Hann
C Xu
C Xu
C-L Huang
CA Rothwell
CA Rothwell
CT Zahn
D Forsyth
D Forsyth
D Mumford
D Zhang
DG Kendall
DM Squire
DP Bertsekas
E Calabi
E Ozcan
E Sharon
E Trucco
EGM Petrakis
ER Davies
EW Dijkstra
EW Dijkstra
EW Dijkstra
F Arrebola
F Janan
F Mokhtarian
F Mokhtarian
F Mokhtarian
Faraz Janan
FS Cohen
G Hadley
G Peyré
G Peyré
H Ling
H Pottmann
H Sundar
H Zhao
I Weiss
J Maciel
J Sato
J Shi
J Tian
J Tian
J Zeng
JA Sethian
JB Cole
JN Tsitsiklis
K Kanatani
K Mikolajczyk
K Siddiqi
KS Arun
KV Mardia
L Gool Van
L Gorelick
L Nielsen
L Torresani
LD Cohen
LS Davis
M Boué
M Frenkel
M Jeffreys
M Kliot
M Rusinol
M Sniedovich
M Sonka
Michael Brady
MP Sampat
MR Ruggeri
O Duchenne
O Van Kaick
PJ Olver
PL Rosin
QX Huang
R Alferez
R Highnam
R Kimmel
R Kimmel
R Kimmel
R Lenz
R Osada
RC Veltkamp
RD Brandt
S Belongie
S Helgason
S Manay
S Manay
SM Smith
SZ Li
TB Sebastian
TB Sebastian
TH Reiss
TY Thomas
W Cao
Y Gdalyahu
Y Wang
Y Xu
Y-H Gu
Y-HR Tsai
YW Chen
Z Huang
Z-G Qu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2015
Field of study

Matching occluded and noisy shapes is a problem frequently encountered in medical image analysis and more generally in computer vision. To keep track of changes inside the breast, for example, it is important for a computer aided detection system to establish correspondences between regions of interest. Shape transformations, computed both with integral invariants (II) and with geodesic distance, yield signatures that are invariant to isometric deformations, such as bending and articulations. Integral invariants describe the boundaries of planar shapes. However, they provide no information about where a particular feature lies on the boundary with regard to the overall shape structure. Conversely, eccentricity transforms (Ecc) can match shapes by signatures of geodesic distance histograms based on information from inside the shape; but they ignore the boundary information. We describe a method that combines the boundary signature of a shape obtained from II and structural information from the Ecc to yield results that improve on them separately

University of Lincoln Institutional Repository

Crossref

The PASCAL Visual Object Classes (VOC) Challenge

Author: A. B. Torralba
A. B. Torralba
Andrew Zisserman
B. Russell
Christopher K. I. Williams
D. Lowe
D. Scharstein
E. B. Sudderth
G. Salton
J. Demsar
J. Zhang
John Winn
L. Fei-Fei
Luc Van Gool
M. Everingham
Mark Everingham
N. Pinto
P. A. Viola
R. Fergus
V. Ferrari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension. © 2009 Springer Science+Business Media, LLC

Lirias

CiteSeerX

Crossref

Edinburgh Research Explorer

Oxford University Research Archive

2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images

Author: A. Agarwal
A. Agarwal
A. Bobick
A. Fathi
A. Zisserman
B. Sapp
B. Sapp
C. Bishop
C. Rother
D. Comaniciu
D. Forsyth
D. M. Gavrilla
D. Ramanan
D. Ramanan
D. Tran
E. Shechtman
F. Crow
G. Hua
G. Mori
H. Jiang
H. Jiang
I. Laptev
I. Laptev
I. Laptev
J. Niebles
J. Nocedal
J. Sivic
J. Sivic
K. Mikolajczyk
L. Sigal
L. Sigal
M. Andriluka
M. Bergtholdt
M. Blank
M. Eichner
M. Eichner
M. Eichner
M. Marin-Jimenez
M. Ozuysal
M. P. Kumar
M. P. Kumar
M.W. Lee
N. Dalal
N. Ikizler
O. Arandjelovic
P. Buehler
P. Dollar
P. Felzenszwalb
P. Felzenszwalb
P. Guan
P. Li
P. Viola
R. Ronfard
S. Ioffe
S. Johnson
S. Johnson
T. Cham
T. P. Tian
T. P. Tian
V. Ferrari
V. Ferrari
V. Ferrari
V. Ferrari
V. K. Singh
X. Lan
X. Lan
X. Ren
Y. Ke
Y. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Abstract We present a technique for estimating the spatial layout of humans in still images—the position of the head, torso and arms. The theme we explore is that once a person is localized using an upper body detector, the search for their body parts can be considerably simplified using weak constraints on position and appearance arising from that detection. Our approach is capable of estimating upper body pose in highly challenging uncontrolled images, without prior knowledge of background, clothing, lighting, or the location and scale of the person in the image. People are only required to be upright and seen from the front or the back (not side). We evaluate the stages of our approach experimentally using ground truth layout annotation on a variety of challenging material, such as images from the PASCAL VOC 2008 challenge and video frames from TV shows and feature films. We also propose and evaluate techniques for searching a video dataset for people in a specific pose. To this end, we develop three new pose descriptors and compare their clas

CiteSeerX

Repository for Publications and Research Data

Crossref

Edinburgh Research Explorer

Oxford University Research Archive

Smooth loss functions for deep top-k classification

Author: Berrada L
Mudigonda PK
Zisserman A
Publication venue
Publication date: 01/01/2018
Field of study

The top-

k

error is a common measure of performance in machine learning and computer vision. In practice, top-

k

classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed suggest that cross-entropy is an optimal learning objective for such a task in the limit of infinite data. In the context of limited and noisy data however, the use of a loss function that is specifically designed for top-

k

classification can bring significant improvements. Our empirical evidence suggests that the loss function must be smooth and have non-sparse gradients in order to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-

k

optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is computationally challenging: a na{\"i}ve algorithm would require

\mathcal{O}(\binom{n}{k})

operations, where

n

is the number of classes. Thanks to a connection to polynomial algebra and a divide-and-conquer approach, we provide an algorithm with a time complexity of

\mathcal{O}(k n)

. Furthermore, we present a novel approximation to obtain fast and stable algorithms on GPUs with single floating point precision. We compare the performance of the cross-entropy loss and our margin-based losses in various regimes of noise and data size, for the predominant use case of

k=5

. Our investigation reveals that our loss is more robust to noise and overfitting than cross-entropy

Oxford University Research Archive

Relaxed softmax: efficient confidence auto-calibration for safe pedestrian detection

Author: Neumann L
Vedaldi A
Zisserman A
Publication venue: OpenReview
Publication date: 01/01/2018
Field of study

As machine learning moves from the lab into the real world, reliability is often of paramount importance. The clearest example are safety-critical applications such as pedestrian detection in autonomous driving. Since algorithms can never be expected to be perfect in all cases, managing reliability becomes crucial. To this end, in this paper we investigate the problem of learning in an end-to-end manner object detectors that are accurate while providing an unbiased estimate of the reliablity of their own predictions. We do so by proposing a modification of the standard softmax layer where a probabilistic confidence score is explicitly pre-multiplied into the incoming activations to modulate confidence. We adopt a rigorous assessment protocol based on reliability diagrams to evaluate the quality of the resulting calibration and show excellent results in pedestrian detection on two challenging public benchmarks

Oxford University Research Archive

Action recognition from weak alignment of body parts

Author: Hoai M
Ladický L
Zisserman A
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2014
Field of study

We propose a method for human action recognition from still images that uses the silhouette and the upper body as a proxy for the pose of the person, and also to guide alignment between samples for the purpose of computing registered feature descriptors. Our contributions include an efficient algorithm, formulated as an energy minimization, for using the silhouette to align body parts between imaged human samples. The descriptors computed over the aligned body parts are incorporated, via a multiple kernel framework, together with other standard features (such as a deformable part model (DPM) and dense SIFT), to learn a classifier for each action class. Experiments on the challenging PASCAL VOC 2012 dataset shows that our method exceeds the state-of-the-art performance on the majority of action classes

Crossref

Oxford University Research Archive

Trusting SVM for Piecewise Linear CNNs

Author: Berrada L
Kumar MP
Zisserman A
Publication venue
Publication date: 01/01/2017
Field of study

We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks (PL-CNNs), a large class of convolutional neural networks. Specifically, PL-CNNs employ piecewise linear non-linearities such as the commonly used ReLU and max-pool, and an SVM classifier as the final layer. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program, which happens to be a latent structured SVM. We optimize the DC program using the concave-convex procedure, which requires us to iteratively solve a structured SVM problem. This allows to design an optimization algorithm with an optimal learning rate that does not require any tuning. Using the MNIST, CIFAR and ImageNet data sets, we show that our approach always improves over the state of the art variants of backpropagation and scales to large data and large network settings

Oxford University Research Archive