Search CORE

249,809 research outputs found

Towards Good Practice in Large-Scale Learning for Image Classification

Author: Akata Zeynep
Harchaoui Zaid
Perronnin Florent
Schmid Cordelia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/06/2012
Field of study

International audienceWe propose a benchmark of several objective functions for large-scale image classification: we compare the one- vs-rest, multiclass, ranking and weighted average ranking SVMs. Using stochastic gradient descent optimization, we can scale the learning to millions of images and thousands of classes. Our experimental evaluation shows that ranking based algorithms do not outperform a one-vs-rest strategy and that the gap between the different algorithms reduces in case of high-dimensional data. We also show that for one-vs-rest, learning through cross-validation the optimal degree of imbalance between the positive and the negative samples can have a significant impact. Furthermore, early stopping can be used as an effective regularization strategy when training with stochastic gradient algorithms. Follow- ing these "good practices", we were able to improve the state-of-the-art on a large subset of 10K classes and 9M of images of ImageNet from 16.7% accuracy to 19.1%

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Unsupervised Visual Feature Learning with Spike-timing-dependent Plasticity: How Far are we from Traditional Feature Learning Approaches?

Author: Bilasco Ioan Marius
Boulet Pierre
Devienne Philippe
Falez Pierre
Tirilly Pierre
Publication venue
Publication date: 05/04/2019
Field of study

Spiking neural networks (SNNs) equipped with latency coding and spike-timing dependent plasticity rules offer an alternative to solve the data and energy bottlenecks of standard computer vision approaches: they can learn visual features without supervision and can be implemented by ultra-low power hardware architectures. However, their performance in image classification has never been evaluated on recent image datasets. In this paper, we compare SNNs to auto-encoders on three visual recognition datasets, and extend the use of SNNs to color images. The analysis of the results helps us identify some bottlenecks of SNNs: the limits of on-center/off-center coding, especially for color images, and the ineffectiveness of current inhibition mechanisms. These issues should be addressed to build effective SNNs for image recognition

arXiv.org e-Print Archive

Hal-Diderot

Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification

Author: Fowlkes Charless
Kong Shu
Punyasena Surangi
Publication venue
Publication date: 03/05/2016
Field of study

We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology. We introduce a novel criteria for selecting meaningful and discriminative exemplar patches. We optimize this function during training using a greedy submodular function optimization framework that gives a near-optimal solution with bounded approximation error. We use these selected exemplars as a dictionary basis and propose a spatially-aware sparse coding method to match testing images for identification while maintaining global shape correspondence. To accelerate the coding process for fast matching, we introduce a relaxed form that uses spatially-aware soft-thresholding during coding. Finally, we carry out an experimental study that demonstrates the effectiveness and efficiency of our exemplar selection and classification mechanisms, achieving

86.13\%

accuracy on a difficult fine-grained species classification task distinguishing three types of fossil spruce pollen.Comment: CVMI 201

arXiv.org e-Print Archive

Crossref

Convolutional Neural Fabrics

Author: Saxena Shreyas
Verbeek Jakob
Publication venue
Publication date: 04/12/2016
Field of study

Despite the success of CNNs, selecting the optimal architecture for a given task remains an open problem. Instead of aiming to select a single optimal architecture, we propose a "fabric" that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyper-parameters of a fabric are the number of channels and layers. While individual architectures can be recovered as paths, the fabric can in addition ensemble all embedded architectures together, sharing their weights where their paths overlap. Parameters can be learned using standard methods based on back-propagation, at a cost that scales linearly in the fabric size. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset.Comment: Corrected typos (In proceedings of NIPS16

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

Author: Brattoli Biagio
Chalupka Krzysztof
Perona Pietro
Tighe Joseph
Zhdanov Fedor
Publication venue
Publication date: 03/03/2020
Field of study

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL

arXiv.org e-Print Archive

Crossref

Caltech Authors