Search CORE

4,974 research outputs found

Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition

Author: Caputo Barbara
Carlucci FABIO MARIA
Colosi Mirco
D'Innocente Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7{\%}, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

Author: Brattoli Biagio
Chalupka Krzysztof
Perona Pietro
Tighe Joseph
Zhdanov Fedor
Publication venue
Publication date: 03/03/2020
Field of study

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL

arXiv.org e-Print Archive

Crossref

Caltech Authors