82 research outputs found
Automatic discovery of image families: Global vs. local features
Gathering a large collection of images has been made quite
easy by social and image sharing websites, e.g. flickr.com.
However, using such collections faces the problem that they
contain a large number of duplicates and highly similar images.
This work tackles the problem of how to automatically
organize image collections into sets of similar images,
called image families hereinafter. We thoroughly compare the
performance of two approaches to measure image similarity:
global descriptors vs. a set of local descriptors. We assess
the performance of these approaches as the problem scales up
to thousands of images and hundreds of families. We present
our results on a new dataset of CD/DVD game covers
The Caltech-UCSD Birds-200-2011 Dataset
CUB-200-2011 is an extended version of CUB-200 [7], a challenging dataset of 200 bird species. The extended version roughly doubles the number of images per category and adds new part localization annotations. All images are annotated with bounding boxes, part locations, and at- tribute labels. Images and annotations were filtered by mul- tiple users of Mechanical Turk. We introduce benchmarks and baseline experiments for multi-class categorization and part localization
Asymmetric Actor Critic for Image-Based Robot Learning
Deep reinforcement learning (RL) has proven a powerful technique in many
sequential decision making domains. However, Robotics poses many challenges for
RL, most notably training on a physical system can be expensive and dangerous,
which has sparked significant interest in learning control policies using a
physics simulator. While several recent works have shown promising results in
transferring policies trained in simulation to the real world, they often do
not fully utilize the advantage of working with a simulator. In this work, we
exploit the full state observability in the simulator to train better policies
which take as input only partial observations (RGBD images). We do this by
employing an actor-critic training algorithm in which the critic is trained on
full states while the actor (or policy) gets rendered images as input. We show
experimentally on a range of simulated tasks that using these asymmetric inputs
significantly improves performance. Finally, we combine this method with domain
randomization and show real robot experiments for several tasks like picking,
pushing, and moving a block. We achieve this simulation to real world transfer
without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT
A Lazy Man’s Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
How many labeled examples are needed to estimate a classifier’s performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupervised Performance Evaluation (SPE), is based on a generative model for the classifier’s confidence scores. In addition to estimating the performance of classifiers on new datasets, SPE can be used to recalibrate a classifier by reestimating the class-conditional confidence distributions
Cascaded Pose Regression
We present a fast and accurate algorithm for computing
the 2D pose of objects in images called cascaded pose
regression (CPR). CPR progressively refines a loosely specified initial guess, where each refinement is carried out by a different regressor. Each regressor performs simple image measurements that are dependent on the output of the previous regressors; the entire system is automatically learned from human annotated training examples. CPR is not restricted to rigid transformations: ‘pose’ is any parameterized variation of the object’s appearance such as the degrees of freedom of deformable and articulated objects. We compare CPR against both standard regression techniques and human performance (computed from redundant human annotations). Experiments on three diverse datasets (mice, faces, fish) suggest CPR is fast (2-3ms per pose estimate), accurate (approaching human performance), and easy to train from small amounts of labeled data
Caltech-UCSD Birds 200
Caltech-UCSD Birds 200 (CUB-200) is a challenging image dataset annotated with 200 bird species. It was created to enable the study of subordinate categorization, which is not possible with other popular datasets that focus on basic level categories (such as PASCAL VOC, Caltech-101, etc). The images were downloaded from the website Flickr and filtered by workers on Amazon Mechanical Turk. Each image is annotated with a bounding box, a rough bird segmentation, and a set of attribute labels
Crowdclustering
Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters/categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations
- …