5,165 research outputs found
Loss Functions for Top-k Error: Analysis and Insights
In order to push the performance on realistic computer vision tasks, the
number of classes in modern benchmark datasets has significantly increased in
recent years. This increase in the number of classes comes along with increased
ambiguity between the class labels, raising the question if top-1 error is the
right performance measure. In this paper, we provide an extensive comparison
and evaluation of established multiclass methods comparing their top-k
performance both from a practical as well as from a theoretical perspective.
Moreover, we introduce novel top-k loss functions as modifications of the
softmax and the multiclass SVM losses and provide efficient optimization
schemes for them. In the experiments, we compare on various datasets all of the
proposed and established methods for top-k error optimization. An interesting
insight of this paper is that the softmax loss yields competitive top-k
performance for all k simultaneously. For a specific top-k error, our new top-k
losses lead typically to further improvements while being faster to train than
the softmax.Comment: In Computer Vision and Pattern Recognition (CVPR), 201
The growing U.S. trade imbalance with China
Over the past decade, the United States has gone from enjoying a small trade surplus with China to grappling with an enormous deficit. Just to keep the gap from expanding in 1997, U.S. exports to China would need to grow at an extraordinary rate--four times as fast as Chinese exports to the United States. Despite recent gains and China's efforts at trade liberalization, growth on that order appears unlikely, and the deficit can be expected to widen in the near term.Balance of trade ; China
Not Using the Car to See the Sidewalk: Quantifying and Controlling the Effects of Context in Classification and Segmentation
Importance of visual context in scene understanding tasks is well recognized
in the computer vision community. However, to what extent the computer vision
models for image classification and semantic segmentation are dependent on the
context to make their predictions is unclear. A model overly relying on context
will fail when encountering objects in context distributions different from
training data and hence it is important to identify these dependencies before
we can deploy the models in the real-world. We propose a method to quantify the
sensitivity of black-box vision models to visual context by editing images to
remove selected objects and measuring the response of the target models. We
apply this methodology on two tasks, image classification and semantic
segmentation, and discover undesirable dependency between objects and context,
for example that "sidewalk" segmentation relies heavily on "cars" being present
in the image. We propose an object removal based data augmentation solution to
mitigate this dependency and increase the robustness of classification and
segmentation models to contextual variations. Our experiments show that the
proposed data augmentation helps these models improve the performance in
out-of-context scenarios, while preserving the performance on regular data.Comment: 14 pages (12 figures
Taking a Deeper Look at Pedestrians
In this paper we study the use of convolutional neural networks (convnets)
for the task of pedestrian detection. Despite their recent diverse successes,
convnets historically underperform compared to other pedestrian detectors. We
deliberately omit explicitly modelling the problem into the network (e.g. parts
or occlusion modelling) and show that we can reach competitive performance
without bells and whistles. In a wide range of experiments we analyse small and
big convnets, their architectural choices, parameters, and the influence of
different training data, including pre-training on surrogate tasks.
We present the best convnet detectors on the Caltech and KITTI dataset. On
Caltech our convnets reach top performance both for the Caltech1x and
Caltech10x training setup. Using additional data at training time our strongest
convnet model is competitive even to detectors that use additional data
(optical flow) at test time
Multi-View Priors for Learning Detectors from Sparse Viewpoint Data
While the majority of today's object class models provide only 2D bounding
boxes, far richer output hypotheses are desirable including viewpoint,
fine-grained category, and 3D geometry estimate. However, models trained to
provide richer output require larger amounts of training data, preferably well
covering the relevant aspects such as viewpoint and fine-grained categories. In
this paper, we address this issue from the perspective of transfer learning,
and design an object class model that explicitly leverages correlations between
visual features. Specifically, our model represents prior distributions over
permissible multi-view detectors in a parametric way -- the priors are learned
once from training data of a source object class, and can later be used to
facilitate the learning of a detector for a target class. As we show in our
experiments, this transfer is not only beneficial for detectors based on
basic-level category representations, but also enables the robust learning of
detectors that represent classes at finer levels of granularity, where training
data is typically even scarcer and more unbalanced. As a result, we report
largely improved performance in simultaneous 2D object localization and
viewpoint estimation on a recent dataset of challenging street scenes.Comment: 13 pages, 7 figures, 4 tables, International Conference on Learning
Representations 201
Long-Term Image Boundary Prediction
Boundary estimation in images and videos has been a very active topic of
research, and organizing visual information into boundaries and segments is
believed to be a corner stone of visual perception. While prior work has
focused on estimating boundaries for observed frames, our work aims at
predicting boundaries of future unobserved frames. This requires our model to
learn about the fate of boundaries and corresponding motion patterns --
including a notion of "intuitive physics". We experiment on natural video
sequences along with synthetic sequences with deterministic physics-based and
agent-based motions. While not being our primary goal, we also show that fusion
of RGB and boundary prediction leads to improved RGB predictions.Comment: Accepted in the AAAI Conference for Artificial Intelligence, 201
- …
