79 research outputs found
Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations
Images seen during test time are often not from the same distribution as
images used for learning. This problem, known as domain shift, occurs when
training classifiers from object-centric internet image databases and trying to
apply them directly to scene understanding tasks. The consequence is often
severe performance degradation and is one of the major barriers for the
application of classifiers in real-world systems. In this paper, we show how to
learn transform-based domain adaptation classifiers in a scalable manner. The
key idea is to exploit an implicit rank constraint, originated from a
max-margin domain adaptation formulation, to make optimization tractable.
Experiments show that the transformation between domains can be very
efficiently learned from data and easily applied to new categories. This begins
to bridge the gap between large-scale internet image collections and object
images captured in everyday life environments
Sequence to Sequence -- Video to Text
Real-world videos often have complex dynamics; and methods for generating
open-domain video descriptions should be sensitive to temporal structure and
allow both input (sequence of frames) and output (sequence of words) of
variable length. To approach this problem, we propose a novel end-to-end
sequence-to-sequence model to generate captions for videos. For this we exploit
recurrent neural networks, specifically LSTMs, which have demonstrated
state-of-the-art performance in image caption generation. Our LSTM model is
trained on video-sentence pairs and learns to associate a sequence of video
frames to a sequence of words in order to generate a description of the event
in the video clip. Our model naturally is able to learn the temporal structure
of the sequence of frames as well as the sequence model of the generated
sentences, i.e. a language model. We evaluate several variants of our model
that exploit different visual features on a standard set of YouTube videos and
two movie description datasets (M-VAD and MPII-MD).Comment: ICCV 2015 camera-ready. Includes code, project page and LSMDC
challenge result
LSDA: Large Scale Detection Through Adaptation
A major challenge in scaling object detection is the difficulty of obtaining
labeled images for large numbers of categories. Recently, deep convolutional
neural networks (CNNs) have emerged as clear winners on object classification
benchmarks, in part due to training with 1.2M+ labeled classification images.
Unfortunately, only a small fraction of those labels are available for the
detection task. It is much cheaper and easier to collect large quantities of
image-level labels from search engines than it is to collect detection data and
label it with precise bounding boxes. In this paper, we propose Large Scale
Detection through Adaptation (LSDA), an algorithm which learns the difference
between the two tasks and transfers this knowledge to classifiers for
categories without bounding box annotated data, turning them into detectors.
Our method has the potential to enable detection for the tens of thousands of
categories that lack bounding box annotations, yet have plenty of
classification data. Evaluation on the ImageNet LSVRC-2013 detection challenge
demonstrates the efficacy of our approach. This algorithm enables us to produce
a >7.6K detector by using available classification data from leaf nodes in the
ImageNet tree. We additionally demonstrate how to modify our architecture to
produce a fast detector (running at 2fps for the 7.6K detector). Models and
software are available a
- …