596 research outputs found
Multitask learning without label correspondences
We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces for the purpose of classification, such as integrating Yahoo! and DMOZ web directories
Learning Material-Aware Local Descriptors for 3D Shapes
Material understanding is critical for design, geometric modeling, and
analysis of functional objects. We enable material-aware 3D shape analysis by
employing a projective convolutional neural network architecture to learn
material- aware descriptors from view-based representations of 3D points for
point-wise material classification or material- aware retrieval. Unfortunately,
only a small fraction of shapes in 3D repositories are labeled with physical
mate- rials, posing a challenge for learning methods. To address this
challenge, we crowdsource a dataset of 3080 3D shapes with part-wise material
labels. We focus on furniture models which exhibit interesting structure and
material variabil- ity. In addition, we also contribute a high-quality expert-
labeled benchmark of 115 shapes from Herman-Miller and IKEA for evaluation. We
further apply a mesh-aware con- ditional random field, which incorporates
rotational and reflective symmetries, to smooth our local material predic-
tions across neighboring surface patches. We demonstrate the effectiveness of
our learned descriptors for automatic texturing, material-aware retrieval, and
physical simulation. The dataset and code will be publicly available.Comment: 3DV 201
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
An important goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on
correlation with human attentio
HPatches: A benchmark and evaluation of handcrafted and learned local descriptors
In this paper, we propose a novel benchmark for evaluating local image
descriptors. We demonstrate that the existing datasets and evaluation protocols
do not specify unambiguously all aspects of evaluation, leading to ambiguities
and inconsistencies in results reported in the literature. Furthermore, these
datasets are nearly saturated due to the recent improvements in local
descriptors obtained by learning them from large annotated datasets. Therefore,
we introduce a new large dataset suitable for training and testing modern
descriptors, together with strictly defined evaluation protocols in several
tasks such as matching, retrieval and classification. This allows for more
realistic, and thus more reliable comparisons in different application
scenarios. We evaluate the performance of several state-of-the-art descriptors
and analyse their properties. We show that a simple normalisation of
traditional hand-crafted descriptors can boost their performance to the level
of deep learning based descriptors within a realistic benchmarks evaluation
On Multilingual Training of Neural Dependency Parsers
We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.Comment: preprint accepted into the TSD201
RPNet: an End-to-End Network for Relative Camera Pose Estimation
This paper addresses the task of relative camera pose estimation from raw
image pixels, by means of deep neural networks. The proposed RPNet network
takes pairs of images as input and directly infers the relative poses, without
the need of camera intrinsic/extrinsic. While state-of-the-art systems based on
SIFT + RANSAC, are able to recover the translation vector only up to scale,
RPNet is trained to produce the full translation vector, in an end-to-end way.
Experimental results on the Cambridge Landmark dataset show very promising
results regarding the recovery of the full translation vector. They also show
that RPNet produces more accurate and more stable results than traditional
approaches, especially for hard images (repetitive textures, textureless
images, etc). To the best of our knowledge, RPNet is the first attempt to
recover full translation vectors in relative pose estimation
Panoptic Segmentation
We propose and study a task we name panoptic segmentation (PS). Panoptic
segmentation unifies the typically distinct tasks of semantic segmentation
(assign a class label to each pixel) and instance segmentation (detect and
segment each object instance). The proposed task requires generating a coherent
scene segmentation that is rich and complete, an important step toward
real-world vision systems. While early work in computer vision addressed
related image/scene parsing tasks, these are not currently popular, possibly
due to lack of appropriate metrics or associated recognition challenges. To
address this, we propose a novel panoptic quality (PQ) metric that captures
performance for all classes (stuff and things) in an interpretable and unified
manner. Using the proposed metric, we perform a rigorous study of both human
and machine performance for PS on three existing datasets, revealing
interesting insights about the task. The aim of our work is to revive the
interest of the community in a more unified view of image segmentation.Comment: accepted to CVPR 201
- …