6,410 research outputs found
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
- …