46 research outputs found
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
Learning Equivariant Representations
State-of-the-art deep learning systems often require large amounts of data
and computation. For this reason, leveraging known or unknown structure of the
data is paramount. Convolutional neural networks (CNNs) are successful examples
of this principle, their defining characteristic being the shift-equivariance.
By sliding a filter over the input, when the input shifts, the response shifts
by the same amount, exploiting the structure of natural images where semantic
content is independent of absolute pixel positions. This property is essential
to the success of CNNs in audio, image and video recognition tasks. In this
thesis, we extend equivariance to other kinds of transformations, such as
rotation and scaling. We propose equivariant models for different
transformations defined by groups of symmetries. The main contributions are (i)
polar transformer networks, achieving equivariance to the group of similarities
on the plane, (ii) equivariant multi-view networks, achieving equivariance to
the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving
equivariance to the continuous 3D rotation group, (iv) cross-domain image
embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v)
spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving
equivariance to 3D rotations for spherical vector fields. Applications include
image classification, 3D shape classification and retrieval, panoramic image
classification and segmentation, shape alignment and pose estimation. What
these models have in common is that they leverage symmetries in the data to
reduce sample and model complexity and improve generalization performance. The
advantages are more significant on (but not limited to) challenging tasks where
data is limited or input perturbations such as arbitrary rotations are present
Early burst detection for memory-efficient image retrieval
International audienceRecent works show that image comparison based on local descriptors is corrupted by visual bursts, which tend to dominate the image similarity. The existing strategies, like power-law normalization, improve the results by discounting the contribution of visual bursts to the image similarity. In this paper, we propose to explicitly detect the visual bursts in an image at an early stage. We compare several detection strategies jointly taking into account feature similarity and geometrical quantities. The bursty groups are merged into meta-features, which are used as input to state-of-the-art image search systems such as VLAD or the selective match kernel. Then, we show the interest of using this strategy in an asymmetrical manner, with only the database features being aggregated but not those of the query. Extensive experiments performed on public benchmarks for visual retrieval show the benefits of our method, which achieves performance on par with the state of the art but with a significantly reduced complexity, thanks to the lower number of features fed to the indexing system
On the Expressive Power of Geometric Graph Neural Networks
The expressive power of Graph Neural Networks (GNNs) has been studied
extensively through the Weisfeiler-Leman (WL) graph isomorphism test. However,
standard GNNs and the WL framework are inapplicable for geometric graphs
embedded in Euclidean space, such as biomolecules, materials, and other
physical systems. In this work, we propose a geometric version of the WL test
(GWL) for discriminating geometric graphs while respecting the underlying
physical symmetries: permutations, rotation, reflection, and translation. We
use GWL to characterise the expressive power of geometric GNNs that are
invariant or equivariant to physical symmetries in terms of distinguishing
geometric graphs. GWL unpacks how key design choices influence geometric GNN
expressivity: (1) Invariant layers have limited expressivity as they cannot
distinguish one-hop identical geometric graphs; (2) Equivariant layers
distinguish a larger class of graphs by propagating geometric information
beyond local neighbourhoods; (3) Higher order tensors and scalarisation enable
maximally powerful geometric GNNs; and (4) GWL's discrimination-based
perspective is equivalent to universal approximation. Synthetic experiments
supplementing our results are available at
https://github.com/chaitjo/geometric-gnn-dojoComment: NeurIPS 2022 Workshop on Symmetry and Geometry in Neural
Representation