499 research outputs found
Left-invariant evolutions of wavelet transforms on the Similitude Group
Enhancement of multiple-scale elongated structures in noisy image data is
relevant for many biomedical applications but commonly used PDE-based
enhancement techniques often fail at crossings in an image. To get an overview
of how an image is composed of local multiple-scale elongated structures we
construct a multiple scale orientation score, which is a continuous wavelet
transform on the similitude group, SIM(2). Our unitary transform maps the space
of images onto a reproducing kernel space defined on SIM(2), allowing us to
robustly relate Euclidean (and scaling) invariant operators on images to
left-invariant operators on the corresponding continuous wavelet transform.
Rather than often used wavelet (soft-)thresholding techniques, we employ the
group structure in the wavelet domain to arrive at left-invariant evolutions
and flows (diffusion), for contextual crossing preserving enhancement of
multiple scale elongated structures in noisy images. We present experiments
that display benefits of our work compared to recent PDE techniques acting
directly on the images and to our previous work on left-invariant diffusions on
orientation scores defined on Euclidean motion group.Comment: 40 page
A reliable order-statistics-based approximate nearest neighbor search algorithm
We propose a new algorithm for fast approximate nearest neighbor search based
on the properties of ordered vectors. Data vectors are classified based on the
index and sign of their largest components, thereby partitioning the space in a
number of cones centered in the origin. The query is itself classified, and the
search starts from the selected cone and proceeds to neighboring ones. Overall,
the proposed algorithm corresponds to locality sensitive hashing in the space
of directions, with hashing based on the order of components. Thanks to the
statistical features emerging through ordering, it deals very well with the
challenging case of unstructured data, and is a valuable building block for
more complex techniques dealing with structured data. Experiments on both
simulated and real-world data prove the proposed algorithm to provide a
state-of-the-art performance
Recommended from our members
3D Shape Understanding and Generation
In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually all image-based benchmarks. For 3D data, the best combination of model and representation is still an open question. Second, 3D data is not available on the same scale as images – taking pictures is a common procedure in our daily lives, whereas capturing 3D content is an activity usually restricted to specialized professionals. This thesis is focused on addressing both of these issues. Which model and representation should we use for generating and recognizing 3D data? What are efficient ways of learning 3D representations from a few examples? Is it possible to leverage image data to build models capable of reasoning about the world in 3D?
Our research findings show that it is possible to build models that efficiently generate 3D shapes as irregularly structured representations. Those models require significantly less memory while generating higher quality shapes than the ones based on voxels and multi-view representations. We start by developing techniques to generate shapes represented as point clouds. This class of models leads to high quality reconstructions and better unsupervised feature learning. However, since point clouds are not amenable to editing and human manipulation, we also present models capable of generating shapes as sets of shape handles -- simpler primitives that summarize complex 3D shapes and were specifically designed for high-level tasks and user interaction. Despite their effectiveness, those approaches require some form of 3D supervision, which is scarce. We present multiple alternatives to this problem. First, we investigate how approximate convex decomposition techniques can be used as self-supervision to improve recognition models when only a limited number of labels are available. Second, we study how neural network architectures induce shape priors that can be used in multiple reconstruction tasks -- using both volumetric and manifold representations. In this regime, reconstruction is performed from a single example -- either a sparse point cloud or multiple silhouettes. Finally, we demonstrate how to train generative models of 3D shapes without using any 3D supervision by combining differentiable rendering techniques and Generative Adversarial Networks
Hashing with binary autoencoders
An attractive approach for fast search in image databases is binary hashing,
where each high-dimensional, real-valued image is mapped onto a
low-dimensional, binary vector and the search is done in this binary space.
Finding the optimal hash function is difficult because it involves binary
constraints, and most approaches approximate the optimization by relaxing the
constraints and then binarizing the result. Here, we focus on the binary
autoencoder model, which seeks to reconstruct an image from the binary code
produced by the hash function. We show that the optimization can be simplified
with the method of auxiliary coordinates. This reformulates the optimization as
alternating two easier steps: one that learns the encoder and decoder
separately, and one that optimizes the code for each image. Image retrieval
experiments, using precision/recall and a measure of code utilization, show the
resulting hash function outperforms or is competitive with state-of-the-art
methods for binary hashing.Comment: 22 pages, 11 figure
Grassmann Learning for Recognition and Classification
Computational performance associated with high-dimensional data is a common challenge for real-world classification and recognition systems. Subspace learning has received considerable attention as a means of finding an efficient low-dimensional representation that leads to better classification and efficient processing. A Grassmann manifold is a space that promotes smooth surfaces, where points represent subspaces and the relationship between points is defined by a mapping of an orthogonal matrix. Grassmann learning involves embedding high dimensional subspaces and kernelizing the embedding onto a projection space where distance computations can be effectively performed. In this dissertation, Grassmann learning and its benefits towards action classification and face recognition in terms of accuracy and performance are investigated and evaluated. Grassmannian Sparse Representation (GSR) and Grassmannian Spectral Regression (GRASP) are proposed as Grassmann inspired subspace learning algorithms. GSR is a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss §¤1-norm minimization for improved classification. GRASP is a novel subspace learning algorithm that leverages the benefits of Grassmann manifolds and Spectral Regression in a framework that supports high discrimination between classes and achieves computational benefits by using manifold modeling and avoiding eigen-decomposition. The effectiveness of GSR and GRASP is demonstrated for computationally intensive classification problems: (a) multi-view action classification using the IXMAS Multi-View dataset, the i3DPost Multi-View dataset, and the WVU Multi-View dataset, (b) 3D action classification using the MSRAction3D dataset and MSRGesture3D dataset, and (c) face recognition using the ATT Face Database, Labeled Faces in the Wild (LFW), and the Extended Yale Face Database B (YALE). Additional contributions include the definition of Motion History Surfaces (MHS) and Motion Depth Surfaces (MDS) as descriptors suitable for activity representations in video sequences and 3D depth sequences. An in-depth analysis of Grassmann metrics is applied on high dimensional data with different levels of noise and data distributions which reveals that standardized Grassmann kernels are favorable over geodesic metrics on a Grassmann manifold. Finally, an extensive performance analysis is made that supports Grassmann subspace learning as an effective approach for classification and recognition
- …