606 research outputs found
Total variation regularization for manifold-valued data
We consider total variation minimization for manifold valued data. We propose
a cyclic proximal point algorithm and a parallel proximal point algorithm to
minimize TV functionals with -type data terms in the manifold case.
These algorithms are based on iterative geodesic averaging which makes them
easily applicable to a large class of data manifolds. As an application, we
consider denoising images which take their values in a manifold. We apply our
algorithms to diffusion tensor images, interferometric SAR images as well as
sphere and cylinder valued images. For the class of Cartan-Hadamard manifolds
(which includes the data space in diffusion tensor imaging) we show the
convergence of the proposed TV minimizing algorithms to a global minimizer
Large Margin Image Set Representation and Classification
In this paper, we propose a novel image set representation and classification
method by maximizing the margin of image sets. The margin of an image set is
defined as the difference of the distance to its nearest image set from
different classes and the distance to its nearest image set of the same class.
By modeling the image sets by using both their image samples and their affine
hull models, and maximizing the margins of the images sets, the image set
representation parameter learning problem is formulated as an minimization
problem, which is further optimized by an expectation -maximization (EM)
strategy with accelerated proximal gradient (APG) optimization in an iterative
algorithm. To classify a given test image set, we assign it to the class which
could provide the largest margin. Experiments on two applications of
video-sequence-based face recognition demonstrate that the proposed method
significantly outperforms state-of-the-art image set classification methods in
terms of both effectiveness and efficiency
Recognizing point clouds using conditional random fields
Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.Peer ReviewedPostprint (author’s final draft
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
From small to large baseline multiview stereo : dealing with blur, clutter and occlusions
This thesis addresses the problem of reconstructing the three-dimensional
(3D) digital model of a scene from a collection of two-dimensional (2D)
images taken from it. To address this fundamental computer vision
problem, we propose three algorithms. They are the main contributions
of this thesis.
First, we solve multiview stereo with the o -axis aperture camera.
This system has a very small baseline as images are captured from
viewpoints close to each other. The key idea is to change the size or
the 3D location of the aperture of the camera so as to extract selected
portions of the scene. Our imaging model takes both defocus and
stereo information into account and allows to solve shape reconstruction
and image restoration in one go. The o -axis aperture camera can
be used in a small-scale space where the camera motion is constrained
by the surrounding environment, such as in 3D endoscopy.
Second, to solve multiview stereo with large baseline, we present a
framework that poses the problem of recovering a 3D surface in the
scene as a regularized minimal partition problem of a visibility function.
The formulation is convex and hence guarantees that the solution
converges to the global minimum. Our formulation is robust
to view-varying extensive occlusions, clutter and image noise. At
any stage during the estimation process the method does not rely on
the visual hull, 2D silhouettes, approximate depth maps, or knowing
which views are dependent(i.e., overlapping) and which are independent(
i.e., non overlapping). Furthermore, the degenerate solution, the
null surface, is not included as a global solution in this formulation.
One limitation of this algorithm is that its computation complexity
grows with the number of views that we combine simultaneously. To
address this limitation, we propose a third formulation. In this formulation,
the visibility functions are integrated within a narrow band
around the estimated surface by setting weights to each point along
optical rays.
This thesis presents technical descriptions for each algorithm and detailed
analyses to show how these algorithms improve existing reconstruction
techniques
- …