1,220 research outputs found
DeepMatching: Hierarchical Deformable Dense Matching
We introduce a novel matching algorithm, called DeepMatching, to compute
dense correspondences between images. DeepMatching relies on a hierarchical,
multi-layer, correlational architecture designed for matching images and was
inspired by deep convolutional approaches. The proposed matching algorithm can
handle non-rigid deformations and repetitive textures and efficiently
determines dense correspondences in the presence of significant changes between
images. We evaluate the performance of DeepMatching, in comparison with
state-of-the-art matching algorithms, on the Mikolajczyk (Mikolajczyk et al
2005), the MPI-Sintel (Butler et al 2012) and the Kitti (Geiger et al 2013)
datasets. DeepMatching outperforms the state-of-the-art algorithms and shows
excellent results in particular for repetitive textures.We also propose a
method for estimating optical flow, called DeepFlow, by integrating
DeepMatching in the large displacement optical flow (LDOF) approach of Brox and
Malik (2011). Compared to existing matching algorithms, additional robustness
to large displacements and complex motion is obtained thanks to our matching
approach. DeepFlow obtains competitive performance on public benchmarks for
optical flow estimation
LIFT: Learned Invariant Feature Transform
We introduce a novel Deep Network architecture that implements the full
feature point handling pipeline, that is, detection, orientation estimation,
and feature description. While previous works have successfully tackled each
one of these problems individually, we show how to learn to do all three in a
unified manner while preserving end-to-end differentiability. We then
demonstrate that our Deep pipeline outperforms state-of-the-art methods on a
number of benchmark datasets, without the need of retraining.Comment: Accepted to ECCV 2016 (spotlight
Geometric and photometric affine invariant image registration
This thesis aims to present a solution to the correspondence problem for the registration
of wide-baseline images taken from uncalibrated cameras. We propose an affine
invariant descriptor that combines the geometry and photometry of the scene to find
correspondences between both views. The geometric affine invariant component of the
descriptor is based on the affine arc-length metric, whereas the photometry is analysed
by invariant colour moments. A graph structure represents the spatial distribution of the
primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs
represent connectivities by extracted contours. After matching, we refine the search for
correspondences by using a maximum likelihood robust algorithm. We have evaluated
the system over synthetic and real data. The method is endemic to propagation of errors
introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System
Features for matching people in different views
There have been significant advances in the computer vision field during the last decade.
During this period, many methods have been developed that have been successful in solving
challenging problems including Face Detection, Object Recognition and 3D Scene Reconstruction.
The solutions developed by computer vision researchers have been widely
adopted and used in many real-life applications such as those faced in the medical and
security industry. Among the different branches of computer vision, Object Recognition
has been an area that has advanced rapidly in recent years. The successful introduction of
approaches such as feature extraction and description has been an important factor in the
growth of this area. In recent years, researchers have attempted to use these approaches
and apply them to other problems such as Content Based Image Retrieval and Tracking.
In this work, we present a novel system that finds correspondences between people seen in
different images. Unlike other approaches that rely on a video stream to track the movement
of people between images, here we present a feature-based approach where we locate a
target’s new location in an image, based only on its visual appearance.
Our proposed system comprises three steps. In the first step, a set of features is extracted
from the target’s appearance. A novel algorithm is developed that allows extraction of features
from a target that is particularly suitable to the modelling task. In the second step,
each feature is characterised using a combined colour and texture descriptor. Inclusion
of information relating to both colour and texture of a feature add to the descriptor’s distinctiveness.
Finally, the target’s appearance and pose is modelled as a collection of such
features and descriptors. This collection is then used as a template that allows us to search
for a similar combination of features in other images that correspond to the target’s new
location.
We have demonstrated the effectiveness of our system in locating a target’s new position in
an image, despite differences in viewpoint, scale or elapsed time between the images. The
characterisation of a target as a collection of features also allows our system to robustly
deal with the partial occlusion of the target
Editing faces in videos
Editing faces in movies is of interest in the special effects industry. We aim at
producing effects such as the addition of accessories interacting correctly with
the face or replacing the face of a stuntman with the face of the main actor.
The system introduced in this thesis is based on a 3D generative face model.
Using a 3D model makes it possible to edit the face in the semantic space of pose,
expression, and identity instead of pixel space, and due to its 3D nature allows
a modelling of the light interaction. In our system we first reconstruct the 3D
face, which is deforming because of expressions and speech, the lighting, and
the camera in all frames of a monocular input video. The face is then edited by
substituting expressions or identities with those of another video sequence or by
adding virtual objects into the scene. The manipulated 3D scene is rendered back
into the original video, correctly simulating the interaction of the light with the
deformed face and virtual objects.
We describe all steps necessary to build and apply the system. This includes
registration of training faces to learn a generative face model, semi-automatic
annotation of the input video, fitting of the face model to the input video, editing
of the fit, and rendering of the resulting scene.
While describing the application we introduce a host of new methods, each
of which is of interest on its own. We start with a new method to register 3D
face scans to use as training data for the face model. For video preprocessing a
new interest point tracking and 2D Active Appearance Model fitting technique
is proposed. For robust fitting we introduce background modelling, model-based
stereo techniques, and a more accurate light model
- …