29,787 research outputs found
Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels
In this paper we propose two distinct enhancements to the basic
''bag-of-keypoints" image categorisation scheme proposed in [4]. In this
approach images are represented as a variable sized set of local image
features (keypoints). Thus, we require machine learning tools which
can operate on sets of vectors. In [4] this is achieved by representing
the set as a histogram over bins found by k-means. We show how this
approach can be improved and generalised using Gaussian Mixture Models
(GMMs). Alternatively, the set of keypoints can be represented directly
as a probability density function, over which a kernel can be de ned. This
approach is shown to give state of the art categorisation performance
Automatic Image Registration in Infrared-Visible Videos using Polygon Vertices
In this paper, an automatic method is proposed to perform image registration
in visible and infrared pair of video sequences for multiple targets. In
multimodal image analysis like image fusion systems, color and IR sensors are
placed close to each other and capture a same scene simultaneously, but the
videos are not properly aligned by default because of different fields of view,
image capturing information, working principle and other camera specifications.
Because the scenes are usually not planar, alignment needs to be performed
continuously by extracting relevant common information. In this paper, we
approximate the shape of the targets by polygons and use affine transformation
for aligning the two video sequences. After background subtraction, keypoints
on the contour of the foreground blobs are detected using DCE (Discrete Curve
Evolution)technique. These keypoints are then described by the local shape at
each point of the obtained polygon. The keypoints are matched based on the
convexity of polygon's vertices and Euclidean distance between them. Only good
matches for each local shape polygon in a frame, are kept. To achieve a global
affine transformation that maximises the overlapping of infrared and visible
foreground pixels, the matched keypoints of each local shape polygon are stored
temporally in a buffer for a few number of frames. The matrix is evaluated at
each frame using the temporal buffer and the best matrix is selected, based on
an overlapping ratio criterion. Our experimental results demonstrate that this
method can provide highly accurate registered images and that we outperform a
previous related method
6-DoF Object Pose from Semantic Keypoints
This paper presents a novel approach to estimating the continuous six degree
of freedom (6-DoF) pose (3D translation and rotation) of an object from a
single RGB image. The approach combines semantic keypoints predicted by a
convolutional network (convnet) with a deformable shape model. Unlike prior
work, we are agnostic to whether the object is textured or textureless, as the
convnet learns the optimal representation from the available training image
data. Furthermore, the approach can be applied to instance- and class-based
pose recovery. Empirically, we show that the proposed approach can accurately
recover the 6-DoF object pose for both instance- and class-based scenarios with
a cluttered background. For class-based object pose estimation,
state-of-the-art accuracy is shown on the large-scale PASCAL3D+ dataset.Comment: IEEE International Conference on Robotics and Automation (ICRA), 201
- …
