86 research outputs found
An ECC Based Iterative Algorithm For Photometric Invariant Projective Registration
International audienceThe ability of an algorithm to accurately estimate the parameters of the geometric trans- formation which aligns two image profiles even in the presence of photometric distortions can be considered as a basic requirement in many computer vision applications. Projec- tive transformations constitute a general class which includes as special cases the affine, as well as the metric subclasses of transformations. In this paper the applicability of a recently proposed iterative algorithm, which uses the Enhanced Correlation Coefficient as a performance criterion, in the projective image registration problem is investigated. The main theoretical results concerning the proposed iterative algorithm are presented. Furthermore, the performance of the iterative algorithm in the presence of nonlinear photometric distortions is compared against the leading Lucas-Kanade algorithm and its simultaneous inverse compositional variant with the help of a series of experiments involving strong or weak geometric deformations, ideal and noisy conditions and even over-modelling of the warping process. Although under ideal conditions the proposed al- gorithm and simultaneous inverse compositional algorithm exhibit a similar performance and both outperform the Lucas-Kanade algorithm, under noisy conditions the proposed algorithm outperforms the other algorithms in convergence speed and accuracy, and exhibits robustness against photometric distortions
Automatic Detection of Calibration Grids in Time-of-Flight Images
It is convenient to calibrate time-of-flight cameras by established methods,
using images of a chequerboard pattern. The low resolution of the amplitude
image, however, makes it difficult to detect the board reliably. Heuristic
detection methods, based on connected image-components, perform very poorly on
this data. An alternative, geometrically-principled method is introduced here,
based on the Hough transform. The projection of a chequerboard is represented
by two pencils of lines, which are identified as oriented clusters in the
gradient-data of the image. A projective Hough transform is applied to each of
the two clusters, in axis-aligned coordinates. The range of each transform is
properly bounded, because the corresponding gradient vectors are approximately
parallel. Each of the two transforms contains a series of collinear peaks; one
for every line in the given pencil. This pattern is easily detected, by
sweeping a dual line through the transform. The proposed Hough-based method is
compared to the standard OpenCV detection routine, by application to several
hundred time-of-flight images. It is shown that the new method detects
significantly more calibration boards, over a greater variety of poses, without
any overall loss of accuracy. This conclusion is based on an analysis of both
geometric and photometric error.Comment: 11 pages, 11 figures, 1 tabl
Cross-calibration of Time-of-flight and Colour Cameras
Time-of-flight cameras provide depth information, which is complementary to
the photometric appearance of the scene in ordinary images. It is desirable to
merge the depth and colour information, in order to obtain a coherent scene
representation. However, the individual cameras will have different viewpoints,
resolutions and fields of view, which means that they must be mutually
calibrated. This paper presents a geometric framework for this multi-view and
multi-modal calibration problem. It is shown that three-dimensional projective
transformations can be used to align depth and parallax-based representations
of the scene, with or without Euclidean reconstruction. A new evaluation
procedure is also developed; this allows the reprojection error to be
decomposed into calibration and sensor-dependent components. The complete
approach is demonstrated on a network of three time-of-flight and six colour
cameras. The applications of such a system, to a range of automatic
scene-interpretation problems, are discussed.Comment: 18 pages, 12 figures, 3 table
Continuous Action Recognition Based on Sequence Alignment
Continuous action recognition is more challenging than isolated recognition
because classification and segmentation must be simultaneously carried out. We
build on the well known dynamic time warping (DTW) framework and devise a novel
visual alignment technique, namely dynamic frame warping (DFW), which performs
isolated recognition based on per-frame representation of videos, and on
aligning a test sequence with a model sequence. Moreover, we propose two
extensions which enable to perform recognition concomitant with segmentation,
namely one-pass DFW and two-pass DFW. These two methods have their roots in the
domain of continuous recognition of speech and, to the best of our knowledge,
their extension to continuous visual action recognition has been overlooked. We
test and illustrate the proposed techniques with a recently released dataset
(RAVEL) and with two public-domain datasets widely used in action recognition
(Hollywood-1 and Hollywood-2). We also compare the performances of the proposed
isolated and continuous recognition algorithms with several recently published
methods
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions
Head-pose estimation has many applications, such as social event analysis,
human-robot and human-computer interaction, driving assistance, and so forth.
Head-pose estimation is challenging because it must cope with changing
illumination conditions, variabilities in face orientation and in appearance,
partial occlusions of facial landmarks, as well as bounding-box-to-face
alignment errors. We propose tu use a mixture of linear regressions with
partially-latent output. This regression method learns to map high-dimensional
feature vectors (extracted from bounding boxes of faces) onto the joint space
of head-pose angles and bounding-box shifts, such that they are robustly
predicted in the presence of unobservable phenomena. We describe in detail the
mapping method that combines the merits of unsupervised manifold learning
techniques and of mixtures of regressions. We validate our method with three
publicly available datasets and we thoroughly benchmark four variants of the
proposed algorithm with several state-of-the-art head-pose estimation methods.Comment: 12 pages, 5 figures, 3 table
Night-time outdoor surveillance with mobile cameras
International audienceThis paper addresses the problem of video surveillance by mobile cameras. We present a method that allows online change detection in night-time outdoor surveillance. Because of the camera movement, background frames are not available and must be "localized" in former sequences and registered with the current frames. To this end, we propose a Frame Localization And Registration (FLAR) approach that solves the problem efficiently. Frames of former sequences define a database which is queried by current frames in turn. To quickly retrieve nearest neighbors, database is indexed through a visual dictionary method based on the SURF descriptor. Furthermore, the frame localization is benefited by a temporal filter that exploits the temporal coherence of videos. Next, the recently proposed ECC alignment scheme is used to spatially register the synchronized frames. Finally, change detection methods apply to aligned frames in order to mark suspicious areas. Experiments with real night sequences recorded by in-vehicle cameras demonstrate the performance of the proposed method and verify its efficiency and effectiveness against other methods
Skeletal Quads: Human Action Recognition Using Joint Quadruples
International audienceRecent advances on human motion analysis have made the extraction of human skeleton structure feasible, even from single depth images. This structure has been proven quite informative for discriminating actions in a recognition scenario. In this context, we propose a local skeleton descriptor that encodes the relative position of joint quadruples. Such a coding implies a similarity normalisation transform that leads to a compact (6D) view-invariant skeletal feature, referred to as skeletal quad. Further, the use of a Fisher kernel representation is suggested to describe the skeletal quads contained in a (sub)action. A Gaussian mixture model is learnt from training data, so that the generation of any set of quads is encoded by its Fisher vector. Finally, a multi-level representation of Fisher vectors leads to an action description that roughly carries the order of sub-action within each action sequence. Efficient classification is here achieved by linear SVMs. The proposed action representation is tested on widely used datasets, MSRAction3D and HDM05. The experimental evaluation shows that the proposed method outperforms state-of-the-art algorithms that rely only on joints, while it competes with methods that combine joints with extra cues
- …