3,323 research outputs found
Rectification from Radially-Distorted Scales
This paper introduces the first minimal solvers that jointly estimate lens
distortion and affine rectification from repetitions of rigidly transformed
coplanar local features. The proposed solvers incorporate lens distortion into
the camera model and extend accurate rectification to wide-angle images that
contain nearly any type of coplanar repeated content. We demonstrate a
principled approach to generating stable minimal solvers by the Grobner basis
method, which is accomplished by sampling feasible monomial bases to maximize
numerical stability. Synthetic and real-image experiments confirm that the
solvers give accurate rectifications from noisy measurements when used in a
RANSAC-based estimator. The proposed solvers demonstrate superior robustness to
noise compared to the state-of-the-art. The solvers work on scenes without
straight lines and, in general, relax the strong assumptions on scene content
made by the state-of-the-art. Accurate rectifications on imagery that was taken
with narrow focal length to near fish-eye lenses demonstrate the wide
applicability of the proposed method. The method is fully automated, and the
code is publicly available at https://github.com/prittjam/repeats.Comment: pre-prin
A comparative evaluation of interest point detectors and local descriptors for visual SLAM
Abstract In this paper we compare the behavior of different interest points detectors and descriptors under the
conditions needed to be used as landmarks in vision-based simultaneous localization and mapping (SLAM).
We evaluate the repeatability of the detectors, as well as the invariance and distinctiveness of the descriptors,
under different perceptual conditions using sequences of images representing planar objects as well as 3D scenes.
We believe that this information will be useful when selecting an appropriat
Robust automatic target tracking based on a Bayesian ego-motion compensation framework for airborne FLIR imagery
Automatic target tracking in airborne FLIR imagery is currently a challenge due to the camera ego-motion. This phenomenon distorts the spatio-temporal correlation of the video sequence, which dramatically reduces the tracking performance. Several works address this problem using ego-motion compensation strategies. They use a deterministic approach to compensate the camera motion assuming a specific model of geometric transformation. However, in real sequences a specific geometric transformation can not accurately describe the camera ego-motion for the whole sequence, and as consequence of this, the performance of the tracking stage can significantly decrease, even completely fail. The optimum transformation for each pair of consecutive frames depends on the relative depth of the elements that compose the scene, and their degree of texturization. In this work, a novel Particle Filter framework is proposed to efficiently manage several hypothesis of geometric transformations: Euclidean, affine, and projective. Each type of transformation is used to compute candidate locations of the object in the current frame. Then, each candidate is evaluated by the measurement model of the Particle Filter using the appearance information. This approach is able to adapt to different camera ego-motion conditions, and thus to satisfactorily perform the tracking. The proposed strategy has been tested on the AMCOM FLIR dataset, showing a high efficiency in the tracking of different types of targets in real working conditions
Taking the bite out of automated naming of characters in TV video
We investigate the problem of automatically labelling appearances of characters in TV or film material
with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying
when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”
Radially-Distorted Conjugate Translations
This paper introduces the first minimal solvers that jointly solve for
affine-rectification and radial lens distortion from coplanar repeated
patterns. Even with imagery from moderately distorted lenses, plane
rectification using the pinhole camera model is inaccurate or invalid. The
proposed solvers incorporate lens distortion into the camera model and extend
accurate rectification to wide-angle imagery, which is now common from consumer
cameras. The solvers are derived from constraints induced by the conjugate
translations of an imaged scene plane, which are integrated with the division
model for radial lens distortion. The hidden-variable trick with ideal
saturation is used to reformulate the constraints so that the solvers generated
by the Grobner-basis method are stable, small and fast.
Rectification and lens distortion are recovered from either one conjugately
translated affine-covariant feature or two independently translated
similarity-covariant features. The proposed solvers are used in a \RANSAC-based
estimator, which gives accurate rectifications after few iterations. The
proposed solvers are evaluated against the state-of-the-art and demonstrate
significantly better rectifications on noisy measurements. Qualitative results
on diverse imagery demonstrate high-accuracy undistortions and rectifications.
The source code is publicly available at https://github.com/prittjam/repeats
Do-It-Yourself Single Camera 3D Pointer Input Device
We present a new algorithm for single camera 3D reconstruction, or 3D input
for human-computer interfaces, based on precise tracking of an elongated
object, such as a pen, having a pattern of colored bands. To configure the
system, the user provides no more than one labelled image of a handmade
pointer, measurements of its colored bands, and the camera's pinhole projection
matrix. Other systems are of much higher cost and complexity, requiring
combinations of multiple cameras, stereocameras, and pointers with sensors and
lights. Instead of relying on information from multiple devices, we examine our
single view more closely, integrating geometric and appearance constraints to
robustly track the pointer in the presence of occlusion and distractor objects.
By probing objects of known geometry with the pointer, we demonstrate
acceptable accuracy of 3D localization.Comment: 8 pages, 6 figures, 2018 15th Conference on Computer and Robot Visio
SceneFlowFields: Dense Interpolation of Sparse Scene Flow Correspondences
While most scene flow methods use either variational optimization or a strong
rigid motion assumption, we show for the first time that scene flow can also be
estimated by dense interpolation of sparse matches. To this end, we find sparse
matches across two stereo image pairs that are detected without any prior
regularization and perform dense interpolation preserving geometric and motion
boundaries by using edge information. A few iterations of variational energy
minimization are performed to refine our results, which are thoroughly
evaluated on the KITTI benchmark and additionally compared to state-of-the-art
on MPI Sintel. For application in an automotive context, we further show that
an optional ego-motion model helps to boost performance and blends smoothly
into our approach to produce a segmentation of the scene into static and
dynamic parts.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
201
- …