12 research outputs found

    Point and line feature-based observer design on SL(3) for Homography estimation and its application to image stabilization

    Get PDF
    This paper presents a new algorithm for online estimation of a sequence of homographies applicable to image sequences obtained from robotic vehicles equipped with a monocular camera. The approach taken exploits the underlying Special Linear group SL(3) structure of the set of homographies along with gyrometer measurements and direct point-and line-feature correspondences between images to develop temporal filter for the homography estimate. Theoretical analysis and experimental results are provided to demonstrate the robustness of the proposed algorithm. The experimental results show excellent performance even in the case of very fast camera motion (relative to frame rate), and in presence of severe occlusion, specular reflection, image blur, and light saturation

    Continually improving large scale long term visual navigation of a vehicle in dynamic urban environments

    Get PDF
    Abstract-This paper is about long term navigation in dynamic environments. In previous work we introduced a framework which stored distinct visual appearances of a workspace, known as experiences. These are used to improve localisation on future visits. In this work we introduce a new introspective process, executed between sorties, thats aims by careful discovery of the relationships between experiences, to further improve the performance of our system. We evaluate our new approach on 37km of stereo data captured over a three month period

    SVO: Fast Semi-Direct Monocular Visual Odometry

    Get PDF
    We propose a semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods. The semi-direct approach eliminates the need of costly feature extraction and robust matching techniques for motion estimation. Our algorithm operates directly on pixel intensities, which results in subpixel precision at high frame-rates. A probabilistic mapping method that explicitly models outlier measurements is used to estimate 3D points, which results in fewer outliers and more reliable points. Precise and high frame-rate motion estimation brings increased robustness in scenes of little, repetitive, and high-frequency texture. The algorithm is applied to micro-aerial-vehicle state-estimation in GPS-denied environments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Semi-direct Visual Odometry) and release our implementation as open-source software

    Planar Homography Estimation from Traffic Streams via Energy Functional Minimization

    Get PDF
    The 3x3 homography matrix specifies the mapping between two images of the same plane as viewed by a pinhole camera. Knowledge of the matrix allows one to remove the perspective distortion and apply any similarity transform, effectively making possible the measurement of distances and angles on the image. A rectified road scene for instance, where vehicles can be segmented and tracked, gives rise to ready estimates of their velocities and spacing or categorization of their type. Typical road scenes render the classical approach to homography estimation difficult. The Direct Linear Transform is highly susceptible to noise and usually requires refining via an further nonlinear penalty minimization. Additionally, the penalty is a function of the displacement between measured and calibrated coordinates, a quantity unavailable in a scene for which we have no knowledge of the road coordinates. We propose instead to achieve metric rectification via the minimization of an energy that measures the violation of two constraints: the divergence-free nature of the traffic flow and the orthogonality of the flow and transverse directions under the true transform. Given that an homography is only determined up to scale, the minimization is performed on the Lie group SL(3)SL(3), for which we develop a gradient descent algorithm. While easily expressed in the world frame, the energy must be computed from measurements made in the image and thus must be pulled back using standard differential geometric machinery to the image frame. We develop an enhancement to the algorithm by incorporating optical flow ideas and apply it to both a noiseless test case and a suite of real-world video streams to demonstrate its efficacy and convergence. Finally, we discuss the extension to a 3D-to-planar mapping for vehicle height inference and an homography that is allowed to vary over the image, invoking a minimization on Diff(SL(3))(SL(3))

    Moving-Camera Video Content Analysis via Action Recognition and Homography Transformation

    Get PDF
    Moving-camera video content analysis aims at interpreting useful information in videos taken by moving cameras, including wearable cameras and handy cameras. It is an essential problem in computer vision, and plays an important role in many real-life applications, including understanding social difficulties and enhancing public security. In this work, we study three sub-problems of moving-camera video content analysis, including two sub-problems for the analysis on wearable-camera videos which are a special type of moving camera videos: recognizing general actions and recognizing microactions in wearable-camera videos. And, the third sub-problem is estimating homographies along moving-camera videos. Recognizing general actions in wearable-camera videos is a challenging task, because the motion features extracted from videos of the same action may show very large variation and inconsistency, by mixing the complex and non-stop motion of the camera. It is very difficult to collect sufficient videos to cover all such variations and use them to train action classifiers with good generalization ability. To address this, we develop a new approach to train action classifiers on a relatively smaller set of fixed-camera videos with different views, and then apply them to recognize actions in wearable-camera videos. We conduct experiments by training on a set of fixed-camera videos and testing on a set of wearable-camera videos, with very promising results. Microactions such as small hand or head movements, can be difficult to be recognized in practice, especially from wearble-camera videos, because only subtle body motion is presented. To address this, we proposed a new deep-learning based method to effectively learn midlayer CNN features for enhancing microaction recognition. More specifically, we develop a new dual-branch network for microaction recognition: one branch uses the high-layer CNN features for classification, and the second branch with a novel subtle motion detector further explores the midlayer CNN features for classification. In the experiments, we build a new microaction video dataset, where the micromotions of interest are mixed with other larger general motions such as walking. Comprehensive experimental results verify that the proposed method yields new state-of-the-art performance in two microaction video datasets, while its performance on two general-action video datasets is also very promising. Homography is the invertible mapping between two images of the same planar surface. For estimating homographies along moving-camera videos, homography estimation between non-adjacent frames can be very challenging when their camera view angles show large difference. To handle this, we propose a new deep-learning based method for homography estimation along videos by exploiting temporal dynamics across frames. More specifically, we develop a recurrent convolutional regression network consisting of convolutional neural network and recurrent neural network with long short-term memory cells, followed by a regression layer for estimating the parameters of homography. In the experiments, we introduce a new approach to synthesize videos with known ground-truth homographies, and evaluate the proposed method on both the synthesized and real-world videos with good results

    Local Accuracy and Global Consistency for Efficient SLAM

    No full text
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to ltering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local motion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several improvements for the visual front-end of SLAM. Our contributions are evaluated exhaustively on a number of synthetic experiments and real-image data-set from single cameras and range imaging devices
    corecore