5 research outputs found

    Rolling-Shutter Modelling for Direct Visual-Inertial Odometry

    Full text link
    We present a direct visual-inertial odometry (VIO) method which estimates the motion of the sensor setup and sparse 3D geometry of the environment based on measurements from a rolling-shutter camera and an inertial measurement unit (IMU). The visual part of the system performs a photometric bundle adjustment on a sparse set of points. This direct approach does not extract feature points and is able to track not only corners, but any pixels with sufficient gradient magnitude. Neglecting rolling-shutter effects in the visual part severely degrades accuracy and robustness of the system. In this paper, we incorporate a rolling-shutter model into the photometric bundle adjustment that estimates a set of recent keyframe poses and the inverse depth of a sparse set of points. IMU information is accumulated between several frames using measurement preintegration, and is inserted into the optimization as an additional constraint between selected keyframes. For every keyframe we estimate not only the pose but also velocity and biases to correct the IMU measurements. Unlike systems with global-shutter cameras, we use both IMU measurements and rolling-shutter effects of the camera to estimate velocity and biases for every state. Last, we evaluate our system on a novel dataset that contains global-shutter and rolling-shutter images, IMU data and ground-truth poses for ten different sequences, which we make publicly available. Evaluation shows that the proposed method outperforms a system where rolling shutter is not modelled and achieves similar accuracy to the global-shutter method on global-shutter data

    A Comprehensive Mapping and Real-World Evaluation of Multi-Object Tracking on Automated Vehicles

    Get PDF
    Multi-Object Tracking (MOT) is a field critical to Automated Vehicle (AV) perception systems. However, it is large, complex, spans research fields, and lacks resources for integration with real sensors and implementation on AVs. Factors such those make it difficult for new researchers and practitioners to enter the field. This thesis presents two main contributions: 1) a comprehensive mapping for the field of Multi-Object Trackers (MOTs) with a specific focus towards Automated Vehicles (AVs) and 2) a real-world evaluation of an MOT developed and tuned using COTS (Commercial Off-The-Shelf) software toolsets. The first contribution aims to give a comprehensive overview of MOTs and various MOT subfields for AVs that have not been presented as wholistically in other papers. The second contribution aims to illustrate some of the benefits of using a COTS MOT toolset and some of the difficulties associated with using real-world data. This MOT performed accurate state estimation of a target vehicle through the tracking and fusion of data from a radar and vision sensor using a Central-Level Track Processing approach and a Global Nearest Neighbors assignment algorithm. It had an 0.44 m positional Root Mean Squared Error (RMSE) over a 40 m approach test. It is the authors\u27 hope that this work provides an overview of the MOT field that will help new researchers and practitioners enter the field. Additionally, the author hopes that the evaluation section illustrates some difficulties of using real-world data and provides a good pathway for developing and deploying MOTs from software toolsets to Automated Vehicles

    Visual-Inertial first responder localisation in large-scale indoor training environments.

    Get PDF
    Accurately and reliably determining the position and heading of first responders undertaking training exercises can provide valuable insights into their situational awareness and give a larger context to the decisions made. Measuring first responder movement, however, requires an accurate and portable localisation system. Training exercises of- ten take place in large-scale indoor environments with limited power infrastructure to support localisation. Indoor positioning technologies that use radio or sound waves for localisation require an extensive network of transmitters or receivers to be installed within the environment to ensure reliable coverage. These technologies also need power sources to operate, making their use impractical for this application. Inertial sensors are infrastructure independent, low cost, and low power positioning devices which are attached to the person or object being tracked, but their localisation accuracy deteriorates over long-term tracking due to intrinsic biases and sensor noise. This thesis investigates how inertial sensor tracking can be improved by providing correction from a visual sensor that uses passive infrastructure (fiducial markers) to calculate accurate position and heading values. Even though using a visual sensor increase the accuracy of the localisation system, combining them with inertial sensors is not trivial, especially when mounted on different parts of the human body and going through different motion dynamics. Additionally, visual sensors have higher energy consumption, requiring more batteries to be carried by the first responder. This thesis presents a novel sensor fusion approach by loosely coupling visual and inertial sensors to create a positioning system that accurately localises walking humans in largescale indoor environments. Experimental evaluation of the devised localisation system indicates sub-metre accuracy for a 250m long indoor trajectory. The thesis also proposes two methods to improve the energy efficiency of the localisation system. The first is a distance-based error correction approach which uses distance estimation from the foot-mounted inertial sensor to reduce the number of corrections required from the visual sensor. Results indicate a 70% decrease in energy consumption while maintaining submetre localisation accuracy. The second method is a motion type adaptive error correction approach, which uses the human walking motion type (forward, backward, or sideways) as an input to further optimise the energy efficiency of the localisation system by modulating the operation of the visual sensor. Results of this approach indicate a 25% reduction in the number of corrections required to keep submetre localisation accuracy. Overall, this thesis advances the state of the art by providing a sensor fusion solution for long-term submetre accurate localisation and methods to reduce the energy consumption, making it more practical for use in first responder training exercises

    Interactive computer vision through the Web

    Get PDF
    Computer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results
    corecore