104 research outputs found

    Robust convex optimisation techniques for autonomous vehicle vision-based navigation

    Get PDF
    This thesis investigates new convex optimisation techniques for motion and pose estimation. Numerous computer vision problems can be formulated as optimisation problems. These optimisation problems are generally solved via linear techniques using the singular value decomposition or iterative methods under an L2 norm minimisation. Linear techniques have the advantage of offering a closed-form solution that is simple to implement. The quantity being minimised is, however, not geometrically or statistically meaningful. Conversely, L2 algorithms rely on iterative estimation, where a cost function is minimised using algorithms such as Levenberg-Marquardt, Gauss-Newton, gradient descent or conjugate gradient. The cost functions involved are geometrically interpretable and can statistically be optimal under an assumption of Gaussian noise. However, in addition to their sensitivity to initial conditions, these algorithms are often slow and bear a high probability of getting trapped in a local minimum or producing infeasible solutions, even for small noise levels. In light of the above, in this thesis we focus on developing new techniques for finding solutions via a convex optimisation framework that are globally optimal. Presently convex optimisation techniques in motion estimation have revealed enormous advantages. Indeed, convex optimisation ensures getting a global minimum, and the cost function is geometrically meaningful. Moreover, robust optimisation is a recent approach for optimisation under uncertain data. In recent years the need to cope with uncertain data has become especially acute, particularly where real-world applications are concerned. In such circumstances, robust optimisation aims to recover an optimal solution whose feasibility must be guaranteed for any realisation of the uncertain data. Although many researchers avoid uncertainty due to the added complexity in constructing a robust optimisation model and to lack of knowledge as to the nature of these uncertainties, and especially their propagation, in this thesis robust convex optimisation, while estimating the uncertainties at every step is investigated for the motion estimation problem. First, a solution using convex optimisation coupled to the recursive least squares (RLS) algorithm and the robust H filter is developed for motion estimation. In another solution, uncertainties and their propagation are incorporated in a robust L convex optimisation framework for monocular visual motion estimation. In this solution, robust least squares is combined with a second order cone program (SOCP). A technique to improve the accuracy and the robustness of the fundamental matrix is also investigated in this thesis. This technique uses the covariance intersection approach to fuse feature location uncertainties, which leads to more consistent motion estimates. Loop-closure detection is crucial in improving the robustness of navigation algorithms. In practice, after long navigation in an unknown environment, detecting that a vehicle is in a location it has previously visited gives the opportunity to increase the accuracy and consistency of the estimate. In this context, we have developed an efficient appearance-based method for visual loop-closure detection based on the combination of a Gaussian mixture model with the KD-tree data structure. Deploying this technique for loop-closure detection, a robust L convex posegraph optimisation solution for unmanned aerial vehicle (UAVs) monocular motion estimation is introduced as well. In the literature, most proposed solutions formulate the pose-graph optimisation as a least-squares problem by minimising a cost function using iterative methods. In this work, robust convex optimisation under the L norm is adopted, which efficiently corrects the UAV’s pose after loop-closure detection. To round out the work in this thesis, a system for cooperative monocular visual motion estimation with multiple aerial vehicles is proposed. The cooperative motion estimation employs state-of-the-art approaches for optimisation, individual motion estimation and registration. Three-view geometry algorithms in a convex optimisation framework are deployed on board the monocular vision system for each vehicle. In addition, vehicle-to-vehicle relative pose estimation is performed with a novel robust registration solution in a global optimisation framework. In parallel, and as a complementary solution for the relative pose, a robust non-linear H solution is designed as well to fuse measurements from the UAVs’ on-board inertial sensors with the visual estimates. The suggested contributions have been exhaustively evaluated over a number of real-image data experiments in the laboratory using monocular vision systems and range imaging devices. In this thesis, we propose several solutions towards the goal of robust visual motion estimation using convex optimisation. We show that the convex optimisation framework may be extended to include uncertainty information, to achieve robust and optimal solutions. We observed that convex optimisation is a practical and very appealing alternative to linear techniques and iterative methods

    Visual / acoustic detection and localisation in embedded systems

    Get PDF
    ©Cranfield UniversityThe continuous miniaturisation of sensing and processing technologies is increasingly offering a variety of embedded platforms, enabling the accomplishment of a broad range of tasks using such systems. Motivated by these advances, this thesis investigates embedded detection and localisation solutions using vision and acoustic sensors. Focus is particularly placed on surveillance applications using sensor networks. Existing vision-based detection solutions for embedded systems suffer from the sensitivity to environmental conditions. In the literature, there seems to be no algorithm able to simultaneously tackle all the challenges inherent to real-world videos. Regarding the acoustic modality, many research works have investigated acoustic source localisation solutions in distributed sensor networks. Nevertheless, it is still a challenging task to develop an ecient algorithm that deals with the experimental issues, to approach the performance required by these systems and to perform the data processing in a distributed and robust manner. The movement of scene objects is generally accompanied with sound emissions with features that vary from an environment to another. Therefore, considering the combination of the visual and acoustic modalities would offer a significant opportunity for improving the detection and/or localisation using the described platforms. In the light of the described framework, we investigate in the first part of the thesis the use of a cost-effective visual based method that can deal robustly with the issue of motion detection in static, dynamic and moving background conditions. For motion detection in static and dynamic backgrounds, we present the development and the performance analysis of a spatio- temporal form of the Gaussian mixture model. On the other hand, the problem of motion detection in moving backgrounds is addressed by accounting for registration errors in the captured images. By adopting a robust optimisation technique that takes into account the uncertainty about the visual measurements, we show that high detection accuracy can be achieved. In the second part of this thesis, we investigate solutions to the problem of acoustic source localisation using a trust region based optimisation technique. The proposed method shows an overall higher accuracy and convergence improvement compared to a linear-search based method. More importantly, we show that through characterising the errors in measurements, which is a common problem for such platforms, higher accuracy in the localisation can be attained. The last part of this work studies the different possibilities of combining visual and acoustic information in a distributed sensors network. In this context, we first propose to include the acoustic information in the visual model. The obtained new augmented model provides promising improvements in the detection and localisation processes. The second investigated solution consists in the fusion of the measurements coming from the different sensors. An evaluation of the accuracy of localisation and tracking using a centralised/decentralised architecture is conducted in various scenarios and experimental conditions. Results have shown the capability of this fusion approach to yield higher accuracy in the localisation and tracking of an active acoustic source than by using a single type of data

    Non-iterative RGB-D-inertial Odometry

    Full text link
    This paper presents a non-iterative solution to RGB-D-inertial odometry system. Traditional odometry methods resort to iterative algorithms which are usually computationally expensive or require well-designed initialization. To overcome this problem, this paper proposes to combine a non-iterative front-end (odometry) with an iterative back-end (loop closure) for the RGB-D-inertial SLAM system. The main contribution lies in the novel non-iterative front-end, which leverages on inertial fusion and kernel cross-correlators (KCC) to match point clouds in frequency domain. Dominated by the fast Fourier transform (FFT), our method is only of complexity O(nlogn)\mathcal{O}(n\log{n}), where nn is the number of points. Map fusion is conducted by element-wise operations, so that both time and space complexity are further reduced. Extensive experiments show that, due to the lightweight of the proposed front-end, the framework is able to run at a much faster speed yet still with comparable accuracy with the state-of-the-arts

    Development and Implementation of Image-based Algorithms for Measurement of Deformations in Material Testing

    Get PDF
    This paper presents the development and implementation of three image-based methods used to detect and measure the displacements of a vast number of points in the case of laboratory testing on construction materials. Starting from the needs of structural engineers, three ad hoc tools for crack measurement in fibre-reinforced specimens and 2D or 3D deformation analysis through digital images were implemented and tested. These tools make use of advanced image processing algorithms and can integrate or even substitute some traditional sensors employed today in most laboratories. In addition, the automation provided by the implemented software, the limited cost of the instruments and the possibility to operate with an indefinite number of points offer new and more extensive analysis in the field of material testing. Several comparisons with other traditional sensors widely adopted inside most laboratories were carried out in order to demonstrate the accuracy of the implemented software. Implementation details, simulations and real applications are reported and discussed in this paper

    Robust Shape Fitting for 3D Scene Abstraction

    Get PDF
    Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts

    Visual Navigation for Robots in Urban and Indoor Environments

    Get PDF
    As a fundamental capability for mobile robots, navigation involves multiple tasks including localization, mapping, motion planning, and obstacle avoidance. In unknown environments, a robot has to construct a map of the environment while simultaneously keeping track of its own location within the map. This is known as simultaneous localization and mapping (SLAM). For urban and indoor environments, SLAM is especially important since GPS signals are often unavailable. Visual SLAM uses cameras as the primary sensor and is a highly attractive but challenging research topic. The major challenge lies in the robustness to lighting variation and uneven feature distribution. Another challenge is to build semantic maps composed of high-level landmarks. To meet these challenges, we investigate feature fusion approaches for visual SLAM. The basic rationale is that since urban and indoor environments contain various feature types such points and lines, in combination these features should improve the robustness, and meanwhile, high-level landmarks can be defined as or derived from these combinations. We design a novel data structure, multilayer feature graph (MFG), to organize five types of features and their inner geometric relationships. Building upon a two view-based MFG prototype, we extend the application of MFG to image sequence-based mapping by using EKF. We model and analyze how errors are generated and propagated through the construction of a two view-based MFG. This enables us to treat each MFG as an observation in the EKF update step. We apply the MFG-EKF method to a building exterior mapping task and demonstrate its efficacy. Two view based MFG requires sufficient baseline to be successfully constructed, which is not always feasible. Therefore, we further devise a multiple view based algorithm to construct MFG as a global map. Our proposed algorithm takes a video stream as input, initializes and iteratively updates MFG based on extracted key frames; it also refines robot localization and MFG landmarks using local bundle adjustment. We show the advantage of our method by comparing it with state-of-the-art methods on multiple indoor and outdoor datasets. To avoid the scale ambiguity in monocular vision, we investigate the application of RGB-D for SLAM.We propose an algorithm by fusing point and line features. We extract 3D points and lines from RGB-D data, analyze their measurement uncertainties, and compute camera motion using maximum likelihood estimation. We validate our method using both uncertainty analysis and physical experiments, where it outperforms the counterparts under both constant and varying lighting conditions. Besides visual SLAM, we also study specular object avoidance, which is a great challenge for range sensors. We propose a vision-based algorithm to detect planar mirrors. We derive geometric constraints for corresponding real-virtual features across images and employ RANSAC to develop a robust detection algorithm. Our algorithm achieves a detection accuracy of 91.0%

    Event-Based Visual-Inertial Odometry Using Smart Features

    Get PDF
    Event-based cameras are a novel type of visual sensor that operate under a unique paradigm, providing asynchronous data on the log-level changes in light intensity for individual pixels. This hardware-level approach to change detection allows these cameras to achieve ultra-wide dynamic range and high temporal resolution. Furthermore, the advent of convolutional neural networks (CNNs) has led to state-of-the-art navigation solutions that now rival or even surpass human engineered algorithms. The advantages offered by event cameras and CNNs make them excellent tools for visual odometry (VO). This document presents the implementation of a CNN trained to detect and describe features within an image as well as the implementation of an event-based visual-inertial odometry (EVIO) pipeline, which estimates a vehicle\u27s 6-degrees-offreedom (DOF) pose using an affixed event-based camera with an integrated inertial measurement unit (IMU). The front-end of this pipeline utilizes a neural network for generating image frames from asynchronous event camera data. These frames are fed into a multi-state constraint Kalman filter (MSCKF) back-end that uses the output of the developed CNN to perform measurement updates. The EVIO pipeline was tested on a selection from the Event-Camera Dataset [1], and on a dataset collected from a fixed-wing unmanned aerial vehicle (UAV) flight test conducted by the Autonomy and Navigation Technology (ANT) Center

    A Comprehensive Introduction of Visual-Inertial Navigation

    Full text link
    In this article, a tutorial introduction to visual-inertial navigation(VIN) is presented. Visual and inertial perception are two complementary sensing modalities. Cameras and inertial measurement units (IMU) are the corresponding sensors for these two modalities. The low cost and light weight of camera-IMU sensor combinations make them ubiquitous in robotic navigation. Visual-inertial Navigation is a state estimation problem, that estimates the ego-motion and local environment of the sensor platform. This paper presents visual-inertial navigation in the classical state estimation framework, first illustrating the estimation problem in terms of state variables and system models, including related quantities representations (Parameterizations), IMU dynamic and camera measurement models, and corresponding general probabilistic graphical models (Factor Graph). Secondly, we investigate the existing model-based estimation methodologies, these involve filter-based and optimization-based frameworks and related on-manifold operations. We also discuss the calibration of some relevant parameters, also initialization of state of interest in optimization-based frameworks. Then the evaluation and improvement of VIN in terms of accuracy, efficiency, and robustness are discussed. Finally, we briefly mention the recent development of learning-based methods that may become alternatives to traditional model-based methods.Comment: 35 pages, 10 figure

    Human motion estimation and analysing feasible collisions between individuals in real settings by using conventional cameras and Deep Learning-based algorithms

    Get PDF
    En este Trabajo Fin de Grado (TFG), se implementará un sistema de estimación y análisis del movimiento humano para la detección y prevención de posibles colisiones entre individuos que se desplazan libremente por un entorno industrial real. Se usarán dos o más cámaras convencionales situadas en puntos estratégicos del entorno a analizar y, mediante algoritmos de Deep Learning proporcionados mediante bibliotecas "open source", como MediaPipe, se estimarán las trayectorias seguidas y se predecirá la dirección del movimiento. Los resultados obtenidos se validarán utilizando, a su vez, cámaras RGB-D de bajo coste, que permitirán obtener medidas de distancia precisas a los individuos, permitiendo obtener el "ground truth" necesario para estimar el grado de precisión alcanzado en el sistema diseñado. El objetivo general que consiste en la estimación del movimiento humano y y el análisis de posibles colisiones entre individuos en entornos reales usando cámaras convencionales y algoritmos de Deep Learning, se conseguirá si se alcanzan los siguientes subobjetivos específicos: - Integración de los componentes de MediaPipe que permiten estimar los puntos de referencia para "Skeleton Tracking". - Diseño de una aplicación que integre los componentes diseñados para estimar las trayectorias de diferentes individuos a partir de la lectura de dos o más cámaras convencionales. - Diseño de una estrategia de predicción de la dirección de los individuos según la trayectoria seguida con el propósito de evitar colisiones. - Validación del sistema comparando los resultados obtenidos por las cámaras convencionales, con datos "ground truth" generados a partir de cámaras RGB-D que permitan medir fiablemente la distancia a los individuos y su posición en un sistema de referencia absoluto en tres dimensiones.Escuela Técnica Superior de Ingeniería IndustrialUniversidad Politécnica de Cartagen

    Algorithms and evaluation for object detection and tracking in computer vision

    Get PDF
    Vision-based object detection and tracking, especially for video surveillance applications, is studied from algorithms to performance evaluation. This dissertation is composed of four topics: (1) Background Modeling and Detection, (2) Performance Evaluation of Sensitive Target Detection, (3) Multi-view Multi-target Multi-Hypothesis Segmentation and Tracking of People, and (4) A Fine-Structure Image/Video Quality Measure. First, we present a real-time algorithm for foreground-background segmentation. It allows us to capture structural background variation due to periodic-like motion over a long period of time under limited memory. Our codebook-based representation is efficient in memory and speed compared with other background modeling techniques. Our method can handle scenes containing moving backgrounds or illumination variations, and it achieves robust detection for different types of videos. In addition to the basic algorithm, three features improving the algorithm are presented - Automatic Parameter Estimation, Layered Modeling/Detection and Adaptive Codebook Updating. Second, we introduce a performance evaluation methodology called Perturbation Detection Rate (PDR) analysis for measuring performance of foreground-background segmentation. It does not require foreground targets or knowledge of foreground distributions. It measures the sensitivity of a background subtraction algorithm in detecting possible low contrast targets against the background as a function of contrast. We compare four background subtraction algorithms using the methodology. Third, a multi-view multi-hypothesis approach to segmenting and tracking multiple persons on a ground plane is proposed. The tracking state space is the set of ground points of the people being tracked. During tracking, several iterations of segmentation are performed using information from human appearance models and ground plane homography. Two innovations are made in this chapter - (1) To more precisely locate the ground location of a person, all center vertical axes of the person across views are mapped to the top-view plane to find the intersection point. (2) To tackle the explosive state space due to multiple targets and views, iterative segmentation-searching is incorporated into a particle filtering framework. By searching for people's ground point locations from segmentations, a set of a few good particles can be identified, resulting in low computational cost. In addition, even if all the particles are away from the true ground point, some of them move towards the true one through the iterated process as long as they are located nearby. Finally, an objective no-reference measure is presented to assess fine-structure image/video quality. The proposed measure using local statistics reflects image degradation well in terms of noise and blur