291 research outputs found

    MRSL: AUTONOMOUS NEURAL NETWORK-BASED SELF-STABILIZING SYSTEM

    Get PDF
    Stabilizing and localizing the positioning systems autonomously in the areas without GPS accessibility is a difficult task. In this thesis we describe a methodology called Most Reliable Straight Line (MRSL) for stabilizing and positioning camera-based objects in 3-D space. The camera-captured images are used to identify easy-to-track points “interesting points� and track them on two consecutive images. The distance between each of interesting points on the two consecutive images are compared and one with the maximum length is assigned to MRSL, which is used to indicate the deviation from the original position. To correct this our trained algorithm is deployed to reduce the deviation by issuing relevant commands, this action is repeated until MRSL converges to zero. To test the accuracy and robustness, the algorithm was deployed to control positioning of a Quadcopter. It was demonstrated that the Quadcopter (a) was highly robust to any external forces, (b) can fly even if the Quadcopter experiences loss of engine, (c) can fly smoothly and positions itself on a desired location

    Featureless Motion Vector-Based Simultaneous Localization, Planar Surface Extraction, and Moving Obstacle Tracking

    Full text link
    Abstract. Motion vectors (MVs) characterize the movement of pixel blocks in video streams and are readily available. MVs not only allow us to avoid expensive feature transform and correspondence computations but also provide the motion information for both the environment and moving obstacles. This enables us to develop a new framework that is ca-pable of simultaneous localization, scene mapping, and moving obstacle tracking. This method first extracts planes from MVs and their corre-sponding pixel macro blocks (MBs) using properties of plane-induced homographies. We then classify MBs as stationary or moving using geo-metric constraints on MVs. Planes are labeled as part of the stationary scene or moving obstacles using MB voting. Therefore, we can estab-lish planes as observations for extended Kalman filters (EKFs) for both the stationary scene and moving objects. We have implemented the pro-posed method. The results show that the proposed method can establish plane-based rectilinear scene structure and detect moving objects while achieving similar localization accuracy of 1-Point EKF. More specifically, the system detects moving obstacles at a true positive rate of 96.6 % with a relative absolution trajectory error of no more than 2.53%.

    DCSLAM : un SLAM temps réel à contraintes dynamiques

    Get PDF
    International audienceLa localisation d'une caméra vidéo en temps réel dans un environnement inconnu ou partiellement connu est un problème abordé par les algorithmes de type CSLAM (Constrained Simultaneous Localization And Mapping). Ceux-ci utilisent des contraintes pour déterminer la pose de la caméra et la structure 3D de l'environnement. Toutefois, les difficultés d'implémentations restreignent ces approches à l'utilisation d'une ou deux contraintes. Afin de dépasser ces difficultés, nous proposons un nouvel algorithme temps réel de type CSLAM conçu pour adapter dynamiquement chaque optimisation au nombre variable de familles de paramètres ainsi qu'à la nature et le nombre de contraintes. Nous utilisons pour cela une méthode permettant de générer automatiquement, à partir d'une liste exhaustive de contraintes, un algorithme d'optimisation spécialisé au problème. C'est, à notre connaissance, la seule implémentation qui allie à la fois flexibilité et performance. Les expérimentations proposées montrent la pertinence de notre approche en terme de précision et de temps d'exécution par rapport à l'état de l'art sur plusieurs benchmarks publics de complexité différente. Une application de réalité augmentée en mixant des objets et des contraintes hétérogènes est également proposée.</p

    Exploring Motion Signatures for Vision-Based Tracking, Recognition and Navigation

    Get PDF
    As cameras become more and more popular in intelligent systems, algorithms and systems for understanding video data become more and more important. There is a broad range of applications, including object detection, tracking, scene understanding, and robot navigation. Besides the stationary information, video data contains rich motion information of the environment. Biological visual systems, like human and animal eyes, are very sensitive to the motion information. This inspires active research on vision-based motion analysis in recent years. The main focus of motion analysis has been on low level motion representations of pixels and image regions. However, the motion signatures can benefit a broader range of applications if further in-depth analysis techniques are developed. In this dissertation, we mainly discuss how to exploit motion signatures to solve problems in two applications: object recognition and robot navigation. First, we use bird species recognition as the application to explore motion signatures for object recognition. We begin with study of the periodic wingbeat motion of flying birds. To analyze the wing motion of a flying bird, we establish kinematics models for bird wings, and obtain wingbeat periodicity in image frames after the perspective projection. Time series of salient extremities on bird images are extracted, and the wingbeat frequency is acquired for species classification. Physical experiments show that the frequency based recognition method is robust to segmentation errors and measurement lost up to 30%. In addition to the wing motion, the body motion of the bird is also analyzed to extract the flying velocity in 3D space. An interacting multi-model approach is then designed to capture the combined object motion patterns and different environment conditions. The proposed systems and algorithms are tested in physical experiments, and the results show a false positive rate of around 20% with a low false negative rate close to zero. Second, we explore motion signatures for vision-based vehicle navigation. We discover that motion vectors (MVs) encoded in Moving Picture Experts Group (MPEG) videos provide rich information of the motion in the environment, which can be used to reconstruct the vehicle ego-motion and the structure of the scene. However, MVs suffer from high noise level. To handle the challenge, an error propagation model for MVs is first proposed. Several steps, including MV merging, plane-at-infinity elimination, and planar region extraction, are designed to further reduce noises. The extracted planes are used as landmarks in an extended Kalman filter (EKF) for simultaneous localization and mapping. Results show that the algorithm performs localization and plane mapping with a relative trajectory error below 5:1%. Exploiting the fact that MVs encodes both environment information and moving obstacles, we further propose to track moving objects at the same time of localization and mapping. This enables the two critical navigation functionalities, localization and obstacle avoidance, to be performed in a single framework. MVs are labeled as stationary or moving according to their consistency to geometric constraints. Therefore, the extracted planes are separated into moving objects and the stationary scene. Multiple EKFs are used to track the static scene and the moving objects simultaneously. In physical experiments, we show a detection rate of moving objects at 96:6% and a mean absolute localization error below 3:5 meters

    Appearance and Geometry Assisted Visual Navigation in Urban Areas

    Get PDF
    Navigation is a fundamental task for mobile robots in applications such as exploration, surveillance, and search and rescue. The task involves solving the simultaneous localization and mapping (SLAM) problem, where a map of the environment is constructed. In order for this map to be useful for a given application, a suitable scene representation needs to be defined that allows spatial information sharing between robots and also between humans and robots. High-level scene representations have the benefit of being more robust and having higher exchangeability for interpretation. With the aim of higher level scene representation, in this work we explore high-level landmarks and their usage using geometric and appearance information to assist mobile robot navigation in urban areas. In visual SLAM, image registration is a key problem. While feature-based methods such as scale-invariant feature transform (SIFT) matching are popular, they do not utilize appearance information as a whole and will suffer from low-resolution images. We study appearance-based methods and propose a scale-space integrated Lucas-Kanade’s method that can estimate geometric transformations and also take into account image appearance with different resolutions. We compare our method against state-of-the-art methods and show that our method can register images efficiently with high accuracy. In urban areas, planar building facades (PBFs) are basic components of the quasirectilinear environment. Hence, segmentation and mapping of PBFs can increase a robot’s abilities of scene understanding and localization. We propose a vision-based PBF segmentation and mapping technique that combines both appearance and geometric constraints to segment out planar regions. Then, geometric constraints such as reprojection errors, orientation constraints, and coplanarity constraints are used in an optimization process to improve the mapping of PBFs. A major issue in monocular visual SLAM is scale drift. While depth sensors, such as lidar, are free from scale drift, this type of sensors are usually more expensive compared to cameras. To enable low-cost mobile robots equipped with monocular cameras to obtain accurate position information, we use a 2D lidar map to rectify imprecise visual SLAM results using planar structures. We propose a two-step optimization approach assisted by a penalty function to improve on low-quality local minima results. Robot paths for navigation can be either automatically generated by a motion planning algorithm or provided by a human. In both cases, a scene representation of the environment, i.e., a map, is useful to specify meaningful tasks for the robot. However, SLAM results usually produce a sparse scene representation that consists of low-level landmarks, such as point clouds, which are neither convenient nor intuitive to use for task specification. We present a system that allows users to program mobile robots using high-level landmarks from appearance data

    High level 3D structure extraction from a single image using a CNN-based approach

    Get PDF
    High-Level Structure (HLS) extraction in a set of images consists of recognizing 3D elements with useful information to the user or application. There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In contrast and motivated by the extensive work developed for the problem of depth estimation in a single image, where parallax constraints are not required, in this work, we propose a novel methodology towards HLS extraction from a single image with promising results. For that, our method has four steps. First, we use a CNN to predict the depth for a single image. Second, we propose a region-wise analysis to refine depth estimates. Third, we introduce a graph analysis to segment the depth in semantic orientations aiming at identifying potential HLS. Finally, the depth sections are provided to a new CNN architecture that predicts HLS in the shape of cubes and rectangular parallelepipeds

    Dense and Globally Consistent Multi-View Stereo

    Get PDF
    Multi-View Stereo (MVS) aims at reconstructing dense geometry of scenes from a set of overlapping images which are captured at different viewing angles. This thesis is devoted to addressing MVS problem by estimating depth maps, since 2D-space operations are trivially parallelizable in contrast to 3D volumetric techniques. Typical setup of depth-map-based MVS approaches consists of per-view calculation and multi-view merging. Most solutions primarily aim at the most precise and complete surfaces for individual views but relaxing the global geometry consistency. Therefore, the inconsistent estimates lead to heavy processing workload in the merging stage and diminish the final reconstruction. Another issue is the textureless areas where the photo-consistency constraint can not discriminate different depths. These matching ambiguities are normally handled by incorporating plane features or the smoothness assumption, that might produce segmentation effect or depends on accuracy and completeness of the calculated object edges. This thesis deals with two kinds of input data, photo collections and high-frame-rate videos, by developing distinct MVS algorithms based on their characteristics: For the sparsely sampled photos, we propose an advanced PatchMatch system that alternates between patch-based correlation maximization and pixel-based optimization of the cross-view consistency. Thereby we get a good trade-off between the photometric and geometric constraints. Moreover, our method achieves high efficiency by combining local pixel traversal and a hierarchical framework for fast depth propagation. For the densely sampled videos, we mainly focus on recovering the homogeneous surfaces, because the redundant scene information enables ray-level correlation which can generate shape depth discontinuities. Our approach infers smooth surfaces for the enclosed areas using perspective depth interpolation, and subsequently tackles the occlusion errors connecting the fore- and background edges. In addition, our edge depth estimation is more robust by accounting for unstructured camera trajectories. Exhaustively calculating depth maps is unfeasible when modeling large scenes from videos. This thesis further improves the reconstruction scalability using an incremental scheme via content-aware view selection and clustering. Our goal is to gradually eliminate the visibility conflicts and increase the surface coverage by processing a minimum subset of views. Constructing view clusters allows us to store merged and locally consistent points with the highest resolution, thus reducing the memory requirements. All approaches presented in the thesis do not rely on high-level techniques, so they can be easily parallelized. The evaluations on various datasets and the comparisons with existing algorithms demonstrate the superiority of our methods
    • …
    corecore