372 research outputs found

    A vision system for mobile maritime surveillance platforms

    Get PDF
    Mobile surveillance systems play an important role to minimise security and safety threats in high-risk or hazardous environments. Providing a mobile marine surveillance platform with situational awareness of its environment is important for mission success. An essential part of situational awareness is the ability to detect and subsequently track potential target objects.Typically, the exact type of target objects is unknown, hence detection is addressed as a problem of finding parts of an image that stand out in relation to their surrounding regions or are atypical to the domain. Contrary to existing saliency methods, this thesis proposes the use of a domain specific visual attention approach for detecting potential regions of interest in maritime imagery. For this, low-level features that are indicative of maritime targets are identified. These features are then evaluated with respect to their local, regional, and global significance. Together with a domain specific background segmentation technique, the features are combined in a Bayesian classifier to direct visual attention to potential target objects.The maritime environment introduces challenges to the camera system: gusts, wind, swell, or waves can cause the platform to move drastically and unpredictably. Pan-tilt-zoom cameras that are often utilised for surveillance tasks can adjusting their orientation to provide a stable view onto the target. However, in rough maritime environments this requires high-speed and precise inputs. In contrast, omnidirectional cameras provide a full spherical view, which allows the acquisition and tracking of multiple targets at the same time. However, the target itself only occupies a small fraction of the overall view. This thesis proposes a novel, target-centric approach for image stabilisation. A virtual camera is extracted from the omnidirectional view for each target and is adjusted based on the measurements of an inertial measurement unit and an image feature tracker. The combination of these two techniques in a probabilistic framework allows for stabilisation of rotational and translational ego-motion. Furthermore, it has the specific advantage of being robust to loosely calibrated and synchronised hardware since the fusion of tracking and stabilisation means that tracking uncertainty can be used to compensate for errors in calibration and synchronisation. This then completely eliminates the need for tedious calibration phases and the adverse effects of assembly slippage over time.Finally, this thesis combines the visual attention and omnidirectional stabilisation frameworks and proposes a multi view tracking system that is capable of detecting potential target objects in the maritime domain. Although the visual attention framework performed well on the benchmark datasets, the evaluation on real-world maritime imagery produced a high number of false positives. An investigation reveals that the problem is that benchmark data sets are unconsciously being influenced by human shot selection, which greatly simplifies the problem of visual attention. Despite the number of false positives, the tracking approach itself is robust even if a high number of false positives are tracked

    Vision-based localization methods under GPS-denied conditions

    Full text link
    This paper reviews vision-based localization methods in GPS-denied environments and classifies the mainstream methods into Relative Vision Localization (RVL) and Absolute Vision Localization (AVL). For RVL, we discuss the broad application of optical flow in feature extraction-based Visual Odometry (VO) solutions and introduce advanced optical flow estimation methods. For AVL, we review recent advances in Visual Simultaneous Localization and Mapping (VSLAM) techniques, from optimization-based methods to Extended Kalman Filter (EKF) based methods. We also introduce the application of offline map registration and lane vision detection schemes to achieve Absolute Visual Localization. This paper compares the performance and applications of mainstream methods for visual localization and provides suggestions for future studies.Comment: 32 pages, 15 figure

    Binokulare EigenbewegungsschĂ€tzung fĂŒr Fahrerassistenzanwendungen

    Get PDF
    Driving can be dangerous. Humans become inattentive when performing a monotonous task like driving. Also the risk implied while multi-tasking, like using the cellular phone while driving, can break the concentration of the driver and increase the risk of accidents. Others factors like exhaustion, nervousness and excitement affect the performance of the driver and the response time. Consequently, car manufacturers have developed systems in the last decades which assist the driver under various circumstances. These systems are called driver assistance systems. Driver assistance systems are meant to support the task of driving, and the field of action varies from alerting the driver, with acoustical or optical warnings, to taking control of the car, such as keeping the vehicle in the traffic lane until the driver resumes control. For such a purpose, the vehicle is equipped with on-board sensors which allow the perception of the environment and/or the state of the vehicle. Cameras are sensors which extract useful information about the visual appearance of the environment. Additionally, a binocular system allows the extraction of 3D information. One of the main requirements for most camera-based driver assistance systems is the accurate knowledge of the motion of the vehicle. Some sources of information, like velocimeters and GPS, are of common use in vehicles today. Nevertheless, the resolution and accuracy usually achieved with these systems are not enough for many real-time applications. The computation of ego-motion from sequences of stereo images for the implementation of driving intelligent systems, like autonomous navigation or collision avoidance, constitutes the core of this thesis. This dissertation proposes a framework for the simultaneous computation of the 6 degrees of freedom of ego-motion (rotation and translation in 3D Euclidean space), the estimation of the scene structure and the detection and estimation of independently moving objects. The input is exclusively provided by a binocular system and the framework does not call for any data acquisition strategy, i.e. the stereo images are just processed as they are provided. Stereo allows one to establish correspondences between left and right images, estimating 3D points of the environment via triangulation. Likewise, feature tracking establishes correspondences between the images acquired at different time instances. When both are used together for a large number of points, the result is a set of clouds of 3D points with point-to-point correspondences between clouds. The apparent motion of the 3D points between consecutive frames is caused by a variety of reasons. The most dominant motion for most of the points in the clouds is caused by the ego-motion of the vehicle; as the vehicle moves and images are acquired, the relative position of the world points with respect to the vehicle changes. Motion is also caused by objects moving in the environment. They move independently of the vehicle motion, so the observed motion for these points is the sum of the ego-vehicle motion and the independent motion of the object. A third reason, and of paramount importance in vision applications, is caused by correspondence problems, i.e. the incorrect spatial or temporal assignment of the point-to-point correspondence. Furthermore, all the points in the clouds are actually noisy measurements of the real unknown 3D points of the environment. Solving ego-motion and scene structure from the clouds of points requires some previous analysis of the noise involved in the imaging process, and how it propagates as the data is processed. Therefore, this dissertation analyzes the noise properties of the 3D points obtained through stereo triangulation. This leads to the detection of a bias in the estimation of 3D position, which is corrected with a reformulation of the projection equation. Ego-motion is obtained by finding the rotation and translation between the two clouds of points. This problem is known as absolute orientation, and many solutions based on least squares have been proposed in the literature. This thesis reviews the available closed form solutions to the problem. The proposed framework is divided in three main blocks: 1) stereo and feature tracking computation, 2) ego-motion estimation and 3) estimation of 3D point position and 3D velocity. The first block solves the correspondence problem providing the clouds of points as output. No special implementation of this block is required in this thesis. The ego-motion block computes the motion of the cameras by finding the absolute orientation between the clouds of static points in the environment. Since the cloud of points might contain independently moving objects and outliers generated by false correspondences, the direct computation of the least squares might lead to an erroneous solution. The first contribution of this thesis is an effective rejection rule that detects outliers based on the distance between predicted and measured quantities, and reduces the effects of noisy measurement by assigning appropriate weights to the data. This method is called Smoothness Motion Constraint (SMC). The ego-motion of the camera between two frames is obtained finding the absolute orientation between consecutive clouds of weighted 3D points. The complete ego-motion since initialization is achieved concatenating the individual motion estimates. This leads to a super-linear propagation of the error, since noise is integrated. A second contribution of this dissertation is a predictor/corrector iterative method, which integrates the clouds of 3D points of multiple time instances for the computation of ego-motion. The presented method considerably reduces the accumulation of errors in the estimated ego-position of the camera. Another contribution of this dissertation is a method which recursively estimates the 3D world position of a point and its velocity; by fusing stereo, feature tracking and the estimated ego-motion in a Kalman Filter system. An improved estimation of point position is obtained this way, which is used in the subsequent system cycle resulting in an improved computation of ego-motion. The general contribution of this dissertation is a single framework for the real time computation of scene structure, independently moving objects and ego-motion for automotive applications.Autofahren kann gefĂ€hrlich sein. Die Fahrleistung wird durch die physischen und psychischen Grenzen des Fahrers und durch externe Faktoren wie das Wetter beeinflusst. Fahrerassistenzsysteme erhöhen den Fahrkomfort und unterstĂŒtzen den Fahrer, um die Anzahl an UnfĂ€llen zu verringern. Fahrerassistenzsysteme unterstĂŒtzen den Fahrer durch Warnungen mit optischen oder akustischen Signalen bis hin zur Übernahme der Kontrolle ĂŒber das Auto durch das System. Eine der Hauptvoraussetzungen fĂŒr die meisten Fahrerassistenzsysteme ist die akkurate Kenntnis der Bewegung des eigenen Fahrzeugs. Heutzutage verfĂŒgt man ĂŒber verschiedene Sensoren, um die Bewegung des Fahrzeugs zu messen, wie zum Beispiel GPS und Tachometer. Doch Auflösung und Genauigkeit dieser Systeme sind nicht ausreichend fĂŒr viele Echtzeitanwendungen. Die Berechnung der Eigenbewegung aus Stereobildsequenzen fĂŒr Fahrerassistenzsysteme, z.B. zur autonomen Navigation oder Kollisionsvermeidung, bildet den Kern dieser Arbeit. Diese Dissertation prĂ€sentiert ein System zur Echtzeitbewertung einer Szene, inklusive Detektion und Bewertung von unabhĂ€ngig bewegten Objekten sowie der akkuraten SchĂ€tzung der sechs Freiheitsgrade der Eigenbewegung. Diese grundlegenden Bestandteile sind erforderlich, um viele intelligente Automobilanwendungen zu entwickeln, die den Fahrer in unterschiedlichen Verkehrssituationen unterstĂŒtzen. Das System arbeitet ausschließlich mit einer Stereokameraplattform als Sensor. Um die Eigenbewegung und die Szenenstruktur zu berechnen wird eine Analyse des Rauschens und der Fehlerfortpflanzung im Bildaufbereitungsprozess benötigt. Deshalb werden in dieser Dissertation die Rauscheigenschaften der durch Stereotriangulation erhaltenen 3D-Punkte analysiert. Dies fĂŒhrt zu der Entdeckung eines systematischen Fehlers in der SchĂ€tzung der 3D-Position, der sich mit einer Neuformulierung der Projektionsgleichung korrigieren lĂ€sst. Die Simulationsergebnisse zeigen, dass eine bedeutende Verringerung des Fehlers in der geschĂ€tzten 3D-Punktposition möglich ist. Die EigenbewegungsschĂ€tzung wird gewonnen, indem die Rotation und Translation zwischen Punktwolken geschĂ€tzt wird. Dieses Problem ist als „absolute Orientierung” bekannt und viele Lösungen auf Basis der Methode der kleinsten Quadrate sind in der Literatur vorgeschlagen worden. Diese Arbeit rezensiert die verfĂŒgbaren geschlossenen Lösungen zu dem Problem. Das vorgestellte System gliedert sich in drei wesentliche Bausteine: 1. Registrierung von Bildmerkmalen, 2. EigenbewegungsschĂ€tzung und 3. iterative SchĂ€tzung von 3D-Position und 3D-Geschwindigkeit von Weltpunkten. Der erster Block erhĂ€lt eine Folge rektifizierter Bilder als Eingabe und liefert daraus eine Liste von verfolgten Bildmerkmalen mit ihrer entsprechenden 3D-Position. Der Block „EigenbewegungsschĂ€tzung” besteht aus vier Hauptschritten in einer Schleife: 1. Bewegungsvorhersage, 2. Anwendung der Glattheitsbedingung fĂŒr die Bewegung (GBB), 3. absolute Orientierungsberechnung und 4. Bewegungsintegration. Die in dieser Dissertation vorgeschlagene GBB ist eine mĂ€chtige Bedingung fĂŒr die Ablehnung von Ausreißern und fĂŒr die Zuordnung von Gewichten zu den gemessenen 3D-Punkten. Simulationen werden mit gaußschem und slashschem Rauschen ausgefĂŒhrt. Die Ergebnisse zeigen die Überlegenheit der GBB-Version ĂŒber die Standardgewichtungsmethoden. Die StabilitĂ€t der Ergebnisse hinsichtlich Ausreißern wurde analysiert mit dem Resultat, dass der „break down point” grĂ¶ĂŸer als 50% ist. Wenn die vier Schritte iterativ ausgefĂŒhrt, werden wird ein PrĂ€diktor-Korrektor-Verfahren gewonnen.Wir nennen diese SchĂ€tzung Multi-frameschĂ€tzung im Gegensatz zur ZweiframeschĂ€tzung, die nur die aktuellen und vorherigen Bildpaare fĂŒr die Berechnung der Eigenbewegung betrachtet. Die erste Iteration wird zwischen der aktuellen und vorherigen Wolke von Punkten durchgefĂŒhrt. Jede weitere Iteration integriert eine zusĂ€tzliche Punktwolke eines vorherigen Zeitpunkts. Diese Methode reduziert die Fehlerakkumulation bei der Integration von mehreren SchĂ€tzungen in einer einzigen globalen SchĂ€tzung. Simulationsergebnisse zeigen, dass obwohl der Fehler noch superlinear im Laufe der Zeit zunimmt, die GrĂ¶ĂŸe des Fehlers um mehrere GrĂ¶ĂŸenordnungen reduziert wird. Der dritte Block besteht aus der iterativen SchĂ€tzung von 3D-Position und 3D-Geschwindigkeit von Weltpunkten. Hier wird eine Methode basierend auf einem Kalman Filter verwendet, das Stereo, Featuretracking und Eigenbewegungsdaten fusioniert. Messungen der Position eines Weltpunkts werden durch das Stereokamerasystem gewonnen. Die Differenzierung der Position des geschĂ€tzten Punkts erlaubt die zusĂ€tzliche SchĂ€tzung seiner Geschwindigkeit. Die Messungen werden durch das Messmodell gewonnen, das Stereo- und Bewegungsdaten fusioniert. Simulationsergebnisse validieren das Modell. Die Verringerung der Positionsunsicherheit im Laufe der Zeit wird mit einer Monte-Carlo Simulation erzielt. Experimentelle Ergebnisse werden mit langen Sequenzen von Bildern erzielt. ZusĂ€tzliche Tests, einschließlich einer 3D-Rekonstruktion einer Waldszene und der Berechnung der freien Kamerabewegung in einem Indoor-Szenario, wurden durchgefĂŒhrt. Die Methode zeigt gute Ergebnisse in allen FĂ€llen. Der Algorithmus liefert zudem akzeptable Ergebnisse bei der SchĂ€tzung der Lage kleiner Objekte, wie Köpfe und Beine von realen Crash-Test-Dummies

    Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

    Full text link
    We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, to attain a higher level of robustness and accuracy, we introduce additional sparse depth samples, which are either acquired with a low-resolution depth sensor or computed via visual Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. Our experiments show that, compared to using only RGB images, the addition of 100 spatially random depth samples reduces the prediction root-mean-square error by 50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two applications of the proposed algorithm: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs. Software and video demonstration are publicly available.Comment: accepted to ICRA 2018. 8 pages, 8 figures, 3 tables. Video at https://www.youtube.com/watch?v=vNIIT_M7x7Y. Code at https://github.com/fangchangma/sparse-to-dens

    Human {POSEitioning} System ({HPS}): {3D} Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors

    Get PDF

    Vision-based Learning for Drones: A Survey

    Full text link
    Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities under various scenarios. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world

    A cognitive ego-vision system for interactive assistance

    Get PDF
    With increasing computational power and decreasing size, computers nowadays are already wearable and mobile. They become attendant of peoples' everyday life. Personal digital assistants and mobile phones equipped with adequate software gain a lot of interest in public, although the functionality they provide in terms of assistance is little more than a mobile databases for appointments, addresses, to-do lists and photos. Compared to the assistance a human can provide, such systems are hardly to call real assistants. The motivation to construct more human-like assistance systems that develop a certain level of cognitive capabilities leads to the exploration of two central paradigms in this work. The first paradigm is termed cognitive vision systems. Such systems take human cognition as a design principle of underlying concepts and develop learning and adaptation capabilities to be more flexible in their application. They are embodied, active, and situated. Second, the ego-vision paradigm is introduced as a very tight interaction scheme between a user and a computer system that especially eases close collaboration and assistance between these two. Ego-vision systems (EVS) take a user's (visual) perspective and integrate the human in the system's processing loop by means of a shared perception and augmented reality. EVSs adopt techniques of cognitive vision to identify objects, interpret actions, and understand the user's visual perception. And they articulate their knowledge and interpretation by means of augmentations of the user's own view. These two paradigms are studied as rather general concepts, but always with the goal in mind to realize more flexible assistance systems that closely collaborate with its users. This work provides three major contributions. First, a definition and explanation of ego-vision as a novel paradigm is given. Benefits and challenges of this paradigm are discussed as well. Second, a configuration of different approaches that permit an ego-vision system to perceive its environment and its user is presented in terms of object and action recognition, head gesture recognition, and mosaicing. These account for the specific challenges identified for ego-vision systems, whose perception capabilities are based on wearable sensors only. Finally, a visual active memory (VAM) is introduced as a flexible conceptual architecture for cognitive vision systems in general, and for assistance systems in particular. It adopts principles of human cognition to develop a representation for information stored in this memory. So-called memory processes continuously analyze, modify, and extend the content of this VAM. The functionality of the integrated system emerges from their coordinated interplay of these memory processes. An integrated assistance system applying the approaches and concepts outlined before is implemented on the basis of the visual active memory. The system architecture is discussed and some exemplary processing paths in this system are presented and discussed. It assists users in object manipulation tasks and has reached a maturity level that allows to conduct user studies. Quantitative results of different integrated memory processes are as well presented as an assessment of the interactive system by means of these user studies

    Real time correlation-based stereo: algorithm, implementations and applications

    Get PDF
    This paper describes some of the work on stereo that has been going on at INRIA in the last four years. The work has concentrated on obtaining dense, accurate and reliable range maps of the environment at rates compatible with the real-time constraints of such applications as the navigation of mobile vehicles in man-made or natural environments. The class of algorithms which has been selected among several is the class of algorithms which has been selected among several is the class of correlation-based stereo algorithms because they are the only ones that can produce sufficiently dense range maps with an algoritmic structure which lends itself nicely to fast implementations because of the simplicity of the underlying computation. We describe the various improvements that we have brought to the original idea, including validation and characterization of the quality of the matches, a recursive implementation of the score computation which makes the method independent of the size of the correlation window and a calibration method which does not require the use of a calibration pattern. We then describe two implementations of this algorithm on two very different pieces of hardware. The first implementation is on a board with four digital signal processors designed jointly with Matra MSII. This implementation can produce 64x64 range maps at rate varying between 200 and 400 ms, depending upon the range of disparities. The second implementation is on a board developed by DEC-PRL and can perform the cross-correlation of two 256X256 images in 140 ms. The first implementation has been integrated in the navigation system of the INRIA cart and used to correct for inertial and odometric errors in navigation experiments both indoors and outdoors on road. This is the first application of our correlation-based algorithm which is described in the paper. The second application has been done jointly with people from the french national space agence (CNES) to study the possibility of using stereo on a future planetary rover for the construction of digital elevation maps. We have shown that real time stereo is possible today at low-cost and can be applied in real applications. The algorithm that has been described is not the most sophisticated available but we have made it robust and reliable thanks to a number of improvements. Evan though each of these improvements is not earth-shattering from the pure research point of view, altogether they have allowed us to go beyond a very important threshold. This threshold measures the difference between a program that runs in the laboratory on a few images and one that works continuously for hours on a sequence of stereo pairs and produces results at such rates and of such quality that they can be used to guide a real vehicle or to produce discrete elevation maps. We believe that this threshold has only been reached in a very small number of cases

    Visual-inertial structure from motion: observability and resolvability

    Get PDF
    International audienceThis paper provides two novel contributions. The former regards the observability of the visual-inertial structure from motion. It is proven that, the information contained in the data provided by a monocular camera which observes a single point-feature and by an Inertial Measurement Unit (IMU) allows estimating the absolute scale, the speed in the local frame, the absolute roll and pitch angles, the biases which affect the accelerometer's and the gyroscope's measurements, the magnitude of the gravitational acceleration and the extrinsic camera-IMU calibration. The latter contribution is the derivation of a new closed form solution to determine some of the previous observable quantities by only using few camera measurements collected during a short time interval and the data provided by the IMU during the same time interval. This closed-solution allows us to investigate the intrinsic properties of the visual-inertial structure from motion and in particular to identify the conditions under which the problem has a finite number of solutions. Specifically, it is shown that the problem can have a unique solution, two distinct solutions and infinite solutions depending on the trajectory, on the number of point-features and on their layout and on the number of camera images. The proposed closed solution is finally used in conjunction with a filter based approach in order to show its benefit
    • 

    corecore