74 research outputs found

    Performance Analysis for Visual Planetary Landing Navigation Using Optical Flow and DEM Matching

    Full text link
    Visual navigation for planetary landing vehicles shows many scientific and technical challenges due to inclined and rather high velocity approach trajectories, complex 3D environment and high computational requirements for real-time image processing. High relative navigation accuracy at landing site is required for obstacle avoidance and operational constraints. The current paper discusses detailed performance analysis results for a recently published concept of a visual navigation system, based on a mono camera as vision sensor and matching of the recovered and reference 3D models of the landing site. The recovered 3D models are being produced by real-time, instantaneous optical flow processing of the navigation camera images. An embedded optical correlator is introduced, which allows a robust and ultra high-speed optical flow processing under different and even unfavorable illumination conditions. The performance analysis is based on a detailed software simulation model of the visual navigation system, including the optical correlator as the key component for ultra-high speed image processing. The paper recalls the general structure of the navigation system and presents detailed end-to-end visual navigation performance results for a Mercury landing reference mission in terms of different visual navigation entry conditions, reference DEM resolution, navigation camera configuration and auxiliary sensor information. I

    Automatic Food Intake Assessment Using Camera Phones

    Get PDF
    Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors

    Vision systems for autonomous aircraft guidance

    Get PDF

    Detail Enhancing Denoising of Digitized 3D Models from a Mobile Scanning System

    Get PDF
    The acquisition process of digitizing a large-scale environment produces an enormous amount of raw geometry data. This data is corrupted by system noise, which leads to 3D surfaces that are not smooth and details that are distorted. Any scanning system has noise associate with the scanning hardware, both digital quantization errors and measurement inaccuracies, but a mobile scanning system has additional system noise introduced by the pose estimation of the hardware during data acquisition. The combined system noise generates data that is not handled well by existing noise reduction and smoothing techniques. This research is focused on enhancing the 3D models acquired by mobile scanning systems used to digitize large-scale environments. These digitization systems combine a variety of sensors – including laser range scanners, video cameras, and pose estimation hardware – on a mobile platform for the quick acquisition of 3D models of real world environments. The data acquired by such systems are extremely noisy, often with significant details being on the same order of magnitude as the system noise. By utilizing a unique 3D signal analysis tool, a denoising algorithm was developed that identifies regions of detail and enhances their geometry, while removing the effects of noise on the overall model. The developed algorithm can be useful for a variety of digitized 3D models, not just those involving mobile scanning systems. The challenges faced in this study were the automatic processing needs of the enhancement algorithm, and the need to fill a hole in the area of 3D model analysis in order to reduce the effect of system noise on the 3D models. In this context, our main contributions are the automation and integration of a data enhancement method not well known to the computer vision community, and the development of a novel 3D signal decomposition and analysis tool. The new technologies featured in this document are intuitive extensions of existing methods to new dimensionality and applications. The totality of the research has been applied towards detail enhancing denoising of scanned data from a mobile range scanning system, and results from both synthetic and real models are presented

    BIO-INSPIRED DISTURBANCE REJECTION WITH OCELLAR AND DISTRIBUTED ACCELERATION SENSING FOR SMALL UNMANNED AIRCRAFT SYSTEMS

    Get PDF
    Rapid sensing of body motions is critical to stabilizing a flight vehicle in the presence of exogenous disturbances as well as providing high performance tracking of desired control commands. This bandwidth requirement becomes more stringent as vehicle scale decreases. In many flying insects three simple eyes, known as the ocelli, operate as low latency visual egomotion sensors. Furthermore many flying insects employ distributed networks of acceleration-sensitive sensors to provide information about body egomotion to rapidly detect external forces and torques. In this work, simulation modeling of the ocelli visual system common to flying insects was performed based on physiological and behavioral data. Linear state estimation matrices were derived from the measurement models to form estimates of egomotion states. A fully analog ocellar sensor was designed and constructed based on these models, producing state estimation outputs. These analog state estimate outputs were characterized in the presence of egomotion stimuli. Feedback from the ocellar sensor, with and without complementary input from optic flow sensors, was implemented on a quadrotor to perform stabilization and disturbance rejection. The performance of the closed loop sensor feedback was compared to baseline inertial feedback. A distributed array of digital accelerometers was constructed to sense rapid force and torque measurements. The response of the array to induced motion stimuli was characterized and an automated calibration algorithm was formulated to estimate sensor position and orientation. A linear state estimation matrix was derived from the calibration to directly estimate forces and torques. The force and torque estimates provided by the sensor network were used to augment the quadrotor inner loop controller to improve tracking of desired commands in the presence of exogenous force and torque disturbances with a force-adaptive feedback control

    On the relationship between neuronal codes and mental models

    Get PDF
    Das übergeordnete Ziel meiner Arbeit an dieser Dissertation war ein besseres Verständnis des Zusammenhangs von mentalen Modellen und den zugrundeliegenden Prinzipien, die zur Selbstorganisation neuronaler Verschaltung führen. Die Dissertation besteht aus vier individuellen Publikationen, die dieses Ziel aus unterschiedlichen Perspektiven angehen. Während die Selbstorganisation von Sparse-Coding-Repräsentationen in neuronalem Substrat bereits ausgiebig untersucht worden ist, sind viele Forschungsfragen dazu, wie Sparse-Coding für höhere, kognitive Prozesse genutzt werden könnte noch offen. Die ersten zwei Studien, die in Kapitel 2 und Kapitel 3 enthalten sind, behandeln die Frage, inwieweit Repräsentationen, die mit Sparse-Coding entstehen, mentalen Modellen entsprechen. Wir haben folgende Selektivitäten in Sparse-Coding-Repräsentationen identifiziert: mit Stereo-Bildern als Eingangsdaten war die Repräsentation selektiv für die Disparitäten von Bildstrukturen, welche für das Abschätzen der Entfernung der Strukturen zum Beobachter genutzt werden können. Außerdem war die Repräsentation selektiv für die die vorherrschende Orientierung in Texturen, was für das Abschätzen der Neigung von Oberflächen genutzt werden kann. Mit optischem Fluss von Eigenbewegung als Eingangsdaten war die Repräsentation selektiv für die Richtung der Eigenbewegung in den sechs Freiheitsgraden. Wegen des direkten Zusammenhangs der Selektivitäten mit physikalischen Eigenschaften können Repräsentationen, die mit Sparse-Coding entstehen, als frühe sensorische Modelle der Umgebung dienen. Die kognitiven Prozesse hinter räumlichem Wissen ruhen auf mentalen Modellen, welche die Umgebung representieren. Wir haben in der dritten Studie, welche in Kapitel 4 enthalten ist, ein topologisches Modell zur Navigation präsentiert, Es beschreibt einen dualen Populations-Code, bei dem der erste Populations-Code Orte anhand von Orts-Feldern (Place-Fields) kodiert und der zweite Populations-Code Bewegungs-Instruktionen, basierend auf der Verknüpfung von Orts-Feldern, kodiert. Der Fokus lag nicht auf der Implementation in biologischem Substrat oder auf einer exakten Modellierung physiologischer Ergebnisse. Das Modell ist eine biologisch plausible, einfache Methode zur Navigation, welche sich an einen Zwischenschritt emergenter Navigations-Fähigkeiten in einer evolutiven Navigations-Hierarchie annähert. Unser automatisierter Test der Sehleistungen von Mäusen, welcher in Kapitel 5 beschrieben wird, ist ein Beispiel von Verhaltens-Tests im Wahrnehmungs-Handlungs-Zyklus (Perception-Action-Cycle). Das Ziel dieser Studie war die Quantifizierung des optokinetischen Reflexes. Wegen des reichhaltigen Verhaltensrepertoires von Mäusen sind für die Quantifizierung viele umfangreiche Analyseschritte erforderlich. Tiere und Menschen sind verkörperte (embodied) lebende Systeme und daher aus stark miteinander verwobenen Modulen oder Entitäten zusammengesetzt, welche außerdem auch mit der Umgebung verwoben sind. Um lebende Systeme als Ganzes zu studieren ist es notwendig Hypothesen, zum Beispiel zur Natur mentaler Modelle, im Wahrnehmungs-Handlungs-Zyklus zu testen. Zusammengefasst erweitern die Studien dieser Dissertation unser Verständnis des Charakters früher sensorischer Repräsentationen als mentale Modelle, sowie unser Verständnis höherer, mentalen Modellen für die räumliche Navigation. Darüber hinaus enthält es ein Beispiel für das Evaluieren von Hypothesn im Wahr\-neh\-mungs-Handlungs-Zyklus.The superordinate aim of my work towards this thesis was a better understanding of the relationship between mental models and the underlying principles that lead to the self-organization of neuronal circuitry. The thesis consists of four individual publications, which approach this goal from differing perspectives. While the formation of sparse coding representations in neuronal substrate has been investigated extensively, many research questions on how sparse coding may be exploited for higher cognitive processing are still open. The first two studies, included as chapter 2 and chapter 3, asked to what extend representations obtained with sparse coding match mental models. We identified the following selectivities in sparse coding representations: with stereo images as input, the representation was selective for the disparity of image structures, which can be used to infer the distance of structures to the observer. Furthermore, it was selective to the predominant orientation in textures, which can be used to infer the orientation of surfaces. With optic flow from egomotion as input, the representation was selective to the direction of egomotion in 6 degrees of freedom. Due to the direct relation between selectivity and physical properties, these representations, obtained with sparse coding, can serve as early sensory models of the environment. The cognitive processes behind spatial knowledge rest on mental models that represent the environment. We presented a topological model for wayfinding in the third study, included as chapter 4. It describes a dual population code, where the first population code encodes places by means of place fields, and the second population code encodes motion instructions based on links between place fields. We did not focus on an implementation in biological substrate or on an exact fit to physiological findings. The model is a biologically plausible, parsimonious method for wayfinding, which may be close to an intermediate step of emergent skills in an evolutionary navigational hierarchy. Our automated testing for visual performance in mice, included in chapter 5, is an example of behavioral testing in the perception-action cycle. The goal of this study was to quantify the optokinetic reflex. Due to the rich behavioral repertoire of mice, quantification required many elaborate steps of computational analyses. Animals and humans are embodied living systems, and therefore composed of strongly enmeshed modules or entities, which are also enmeshed with the environment. In order to study living systems as a whole, it is necessary to test hypothesis, for example on the nature of mental models, in the perception-action cycle. In summary, the studies included in this thesis extend our view on the character of early sensory representations as mental models, as well as on high-level mental models for spatial navigation. Additionally it contains an example for the evaluation of hypotheses in the perception-action cycle

    Recovering Heading for Visually-Guided Navigation

    Get PDF
    We present a model for recovering the direction of heading of an observer who is moving relative to a scene that may contain self-moving objects. The model builds upon an algorithm proposed by Rieger and Lawton (1985), which is based on earlier work by Longuet-Higgens and Prazdny (1981). The algorithm uses velocity differences computed in regions of high depth variation to estimate the location of the focus of expansion, which indicates the observer's heading direction. We relate the behavior of the proposed model to psychophysical observations regarding the ability of human observers to judge their heading direction, and show how the model can cope with self-moving objects in the environment. We also discuss this model in the broader context of a navigational system that performs tasks requiring rapid sensing and response through the interaction of simple task-specific routines

    Computational Imaging Approach to Recovery of Target Coordinates Using Orbital Sensor Data

    Get PDF
    This dissertation addresses the components necessary for simulation of an image-based recovery of the position of a target using orbital image sensors. Each component is considered in detail, focusing on the effect that design choices and system parameters have on the accuracy of the position estimate. Changes in sensor resolution, varying amounts of blur, differences in image noise level, selection of algorithms used for each component, and lag introduced by excessive processing time all contribute to the accuracy of the result regarding recovery of target coordinates using orbital sensor data. Using physical targets and sensors in this scenario would be cost-prohibitive in the exploratory setting posed, therefore a simulated target path is generated using Bezier curves which approximate representative paths followed by the targets of interest. Orbital trajectories for the sensors are designed on an elliptical model representative of the motion of physical orbital sensors. Images from each sensor are simulated based on the position and orientation of the sensor, the position of the target, and the imaging parameters selected for the experiment (resolution, noise level, blur level, etc.). Post-processing of the simulated imagery seeks to reduce noise and blur and increase resolution. The only information available for calculating the target position by a fully implemented system are the sensor position and orientation vectors and the images from each sensor. From these data we develop a reliable method of recovering the target position and analyze the impact on near-realtime processing. We also discuss the influence of adjustments to system components on overall capabilities and address the potential system size, weight, and power requirements from realistic implementation approaches

    Differentiable world programs

    Full text link
    L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action. Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies. Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés. La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA. Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond. Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche. Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action. However, these advancements have undermined many ``classical'' techniques developed over the last few decades. We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents. ``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI. Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning. Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes. Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image

    Two-view Geometry Estimation Unaffected by a Dominant Plane

    Get PDF
    A RANSAC-based algorithm for robust estimation of epipolar geometry from point correspondences in the possible presence of a dominant scene plane is presented. The algorithm handles scenes with (i) all points in a single plane, (ii) majority of points in a single plane and the rest off the plane, (iii) no dominant plane. It is not required to know a priori which of the cases (i)-(iii) occurs. The algorithm exploits a theorem we proved, that if five or more of seven correspondences are related by a homography then there is an epipolar geometry consistent with the seven-tuple as well as with all correspondences related by the homography. This means that a seven point sample consisting of two outliers and five inliers lying in a dominant plane produces an epipolar geometry which is wrong and yet consistent with a high number of correspondences. The theorem explains why RANSAC often fails to estimate epipolar geometry in the presence of a dominant plane. Rather surprisingly, the theorem also implies that RANSAC-based homography estimation is faster when drawing nonminimal samples of seven correspondences than minimal samples of four correspondences
    • …
    corecore