1,380 research outputs found

    Motion planning for robot audition

    Get PDF
    International audienceRobot audition refers to a range of hearing capabilities which help robots explore and understand their environment. Among them, sound source localization is the problem of estimating the location of a sound source given measurements of its angle of arrival with respect to a microphone array mounted on the robot. In addition, robot motion can help quickly solve the front-back ambiguity existing in a linear microphone array. In this article, we focus on the problem of exploiting robot motion to improve the estimation of the location of an intermittent and possibly moving source in a noisy and reverberant environment. We first propose a robust extended mixture Kalman filtering framework for jointly estimating the source location and its activity over time. Building on this framework, we then propose a long-term robot motion planning algorithm based on Monte Carlo tree search to find an optimal robot trajectory according to two alternative criteria: the Shannon entropy or the standard deviation of the estimated belief on the source location. These criteria are integrated over time using a discount factor. Experimental results show the robustness of the proposed estimation framework to false angle of arrival measurements within ±20◦ and 10% false source activity detection rate. The proposed robot motion planning technique achieves an average localization error 48.7% smaller than a one-step-ahead method. In addition, we compare the correlation between the estimation error and the two criteria, and investigate the effect of the discount factor on the performance of the proposed motion planning algorithm

    Robust and Efficient Inference of Scene and Object Motion in Multi-Camera Systems

    Get PDF
    Multi-camera systems have the ability to overcome some of the fundamental limitations of single camera based systems. Having multiple view points of a scene goes a long way in limiting the influence of field of view, occlusion, blur and poor resolution of an individual camera. This dissertation addresses robust and efficient inference of object motion and scene in multi-camera and multi-sensor systems. The first part of the dissertation discusses the role of constraints introduced by projective imaging towards robust inference of multi-camera/sensor based object motion. We discuss the role of the homography and epipolar constraints for fusing object motion perceived by individual cameras. For planar scenes, the homography constraints provide a natural mechanism for data association. For scenes that are not planar, the epipolar constraint provides a weaker multi-view relationship. We use the epipolar constraint for tracking in multi-camera and multi-sensor networks. In particular, we show that the epipolar constraint reduces the dimensionality of the state space of the problem by introducing a ``shared'' state space for the joint tracking problem. This allows for robust tracking even when one of the sensors fail due to poor SNR or occlusion. The second part of the dissertation deals with challenges in the computational aspects of tracking algorithms that are common to such systems. Much of the inference in the multi-camera and multi-sensor networks deal with complex non-linear models corrupted with non-Gaussian noise. Particle filters provide approximate Bayesian inference in such settings. We analyze the computational drawbacks of traditional particle filtering algorithms, and present a method for implementing the particle filter using the Independent Metropolis Hastings sampler, that is highly amenable to pipelined implementations and parallelization. We analyze the implementations of the proposed algorithm, and in particular concentrate on implementations that have minimum processing times. The last part of the dissertation deals with the efficient sensing paradigm of compressing sensing (CS) applied to signals in imaging, such as natural images and reflectance fields. We propose a hybrid signal model on the assumption that most real-world signals exhibit subspace compressibility as well as sparse representations. We show that several real-world visual signals such as images, reflectance fields, videos etc., are better approximated by this hybrid of two models. We derive optimal hybrid linear projections of the signal and show that theoretical guarantees and algorithms designed for CS can be easily extended to hybrid subspace-compressive sensing. Such methods reduce the amount of information sensed by a camera, and help in reducing the so called data deluge problem in large multi-camera systems

    Performance Analysis of Bearings-only Tracking Problems for Maneuvering Target and Heterogeneous Sensor Applications

    Get PDF
    State estimation, i.e. determining the trajectory, of a maneuvering target from noisy measurements collected by a single or multiple passive sensors (e.g. passive sonar and radar) has wide civil and military applications, for example underwater surveillance, air defence, wireless communications, and self-protection of military vehicles. These passive sensors are listening to target emitted signals without emitting signals themselves which give them concealing properties. Tactical scenarios exists where the own position shall not be revealed, e.g. for tracking submarines with passive sonar or tracking an aerial target by means of electro-optic image sensors like infrared sensors. This estimation process is widely known as bearings-only tracking. On the one hand, a challenge is the high degree of nonlinearity in the estimation process caused by the nonlinear relation of angular measurements to the Cartesian state. On the other hand, passive sensors cannot provide direct target location measurements, so bearings-only tracking suffers from poor target trajectory estimation accuracy due to marginal observability from sensor measurements. In order to achieve observability, that means to be able to estimate the complete target state, multiple passive sensor measurements must be fused. The measurements can be recorded spatially distributed by multiple dislocated sensor platforms or temporally distributed by a single, moving sensor platform. Furthermore, an extended case of bearings-only tracking is given if heterogeneous measurements from targets emitting different types of signals, are involved. With this, observability can also be achieved on a single, not necessarily moving platform. In this work, a performance bound for complex motion models, i.e. piecewisely maneuvering targets with unknown maneuver change times, by means of bearings-only measurements from a single, moving sensor platform is derived and an efficient estimator is implemented and analyzed. Furthermore, an observability analysis is carried out for targets emitting acoustic and electromagnetic signals. Here, the different signal propagation velocities can be exploited to ensure observability on a single, not necessarily moving platform. Based on the theoretical performance and observability analyses a distributed fusion system has been realized by means of heterogeneous sensors, which shall detect an event and localize a threat. This is performed by a microphone array to detect sound waves emitted by the threat as well as a radar detector that detects electromagnetic emissions from the threat. Since multiple platforms are involved to provide increased observability and also redundancy against possible breakdowns, a WiFi mobile ad hoc network is used for communications. In order to keep up the network in a breakdown OLSR (optimized link state routing) routing approach is employed

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application

    Sparse Bayesian information filters for localization and mapping

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution February 2008This thesis formulates an estimation framework for Simultaneous Localization and Mapping (SLAM) that addresses the problem of scalability in large environments. We describe an estimation-theoretic algorithm that achieves significant gains in computational efficiency while maintaining consistent estimates for the vehicle pose and the map of the environment. We specifically address the feature-based SLAM problem in which the robot represents the environment as a collection of landmarks. The thesis takes a Bayesian approach whereby we maintain a joint posterior over the vehicle pose and feature states, conditioned upon measurement data. We model the distribution as Gaussian and parametrize the posterior in the canonical form, in terms of the information (inverse covariance) matrix. When sparse, this representation is amenable to computationally efficient Bayesian SLAM filtering. However, while a large majority of the elements within the normalized information matrix are very small in magnitude, it is fully populated nonetheless. Recent feature-based SLAM filters achieve the scalability benefits of a sparse parametrization by explicitly pruning these weak links in an effort to enforce sparsity. We analyze one such algorithm, the Sparse Extended Information Filter (SEIF), which has laid much of the groundwork concerning the computational benefits of the sparse canonical form. The thesis performs a detailed analysis of the process by which the SEIF approximates the sparsity of the information matrix and reveals key insights into the consequences of different sparsification strategies. We demonstrate that the SEIF yields a sparse approximation to the posterior that is inconsistent, suffering from exaggerated confidence estimates. This overconfidence has detrimental effects on important aspects of the SLAM process and affects the higher level goal of producing accurate maps for subsequent localization and path planning. This thesis proposes an alternative scalable filter that maintains sparsity while preserving the consistency of the distribution. We leverage insights into the natural structure of the feature-based canonical parametrization and derive a method that actively maintains an exactly sparse posterior. Our algorithm exploits the structure of the parametrization to achieve gains in efficiency, with a computational cost that scales linearly with the size of the map. Unlike similar techniques that sacrifice consistency for improved scalability, our algorithm performs inference over a posterior that is conservative relative to the nominal Gaussian distribution. Consequently, we preserve the consistency of the pose and map estimates and avoid the effects of an overconfident posterior. We demonstrate our filter alongside the SEIF and the standard EKF both in simulation as well as on two real-world datasets. While we maintain the computational advantages of an exactly sparse representation, the results show convincingly that our method yields conservative estimates for the robot pose and map that are nearly identical to those of the original Gaussian distribution as produced by the EKF, but at much less computational expense. The thesis concludes with an extension of our SLAM filter to a complex underwater environment. We describe a systems-level framework for localization and mapping relative to a ship hull with an Autonomous Underwater Vehicle (AUV) equipped with a forward-looking sonar. The approach utilizes our filter to fuse measurements of vehicle attitude and motion from onboard sensors with data from sonar images of the hull. We employ the system to perform three-dimensional, 6-DOF SLAM on a ship hull

    USING PROBABILISTIC GRAPHICAL MODELS TO DRAW INFERENCES IN SENSOR NETWORKS WITH TRACKING APPLICATIONS

    Get PDF
    Sensor networks have been an active research area in the past decade due to the variety of their applications. Many research studies have been conducted to solve the problems underlying the middleware services of sensor networks, such as self-deployment, self-localization, and synchronization. With the provided middleware services, sensor networks have grown into a mature technology to be used as a detection and surveillance paradigm for many real-world applications. The individual sensors are small in size. Thus, they can be deployed in areas with limited space to make unobstructed measurements in locations where the traditional centralized systems would have trouble to reach. However, there are a few physical limitations to sensor networks, which can prevent sensors from performing at their maximum potential. Individual sensors have limited power supply, the wireless band can get very cluttered when multiple sensors try to transmit at the same time. Furthermore, the individual sensors have limited communication range, so the network may not have a 1-hop communication topology and routing can be a problem in many cases. Carefully designed algorithms can alleviate the physical limitations of sensor networks, and allow them to be utilized to their full potential. Graphical models are an intuitive choice for designing sensor network algorithms. This thesis focuses on a classic application in sensor networks, detecting and tracking of targets. It develops feasible inference techniques for sensor networks using statistical graphical model inference, binary sensor detection, events isolation and dynamic clustering. The main strategy is to use only binary data for rough global inferences, and then dynamically form small scale clusters around the target for detailed computations. This framework is then extended to network topology manipulation, so that the framework developed can be applied to tracking in different network topology settings. Finally the system was tested in both simulation and real-world environments. The simulations were performed on various network topologies, from regularly distributed networks to randomly distributed networks. The results show that the algorithm performs well in randomly distributed networks, and hence requires minimum deployment effort. The experiments were carried out in both corridor and open space settings. A in-home falling detection system was simulated with real-world settings, it was setup with 30 bumblebee radars and 30 ultrasonic sensors driven by TI EZ430-RF2500 boards scanning a typical 800 sqft apartment. Bumblebee radars are calibrated to detect the falling of human body, and the two-tier tracking algorithm is used on the ultrasonic sensors to track the location of the elderly people
    • …
    corecore