2,002 research outputs found

    Acoustic Echo Estimation using the model-based approach with Application to Spatial Map Construction in Robotics

    Get PDF

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application

    Trajectory Prediction with Event-Based Cameras for Robotics Applications

    Get PDF
    This thesis presents the study, analysis, and implementation of a framework to perform trajectory prediction using an event-based camera for robotics applications. Event-based perception represents a novel computation paradigm based on unconventional sensing technology that holds promise for data acquisition, transmission, and processing at very low latency and power consumption, crucial in the future of robotics. An event-based camera, in particular, is a sensor that responds to light changes in the scene, producing an asynchronous and sparse output over a wide illumination dynamic range. They only capture relevant spatio-temporal information - mostly driven by motion - at high rate, avoiding the inherent redundancy in static areas of the field of view. For such reasons, this device represents a potential key tool for robots that must function in highly dynamic and/or rapidly changing scenarios, or where the optimisation of the resources is fundamental, like robots with on-board systems. Prediction skills are something humans rely on daily - even unconsciously - for instance when driving, playing sports, or collaborating with other people. In the same way, predicting the trajectory or the end-point of a moving target allows a robot to plan for appropriate actions and their timing in advance, interacting with it in many different manners. Moreover, prediction is also helpful for compensating robot internal delays in the perception-action chain, due for instance to limited sensors and/or actuators. The question I addressed in this work is whether event-based cameras are advantageous or not in trajectory prediction for robotics. In particular, if classical deep learning architecture used for this task can accommodate for event-based data, working asynchronously, and which benefit they can bring with respect to standard cameras. The a priori hypothesis is that being the sampling of the scene driven by motion, such a device would allow for more meaningful information acquisition, improving the prediction accuracy and processing data only when needed - without any information loss or redundant acquisition. To test the hypothesis, experiments are mostly carried out using the neuromorphic iCub, a custom version of the iCub humanoid platform that mounts two event-based cameras in the eyeballs, along with standard RGB cameras. To further motivate the work on iCub, a preliminary step is the evaluation of the robot's internal delays, a value that should be compensated by the prediction to interact in real-time with the object perceived. The first part of this thesis sees the implementation of the event-based framework for prediction, to answer the question if Long Short-Term Memory neural networks, the architecture used in this work, can be combined with event-based cameras. The task considered is the handover Human-Robot Interaction, during which the trajectory of the object in the human's hand must be inferred. Results show that the proposed pipeline can predict both spatial and temporal coordinates of the incoming trajectory with higher accuracy than model-based regression methods. Moreover, fast recovery from failure cases and adaptive prediction horizon behavior are exhibited. Successively, I questioned how much the event-based sampling approach can be convenient with respect to the classical fixed-rate approach. The test case used is the trajectory prediction of a bouncing ball, implemented with the pipeline previously introduced. A comparison between the two sampling methods is analysed in terms of error for different working rates, showing how the spatial sampling of the event-based approach allows to achieve lower error and also to adapt the computational load dynamically, depending on the motion in the scene. Results from both works prove that the merging of event-based data and Long Short-Term Memory networks looks promising for spatio-temporal features prediction in highly dynamic tasks, and paves the way to further studies about the temporal aspect and to a wide range of applications, not only robotics-related. Ongoing work is now focusing on the robot control side, finding the best way to exploit the spatio-temporal information provided by the predictor and defining the optimal robot behavior. Future work will see the shift of the full pipeline - prediction and robot control - to a spiking implementation. First steps in this direction have been already made thanks to a collaboration with a group from the University of Zurich, with which I propose a closed-loop motor controller implemented on a mixed-signal analog/digital neuromorphic processor, emulating a classical PID controller by means of spiking neural networks

    Scenic: A Language for Scenario Specification and Scene Generation

    Full text link
    We propose a new probabilistic programming language for the design and analysis of perception systems, especially those based on machine learning. Specifically, we consider the problems of training a perception system to handle rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs and sampling these to generate specialized training and test sets. More generally, such languages can be used for cyber-physical systems and robotics to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment is a "scene", a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing "scenarios" that are distributions over scenes. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.Comment: 41 pages, 36 figures. Full version of a PLDI 2019 paper (extending UC Berkeley EECS Department Tech Report No. UCB/EECS-2018-8
    corecore