616 research outputs found
Anatomically-based skeleton kinetics and pose estimation in freely-moving rodents
Forming a complete picture of the relationship between neural activity and body kinetics requires quantification of skeletal joint biomechanics during behavior. However, without detailed knowledge of the underlying skeletal motion, inferring joint kinetics from surface tracking approaches is difficult, especially for animals where the relationship between surface anatomy and skeleton changes during motion. Here we developed a videography-based method enabling detailed three-dimensional kinetic quantification of an anatomically defined skeleton in untethered freely-behaving animals. This skeleton-based model has been constrained by anatomical principles and joint motion limits and provided skeletal pose estimates for a range of rodent sizes, even when limbs were occluded. Model-inferred joint kinetics for both gait and gap-crossing behaviors were verified by direct measurement of limb placement, showing that complex decision-making behaviors can be accurately reconstructed at the level of skeletal kinetics using our anatomically constrained model
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
Unbiased Learning of Deep Generative Models with Structured Discrete Representations
By composing graphical models with deep learning architectures, we learn
generative models with the strengths of both frameworks. The structured
variational autoencoder (SVAE) inherits structure and interpretability from
graphical models, and flexible likelihoods for high-dimensional data from deep
learning, but poses substantial optimization challenges. We propose novel
algorithms for learning SVAEs, and are the first to demonstrate the SVAE's
ability to handle multimodal uncertainty when data is missing by incorporating
discrete latent variables. Our memory-efficient implicit differentiation scheme
makes the SVAE tractable to learn via gradient descent, while demonstrating
robustness to incomplete optimization. To more rapidly learn accurate graphical
model parameters, we derive a method for computing natural gradients without
manual derivations, which avoids biases found in prior work. These optimization
innovations enable the first comparisons of the SVAE to state-of-the-art time
series models, where the SVAE performs competitively while learning
interpretable and structured discrete data representations.Comment: 38 pages, 7 figure
PravdÄ›podobnostnĂ modely pro lokalizaci bezpilotnĂho letounu testovanĂ© na reálnĂ˝ch datech
Práca sa zaoberá problĂ©mom odhadovania stavu dynamickĂ©ho systĂ©mu v oblasti robotiky, konkrĂ©tne bezpilotnĂ˝ch lietajĂşcich robotov. Na základe dát zĂskanĂ˝ch z robota navrhneme niekoÄľko pravdepodobnostnĂ˝ch modelov pre odhad jeho stavu (hlavne rĂ˝chlosti a rotaÄŤnĂ˝ch uhlov), takisto pre konfigurácie, kde jeden zo senzorov nie je dostupnĂ˝. PouĹľĂvame Kalmanov filter a ÄŚasticovĂ˝ filter a zameriavame sa na uÄŤenie parametrov modelu EM algoritmom. EM algoritmus je potom upravenĂ˝ vzhÄľadom k negaussovskĂ©mu rozloĹľeniu chyby niektorĂ˝ch senzorov a pridanĂm penalizaÄŤnĂ˝ch ÄŤlenov za zloĹľitosĹĄ modelu pre lepšie fungovanie na neznámych dátach. Tieto metĂłdy implementujeme v prostredĂ MATLAB a vyhodnotĂme na oddelenĂ˝ch dátach. V práci tieĹľ analyzujeme dáta z pozemnĂ©ho robota a pouĹľijeme našu implementáciu ÄŚasticovĂ©ho filtra pre odhad jeho polohy. Powered by TCPDF (www.tcpdf.org)The thesis addresses the dynamic state estimation problem for the field of robotics, particularly for unmanned aerial vehicles (UAVs). Based on data collected from an UAV, we design several probabilistic models for estimation of its state (mainly speed and rotation angles), including the configurations where one of the sensors is not available. We use Kalman filter and Particle filter and focus on learning the model parameters using EM algorithm. The EM algorithm is then adjusted with respect to non-Gaussian density of some sensor errors and modified using model complexity penalization terms for better generalization. We implement these methods in MATLAB environment and evaluate on separate datasets. We also analyze data from a ground robot and use our implementation of Particle filter for estimation of its position. Powered by TCPDF (www.tcpdf.org)Department of Theoretical Computer Science and Mathematical LogicKatedra teoretickĂ© informatiky a matematickĂ© logikyMatematicko-fyzikálnĂ fakultaFaculty of Mathematics and Physic
Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots
Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application
- …