1,035 research outputs found

    A sticky HDP-HMM with application to speaker diarization

    Get PDF
    We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Probabilistic Models of Motor Production

    Get PDF
    N. Bernstein defined the ability of the central neural system (CNS) to control many degrees of freedom of a physical body with all its redundancy and flexibility as the main problem in motor control. He pointed at that man-made mechanisms usually have one, sometimes two degrees of freedom (DOF); when the number of DOF increases further, it becomes prohibitively hard to control them. The brain, however, seems to perform such control effortlessly. He suggested the way the brain might deal with it: when a motor skill is being acquired, the brain artificially limits the degrees of freedoms, leaving only one or two. As the skill level increases, the brain gradually "frees" the previously fixed DOF, applying control when needed and in directions which have to be corrected, eventually arriving to the control scheme where all the DOF are "free". This approach of reducing the dimensionality of motor control remains relevant even today. One the possibles solutions of the Bernstetin's problem is the hypothesis of motor primitives (MPs) - small building blocks that constitute complex movements and facilitite motor learnirng and task completion. Just like in the visual system, having a homogenious hierarchical architecture built of similar computational elements may be beneficial. Studying such a complicated object as brain, it is important to define at which level of details one works and which questions one aims to answer. David Marr suggested three levels of analysis: 1. computational, analysing which problem the system solves; 2. algorithmic, questioning which representation the system uses and which computations it performs; 3. implementational, finding how such computations are performed by neurons in the brain. In this thesis we stay at the first two levels, seeking for the basic representation of motor output. In this work we present a new model of motor primitives that comprises multiple interacting latent dynamical systems, and give it a full Bayesian treatment. Modelling within the Bayesian framework, in my opinion, must become the new standard in hypothesis testing in neuroscience. Only the Bayesian framework gives us guarantees when dealing with the inevitable plethora of hidden variables and uncertainty. The special type of coupling of dynamical systems we proposed, based on the Product of Experts, has many natural interpretations in the Bayesian framework. If the dynamical systems run in parallel, it yields Bayesian cue integration. If they are organized hierarchically due to serial coupling, we get hierarchical priors over the dynamics. If one of the dynamical systems represents sensory state, we arrive to the sensory-motor primitives. The compact representation that follows from the variational treatment allows learning of a motor primitives library. Learned separately, combined motion can be represented as a matrix of coupling values. We performed a set of experiments to compare different models of motor primitives. In a series of 2-alternative forced choice (2AFC) experiments participants were discriminating natural and synthesised movements, thus running a graphics Turing test. When available, Bayesian model score predicted the naturalness of the perceived movements. For simple movements, like walking, Bayesian model comparison and psychophysics tests indicate that one dynamical system is sufficient to describe the data. For more complex movements, like walking and waving, motion can be better represented as a set of coupled dynamical systems. We also experimentally confirmed that Bayesian treatment of model learning on motion data is superior to the simple point estimate of latent parameters. Experiments with non-periodic movements show that they do not benefit from more complex latent dynamics, despite having high kinematic complexity. By having a fully Bayesian models, we could quantitatively disentangle the influence of motion dynamics and pose on the perception of naturalness. We confirmed that rich and correct dynamics is more important than the kinematic representation. There are numerous further directions of research. In the models we devised, for multiple parts, even though the latent dynamics was factorized on a set of interacting systems, the kinematic parts were completely independent. Thus, interaction between the kinematic parts could be mediated only by the latent dynamics interactions. A more flexible model would allow a dense interaction on the kinematic level too. Another important problem relates to the representation of time in Markov chains. Discrete time Markov chains form an approximation to continuous dynamics. As time step is assumed to be fixed, we face with the problem of time step selection. Time is also not a explicit parameter in Markov chains. This also prohibits explicit optimization of time as parameter and reasoning (inference) about it. For example, in optimal control boundary conditions are usually set at exact time points, which is not an ecological scenario, where time is usually a parameter of optimization. Making time an explicit parameter in dynamics may alleviate this

    Dynamic Switching State Systems for Visual Tracking

    Get PDF
    This work addresses the problem of how to capture the dynamics of maneuvering objects for visual tracking. Towards this end, the perspective of recursive Bayesian filters and the perspective of deep learning approaches for state estimation are considered and their functional viewpoints are brought together

    Facial Expression Recognition in the Presence of Head Motion

    Get PDF

    The Future of Humanoid Robots

    Get PDF
    This book provides state of the art scientific and engineering research findings and developments in the field of humanoid robotics and its applications. It is expected that humanoids will change the way we interact with machines, and will have the ability to blend perfectly into an environment already designed for humans. The book contains chapters that aim to discover the future abilities of humanoid robots by presenting a variety of integrated research in various scientific and engineering fields, such as locomotion, perception, adaptive behavior, human-robot interaction, neuroscience and machine learning. The book is designed to be accessible and practical, with an emphasis on useful information to those working in the fields of robotics, cognitive science, artificial intelligence, computational methods and other fields of science directly or indirectly related to the development and usage of future humanoid robots. The editor of the book has extensive R&D experience, patents, and publications in the area of humanoid robotics, and his experience is reflected in editing the content of the book
    • …
    corecore