345 research outputs found

    Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

    Get PDF
    We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201

    Single camera pose estimation using Bayesian filtering and Kinect motion priors

    Full text link
    Traditional approaches to upper body pose estimation using monocular vision rely on complex body models and a large variety of geometric constraints. We argue that this is not ideal and somewhat inelegant as it results in large processing burdens, and instead attempt to incorporate these constraints through priors obtained directly from training data. A prior distribution covering the probability of a human pose occurring is used to incorporate likely human poses. This distribution is obtained offline, by fitting a Gaussian mixture model to a large dataset of recorded human body poses, tracked using a Kinect sensor. We combine this prior information with a random walk transition model to obtain an upper body model, suitable for use within a recursive Bayesian filtering framework. Our model can be viewed as a mixture of discrete Ornstein-Uhlenbeck processes, in that states behave as random walks, but drift towards a set of typically observed poses. This model is combined with measurements of the human head and hand positions, using recursive Bayesian estimation to incorporate temporal information. Measurements are obtained using face detection and a simple skin colour hand detector, trained using the detected face. The suggested model is designed with analytical tractability in mind and we show that the pose tracking can be Rao-Blackwellised using the mixture Kalman filter, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. In addition, the use of the proposed upper body model allows reliable three-dimensional pose estimates to be obtained indirectly for a number of joints that are often difficult to detect using traditional object recognition strategies. Comparisons with Kinect sensor results and the state of the art in 2D pose estimation highlight the efficacy of the proposed approach.Comment: 25 pages, Technical report, related to Burke and Lasenby, AMDO 2014 conference paper. Code sample: https://github.com/mgb45/SignerBodyPose Video: https://www.youtube.com/watch?v=dJMTSo7-uF

    Detection and tracking of multiple moving objects in video.

    Get PDF

    Employing a RGB-D Sensor for Real-Time Tracking of Humans across Multiple Re-Entries in a Smart Environment

    Get PDF
    The term smart environment refers to physical spaces equipped with sensors feeding into adaptive algorithms that enable the environment to become sensitive and responsive to the presence and needs of its occupants. People with special needs, such as the elderly or disabled people, stand to benefit most from such environments as they offer sophisticated assistive functionalities supporting independent living and improved safety. In a smart environment, the key issue is to sense the location and identity of its users. In this paper, we intend to tackle the problems of detecting and tracking humans in a realistic home environment by exploiting the complementary nature of (synchronized) color and depth images produced by a low-cost consumer-level RGB-D camera. Our system selectively feeds the complementary data emanating from the two vision sensors to different algorithmic modules which together implement three sequential components: (1) object labeling based on depth data clustering, (2) human re-entry identification based on comparing visual signatures extracted from the color (RGB) information, and (3) human tracking based on the fusion of both depth and RGB data. Experimental results show that this division of labor improves the system’s efficiency and classification performance

    Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking

    Get PDF
    Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática

    Detection of Motorcycles in Urban Traffic Using Video Analysis: A Review

    Get PDF
    Motorcycles are Vulnerable Road Users (VRU) and as such, in addition to bicycles and pedestrians, they are the traffic actors most affected by accidents in urban areas. Automatic video processing for urban surveillance cameras has the potential to effectively detect and track these road users. The present review focuses on algorithms used for detection and tracking of motorcycles, using the surveillance infrastructure provided by CCTV cameras. Given the importance of results achieved by Deep Learning theory in the field of computer vision, the use of such techniques for detection and tracking of motorcycles is also reviewed. The paper ends by describing the performance measures generally used, publicly available datasets (introducing the Urban Motorbike Dataset (UMD) with quantitative evaluation results for different detectors), discussing the challenges ahead and presenting a set of conclusions with proposed future work in this evolving area
    • …
    corecore