Search CORE

2,307 research outputs found

Audiovisual head orientation estimation with particle filtering in multisensor scenarios

Author: Canton Ferrer Cristian
Casas Pla Josep Ramon
Hernando Pericás Francisco Javier
Pardàs Feliu Montse
Segura Perales Carlos
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER

Author: Brutti A
Cavallaro A
IEEE
Omologo M
Qian X
Publication venue
Publication date: 21/11/2017
Field of study

reserved4siWe propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array. After extracting audio-visual cues from individual modalities we fuse them adaptively using their reliability in a particle filter framework. The reliability of the audio signal is measured based on the maximum Global Coherence Field (GCF) peak value at each frame. The visual reliability is based on colour-histogram matching with detection results compared with a reference image in the RGB space. Experiments on the AV16.3 dataset show that the proposed adaptive audio-visual tracker outperforms both the individual modalities and a classical approach with fixed parameters in terms of tracking accuracy.Qian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, AndreaQian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, Andre

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Queen Mary Research Online

3D angle-of-arrival positioning using von Mises-Fisher distribution

Author: Ali-Löytty Simo
Nurminen Henri
Piché Robert
Suomalainen Laura
Publication venue
Publication date: 07/09/2017
Field of study

We propose modeling an angle-of-arrival (AOA) positioning measurement as a von Mises-Fisher (VMF) distributed unit vector instead of the conventional normally distributed azimuth and elevation measurements. Describing the 2-dimensional AOA measurement with three numbers removes discontinuities and reduces nonlinearity at the poles of the azimuth-elevation coordinate system. Our computer simulations show that the proposed VMF measurement noise model based filters outperform the normal distribution based algorithms in accuracy in a scenario where close-to-pole measurements occur frequently.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Trepo - Institutional Repository of Tampere University

Spatial context-aware person-following for a domestic robot

Author: Hanheide Marc
Sagerer Gerhard
Yuan Fang
Publication venue
Publication date: 01/01/2008
Field of study

Domestic robots are in the focus of research in terms of service providers in households and even as robotic companion that share the living space with humans. A major capability of mobile domestic robots that is joint exploration of space. One challenge to deal with this task is how could we let the robots move in space in reasonable, socially acceptable ways so that it will support interaction and communication as a part of the joint exploration. As a step towards this challenge, we have developed a context-aware following behav- ior considering these social aspects and applied these together with a multi-modal person-tracking method to switch between three basic following approaches, namely direction-following, path-following and parallel-following. These are derived from the observation of human-human following schemes and are activated depending on the current spatial context (e.g. free space) and the relative position of the interacting human. A combination of the elementary behaviors is performed in real time with our mobile robot in different environments. First experimental results are provided to demonstrate the practicability of the proposed approach

University of Lincoln Institutional Repository

Publications at Bielefeld University

Audio‐Visual Speaker Tracking

Author: Kılıç Volkan
Wang Wenwu
Publication venue: 'IntechOpen'
Publication date: 12/07/2017
Field of study

Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multi‐speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in real‐time tracking and localization of speakers. However, speaker tracking is a challenging task in real‐life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multi‐modal information, as it conveys complementary information about the state of the speakers compared to single‐modal tracking. To use multi‐modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a state‐of‐the‐art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field

IntechOpen

Crossref

Multimodal methods for blind source separation of audio sources

Author: Syed M.R. Naqvi (7200659)
Publication venue
Publication date: 01/01/2009
Field of study

The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

Loughborough University Institutional Repository

A multimodal approach to blind source separation of moving sources

Author: Jonathon Chambers (1251609)
Miao Yu (1252284)
Mohsen Naqvi (1252812)
Publication venue
Publication date: 01/01/2010
Field of study

A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying; thus, the unmixing filters should also be time varying, which are difficult to calculate in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on video cameras. Positions and velocities of the sources are obtained from the 3-D tracker based on a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary for a certain minimum period, a frequency domain BSS algorithm is implemented with an initialization derived from the positions of the source signals. Once the sources are moving, a beamforming algorithm which requires no prior statistical knowledge is used to perform real time speech enhancement and provide separation of the sources. Experimental results confirm that by utilizing the visual modality, the proposed algorithm not only improves the performance of the BSS algorithm and mitigates the permutation problem for stationary sources, but also provides a good BSS performance for moving sources in a low reverberant environment

Loughborough University Institutional Repository

Crossref

Surrey Research Insight

Audio-visual tracking of concurrent speakers

Author: Brutti A
Cavallaro A
Lanz O
Omologo M
Qian X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Audio-visual tracking of an unknown number of concurrent speakers in 3D is a challenging task, especially when sound and video are collected with a compact sensing platform. In this paper, we propose a tracker that builds on generative and discriminative audio-visual likelihood models formulated in a particle filtering framework. We localize multiple concurrent speakers with a de-emphasized acoustic map assisted by the image detection-derived 3D video observations. The 3D multimodal observations are either assigned to existing tracks for discriminative likelihood computation or used to initialize new tracks. The generative likelihoods rely on color distribution of the target and the de-emphasized acoustic map value. Experiments on AV16.3 and CAV3D datasets show that the proposed tracker outperforms the uni-modal trackers and the state-of-the-art approaches both in 3D and on the image plane

Archivio della ricerca - Fondazione Bruno Kessler

Queen Mary Research Online

Evaluating indoor positioning systems in a shopping mall : the lessons learned from the IPIN 2018 competition

Author: Ali Muhammad Usman
Ben-Moshe Boaz
Chien Ying-Ren
Cho Eunyoung
Ding Zhenxing
Fang Shih-Hau
Hacohen Shlomi
Han Jaeseung
Hur Soojung
Jeong Hyeongyo
Jun Sungwoo
Knauth Stefan
Kronenwett Nikolai
Kuang Jian
Landa Vlad
Landau Yael
Lee Changeun
Lee Keumryeol
Lee Soyeon
Lee Yonghyun
Li Xianghong
Li Yu
Lu Chuanhua
Lungenstrass Tomas
Marbel Revital
Martin Mendoza-Silva German
Niu Xiaoji
Opiela Miroslav
Ortiz Miguel
Pablo Morales Juan
Park Chan Gook
Park Changjun
Park Sangjoon
Park So Young
Park Yongwan
Perez-Navarro Antoni
Perul Johan
Pipelidis Georgios
Plets David
Ramon Jimenez Antonio
Renaudin Valerie
Rew Jehyeok
Seco Fernando
Shimada Atsushi
Shvalb Nir
Taniguchi Rin-Ichiro
Thomas Diego
Torres-Sospedra Joaquin
Trogh Jens
Tsao Yu
Tsiamitros Nikolaos
Uchiyama Hideaki
Vladimirov Blagovest
Wei Dongyan
Xu Feng
Yang Shi-Shen
Ye Feng
Ye Shih-Jyun
Zhang Wenchao
Zhang Ying
Zheng Xingyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

The Indoor Positioning and Indoor Navigation (IPIN) conference holds an annual competition in which indoor localization systems from different research groups worldwide are evaluated empirically. The objective of this competition is to establish a systematic evaluation methodology with rigorous metrics both for real-time (on-site) and post-processing (off-site) situations, in a realistic environment unfamiliar to the prototype developers. For the IPIN 2018 conference, this competition was held on September 22nd, 2018, in Atlantis, a large shopping mall in Nantes (France). Four competition tracks (two on-site and two off-site) were designed. They consisted of several 1 km routes traversing several floors of the mall. Along these paths, 180 points were topographically surveyed with a 10 cm accuracy, to serve as ground truth landmarks, combining theodolite measurements, differential global navigation satellite system (GNSS) and 3D scanner systems. 34 teams effectively competed. The accuracy score corresponds to the third quartile (75th percentile) of an error metric that combines the horizontal positioning error and the floor detection. The best results for the on-site tracks showed an accuracy score of 11.70 m (Track 1) and 5.50 m (Track 2), while the best results for the off-site tracks showed an accuracy score of 0.90 m (Track 3) and 1.30 m (Track 4). These results showed that it is possible to obtain high accuracy indoor positioning solutions in large, realistic environments using wearable light-weight sensors without deploying any beacon. This paper describes the organization work of the tracks, analyzes the methodology used to quantify the results, reviews the lessons learned from the competition and discusses its future

Ghent University Academic Bibliography

Digital.CSIC