Search CORE

3 research outputs found

Robust indoor speaker recognition in a network of audio and video sensors

Author: D'Arca Eleonora
Hopgood James R.
Robertson Neil M.
Publication venue
Publication date: 04/06/2016
Field of study

AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method

Queen's University Belfast Research Portal

Heriot Watt Pure

Elsevier - Publisher Connector

Crossref

Edinburgh Research Explorer

ACCELERATED PEOPLE TRACKING USING TEXTURE IN A CAMERA NETWORK

Author: Andrew Wallace
Greg Michaelson
Wasit Limprasert
Publication venue
Publication date: 01/02/2012
Field of study

We present an approach to tracking multiple human subjects within a camera network. A particle filter framework is used in which we combine foreground-background subtraction with a novel approach to texture learning and likelihood computation based on an ellipsoid model. As there are inevitable problems with multiple subjects due to occlusion and crossing, we include a robust method to suppress distraction between subjects. To achieve real-time performance, we have also developed our code on a graphics processing unit to achieve a 10-fold reduction in processing time with an approximate frame rate of 10 frames per second.

CiteSeerX

Heriot Watt Pure

ACCELERATED PEOPLE TRACKING USING TEXTURE IN A CAMERA NETWORK

Author
Publication venue: 'Scitepress'
Publication date: 01/01/2012
Field of study

Crossref