Search CORE

83 research outputs found

A survey of face detection, extraction and recognition

Author: Lu Yongzhong
Yu Shengsheng
Zhou Jingli
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/02/2012
Field of study

The goal of this paper is to present a critical survey of existing literatures on human face recognition over the last 4-5 years. Interest and research activities in face recognition have increased significantly over the past few years, especially after the American airliner tragedy on September 11 in 2001. While this growth largely is driven by growing application demands, such as static matching of controlled photographs as in mug shots matching, credit card verification to surveillance video images, identification for law enforcement and authentication for banking and security system access, advances in signal analysis techniques, such as wavelets and neural networks, are also important catalysts. As the number of proposed techniques increases, survey and evaluation becomes important

Sensor fusion for tangible acoustic interfaces for human computer intreraction

Author: Al-Kutubi Mostafa.
Publication venue
Publication date
Field of study

This thesis presents the development of tangible acoustic interfaces for human computer interaction. The method adopted was to position sensors on the surface of a solid object to detect acoustic waves generated during an interaction, process the sensor signals and estimate either the location of a discrete impact or the trajectory of a moving point of contact on the surface. Higher accuracy and reliability were achieved by employing sensor fusion to combine the information collected from redundant sensors electively positioned on the solid object. Two different localisation approaches are proposed in the thesis. The learning-based approach is employed to detect discrete impact positions. With this approach, a signature vector representation of time-series patterns from a single sensor is matched with database signatures for known impact locations. For improved reliability, a criterion is proposed to extract the location signature from two vectors. The other approach is based on the Time Difference of Arrival (TDOA) of a source signal captured by a spatially distributed array of sensors. Enhanced positioning algorithms that consider near-field scenario, dispersion, optimisation and filtration are proposed to tackle the problems of passive acoustic localisation in solid objects. A computationally efficient algorithm for tracking a continuously moving source is presented. Spatial filtering of the estimated trajectory has been performed using Kalman filtering with automated initialisation

On the applicability of models for outdoor sound (A)

Author: Rasmussen Karsten Bo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1999
Field of study

Application of the PE method to up-slope sound propagation

Author: Arranz Marta Galindo
Rasmussen Karsten Bo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1995
Field of study

Exploiting the bimodality of speech in the cocktail party problem

Author: Aubrey Andrew James
Publication venue
Publication date: 01/01/2008
Field of study

The cocktail party problem is one of following a conversation in a crowded room where there are many competing sound sources, such as the voices of other speakers or music. To address this problem using computers, digital signal processing solutions commonly use blind source separation (BSS) which aims to separate all the original sources (voices) from the mixture simultaneously. Traditionally, BSS methods have relied on information derived from the mixture of sources to separate the mixture into its constituent elements. However, the human auditory system is well adapted to handle the cocktail party scenario, using both auditory and visual information to follow (or hold) a conversation in a such an environment. This thesis focuses on using visual information of the speakers in a cocktail party like scenario to aid in improving the performance of BSS. There are several useful applications of such technology, for example: a pre-processing step for a speech recognition system, teleconferencing or security surveillance. The visual information used in this thesis is derived from the speaker's mouth region, as it is the most visible component of speech production. Initial research presented in this thesis considers a joint statistical model of audio and visual features, which is used to assist in control ling the convergence behaviour of a BSS algorithm. The results of using the statistical models are compared to using the raw audio information alone and it is shown that the inclusion of visual information greatly improves its convergence behaviour. Further research focuses on using the speaker's mouth region to identify periods of time when the speaker is silent through the development of a visual voice activity detector (V-VAD) (i.e. voice activity detection using visual information alone). This information can be used in many different ways to simplify the BSS process. To this end, two novel V-VADs were developed and tested within a BSS framework, which result in significantly improved intelligibility of the separated source associated with the V-VAD output. Thus the research presented in this thesis confirms the viability of using visual information to improve solutions to the cocktail party problem.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Tracking interacting targets in multi-modal sensors

Author: Taj Murtaza
Publication venue
Publication date: 01/01/2009
Field of study

PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns