66 research outputs found

    A multimodal approach to blind source separation of moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying; thus, the unmixing filters should also be time varying, which are difficult to calculate in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on video cameras. Positions and velocities of the sources are obtained from the 3-D tracker based on a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary for a certain minimum period, a frequency domain BSS algorithm is implemented with an initialization derived from the positions of the source signals. Once the sources are moving, a beamforming algorithm which requires no prior statistical knowledge is used to perform real time speech enhancement and provide separation of the sources. Experimental results confirm that by utilizing the visual modality, the proposed algorithm not only improves the performance of the BSS algorithm and mitigates the permutation problem for stationary sources, but also provides a good BSS performance for moving sources in a low reverberant environment

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    Acoustic based safety emergency vehicle detection for intelligent transport systems

    Get PDF
    A system has been investigated for the detection of incoming direction of an emergency vehicle. Acoustic detection methods based on a cross microphone array have been implemented. It is shown that source detection based on time delay estimation outperforms sound intensity techniques, although both techniques perform well for the application. The relaying of information to the driver as a warning signal has been investigated through the use of ambisonic technology and a 4 speaker array which is ubiquitous in most modern vehicles. Simulations show that accurate warning information may be relayed to the driver and afford correct action

    Acoustic event detection: SVM-based system and evaluation setup in CLEAR’07

    Get PDF
    In this paper, the Acoustic Event Detection (AED) system developed at the UPC is described, and its results in the CLEAR evaluations carried out in March 2007 are reported. The system uses a set of features composed of frequency-filtered band energies and perceptual features, and it is based on SVM classifiers and multi-microphone decision fusion. Also, the current evaluation setup and, in particular, the two new metrics used in this evaluation are presented.Peer ReviewedPostprint (author’s final draft

    Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

    Get PDF
    Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on distant recording of spontaneous multiparty conversations, this thesis focuses on the use of microphone arrays to address the question Who spoke where and when?. The speed, the versatility and the robustness of the proposed techniques are tested on a variety of real indoor recordings, including multiple moving speakers as well as seated speakers in meetings. Optimized implementations are provided in most cases. We propose to discretize the physical space into a few sectors, and for each time frame, to determine which sectors contain active acoustic sources (Where? When?). A topological interpretation of beamforming is proposed, which permits both to evaluate the average acoustic energy in a sector for a negligible cost, and to locate precisely a speaker within an active sector. One additional contribution that goes beyond the eld of microphone arrays is a generic, automatic threshold selection method, which does not require any training data. On the speaker detection task, the new approach is dramatically superior to the more classical approach where a threshold is set on training data. We use the new approach into an integrated system for multispeaker detection-localization. Another generic contribution is a principled, threshold-free, framework for short-term clustering of multispeaker location estimates, which also permits to detect where and when multiple trajectories intersect. On multi-party meeting recordings, using distant microphones only, short-term clustering yields a speaker segmentation performance similar to that of close-talking microphones. The resulting short speech segments are then grouped into speaker clusters (Who?), through an extension of the Bayesian Information Criterion to merge multiple modalities. On meeting recordings, the speaker clustering performance is signicantly improved by merging the classical mel-cepstrum information with the short-term speaker location information. Finally, a close analysis of the speaker clustering results suggests that future research should investigate the effect of human acoustic radiation characteristics on the overall transmission channel, when a speaker is a few meters away from a microphone

    Estimation of acoustic echoes using expectation-maximization methods

    Get PDF

    Способы оценивания систем аудиолокализации выступающих в зале совещаний

    Get PDF
    The employing of sound source localization methods allows to evaluate location and head orientation of a speaker in a room. At present such systems are popular at development of intelligent support systems for smart meeting rooms. In this paper, a set of the metrics for performance evaluation of sound source localization systems as well as their integration with video monitoring systems are analyzed. Accuracy estimation of speaker positions, located on 32 chairs was carried out in the developed smart meeting roomПрименение методов аудиолокализации позволяет оценить положение и направление головы говорящего в помещении. Подобные системы в настоящее время популярны при разработке интеллектуальных систем сопровождения мероприятий в залах совещаний. В работе проанализирован ряд методик для оценивания производи-тельности систем аудиолокализации, а также их интеграции с системами видеомонито-ринга. На примере разработанного интеллектуального зала совещаний проведена оценка точности аудиолокализации выступающих, находящихся в 32 кресла

    Acoustic Echo Estimation using the model-based approach with Application to Spatial Map Construction in Robotics

    Get PDF
    corecore