Search CORE

5 research outputs found

A geometric interpretation of non-target-normalized maximum cross-channel correlation for vocal activity detection in meetings

Author: Laskowski Kornel
Schultz Tanja
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

Crossref

KITopen

A Geometric Interpretation of Non-Target-Normalized Maximum Cross-channel Correlation for Vocal Activity Detection in Meetings

Author: Laskowski Kornel
Schultz Tanja
Publication venue
Publication date: 30/06/2008
Field of study

KITopen

Floor Holder Detection and End of Speaker Turn Prediction in Meetings

Author: Bourlard Hervé
Dielmann Alfred
Garau Giulia
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 26/08/2010
Field of study

We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the speech activity detection outputs. A supervised end-of-speaker-turn prediction framework, based on Dynamic Bayesian Networks and automatically extracted multimodal features (related to prosody, overlapping speech, and visual motion), was also investigated. These novel approaches resulted in good floor holder detection rates (13:2% Floor Error Rate), attaining state of the art end-of-speaker-turn prediction performances

Infoscience - École polytechnique fédérale de Lausanne

A Geometric Interpretation of Non-Target-Normalized Maximum Cross-channel Correlation for Vocal Activity Detection in Meetings

Author: Interact Universität Karlsruhe
Kornel Laskowski
Publication venue
Publication date: 01/01/2007
Field of study

Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, standard vocal activity detection algorithms have been shown to be ineffective, because participants typically vocalize for only a fraction of the recorded time and because, while they are not vocalizing, their channels are frequently dominated by crosstalk from other participants. In the present work, we review a particular type of normalization of maximum cross-channel correlation, a feature recently introduced to address the crosstalk problem. We derive a plausible geometric interpretation and show how the frame size affects performance.

CiteSeerX