8 research outputs found

    A Sector-Based, Frequency-Domain Approach to Detection and Localization of Multiple Speakers

    Get PDF
    Detection and localization of speakers with microphone arrays is a difficult task due to the wideband nature of speech signals, the large amount of overlaps between speakers in spontaneous conversations, and the presence of noise sources. Many existing audio multi-source localization methods rely on prior knowledge of the sectors containing active sources and/or the number of active sources. This paper proposes sector-based, frequency-domain approaches that address both detection and localization problems by measuring relative phases between microphones. The first approach is similar to delay-sum beamforming. The second approach is novel: it relies on systematic optimization of a centroid in phase space, for each sector. It provides major, systematic improvement over the first approach as well as over previous work. Very good results are obtained on more than one hour of recordings in real meeting room conditions, including cases with up to 3 concurrent speakers

    Threshold Selection for Unsupervised Detection, with an Application to Microphone Arrays

    Get PDF
    Detection is usually done by comparing some criterion to a threshold. It is often desirable to keep a performance metric such as False Alarm Rate constant across conditions. Using training data to select the threshold may lead to suboptimal results on test data recorded in different conditions. This paper investigates unsupervised approaches, where no training data is used. A probabilistic model is fitted on the test data using the EM algorithm, and the threshold value is selected based on the model. The proposed approach (1) does not use training data, (2) uses the test data itself to compensate for simplifications inherent to the model, (3) permits the use of more complex models in a straightforward manner. On a microphone array speech detection task, the proposed unsupervised approach achieves similar or better results than the ``training'' approach. The methodology is general and may be applied to other contexts than microphone arrays, and other performance metrics than FAR

    A Frequency-Domain Silence Noise Model

    Get PDF
    This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. A first application to sector-based, joint audio source localization and detection, using multiple microphones, confirms that the model can provide major enhancement. A second application to the single channel speech recognition task in a noisy environment yields major improvement on stationary noise and promising results on non-stationary noise

    Sector-Based Detection for Hands-Free Speech Enhancement in Cars

    Get PDF
    Speech-based command interfaces are becoming more and more common in cars. Applications include automatic dialog systems for hands-free phone calls as well as more advanced features such as navigation systems. However, interferences, such as speech from the codriver, can hamper a lot the performance of the speech recognition component, which is crucial for those applications. This issue can be addressed with {\em adaptive} interference cancellation techniques such as the Generalized Sidelobe Canceller~(GSC). In order to cancel the interference (codriver) while not cancelling the target (driver), adaptation must happen only when the interference is active and dominant. To that purpose, this paper proposes two efficient adaptation control methods called ``implicit'' and ``explicit''. While the ``implicit'' method is fully automatic, the ``explicit'' method relies on pre-estimation of target and interference energies. A major contribution of this paper is a direct, robust method for such pre-estimation, directly derived from sector-based detection and localization techniques. Experiments on real in-car data validate both adaptation methods, including a case with 100 km/h background road noise

    Activity Report 2004

    Get PDF

    Threshold Selection for Unsupervised Detection, with an Application to Microphone Arrays

    Get PDF
    Detection is usually done by comparing some criterion to a threshold. It is often desirable to keep a performance metric such as False Alarm Rate constant across conditions. Using training data to select the threshold may lead to suboptimal results on test data recorded in different conditions. This paper investigates unsupervised approaches, where no training data is used. A probabilistic model is fitted on the test data using the EM algorithm, and the threshold value is selected based on the model. The proposed approach (1)~does not use training data, (2)~uses the test data itself to compensate simplifications inherent to the model, (3)~permits the use of more complex models in a straightforward manner. On a microphone array speech detection task, the proposed unsupervised approach achieves similar or better results than the ``training'' approach. The methodology is general and may be applied to other contexts than microphone arrays, and other performance metrics than FAR

    HIGH WAVE VECTOR ACOUSTIC METAMATERIALS: FUNDAMENTAL STUDIES AND APPLICATIONS

    Get PDF
    Acoustic metamaterials are artificially engineered structures with subwavelength unit cells that hold extraordinary acoustic properties. Their ability to manipulate acoustic waves in ways that are not readily possible in naturally occurring materials have garnered much attention by researchers in recent years. In this dissertation work, acoustic metamaterials that enable wave propagation with high wave vector values are studied. These materials render several key properties, including energy confinement and transport, wave control enhancement, and enhancement of acoustic radiation, which are exploited for enhancing acoustic wave emission and reception. The dissertation work is summarized as follows. First, to enable experimental studies of the deep subwavelength cavities in these metamaterials, a low dimensional fiber optic probe was developed, which allows direct characterization of the intrinsic properties of the metamaterials without seriously disrupting the acoustic fields. Second, low dimensional acoustic metamaterials for enhancing acoustic reception were realized and studied. These metamaterials were demonstrated to achieve both passive and active functionalities, including passive signal amplification and frequency filtering, as well as active tuning for switching and pulse retardation control. Third, a metamaterial emitter was realized and studied, which is capable of enhancing the radiative properties of an embedded emitter. Parametric studies enhanced the understanding of the effects of different geometric parameters on the radiation performance of the structure. Finally, the metamaterial emitter and receiver were combined to form a metamaterial-based sonar system. For the first time, the superior performance of the metamaterial enhanced sonar system over conventional sonar systems was analytically and experimentally demonstrated. As a proof of concept, a robotic sonar platform equipped with the metamaterial system was shown to possess remarkably better tracking performance compared to the conventional system. Through this dissertation work, an enhanced understanding of high-k acoustic metamaterials has been achieved, and their applications in acoustic sensing, emission enhancement, and sonar systems have been demonstrated
    corecore