Search CORE

1,684 research outputs found

Binaural scene analysis : localization, detection and recognition of speakers in complex acoustic scenes

Author: May T.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2012
Field of study

The human auditory system has the striking ability to robustly localize and recognize a specific target source in complex acoustic environments while ignoring interfering sources. Surprisingly, this remarkable capability, which is referred to as auditory scene analysis, is achieved by only analyzing the waveforms reaching the two ears. Computers, however, are presently not able to compete with the performance achieved by the human auditory system, even in the restricted paradigm of confronting a computer algorithm based on binaural signals with a highly constrained version of auditory scene analysis, such as localizing a sound source in a reverberant environment or recognizing a speaker in the presence of interfering noise. In particular, the problem of focusing on an individual speech source in the presence of competing speakers, termed the cocktail party problem, has been proven to be extremely challenging for computer algorithms. The primary objective of this thesis is the development of a binaural scene analyzer that is able to jointly localize, detect and recognize multiple speech sources in the presence of reverberation and interfering noise. The processing of the proposed system is divided into three main stages: localization stage, detection of speech sources, and recognition of speaker identities. The only information that is assumed to be known a priori is the number of target speech sources that are present in the acoustic mixture. Furthermore, the aim of this work is to reduce the performance gap between humans and machines by improving the performance of the individual building blocks of the binaural scene analyzer. First, a binaural front-end inspired by auditory processing is designed to robustly determine the azimuth of multiple, simultaneously active sound sources in the presence of reverberation. The localization model builds on the supervised learning of azimuthdependent binaural cues, namely interaural time and level differences. Multi-conditional training is performed to incorporate the uncertainty of these binaural cues resulting from reverberation and the presence of competing sound sources. Second, a speech detection module that exploits the distinct spectral characteristics of speech and noise signals is developed to automatically select azimuthal positions that are likely to correspond to speech sources. Due to the established link between the localization stage and the recognition stage, which is realized by the speech detection module, the proposed binaural scene analyzer is able to selectively focus on a predefined number of speech sources that are positioned at unknown spatial locations, while ignoring interfering noise sources emerging from other spatial directions. Third, the speaker identities of all detected speech sources are recognized in the final stage of the model. To reduce the impact of environmental noise on the speaker recognition performance, a missing data classifier is combined with the adaptation of speaker models using a universal background model. This combination is particularly beneficial in nonstationary background noise

Repository TU/e

Pure OAI Repository

EFFICIENT DATA ADAPTION FOR MUSICAL SOURCE SEPARATION METHODS BASED ON PARAMETRIC MODELS

Author: Ewert S
IEEE
Mueller M
Sandler M
Publication venue
Publication date: 28/04/2016
Field of study

Queen Mary Research Online

Utilising temporal signal features in adverse noise conditions: Detection, estimation, and the reassigned spectrogram

Author: Brown G.J.
Mill R.W.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 23/02/2016
Field of study

Visual displays in passive sonar based on the Fourier spectrogram are underpinned by detection models that rely on signal and noise power statistics. Time-frequency representations specialised for sparse signals achieve a sharper signal representation, either by reassigning signal energy based on temporal structure or by conveying temporal structure directly. However, temporal representations involve nonlinear transformations that make it difficult to reason about how they respond to additive noise. This article analyses the effect of noise on temporal fine structure measurements such as zero crossings and instantaneous frequency. Detectors that rely on zero crossing intervals, intervals and peak amplitudes, and instantaneous frequency measurements are developed, and evaluated for the detection of a sinusoid in Gaussian noise, using the power detector as a baseline. Detectors that rely on fine structure outperform the power detector under certain circumstances; and detectors that rely on both fine structure and power measurements are superior. Reassigned spectrograms assume that the statistics used to reassign energy are reliable, but the derivation of the fine structure detectors indicates the opposite. The article closes by proposing and demonstrating the concept of a doubly reassigned spectrogram, wherein temporal measurements are reassigned according to a statistical model of the noise background

White Rose Research Online

An objective test tool for pitch extractors' response attributes

Author: Banno Hideki
Kawahara Hideki
Kitamura Tatsuya
Morise Masanori
Sakakibara Ken-Ichi
Yatabe Kohei
Publication venue
Publication date: 02/04/2022
Field of study

We propose an objective measurement method for pitch extractors' responses to frequency-modulated signals. It enables us to evaluate different pitch extractors with unified criteria. The method uses extended time-stretched pulses combined by binary orthogonal sequences. It provides simultaneous measurement results consisting of the linear and the non-linear time-invariant responses and random and time-varying responses. We tested representative pitch extractors using fundamental frequencies spanning 80~Hz to 400~Hz with 1/48 octave steps and produced more than 1000 modulation frequency response plots. We found that making scientific visualization by animating these plots enables us to understand different pitch extractors' behavior at once. Such efficient and effortless inspection is impossible by inspecting all individual plots. The proposed measurement method with visualization leads to further improvement of the performance of one of the extractors mentioned above. In other words, our procedure turns the specific pitch extractor into the best reliable measuring equipment that is crucial for scientific research. We open-sourced MATLAB codes of the proposed objective measurement method and visualization procedure.Comment: 5 pages, 9 figures, submitted to Interspeech2022. arXiv admin note: text overlap with arXiv:2111.0362

arXiv.org e-Print Archive

Engineering data compendium. Human perception and performance. User's guide

Author: Boff Kenneth R.
Lincoln Janet E.
Publication venue
Publication date
Field of study

The concept underlying the Engineering Data Compendium was the product of a research and development program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design and military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from the existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by systems designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is the first volume, the User's Guide, containing a description of the program and instructions for its use

NASA Technical Reports Server

Neuroplasticity in Young Age: Computer-Based Early Neurodevelopment Classifier

Author: Bar-Yosef Omer
Barak Shai
Friedman Hagit
Raunak Saab
Soloveichick Marina
Tatiana Smolkin
Publication venue: 'IntechOpen'
Publication date: 20/12/2017
Field of study

Neurodevelopmental syndromes, a continuously growing issue, are impairments in the growth and development of the brain and CNS which are pronounced in a variety of emotional, cognitive, motor and social skills. Early assessment and detection of typical, clinically correlated early signs of developmental abnormalities is crucial for early and effective intervention, supporting initiation of early treatment and minimizing neurological and functional deficits. Successful early interventions would then direct to early time windows of higher neural plasticity. Various syndromes are reflected in early vocal and motor characteristics, making them suitable indicators of an infant’s neural development. Performance of the computerized classifiers we developed shows approximately 90% accuracy on a database of diagnosed babies. The results demonstrate the potential of vocal and motor analysis for computer-assisted early detection of neurodevelopmental insults

IntechOpen

Crossref

Automatic transcription of polyphonic music exploiting temporal evolution

Author: Benetos E
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2012
Field of study

PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

CiteSeerX

Queen Mary Research Online

On the applicability of models for outdoor sound (A)

Author: Rasmussen Karsten Bo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1999
Field of study

Crossref

Online Research Database In Technology

Recommended from our members

Signal separation of musical instruments: simulation-based methods for musical signal decomposition and transcription

Author: Walmsley Paul Jospeh
Publication venue: University of Cambridge
Publication date: 29/05/2001
Field of study

This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the description of time-varying homogeneous and heterogeneous multiple component signals is developed, and then applied to the analysis of musical signals. The importance of high level musical and perceptual psychological knowledge in the formulation of the model is highlighted, and attention is drawn to the limitation of pure signal processing techniques for dealing with musical signals. Gestalt psychological grouping principles motivate the hierarchical signal model, and component identifiability is considered in terms of perceptual streaming where each component establishes its own context. A major emphasis of this thesis is the practical application of MCMC techniques, which are generally deemed to be too slow for many applications. Through the design of efficient transition kernels highly optimised for harmonic models, and by careful choice of assumptions and approximations, implementations approaching the order of realtime are viable.Engineering and Physical Sciences Research Counci

Apollo (Cambridge)