1,537 research outputs found

    Speaker Recognition: Advancements and Challenges

    Get PDF

    Modelling, Simulation and Data Analysis in Acoustical Problems

    Get PDF
    Modelling and simulation in acoustics is currently gaining importance. In fact, with the development and improvement of innovative computational techniques and with the growing need for predictive models, an impressive boost has been observed in several research and application areas, such as noise control, indoor acoustics, and industrial applications. This led us to the proposal of a special issue about “Modelling, Simulation and Data Analysis in Acoustical Problems”, as we believe in the importance of these topics in modern acoustics’ studies. In total, 81 papers were submitted and 33 of them were published, with an acceptance rate of 37.5%. According to the number of papers submitted, it can be affirmed that this is a trending topic in the scientific and academic community and this special issue will try to provide a future reference for the research that will be developed in coming years

    Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features

    Get PDF
    The Mel Frequency Cepstral Coefficients (MFCCs) are widely used in order to extract essential information from a voice signal and became a popular feature extractor used in audio processing. However, MFCC features are usually calculated from a single window (taper) characterized by large variance. This study shows investigations on reducing variance for the classification of two different voice qualities (normal voice and disordered voice) using multitaper MFCC features. We also compare their performance by newly proposed windowing techniques and conventional single-taper technique. The results demonstrate that adapted weighted Thomson multitaper method could distinguish between normal voice and disordered voice better than the results done by the conventional single-taper (Hamming window) technique and two newly proposed windowing methods. The multitaper MFCC features may be helpful in identifying voices at risk for a real pathology that has to be proven later

    Individual differences in supra-threshold auditory perception - mechanisms and objective correlates

    Full text link
    Thesis (Ph.D.)--Boston UniversityTo extract content and meaning from a single source of sound in a quiet background, the auditory system can use a small subset of a very redundant set of spectral and temporal features. In stark contrast, communication in a complex, crowded scene places enormous demands on the auditory system. Spectrotemporal overlap between sounds reduces modulations in the signals at the ears and causes masking, with problems exacerbated by reverberation. Consistent with this idea, many patients seeking audiological treatment seek help precisely because they notice difficulties in environments requiring auditory selective attention. In the laboratory, even listeners with normal hearing thresholds exhibit vast differences in the ability to selectively attend to a target. Understanding the mechanisms causing these supra-threshold differences, the focus of this thesis, may enable research that leads to advances in treating communication disorders that affect an estimated one in five Americans. Converging evidence from human and animal studies points to one potential source of these individual differences: differences in the fidelity with which supra-threshold sound is encoded in the early portions of the auditory pathway. Electrophysiological measures of sound encoding by the auditory brainstem in humans and animals support the idea that the temporal precision of the early auditory neural representation can be poor even when hearing thresholds are normal. Concomitantly, animal studies show that noise exposure and early aging can cause a loss (cochlear neuropathy) of a large percentage of the afferent population of auditory nerve fibers innervating the cochlear hair cells without any significant change in measured audiograms. Using behavioral, otoacoustic and electrophysiological measures in conjunction with computational models of sound processing by the auditory periphery and brainstem, a detailed examination of temporal coding of supra-threshold sound is carried out, focusing on characterizing and understanding individual differences in listeners with normal hearing thresholds and normal cochlear mechanical function. Results support the hypothesis that cochlear neuropathy may reduce encoding precision of supra-threshold sound, and that this manifests as deficits both behaviorally and in subcortical electrophysiological measures in humans. Based on these results, electrophysiological measures are developed that may yield sensitive, fast, objective measures of supra-threshold coding deficits that arise as a result of cochlear neuropathy

    Tensor Analysis and Fusion of Multimodal Brain Images

    Get PDF
    Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE

    Automatic speech recognition: from study to practice

    Get PDF
    Today, automatic speech recognition (ASR) is widely used for different purposes such as robotics, multimedia, medical and industrial application. Although many researches have been performed in this field in the past decades, there is still a lot of room to work. In order to start working in this area, complete knowledge of ASR systems as well as their weak points and problems is inevitable. Besides that, practical experience improves the theoretical knowledge understanding in a reliable way. Regarding to these facts, in this master thesis, we have first reviewed the principal structure of the standard HMM-based ASR systems from technical point of view. This includes, feature extraction, acoustic modeling, language modeling and decoding. Then, the most significant challenging points in ASR systems is discussed. These challenging points address different internal components characteristics or external agents which affect the ASR systems performance. Furthermore, we have implemented a Spanish language recognizer using HTK toolkit. Finally, two open research lines according to the studies of different sources in the field of ASR has been suggested for future work
    • …
    corecore