1,248 research outputs found

    An application of an auditory periphery model in speaker identification

    Get PDF
    The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

    Efficient audio signal processing for embedded systems

    Get PDF
    We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

    Neuromorphic audio processing through real-time embedded spiking neural networks.

    Get PDF
    In this work novel speech recognition and audio processing systems based on a spiking artificial cochlea and neural networks are proposed and implemented. First, the biological behavior of the animal’s auditory system is analyzed and studied, along with the classical mechanisms of audio signal processing for sound classification, including Deep Learning techniques. Based on these studies, novel audio processing and automatic audio signal recognition systems are proposed, using a bio-inspired auditory sensor as input. A desktop software tool called NAVIS (Neuromorphic Auditory VIsualizer) for post-processing the information obtained from spiking cochleae was implemented, allowing to analyze these data for further research. Next, using a 4-chip SpiNNaker hardware platform and Spiking Neural Networks, a system is proposed for classifying different time-independent audio signals, making use of a Neuromorphic Auditory Sensor and frequency studies obtained with NAVIS. To prove the robustness and analyze the limitations of the system, the input audios were disturbed, simulating extreme noisy environments. Deep Learning mechanisms, particularly Convolutional Neural Networks, are trained and used to differentiate between healthy persons and pathological patients by detecting murmurs from heart recordings after integrating the spike information from the signals using a neuromorphic auditory sensor. Finally, a similar approach is used to train Spiking Convolutional Neural Networks for speech recognition tasks. A novel SCNN architecture for timedependent signals classification is proposed, using a buffered layer that adapts the information from a real-time input domain to a static domain. The system was deployed on a 48-chip SpiNNaker platform. Finally, the performance and efficiency of these systems were evaluated, obtaining conclusions and proposing improvements for future works.Premio Extraordinario de Doctorado U

    Robust Sound Event Classification using Deep Neural Networks

    Get PDF
    The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques

    Evaluation of mfcc estimation techniques for music similarity

    Get PDF
    Publication in the conference proceedings of EUSIPCO, Florence, Italy, 200

    WASIS - Identificação bioacústica de espécies baseada em múltiplos algoritmos de extração de descritores e de classificação

    Get PDF
    Orientador: Claudia Maria Bauzer MedeirosDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A identificação automática de animais por meio de seus sons é um dos meios para realizar pesquisa em bioacústica. Este domínio de pesquisa fornece, por exemplo, métodos para o monitoramento de espécies raras e ameaçadas, análises de mudanças em comunidades ecológicas, ou meios para o estudo da função social de vocalizações no contexto comportamental. Mecanismos de identificação são tipicamente executados em dois estágios: extração de descritores e classificação. Ambos estágios apresentam desafios, tanto em ciência da computação quanto na bioacústica. A escolha de algoritmos de extração de descritores e técnicas de classificação eficientes é um desafio em qualquer sistema de reconhecimento de áudio, especialmente no domínio da bioacústica. Dada a grande variedade de grupos de animais estudados, algoritmos são adaptados a grupos específicos. Técnicas de classificação de áudio também são sensíveis aos descritores extraídos e condições associadas às gravações. Como resultado, muitos sistemas computacionais para bioacústica não são expansíveis, limitando os tipos de experimentos de reconhecimento que possam ser conduzidos. Baseado neste cenário, esta dissertação propõe uma arquitetura de software que acomode múltiplos algoritmos de extração de descritores, fusão entre descritores e algoritmos de classificação para auxiliar cientistas e o grande público na identificação de animais através de seus sons. Esta arquitetura foi implementada no software WASIS, gratuitamente disponível na Internet. Diversos algoritmos foram implementados, servindo como base para um estudo comparativo que recomenda conjuntos de algoritmos de extração de descritores e de classificação para três grupos de animaisAbstract: Automatic identification of animal species based on their sounds is one of the means to conduct research in bioacoustics. This research domain provides, for instance, ways to monitor rare and endangered species, to analyze changes in ecological communities, or ways to study the social meaning of the animal calls in the behavior context. Identification mechanisms are typically executed in two stages: feature extraction and classification. Both stages present challenges, in computer science and in bioacoustics. The choice of effective feature extraction and classification algorithms is a challenge on any audio recognition system, especially in bioacoustics. Considering the wide variety of animal groups studied, algorithms are tailored to specific groups. Classification techniques are also sensitive to the extracted features, and conditions surrounding the recordings. As a results, most bioacoustic softwares are not extensible, therefore limiting the kinds of recognition experiments that can be conducted. Given this scenario, this dissertation proposes a software architecture that allows multiple feature extraction, feature fusion and classification algorithms to support scientists and the general public on the identification of animal species through their recorded sounds. This architecture was implemented by the WASIS software, freely available on the Web. A number of algorithms were implemented, serving as the basis for a comparative study that recommends sets of feature extraction and classification algorithms for three animal groupsMestradoCiência da ComputaçãoMestre em Ciência da Computação132849/2015-12013/02219-0CNPQFAPES

    Objective Assessment of Machine Learning Algorithms for Speech Enhancement in Hearing Aids

    Get PDF
    Speech enhancement in assistive hearing devices has been an area of research for many decades. Noise reduction is particularly challenging because of the wide variety of noise sources and the non-stationarity of speech and noise. Digital signal processing (DSP) algorithms deployed in modern hearing aids for noise reduction rely on certain assumptions on the statistical properties of undesired signals. This could be disadvantageous in accurate estimation of different noise types, which subsequently leads to suboptimal noise reduction. In this research, a relatively unexplored technique based on deep learning, i.e. Recurrent Neural Network (RNN), is used to perform noise reduction and dereverberation for assisting hearing-impaired listeners. For noise reduction, the performance of the deep learning model was evaluated objectively and compared with that of open Master Hearing Aid (openMHA), a conventional signal processing based framework, and a Deep Neural Network (DNN) based model. It was found that the RNN model can suppress noise and improve speech understanding better than the conventional hearing aid noise reduction algorithm and the DNN model. The same RNN model was shown to reduce reverberation components with proper training. A real-time implementation of the deep learning model is also discussed

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
    corecore