2,679 research outputs found

    The voice activity detection (VAD) recorder and VAD network recorder : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University

    Get PDF
    The project is to provide a feasibility study for the AudioGraph tool, focusing on two application areas: the VAD (voice activity detector) recorder and the VAD network recorder. The first one achieves a low bit-rate speech recording on the fly, using a GSM compression coder with a simple VAD algorithm; and the second one provides two-way speech over IP, fulfilling echo cancellation with a simplex channel. The latter is required for implementing a synchronous AudioGraph. In the first chapter we introduce the background of this project, specifically, the VoIP technology, the AudioGraph tool, and the VAD algorithms. We also discuss the problems set for this project. The second chapter presents all the relevant techniques in detail, including sound representation, speech-coding schemes, sound file formats, PowerPlant and Macintosh programming issues, and the simple VAD algorithm we have developed. The third chapter discusses the implementation issues, including the systems' objective, architecture, the problems encountered and solutions used. The fourth chapter illustrates the results of the two applications. The user documentations for the applications are given, and after that, we analyse the parameters based on the results. We also present the default settings of the parameters, which could be used in the AudioGraph system. The last chapter provides conclusions and future work

    Real-time implementation of a dual microphone beamformer : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Computer Systems at Massey University, Albany, New Zealand

    Get PDF
    The main objective of this project is to develop a microphone array system, which captures the speech signal for a speech related application. This system should allow the user to move freely and acquire the speech from adverse acoustic environments. The most important problem when the distance between the speaker and the microphone increases is that often the quality of the speech signal is degraded by background noise and reverberation. As a result, the speech related applications fails to perform well under these circumstances. This unwanted noise components present in the acquired signal have to be removed in order to improve the performance of these applications. This thesis contains the development of a dual microphone beamformer in a Digital Signal Processor (DSP). The development kit used in this project is the Texas Instruments TMS320C6711 DSP Starter Kit (DSK). The switched Griffiths-Jim beamformer was selected as the algorithm to be implemented in the DSK. Van Compernolle developed this algorithm in 1990 by modifying the Griffiths-Jim beamformer structure. This beamformer algorithm is used to improve the quality of the desired speech signal by reducing the background noise. This algorithm requires atleast two input channels to obtain the spatial characteristics of the acquired signal. Therefore, the PCM3003 audio daughter card is used to access the two microphone signals. The software implementation of the switched Griffiths-Jim beamformer algorithm has two main stages. The first stage is to identify the presence of speech in the acquired signal. A simple Voice Activity Detector (VAD) based on the energy of the acquired signal is used to distinguish between the wanted speech signal and the unwanted noise signals. The second stage is the adaptive beamformer, which uses the results obtained from the VAD algorithm to reduce the background noise. The adaptive beamformer consists of two adaptive filters based on the Normalised Least Mean Squares (NLMS) algorithm. The first filter behaves like a beam-steering filter and it's only updated during the presence of speech and noise signal. The second filter behaves like an Adaptive Noise Canceller (ANC) and it is only updated when a noise alone period is present. The VAD algorithm controls the updating process of these NLMS filters and only one of these filters is updated at any given time. This algorithm was successfully implemented in the chosen DSK using the Code Composer Studio (CCS) software. This implementation is tested in real-time using a speech recognition system. This system is programmed in Visual Basic software using the Microsoft Speech SDK components. This dual microphone system allows the user to move around freely and acquire the desired speech signal. The results show a reasonable amount of enhancement in the output signal, and a significant improvement in the ease of using the speech recognition system is achieved

    Automotive three-microphone voice activity detector and noise-canceller

    Get PDF
    This paper addresses issues in improving hands-free speech recognition performance in car environments. A three-microphone array has been used to form a beamformer with leastmean squares (LMS) to improve Signal to Noise Ratio (SNR). A three-microphone array has been paralleled to a Voice Activity Detection (VAD). The VAD uses time-delay estimation together with magnitude-squared coherence (MSC)

    A SILENCE REMOVAL AND ENDPOINT DETECTION APPROACH FOR SPEECH PROCESSING

    Get PDF
    In this paper a brief overview of silence removal and voice activity detection is discussed and a new method for silence removal is suggested. The objective of suggested method is to delete the silence and unvoiced segments from the speech signal which are very useful to increase the performance and accuracy of the system. Endpoint detection is used to remove the DC offset value from the signal after silence removal process. Silence removal and Endpoint detection are main part of many applications such as speaker and speech recognition. The proposed method uses Root Mean Square (RMS) to delete the unvoiced segments from the speech signal. This work showed better results for silence removal and endpoint detection than existing methods. The performance of this research work is evaluated using MATLAB tool and accuracy of 97.2% is achieved

    Advanced sensors technology survey

    Get PDF
    This project assesses the state-of-the-art in advanced or 'smart' sensors technology for NASA Life Sciences research applications with an emphasis on those sensors with potential applications on the space station freedom (SSF). The objectives are: (1) to conduct literature reviews on relevant advanced sensor technology; (2) to interview various scientists and engineers in industry, academia, and government who are knowledgeable on this topic; (3) to provide viewpoints and opinions regarding the potential applications of this technology on the SSF; and (4) to provide summary charts of relevant technologies and centers where these technologies are being developed

    Digital Signal Processing for The Development of Deep Learning-Based Speech Recognition Technology

    Get PDF
    This research discusses digital signal processing in the context of developing deep learning-based speech recognition technology. Given the increasing demand for accurate and efficient speech recognition systems, digital signal processing techniques are essential. The research method used is an experimental method with a quantitative approach. This research method consists of several stages: introduction, research design, data collection, data preprocessing, Deep Learning Model Development, performance training and evaluation, experiments and testing, and data analysis. These findings are expected to contribute to developing more sophisticated and applicable speech recognition systems in various fields. For example, in virtual assistants such as Siri and Google Assistant, improved speech recognition accuracy will allow for more natural interactions and faster responses, improving the user experience. This technology can be used in security systems for safer and more reliable voice authentication, replacing or supplementing passwords and fingerprints. Additionally, in accessibility technology, more accurate voice recognition will be particularly beneficial for individuals with visual impairments or mobility, allowing them to control devices and access information with just voice commands. Other benefits include improvements in automated phone apps, automatic transcription for meetings or conferences, and the development of smart home devices that can be fully voice-operated

    Speech Enhancement for Automatic Analysis of Child-Centered Audio Recordings

    Get PDF
    Analysis of child-centred daylong naturalist audio recordings has become a de-facto research protocol in the scientific study of child language development. The researchers are increasingly using these recordings to understand linguistic environment a child encounters in her routine interactions with the world. These audio recordings are captured by a microphone that a child wears throughout a day. The audio recordings, being naturalistic, contain a lot of unwanted sounds from everyday life which degrades the performance of speech analysis tasks. The purpose of this thesis is to investigate the utility of speech enhancement (SE) algorithms in the automatic analysis of such recordings. To this effect, several classical signal processing and modern machine learning-based SE methods were employed 1) as a denoiser for speech corrupted with additive noise sampled from real-life child-centred daylong recordings and 2) as front-end for downstream speech processing tasks of addressee classification (infant vs. adult-directed speech) and automatic syllable count estimation from the speech. The downstream tasks were conducted on data derived from a set of geographically, culturally, and linguistically diverse child-centred daylong audio recordings. The performance of denoising was evaluated through objective quality metrics (spectral distortion and instrumental intelligibility) and through the downstream task performance. Finally, the objective evaluation results were compared with downstream task performance results to find whether objective metrics can be used as a reasonable proxy to select SE front-end for a downstream task. The results obtained show that a recently proposed Long Short-Term Memory (LSTM)-based progressive learning architecture provides maximum performance gains in the downstream tasks in comparison with the other SE methods and baseline results. Classical signal processing-based SE methods also lead to competitive performance. From the comparison of objective assessment and downstream task performance results, no predictive relationship between task-independent objective metrics and performance of downstream tasks was found

    Development and Evaluation of a Real-Time Framework for a Portable Assistive Hearing Device

    Get PDF
    Testing and verification of digital hearing aid devices, and the embedded software and algorithms can prove to be a challenging task especially taking into account time-to-market considerations. This thesis describes a PC based, real-time, highly configurable framework for the evaluation of audio algorithms. Implementation of audio processing algorithms on such a platform can provide hearing aid designers and manufacturers the ability to test new and existing processing techniques and collect data about their performance in real-life situations, and without the need to develop a prototype device. The platform is based on the Eurotech Catalyst development kit and the Fedora Linux OS, and it utilizes the JACK audio engine to facilitate reliable real-time performance Additionally, we demonstrate the capabilities of this platform by implementing an audio processing chain targeted at improving speech intelligibility for people suffering from auditory neuropathy. Evaluation is performed for both noisy and noise-free environments. Subjective evaluation of the results, using normal hearing listeners and an auditory neuropathy simulator, demonstrates improvement in some conditions

    Development of the Feature Extractor for Speech Recognition

    Get PDF
    Projecte final de carrera realitzat en col.laboració amb University of MariborWith this diploma work we have attempted to give continuity to the previous work done by other researchers called, Voice Operating Intelligent Wheelchair – VOIC [1]. A development of a wheelchair controlled by voice is presented in this work and is designed for physically disabled people, who cannot control their movements. This work describes basic components of speech recognition and wheelchair control system. Going to the grain, a speech recognizer system is comprised of two distinct blocks, a Feature Extractor and a Recognizer. The present work is targeted at the realization of an adequate Feature Extractor block which uses a standard LPC Cepstrum coder, which translates the incoming speech into a trajectory in the LPC Cepstrum feature space, followed by a Self Organizing Map, which classifies the outcome of the coder in order to produce optimal trajectory representations of words in reduced dimension feature spaces. Experimental results indicate that trajectories on such reduced dimension spaces can provide reliable representations of spoken words. The Recognizer block is left for future researchers. The main contributions of this work have been the research and approach of a new technology for development issues and the realization of applications like a voice recorder and player and a complete Feature Extractor system

    DeepEar: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning

    Get PDF
    Microphones are remarkably powerful sensors of human behavior and context. However, audio sensing is highly susceptible to wild fluctuations in accuracy when used in diverse acoustic environments (such as, bedrooms, vehicles, or cafes), that users encounter on a daily basis. Towards addressing this challenge, we turn to the field of deep learning; an area of machine learning that has radically changed related audio modeling domains like speech recognition. In this paper, we present DeepEar – the first mobile audio sensing framework built from coupled Deep Neural Networks (DNNs) that simultaneously perform common audio sensing tasks. We train DeepEar with a large-scale dataset including unlabeled data from 168 place visits. The resulting learned model, involving 2.3M parameters, enables DeepEar to significantly increase inference robustness to background noise beyond conventional approaches present in mobile devices. Finally, we show DeepEar is feasible for smartphones by building a cloud-free DSP-based prototype that runs continuously, using only 6% of the smartphone’s battery dailyThis is the author accepted manuscript. The final version is available from ACM via http://dx.doi.org/10.1145/2750858.280426
    • …
    corecore