451 research outputs found

    IMPACT OF MICROPHONE POSITIONAL ERRORS ON SPEECH INTELLIGIBILITY

    Get PDF
    The speech of a person speaking in a noisy environment can be enhanced through electronic beamforming using spatially distributed microphones. As this approach demands precise information about the microphone locations, its application is limited in places where microphones must be placed quickly or changed on a regular basis. Highly precise calibration or measurement process can be tedious and time consuming. In order to understand tolerable limits on the calibration process, the impact of microphone position error on the intelligibility is examined. Analytical expressions are derived by modeling the microphone position errors as a zero mean uniform distribution. Experiments and simulations were performed to show relationships between precision of the microphone location measurement and loss in intelligibility. A variety of microphone array configurations and distracting sources (other interfering speech and white noise) are considered. For speech near the threshold of intelligibility, the results show that microphone position errors with standard deviations less than 1.5cm can limit losses in intelligibility to within 10% of the maximum (perfect microphone placement) for all the microphone distributions examined. Of different array distributions experimented, the linear array tends to be more vulnerable whereas the non-uniform 3D array showed a robust performance to positional errors

    Towards An Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System

    Get PDF
    This thesis presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged autonomous, adaptive, and context aware multimodal system. The design of the proposed cognitively inspired framework is scalable, meaning that it is possible for the techniques used in individual parts of the system to be upgraded and there is scope for the initial framework presented here to be expanded. In the proposed system, the concept of single modality two stage filtering is extended to include the visual modality. Noisy speech information received by a microphone array is first pre-processed by visually derived Wiener filtering employing the novel use of the Gaussian Mixture Regression (GMR) technique, making use of associated visual speech information, extracted using a state of the art Semi Adaptive Appearance Models (SAAM) based lip tracking approach. This pre-processed speech is then enhanced further by audio only beamforming using a state of the art Transfer Function Generalised Sidelobe Canceller (TFGSC) approach. This results in a system which is designed to function in challenging noisy speech environments (using speech sentences with different speakers from the GRID corpus and a range of noise recordings), and both objective and subjective test results (employing the widely used Perceptual Evaluation of Speech Quality (PESQ) measure, a composite objective measure, and subjective listening tests), showing that this initial system is capable of delivering very encouraging results with regard to filtering speech mixtures in difficult reverberant speech environments. Some limitations of this initial framework are identified, and the extension of this multimodal system is explored, with the development of a fuzzy logic based framework and a proof of concept demonstration implemented. Results show that this proposed autonomous,adaptive, and context aware multimodal framework is capable of delivering very positive results in difficult noisy speech environments, with cognitively inspired use of audio and visual information, depending on environmental conditions. Finally some concluding remarks are made along with proposals for future work

    Six Noise Type Military Sound Classifier

    Get PDF
    Blast noise from military installations often has a negative impact on the quality of life of residents living in nearby communities. This negatively impacts the military's testing \& training capabilities due to restrictions, curfews, or range closures enacted to address noise complaints. In order to more directly manage noise around military installations, accurate noise monitoring has become a necessity. Although most noise monitors are simple sound level meters, more recent ones are capable of discerning blasts from ambient noise with some success. Investigators at the University of Pittsburgh previously developed a more advanced noise classifier that can discern between wind, aircraft, and blast noise, while simultaneously lowering the measurement threshold. Recent work will be presented from the development of a more advanced classifier that identifies additional classes of noise such as machine gun fire, vehicles, and thunder. Additional signal metrics were explored given the increased complexity of the classifier. By broadening the types of noise the system can accurately classify and increasing the number of metrics, a new system was developed with increased blast noise accuracy, decreased number of missed events, and significantly fewer false positives

    MICROPHONE ARRAY OPTIMIZATION IN IMMERSIVE ENVIRONMENTS

    Get PDF
    The complex relationship between array gain patterns and microphone distributions limits the application of traditional optimization algorithms on irregular arrays, which show enhanced beamforming performance for human speech capture in immersive environments. This work analyzes the relationship between irregular microphone geometries and spatial filtering performance with statistical methods. Novel geometry descriptors are developed to capture the properties of irregular microphone distributions showing their impact on array performance. General guidelines and optimization methods for regular and irregular array design are proposed in immersive (near-field) environments to obtain superior beamforming ability for speech applications. Optimization times are greatly reduced through the objective function rules using performance-based geometric descriptions of microphone distributions that circumvent direct array gain computations over the space of interest. In addition, probabilistic descriptions of acoustic scenes are introduced to incorporate various levels of prior knowledge for the source distribution. To verify the effectiveness of the proposed optimization methods, simulated gain patterns and real SNR results of the optimized arrays are compared to corresponding traditional regular arrays and arrays obtained from direct exhaustive searching methods. Results show large SNR enhancements for the optimized arrays over arbitrary randomly generated arrays and regular arrays, especially at low microphone densities. The rapid convergence and acceptable processing times observed during the experiments establish the feasibility of proposed optimization methods for array geometry design in immersive environments where rapid deployment is required with limited knowledge of the acoustic scene, such as in mobile platforms and audio surveillance applications

    Timbre Discrimination for Brief Instrument Sounds

    Get PDF

    Speaker gender recognition system

    Get PDF
    Abstract. Automatic gender recognition through speech is one of the fundamental mechanisms in human-machine interaction. Typical application areas of this technology range from gender-targeted advertising to gender-specific IoT (Internet of Things) applications. It can also be used to narrow down the scope of investigations in crime scenarios. There are many possible methods of recognizing the gender of a speaker. In machine learning applications, the first step is to acquire and convert the natural human voice into a form of machine understandable signal. Useful voice features then could be extracted and labelled with gender information so that are then trained by machines. After that, new input voice can be captured and processed and the machine is able to extract the features by pattern modelling. In this thesis, a real-time speaker gender recognition system was designed within Matlab environment. This system could automatically identify the gender of a speaker by voice. The implementation work utilized voice processing and feature extraction techniques to deal with an input speech coming from a microphone or a recorded speech file. The response features are extracted and classified. Then the machine learning classification method (Naïve Bayes Classifier) is used to distinguish the gender features. The recognition result with gender information is then finally displayed. The evaluation of the speaker gender recognition systems was done in an experiment with 40 participants (half male and half female) in a quite small room. The experiment recorded 400 speech samples by speakers from 16 countries in 17 languages. These 400 speech samples were tested by the gender recognition system and showed a considerably good performance, with only 29 errors of recognition (92.75% accuracy). In comparison with previous speaker gender recognition systems, most of them obtained the accuracy no more than 90% and only one obtained 100% accuracy with very limited testers. We can then conclude that the performance of the speaker gender recognition system designed in this thesis is reliable

    Word And Speaker Recognition System

    Get PDF
    In this report, a system which combines user dependent Word Recognition and text dependent speaker recognition is described. Word recognition is the process of converting an audio signal, captured by a microphone, to a word. Speaker Identification is the ability to recognize a person identity base on the specific word he/she uttered. A person's voice contains various parameters that convey information such as gender, emotion, health, attitude and identity. Speaker recognition identifies who is the speaker based on the unique voiceprint from the speech data. Voice Activity Detection (VAD), Spectral Subtraction (SS), Mel-Frequency Cepstrum Coefficient (MFCC), Vector Quantization (VQ), Dynamic Time Warping (DTW) and k-Nearest Neighbour (k-NN) are methods used in word recognition part of the project to implement using MATLAB software. For Speaker Recognition part, Vector Quantization (VQ) is used. The recognition rate for word and speaker recognition system that was successfully implemented is 84.44% for word recognition while for speaker recognition is 54.44%
    corecore