1,387 research outputs found

    Faculty Publications and Creative Works 2005

    Get PDF
    Faculty Publications & Creative Works is an annual compendium of scholarly and creative activities of University of New Mexico faculty during the noted calendar year. Published by the Office of the Vice President for Research and Economic Development, it serves to illustrate the robust and active intellectual pursuits conducted by the faculty in support of teaching and research at UNM. In 2005, UNM faculty produced over 1,887 works, including 1,887 scholarly papers and articles, 57 books, 127 book chapters, 58 reviews, 68 creative works and 4 patented works. We are proud of the accomplishments of our faculty which are in part reflected in this book, which illustrates the diversity of intellectual pursuits in support of research and education at the University of New Mexico

    Implementation and evaluation of a low complexity microphone array for speaker recognition

    Get PDF
    Includes bibliographical references (leaves 83-86).This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field

    VenoMave: Targeted Poisoning Against Speech Recognition

    Full text link
    The wide adoption of Automatic Speech Recognition (ASR) remarkably enhanced human-machine interaction. Prior research has demonstrated that modern ASR systems are susceptible to adversarial examples, i.e., malicious audio inputs that lead to misclassification by the victim's model at run time. The research question of whether ASR systems are also vulnerable to data-poisoning attacks is still unanswered. In such an attack, a manipulation happens during the training phase: an adversary injects malicious inputs into the training set to compromise the neural network's integrity and performance. Prior work in the image domain demonstrated several types of data-poisoning attacks, but these results cannot directly be applied to the audio domain. In this paper, we present the first data-poisoning attack against ASR, called VenoMave. We evaluate our attack on an ASR system that detects sequences of digits. When poisoning only 0.17% of the dataset on average, we achieve an attack success rate of 86.67%. To demonstrate the practical feasibility of our attack, we also evaluate if the target audio waveform can be played over the air via simulated room transmissions. In this more realistic threat model, VenoMave still maintains a success rate up to 73.33%. We further extend our evaluation to the Speech Commands corpus and demonstrate the scalability of VenoMave to a larger vocabulary. During a transcription test with human listeners, we verify that more than 85% of the original text of poisons can be correctly transcribed. We conclude that data-poisoning attacks against ASR represent a real threat, and we are able to perform poisoning for arbitrary target input files while the crafted poison samples remain inconspicuous

    Reports to the President

    Get PDF
    A compilation of annual reports for the 1989-1990 academic year, including a report from the President of the Massachusetts Institute of Technology, as well as reports from the academic and administrative units of the Institute. The reports outline the year's goals, accomplishments, honors and awards, and future plans

    Reports to the President

    Get PDF
    A compilation of annual reports for the 1999-2000 academic year, including a report from the President of the Massachusetts Institute of Technology, as well as reports from the academic and administrative units of the Institute. The reports outline the year's goals, accomplishments, honors and awards, and future plans

    Speech enhancement using auditory filterbank.

    Get PDF
    This thesis presents a novel subband noise reduction technique for speech enhancement, termed as Adaptive Subband Wiener Filtering (ASWF), based on a critical-band gammatone filterbank. The ASWF is derived from a generalized Subband Wiener Filtering (SWF) equation and reduces noises according to the estimated signal-to-noise ratio (SNR) in each auditory channel and in each time frame. The design of a subband noise estimator, suitable for some real-life noise environments, is also presented. This denoising technique would be beneficial for some auditory-based speech and audio applications, e.g. to enhance the robustness of sound processing in cochlear implants. Comprehensive objective and subjective tests demonstrated the proposed technique is effective to improve the perceptual quality of enhanced speeches. This technique offers a time-domain noise reduction scheme using a linear filterbank structure and can be combined with other filterbank algorithms (such as for speech recognition and coding) as a front-end processing step immediately after the analysis filterbank, to increase the robustness of the respective application.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .G85. Source: Masters Abstracts International, Volume: 44-03, page: 1452. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005

    A novel lip geometry approach for audio-visual speech recognition

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset
    • …
    corecore