2,041 research outputs found

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Autonomous Learning of Speaker Identity and WiFi Geofence From Noisy Sensor Data

    Get PDF
    A fundamental building block towards intelligent environments is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit unique vocal characteristics as people interact with one another in common spaces. However, manually enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. Instead, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation, e.g., sniffed wireless Media Access Control (MAC) addresses, can we learn to associate a specific identity with a particular voiceprint? To address this problem, this paper advocates an Internet of Things (IoT) solution and proposes to use co-located WiFi as supervisory weak labels to automatically bootstrap the labelling process. In particular, a novel cross-modality labelling algorithm is proposed that jointly optimises the clustering and association process, which solves the inherent mismatching issues arising from heterogeneous sensor data. At the same time, we further propose to reuse the labelled data to iteratively update wireless geofence models and curate device specific thresholds. Extensive experimental results from two different scenarios demonstrate that our proposed method is able to achieve 2-fold improvement in labelling compared with conventional methods and can achieve reliable speaker recognition in the wild

    Orthographic Quality in English as a Second Language

    Get PDF
    Learning new vocabulary words in a second language is a challenge for the adult learner, especially when the second language writing system differs from the first language writing system. According to the lexical quality hypothesis (Perfetti & Hart, 2001), there are three constituents to word-level knowledge: orthographic, phonological, and semantic. A set of studies investigated the nature of orthographic knowledge in advanced learners of English as a second language. In a data mining study, students’ spelling errors were analyzed. Results showed that first language background and second language proficiency have an effect on the rates and types of spelling errors made. In two training interventions, students showed learning gains from two different types of spelling instruction: a form focus condition and a form-meaning integration condition (Norris & Ortega, 2000). In a separate audio dictation task, non-native English speakers were shown to be sensitive to word frequency and age of acquisition but not regularity. In a cross-modal matching task, the same students were most susceptible to transposition foils that preserved target letters but in an incorrect order, and least susceptible to phonological foils that preserved phonological but not orthographic form of the target word. In a spell checking task, students had more difficulty rejecting misspelled words that maintained the phonological form of the target word than misspelled words that did not preserve phonology of the target. Overall, findings suggest that intermediate to advanced learners of English as a second language still show difficulty with the language’s deep orthography, but that they can benefit from minimal amounts of instruction. Furthermore, these students appear to be acquiring orthographic knowledge via exemplar-based rather than rule-based strategies. This research expands upon the lexical quality hypothesis and finds support for the arbitrary mapping hypothesis
    • …
    corecore