2,223 research outputs found

    SAS: A Speaker Verification Spoofing Database Containing Diverse Attacks

    Get PDF
    Due to copyright restrictions, the access to the full text of this article is only available via subscription.This paper presents the first version of a speaker verification spoofing and anti-spoofing database, named SAS corpus. The corpus includes nine spoofing techniques, two of which are speech synthesis, and seven are voice conversion. We design two protocols, one for standard speaker verification evaluation, and the other for producing spoofing materials. Hence, they allow the speech synthesis community to produce spoofing materials incrementally without knowledge of speaker verification spoofing and anti-spoofing. To provide a set of preliminary results, we conducted speaker verification experiments using two state-of-the-art systems. Without any anti-spoofing techniques, the two systems are extremely vulnerable to the spoofing attacks implemented in our SAS corpus.EPSRC ; CAF ; TÜBİTA

    Voice conversion versus speaker verification: an overview

    Get PDF
    A speaker verification system automatically accepts or rejects a claimed identity of a speaker based on a speech sample. Recently, a major progress was made in speaker verification which leads to mass market adoption, such as in smartphone and in online commerce for user authentication. A major concern when deploying speaker verification technology is whether a system is robust against spoofing attacks. Speaker verification studies provided us a good insight into speaker characterization, which has contributed to the progress of voice conversion technology. Unfortunately, voice conversion has become one of the most easily accessible techniques to carry out spoofing attacks; therefore, presents a threat to speaker verification systems. In this paper, we will briefly introduce the fundamentals of voice conversion and speaker verification technologies. We then give an overview of recent spoofing attack studies under different conditions with a focus on voice conversion spoofing attack. We will also discuss anti-spoofing attack measures for speaker verification.Published versio

    Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance

    Get PDF
    Due to copyright restrictions, the access to the full text of this article is only available via subscription.In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.EPSRC ; TÜBİTA

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

    Nonuniform Power Changes and Spatial, Temporal and Spectral Diversity in High Gamma Band (\u3e60 Hz) Signals in Human Electrocorticography

    Get PDF
    High-gamma band: \u3e60Hz) power changes in cortical electrophysiology are a reliable indicator of focal, event-related cortical activity. In spite of discoveries of oscillatory subthreshold and synchronous suprathreshold activity at the cellular level, there is an increasingly popular view that high-gamma band amplitude changes recorded from cellular ensembles are the result of asynchronous firing activity that yields wideband and uniform power increases. Others have demonstrated independence of power changes in the low- and high-gamma bands, but to date, no studies have shown evidence of any such independence above 60Hz. Based on non-uniformities in time-frequency analyses of electrocorticographic: ECoG) signals, we hypothesized that induced high-gamma band: 60-500Hz) power changes are more heterogeneous than currently understood. We quantified this spectral non-uniformity with two different approaches using single-word repetition tasks in human subjects. First, we showed that the functional responsiveness of different ECoG high-gamma sub-bands can discriminate cognitive tasks: e.g., hearing, reading, speaking) and cortical locations. Power changes in these sub-bands of the high-gamma range are consistently present within single trials and have statistically different time courses within the trial structure. Moreover, when consolidated across all subjects within three task-relevant anatomic regions: sensorimotor, Broca\u27s area, and superior temporal gyrus), these behavior- and location- dependent power changes evidenced nonuniform trends across the population of subjects. Second, we studied the dynamics of multiple frequency bands in order to quantify the diversity present in the ECoG signals. Using a matched filter construct and receiver operating characteristic: ROC) analysis we show that power modulations correlated with phonemic content in spoken and heard words are represented diffusely in space, time and frequency. Correlating power modulation in multiple frequency bands above 60 Hz over broad cortical areas, with time varying envelopes significantly improved performed area under the ROC curve scores in phoneme prediction experiments. Finally we show preliminary evidence supporting our hypothesis in microarray ECoG data. Taken together, the nonuniformity of high frequency power changes and the information content captured in the spatio-temporal dynamics of those frequencies suggests that a new approach to evaluating high-gamma band cortical activity is necessary. These findings show that in addition to time and location, frequency is another fundamental dimension of high-gamma dynamics

    “Because It Sounds Right”: A Guiding Light of Speaker Knowledge

    Get PDF
    Approaches to second language teaching have included continuous exposure, grammar lessons, and a various combinations of these methods. Recent studies highlight specific, detailed knowledge, in speakers of a language, of the phonetic and structural information of many kinds of phrases. These include formulaic expressions (idioms, proverbs, conversational speech formulas, expletives), lexical bundles (sentence stems, conventional expressions, discourse organizers), and collocations (a range of other unitary, multiword expressions). These exemplars share the feature of familiarity: they are known and recognized by speakers of a language, and stored in mental representation with their concomitant features of structure, phonetic and prosodic shape, meaning, and use. In addition, the linguistic sciences currently advance the perspective that language competence is constituted by knowledge of constructions at various levels of abstraction, implying a larger role of memory in language competence than previously understood. Performance by persons with neurological disorders reveals specific effects on production of these kinds of phrases. Given the putatively extremely large repertory of known, stored expressions and constructions that have been shown to constitute language representation, a guiding principle of speaker use might be that the expression sounds right, implying special importance to listening exercises in second language learning

    Prolegomena to a neurocomputational architecture for human grammatical encoding and decoding

    No full text
    The study develops a neurocomputational architecture for grammatical processing in language production and language comprehension (grammatical encoding and decoding, respectively). It seeks to answer two questions. First, how is online syntactic structure formation of the complexity required by natural-language grammars possible in a fixed, preexisting neural network without the need for online creation of new connections or associations? Second, is it realistic to assume that the seemingly disparate instantiations of syntactic structure formation in grammatical encoding and grammatical decoding can run on the same neural infrastructure? This issue is prompted by accumulating experimental evidence for the hypothesis that the mechanisms for grammatical decoding overlap with those for grammatical encoding to a considerable extent, thus inviting the hypothesis of a single “grammatical coder.” The paper answers both questions by providing the blueprint for a syntactic structure formation mechanism that is entirely based on prewired circuitry (except for referential processing, which relies on the rapid learning capacity of the hippocampal complex), and can subserve decoding as well as encoding tasks. The model builds on the “Unification Space” model of syntactic parsing developed by Vosse & Kempen (2000, 2008, 2009). The design includes a neurocomputational mechanism for the treatment of an important class of grammatical movement phenomena
    • …
    corecore