53,653 research outputs found

    A silent speech system based on permanent magnet articulography and direct synthesis

    Get PDF
    In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

    Anti-spoofing Methods for Automatic SpeakerVerification System

    Full text link
    Growing interest in automatic speaker verification (ASV)systems has lead to significant quality improvement of spoofing attackson them. Many research works confirm that despite the low equal er-ror rate (EER) ASV systems are still vulnerable to spoofing attacks. Inthis work we overview different acoustic feature spaces and classifiersto determine reliable and robust countermeasures against spoofing at-tacks. We compared several spoofing detection systems, presented so far,on the development and evaluation datasets of the Automatic SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge 2015.Experimental results presented in this paper demonstrate that the useof magnitude and phase information combination provides a substantialinput into the efficiency of the spoofing detection systems. Also wavelet-based features show impressive results in terms of equal error rate. Inour overview we compare spoofing performance for systems based on dif-ferent classifiers. Comparison results demonstrate that the linear SVMclassifier outperforms the conventional GMM approach. However, manyresearchers inspired by the great success of deep neural networks (DNN)approaches in the automatic speech recognition, applied DNN in thespoofing detection task and obtained quite low EER for known and un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer and Information Science (CCIS) vol. 66

    The evolution of a visual-to-auditory sensory substitution device using interactive genetic algorithms

    Get PDF
    Sensory Substitution is a promising technique for mitigating the loss of a sensory modality. Sensory Substitution Devices (SSDs) work by converting information from the impaired sense (e.g. vision) into another, intact sense (e.g. audition). However, there are a potentially infinite number of ways of converting images into sounds and it is important that the conversion takes into account the limits of human perception and other user-related factors (e.g. whether the sounds are pleasant to listen to). The device explored here is termed “polyglot” because it generates a very large set of solutions. Specifically, we adapt a procedure that has been in widespread use in the design of technology but has rarely been used as a tool to explore perception – namely Interactive Genetic Algorithms. In this procedure, a very large range of potential sensory substitution devices can be explored by creating a set of ‘genes’ with different allelic variants (e.g. different ways of translating luminance into loudness). The most successful devices are then ‘bred’ together and we statistically explore the characteristics of the selected-for traits after multiple generations. The aim of the present study is to produce design guidelines for a better SSD. In three experiments we vary the way that the fitness of the device is computed: by asking the user to rate the auditory aesthetics of different devices (Experiment 1), by measuring the ability of participants to match sounds to images (Experiment 2) and the ability to perceptually discriminate between two sounds derived from similar images (Experiment 3). In each case the traits selected for by the genetic algorithm represent the ideal SSD for that task. Taken together, these traits can guide the design of a better SSD
    corecore