1,378 research outputs found

    Saudi Accented Arabic Voice Bank

    Get PDF
    AbstractThe aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that meet these challenges are highlighted. SAAVB consists of 1033 speakers speak in Modern Standard Arabic with a Saudi accent. The SAAVB content is analyzed and the results are illustrated. The content was verified internally and externally by IBM Cairo and can be used to train speech engines such as automatic speech recognition and speaker verification systems

    Proceedings: Voice Technology for Interactive Real-Time Command/Control Systems Application

    Get PDF
    Speech understanding among researchers and managers, current developments in voice technology, and an exchange of information concerning government voice technology efforts are discussed

    Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

    Get PDF
    Hybrid speech recognizers, where the estimation of the emission pdf of the states of Hidden Markov Models (HMMs), usually carried out using Gaussian Mixture Models (GMMs), is substituted by Artificial Neural Networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in the paper.This work was supported in part by the regional grant (Comunidad Autónoma de Madrid-UC3M) CCG06-UC3M/TIC-0812 and in part by a project funded by the Spanish Ministry of Science and Innovation (TEC 2008-06382).Publicad

    Speech recognition through physical reservoir computing with neuromorphic nanowire networks

    Get PDF
    The hardware implementation of the reservoir computing paradigm represents a key aspect for taking into advantage of neuromorphic data processing. In this context, self-organised nanonetworks represent a versatile and scalable computational substrate for multiple tasks by exploiting the emerging collective behaviour of the system arising from complexity. The emerging behaviour allows spatio-temporal processing of multiple input signals and relies on the nonlinear interaction in between a multitude of nanoscale memristive elements. By means of a physics-based grid-graph modeling, we report on the implementation of reservoir computing for a speech recognition task in a memristive nanonetwork based on nanowires (NWs) acting as a physical reservoir. Besides analysing the pre-processing step for the transduction of the audio samples in electrical stimuli to be applied to the physical reservoir, we analyse the effect of the network size and the adoption of virtual nodes on computing performances. Results show that memristive nanonetworks allow in materia implementation of reservoir computing for the realisation of brain-inspired neuromorphic systems with reduced training cost

    Max-Planck-Institute for Psycholinguistics: Annual Report 2001

    No full text

    Prosody and speech perception

    Get PDF
    The major concern of this thesis is with models of speech perception. Following Gibson's (1966) work on visual perception, it seeks to establish whether there are sources of information in the speech signal which can be responded to directly and which specify the units of information of speech. The treatment of intonation follows that of Halliday (1967) and rhythm that of Abercrombie (1967) . By "prosody" is taken to mean both the intonational and the rhythmic aspects of speech.Experiments one to four show the interdependence of prosody and grammar in the perception of speech, although they leave open the question of which sort of information is responded to first. Experiments five and six, employing a short-term memory paradigm and Morton's (1970) "suffix effect" explanation, demonstrate that prosody could well be responded to before grammar. Since the previous experiments suggested a close connection between the two, these results suggest that information about grammatical structures may well be given directly by prosody. In qthe final two experiments the amount of prosodic information in fluent speech that can be perceived independently of grammar and meaning is investigated. Although tone -group division seems to be given clearly enough by acoustic cues, there are problems of interpretation with the data on syllable stress assignments.In the concluding chapter, a three-stage model of speech perception is proposed, following never (1970), but incorporating prosodic analysis as an integral part of the processing. The obtained experimental results are integrated within this model

    Neurocognitive Implications of Tangential Speech in Patients with Focal Brain Damage

    Get PDF
    There are no studies on the neurocognitive implications of tangential speech (TS). This research aims to take a step forward in the study of narrative processing, by evaluating TS in a sample that helps to detect this deficit when it is neurogenic and recently manifested. The relationship between TS, secondary to focal brain injury, and neuropsychological and neuroanatomical variables was explored. A comprehensive neuropsychological battery was administered to 175 volunteers: 95 alert inpatients, without aphasia, without psychiatric history and without TS history, and 80 healthy participants, without TS. Results: TS (prevalence 16%) was independent of type or site of injury. An adverse effect of TS on global neuropsychological performance was observed. This effect was significantly related to attentional errors along with prolonged processing times but not to correct responses. Reliability and validity indices for the present TS screening scale were provided. Conclusion: Present results support the hypothesis that this neurogenic inability to spontaneously find, organize and communicate verbal information, beyond single words, depends on extended brain networks involving processes such as sustained attention, complex-syntax comprehension, the (implicit) interpretation and spontaneous recall of a narrative, and emotional and behavioral alterations. Early TS detection is advisable for prevention and treatment at any age

    The role of explicit memory in syntactic persistence : effects of lexical cueing and load on sentence memory and sentence production

    Get PDF
    Speakers' memory of sentence structure can persist and modulate the syntactic choices of subsequent utterances (i.e., structural priming). Much research on structural priming posited a multifactorial account by which an implicit learning process and a process related to explicit memory jointly contribute to the priming effect. Here, we tested two predictions from that account: (1) that lexical repetition facilitates the retrieval of sentence structures from memory; (2) that priming is partly driven by a short-term explicit memory mechanism with limited resources. In two pairs of structural priming and sentence structure memory experiments, we examined the effects of structural priming and its modulation by lexical repetition as a function of cognitive load in native Dutch speakers. Cognitive load was manipulated by interspersing the prime and target trials with easy or difficult mathematical problems. Lexical repetition boosted both structural priming (Experiments 1a-2a) and memory for sentence structure (Experiments 1b-2b) and did so with a comparable magnitude. In Experiment 1, there were no load effects, but in Experiment 2, with a stronger manipulation of load, both the priming and memory effects were reduced with a larger cognitive load. The findings support an explicit memory mechanism in structural priming that is cue-dependent and attention-demanding, consistent with a multifactorial account of structural priming
    corecore