1,378 research outputs found
Saudi Accented Arabic Voice Bank
AbstractThe aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that meet these challenges are highlighted. SAAVB consists of 1033 speakers speak in Modern Standard Arabic with a Saudi accent. The SAAVB content is analyzed and the results are illustrated. The content was verified internally and externally by IBM Cairo and can be used to train speech engines such as automatic speech recognition and speaker verification systems
Proceedings: Voice Technology for Interactive Real-Time Command/Control Systems Application
Speech understanding among researchers and managers, current developments in voice technology, and an exchange of information concerning government voice technology efforts are discussed
Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems
Hybrid speech recognizers, where the estimation of the emission pdf of the states of Hidden Markov Models (HMMs), usually carried out using Gaussian Mixture Models (GMMs), is substituted by Artificial Neural Networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in the paper.This work was supported in part by the regional grant (Comunidad Autónoma de Madrid-UC3M) CCG06-UC3M/TIC-0812 and in part by a project funded by the Spanish Ministry of Science and Innovation (TEC 2008-06382).Publicad
Speech recognition through physical reservoir computing with neuromorphic nanowire networks
The hardware implementation of the reservoir computing paradigm represents a key aspect for taking into advantage of neuromorphic data processing. In this context, self-organised nanonetworks represent a versatile and scalable computational substrate for multiple tasks by exploiting the emerging collective behaviour of the system arising from complexity. The emerging behaviour allows spatio-temporal processing of multiple input signals and relies on the nonlinear interaction in between a multitude of nanoscale memristive elements. By means of a physics-based grid-graph modeling, we report on the implementation of reservoir computing for a speech recognition task in a memristive nanonetwork based on nanowires (NWs) acting as a physical reservoir. Besides analysing the pre-processing step for the transduction of the audio samples in electrical stimuli to be applied to the physical reservoir, we analyse the effect of the network size and the adoption of virtual nodes on computing performances. Results show that memristive nanonetworks allow in materia implementation of reservoir computing for the realisation of brain-inspired neuromorphic systems with reduced training cost
Prosody and speech perception
The major concern of this thesis is with
models of speech perception. Following Gibson's
(1966) work on visual perception, it seeks to establish
whether there are sources of information in the speech
signal which can be responded to directly and which
specify the units of information of speech. The
treatment of intonation follows that of Halliday (1967)
and rhythm that of Abercrombie (1967) . By "prosody"
is taken to mean both the intonational and the
rhythmic aspects of speech.Experiments one to four show the
interdependence of prosody and grammar in the
perception of speech, although they leave open the
question of which sort of information is responded
to first. Experiments five and six, employing a
short-term memory paradigm and Morton's (1970)
"suffix effect" explanation, demonstrate that prosody
could well be responded to before grammar. Since
the previous experiments suggested a close connection
between the two, these results suggest that information
about grammatical structures may well be given
directly by prosody. In qthe final two experiments
the amount of prosodic information in fluent speech
that can be perceived independently of grammar and
meaning is investigated. Although tone -group
division seems to be given clearly enough by acoustic
cues, there are problems of interpretation with the
data on syllable stress assignments.In the concluding chapter, a three-stage
model of speech perception is proposed, following
never (1970), but incorporating prosodic analysis as
an integral part of the processing. The obtained
experimental results are integrated within this
model
Neurocognitive Implications of Tangential Speech in Patients with Focal Brain Damage
There are no studies on the neurocognitive implications of tangential speech (TS). This research aims to take a step forward in the study of narrative processing, by evaluating TS in a sample that helps to detect this deficit when it is neurogenic and recently manifested. The relationship between TS, secondary to focal brain injury, and neuropsychological and neuroanatomical variables was explored. A comprehensive neuropsychological battery was administered to 175 volunteers: 95 alert inpatients, without aphasia, without psychiatric history and without TS history, and 80 healthy participants, without TS. Results: TS (prevalence 16%) was independent of type or site of injury. An adverse effect of TS on global neuropsychological performance was observed. This effect was significantly related to attentional errors along with prolonged processing times but not to correct responses. Reliability and validity indices for the present TS screening scale were provided. Conclusion: Present results support the hypothesis that this neurogenic inability to spontaneously find, organize and communicate verbal information, beyond single words, depends on extended brain networks involving processes such as sustained attention, complex-syntax comprehension, the (implicit) interpretation and spontaneous recall of a narrative, and emotional and behavioral alterations. Early TS detection is advisable for prevention and treatment at any age
The role of explicit memory in syntactic persistence : effects of lexical cueing and load on sentence memory and sentence production
Speakers' memory of sentence structure can persist and modulate the syntactic choices of subsequent utterances (i.e., structural priming). Much research on structural priming posited a multifactorial account by which an implicit learning process and a process related to explicit memory jointly contribute to the priming effect. Here, we tested two predictions from that account: (1) that lexical repetition facilitates the retrieval of sentence structures from memory; (2) that priming is partly driven by a short-term explicit memory mechanism with limited resources. In two pairs of structural priming and sentence structure memory experiments, we examined the effects of structural priming and its modulation by lexical repetition as a function of cognitive load in native Dutch speakers. Cognitive load was manipulated by interspersing the prime and target trials with easy or difficult mathematical problems. Lexical repetition boosted both structural priming (Experiments 1a-2a) and memory for sentence structure (Experiments 1b-2b) and did so with a comparable magnitude. In Experiment 1, there were no load effects, but in Experiment 2, with a stronger manipulation of load, both the priming and memory effects were reduced with a larger cognitive load. The findings support an explicit memory mechanism in structural priming that is cue-dependent and attention-demanding, consistent with a multifactorial account of structural priming
- …