2,329 research outputs found

    Progress in Speech Recognition for Romanian Language

    Get PDF

    A Neural Model for Self Organizing Feature Detectors and Classifiers in a Network Hierarchy

    Full text link
    Many models of early cortical processing have shown how local learning rules can produce efficient, sparse-distributed codes in which nodes have responses that are statistically independent and low probability. However, it is not known how to develop a useful hierarchical representation, containing sparse-distributed codes at each level of the hierarchy, that incorporates predictive feedback from the environment. We take a step in that direction by proposing a biologically plausible neural network model that develops receptive fields, and learns to make class predictions, with or without the help of environmental feedback. The model is a new type of predictive adaptive resonance theory network called Receptive Field ARTMAP, or RAM. RAM self organizes internal category nodes that are tuned to activity distributions in topographic input maps. Each receptive field is composed of multiple weight fields that are adapted via local, on-line learning, to form smooth receptive ftelds that reflect; the statistics of the activity distributions in the input maps. When RAM generates incorrect predictions, its vigilance is raised, amplifying subtractive inhibition and sharpening receptive fields until the error is corrected. Evaluation on several classification benchmarks shows that RAM outperforms a related (but neurally implausible) model called Gaussian ARTMAP, as well as several standard neural network and statistical classifters. A topographic version of RAM is proposed, which is capable of self organizing hierarchical representations. Topographic RAM is a model for receptive field development at any level of the cortical hierarchy, and provides explanations for a variety of perceptual learning data.Defense Advanced Research Projects Agency and Office of Naval Research (N00014-95-1-0409

    A hybrid neural network based speech recognition system for pervasive environments

    Get PDF
    One of the major drawbacks to using speech as the input to any pervasive environment is the requirement to balance accuracy with the high processing overheads involved. This paper presents an Arabic speech recognition system (called UbiqRec), which address this issue by providing a natural and intuitive way of communicating within ubiquitous environments, while balancing processing time, memory and recognition accuracy. A hybrid approach has been used which incorporates spectrographic information, singular value decomposition, concurrent self-organizing maps (CSOM) and pitch contours for Arabic phoneme recognition. The approach employs separate self-organizing maps (SOM) for each Arabic phoneme joined in parallel to form a CSOM. The performance results confirm that with suitable preprocessing of data, including extraction of distinct power spectral densities (PSD) and singular value decomposition, the training time for CSOM was reduced by 89%. The empirical results also proved that overall recognition accuracy did not fall below 91%

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning

    Full text link
    Cortical plasticity is one of the main features that enable our ability to learn and adapt in our environment. Indeed, the cerebral cortex self-organizes itself through structural and synaptic plasticity mechanisms that are very likely at the basis of an extremely interesting characteristic of the human brain development: the multimodal association. In spite of the diversity of the sensory modalities, like sight, sound and touch, the brain arrives at the same concepts (convergence). Moreover, biological observations show that one modality can activate the internal representation of another modality when both are correlated (divergence). In this work, we propose the Reentrant Self-Organizing Map (ReSOM), a brain-inspired neural system based on the reentry theory using Self-Organizing Maps and Hebbian-like learning. We propose and compare different computational methods for unsupervised learning and inference, then quantify the gain of the ReSOM in a multimodal classification task. The divergence mechanism is used to label one modality based on the other, while the convergence mechanism is used to improve the overall accuracy of the system. We perform our experiments on a constructed written/spoken digits database and a DVS/EMG hand gestures database. The proposed model is implemented on a cellular neuromorphic architecture that enables distributed computing with local connectivity. We show the gain of the so-called hardware plasticity induced by the ReSOM, where the system's topology is not fixed by the user but learned along the system's experience through self-organization.Comment: Preprin

    Speaker Identification and Spoken word Recognition in Noisy Environment using Different Techniques

    Get PDF
    In this work, an attempt is made to design ASR systems through software/computer programs which would perform Speaker Identification, Spoken word recognition and combination of both speaker identification and Spoken word recognition in general noisy environment. Automatic Speech Recognition system is designed for Limited vocabulary of Telugu language words/control commands. The experiments are conducted to find the better combination of feature extraction technique and classifier model that will perform well in general noisy environment (Home/Office environment where noise is around 15-35 dB). A recently proposed features extraction technique Gammatone frequency coefficients which is reported as the best fit to the human auditory system is chosen for the experiments along with the more common feature extraction techniques MFCC and PLP as part of Front end process (i.e. speech features extraction). Two different Artificial Neural Network classifiers Learning Vector Quantization (LVQ) neural networks and Radial Basis Function (RBF) neural networks along with Hidden Markov Models (HMMs) are chosen for the experiments as part of Back end process (i.e. training/modeling the ASRs). The performance of different ASR systems that are designed by utilizing the 9 different combinations (3 feature extraction techniques and 3 classifier models) are analyzed in terms of spoken word recognition and speaker identification accuracy success rate, design time of ASRs, and recognition / identification response time .The testing speech samples are recorded in general noisy conditions i.e.in the existence of air conditioning noise, fan noise, computer key board noise and far away cross talk noise. ASR systems designed and analyzed programmatically in MATLAB 2013(a) Environment

    NASA JSC neural network survey results

    Get PDF
    A survey of Artificial Neural Systems in support of NASA's (Johnson Space Center) Automatic Perception for Mission Planning and Flight Control Research Program was conducted. Several of the world's leading researchers contributed papers containing their most recent results on artificial neural systems. These papers were broken into categories and descriptive accounts of the results make up a large part of this report. Also included is material on sources of information on artificial neural systems such as books, technical reports, software tools, etc

    Developmental refinement of cortical systems for speech and voice processing

    Get PDF
    Development typically leads to optimized and adaptive neural mechanisms for the processing of voice and speech. In this fMRI study we investigated how this adaptive processing reaches its mature efficiency by examining the effects of task, age and phonological skills on cortical responses to voice and speech in children (8-9years), adolescents (14-15years) and adults. Participants listened to vowels (/a/, /i/, /u/) spoken by different speakers (boy, girl, man) and performed delayed-match-to-sample tasks on vowel and speaker identity. Across age groups, similar behavioral accuracy and comparable sound evoked auditory cortical fMRI responses were observed. Analysis of task-related modulations indicated a developmental enhancement of responses in the (right) superior temporal cortex during the processing of speaker information. This effect was most evident through an analysis based on individually determined voice sensitive regions. Analysis of age effects indicated that the recruitment of regions in the temporal-parietal cortex and posterior cingulate/cingulate gyrus decreased with development. Beyond age-related changes, the strength of speech-evoked activity in left posterior and right middle superior temporal regions significantly scaled with individual differences in phonological skills. Together, these findings suggest a prolonged development of the cortical functional network for speech and voice processing. This development includes a progressive refinement of the neural mechanisms for the selection and analysis of auditory information relevant to the ongoing behavioral task

    Learning to Behave: Internalising Knowledge

    Get PDF
    corecore