2,622 research outputs found

    The Application of Echo State Networks to Atypical Speech Recognition

    Get PDF
    Automatic speech recognition (ASR) techniques have improved extensively over the past few years with the rise of new deep learning architectures. Recent sequence-to-sequence models have been shown to have high accuracy by utilizing the attention mechanism, which evaluates and learns the magnitude of element relationships in sequences. Despite being highly accurate, commercial ASR models have a weakness when it comes to accessibility. Current commercial deep learning ASR models find difficulty evaluating and transcribing speech for individuals with unique vocal features, such as those with dysarthria, heavy accents, as well as deaf and hard-of-hearing individuals. Current methodologies for processing vocal data revolve around convolutional feature extraction layers, dulling the sequential nature of the data. Alternatively, reservoir computing has gained popularity for the ability to translate input data to changing network states, which preserves the overall feature complexity of the input. Echo state networks (ESN), a type of reservoir computing mechanism employing a random recurrent neural network, have shown promise in a number of time series classification tasks. This work explores the integration of ESNs into deep learning ASR models. The Listen, Attend and Spell, and Transformer models were utilized as a baseline. A novel approach that used the echo state network as a feature extractor was explored and evaluated using the two models as baseline architectures. The models were trained on 960 hours of LibriSpeech audio data and tuned on various atypical speech data, including the Torgo dysarthric speech dataset and University of Memphis SPAL dataset. The ESN-based Echo, Listen, Attend, and Spell model produced more accurate transcriptions when evaluating on the LibriSpeech test set compared to the ESN-based Transformer. The baseline transformer model achieved a 43.4% word error rate on the Torgo test set after full network tuning. A prototype ASR system was developed to utilize both the developed model as well as commercial smart assistant language models. The system operates on a Raspberry Pi 4 using the Assistant Relay framework

    Emotional Voice Areas: Anatomic Location, Functional Properties, and Structural Connections Revealed by Combined fMRI/DTI

    Get PDF
    We determined the location, functional response profile, and structural fiber connections of auditory areas with voice- and emotion-sensitive activity using functional magnetic resonance imaging (fMRI) and diffusion tensor imaging. Bilateral regions responding to emotional voices were consistently found in the superior temporal gyrus, posterolateral to the primary auditory cortex. Event-related fMRI showed stronger responses in these areas to voices-expressing anger, sadness, joy, and relief, relative to voices with neutral prosody. Their neural responses were primarily driven by prosodic arousal, irrespective of valence. Probabilistic fiber tracking revealed direct structural connections of these "emotional voice areas” (EVA) with ipsilateral medial geniculate body, which is the major input source of early auditory cortex, as well as with the ipsilateral inferior frontal gyrus (IFG) and inferior parietal lobe (IPL). In addition, vocal emotions (compared with neutral prosody) increased the functional coupling of EVA with the ipsilateral IFG but not IPL. These results provide new insights into the neural architecture of the human voice processing system and support a crucial involvement of IFG in the recognition of vocal emotions, whereas IPL may subserve distinct auditory spatial functions, consistent with distinct anatomical substrates for the processing of "how” and "where” information within the auditory pathway

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    A Review on EEG Signals Based Emotion Recognition

    Get PDF
    Emotion recognition has become a very controversial issue in brain-computer interfaces (BCIs). Moreover, numerous studies have been conducted in order to recognize emotions. Also, there are several important definitions and theories about human emotions. In this paper we try to cover important topics related to the field of emotion recognition. We review several studies which are based on analyzing electroencephalogram (EEG) signals as a biological marker in emotion changes. Considering low cost, good time and spatial resolution, EEG has become very common and is widely used in most BCI applications and studies. First, we state some theories and basic definitions related to emotions. Then some important steps of an emotion recognition system like different kinds of biologic measurements (EEG, electrocardiogram [EEG], respiration rate, etc), offline vs online recognition methods, emotion stimulation types and common emotion models are described. Finally, the recent and most important studies are reviewed

    A Review on Human-Computer Interaction and Intelligent Robots

    Get PDF
    In the field of artificial intelligence, human–computer interaction (HCI) technology and its related intelligent robot technologies are essential and interesting contents of research. From the perspective of software algorithm and hardware system, these above-mentioned technologies study and try to build a natural HCI environment. The purpose of this research is to provide an overview of HCI and intelligent robots. This research highlights the existing technologies of listening, speaking, reading, writing, and other senses, which are widely used in human interaction. Based on these same technologies, this research introduces some intelligent robot systems and platforms. This paper also forecasts some vital challenges of researching HCI and intelligent robots. The authors hope that this work will help researchers in the field to acquire the necessary information and technologies to further conduct more advanced research

    On the encoding of natural music in computational models and human brains

    Get PDF
    This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music
    • …
    corecore