66 research outputs found

    Learning temporal clusters using capsule routing for speech emotion recognition

    Get PDF
    Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirectional long short-term memory network (BLSTM), CNN and Capsule networks. The BLSTM deals with the temporal dynamics of the speech signal by effectively representing forward/backward contextual information while the CNN along with the dynamic routing of the Capsule net learn temporal clusters which altogether provide a state-of-the-art technique for classifying the extracted patterns. The proposed approach was compared with a wide range of architectures on the FAU-Aibo and RAVDESS corpora and remarkable gain over state-of-the-art systems were obtained. For FAO-Aibo and RAVDESS 77.6% and 56.2% accuracy was achieved, respectively, which is 3% and 14% (absolute) higher than the best-reported result for the respective tasks

    Empirical interpretation of speech emotion perception with attention based model for speech emotion Recognition

    Get PDF
    Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. Harmonically structured vowel and consonant sounds add indexical and linguistic cues in spoken information. Previous research argued whether vowel sound cues were more important in carrying the emotional context from a psychological and linguistic point of view. Other research also claimed that emotion information could exist in small overlapping acoustic cues. However, these claims are not corroborated in computational speech emotion recognition systems. In this research, a convolution-based model and a long-short-term memory-based model, both using attention, are applied to investigate these theories of speech emotion on computational models. The role of acoustic context and word importance is demonstrated for the task of speech emotion recognition. The IEMOCAP corpus is evaluated by the proposed models, and 80.1% unweighted accuracy is achieved on pure acoustic data which is higher than current state-of-the-art models on this task. The phones and words are mapped to the attention vectors and it is seen that the vowel sounds are more important for defining emotion acoustic cues than the consonants, and the model can assign word importance based on acoustic context

    Population parameters of Rastrelliger kanagurta (Cuvier, 1816) in the Marudu Bay, Sabah, Malaysia

    Get PDF
    An investigation of the population parameters of Indian mackerel, Rastrelliger kanagurta (Cuvier, 1816) in the Marudu Bay, Sabah, Malaysia was carried out from January to September 2013. The relationship between total length and body weight was estimated as W=0.006TL^3.215 or Log W=3.215LogTL – 2.22 (R^2=0.946). Monthly length frequency data of R. kanagurta were analyzed by FiSAT software to evaluate the mortality rates and its exploitation level. Asymptotic length (L∝) and growth co-efficient (K) were estimated at 27.83 cm and 1.50 yr^-1, respectively. The growth performance index (φ') was calculated as 3.07. Total mortality (Z), natural mortality (M) and fishing mortality (F) was calculated at 4.44 yr^-1, 2.46 yr^-1 and 1.98 yr^-1, respectively. Exploitation level (E) of R. kanagurta was found to be 0.45. The exploitation level was below the optimum level of exploitation (E=0.50). It is revealed that the stock of R. kanagurta was found to be still under exploited in Marudu Bay

    Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

    Get PDF
    Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness

    American sign language posture understanding with deep neural networks

    Get PDF
    Sign language is a visually oriented, natural, nonverbal communication medium. Having shared similar linguistic properties with its respective spoken language, it consists of a set of gestures, postures and facial expressions. Though, sign language is a mode of communication between deaf people, most other people do not know sign language interpretations. Therefore, it would be constructive if we can translate the sign postures artificially. In this paper, a capsule-based deep neural network sign posture translator for an American Sign Language (ASL) fingerspelling (posture), has been presented. The performance validation shows that the approach can successfully identify sign language, with accuracy like 99%. Unlike previous neural network approaches, which mainly used fine-tuning and transfer learning from pre-trained models, the developed capsule network architecture does not require a pre-trained model. The framework uses a capsule network with adaptive pooling which is the key to its high accuracy. The framework is not limited to sign language understanding, but it has scope for non-verbal communication in Human-Robot Interaction (HRI) also

    Measurement of light output of NE213 and NE102A detectors for2.7-14.5 MeV neutrons

    Get PDF
    The light output of 125-mm-diameter NE213 and NE102A detectors has been measured for neutron energies ranging from 2.7 to 14.5 MeV. For neutron energies below 6.14 MeV, measurements were carried out using the neutron time-of-flight spectrum from an Am-Be neutron source, while for proton energies above 6.14 MeV, measurements were carried out using neutrons produced from the T(d,n) reaction. For the NE102A detector the measured light output is in good agreement with the data of R.A. Cecil et al., (1979) but for the NE213 detector the light output is 2-15% lower than that for a similar detector. The NE213 detector light output agrees with the data of V. Verbinski et al. (1968
    corecore