Search CORE

123 research outputs found

COMPARISON OF FIVE CLASSIFIERS FOR CLASSIFICATION OF SYLLABLES SOUND USING TIME-FREQUENCY FEATURES

Author: DOMY KRISTOMO
INDAH SOESANTI
RISANURI HIDAYAT
Publication venue: Taylor's University
Publication date: 01/09/2018
Field of study

In a speech recognition and classification system, the step of determining the suitable and reliable classifier is essential in order to obtain optimal classification result. This paper presents Indonesian syllables sound classification by a C4.5 decision tree, a Naive Bayes classifier, a Sequential Minimal Optimization (SMO) algorithm, a Random Forest decision tree, and a Multi-Layer Perceptron (MLP) for classifying twelve classes of syllables. This research applies five different features set, those are combination features of Discrete Wavelet Transform (DWT) with statistical denoted as WS, the Renyi Entropy (RE) features, the combination of Autoregressive Power Spectral Density (AR-PSD) and Statistical denoted as PSDS, the combination of PSDS and the selected features of RE by using Correlation-Based Feature Selection (CFS) denoted as RPSDS, and the combination of DWT, RE, and AR-PSD denoted as WRPSDS. The results show that the classifier of MLP has the highest performance when it is combined with WRPSDS

Directory of Open Access Journals

Syllables sound signal classification using multi-layer perceptron in varying number of hidden-layer and hidden-neuron

Author: Ayadi
Chandra
Dede
Farooq
Hidayat
Kohavi
Kristomo
Kristomo
Kristomo
Král
Rényi
Sharma
Vuppala
Publication venue: 'EDP Sciences'
Publication date: 01/01/2018
Field of study

Crossref

Emotion recognition based on the energy distribution of plosive syllables

Author: Agrima Abdellah
Elmazouzi Laila
Farchi Abdelmajid
Mounir Badia
Mounir Ilham
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2022
Field of study

We usually encounter two problems during speech emotion recognition (SER): expression and perception problems, which vary considerably between speakers, languages, and sentence pronunciation. In fact, finding an optimal system that characterizes the emotions overcoming all these differences is a promising prospect. In this perspective, we considered two emotional databases: Moroccan Arabic dialect emotional database (MADED), and Ryerson audio-visual database on emotional speech and song (RAVDESS) which present notable differences in terms of type (natural/acted), and language (Arabic/English). We proposed a detection process based on 27 acoustic features extracted from consonant-vowel (CV) syllabic units: \ba, \du, \ki, \ta common to both databases. We tested two classification strategies: multiclass (all emotions combined: joy, sadness, neutral, anger) and binary (neutral vs. others, positive emotions (joy) vs. negative emotions (sadness, anger), sadness vs. anger). These strategies were tested three times: i) on MADED, ii) on RAVDESS, iii) on MADED and RAVDESS. The proposed method gave better recognition accuracy in the case of binary classification. The rates reach an average of 78% for the multi-class classification, 100% for neutral vs. other cases, 100% for the negative emotions (i.e. anger vs. sadness), and 96% for the positive vs. negative emotions

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

SELF-LEARNING TECHNIQUES FOR ARABIC SPEECH SEGMENTATION AND RECOGNITION

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex.

Author: Bouchard Kristofer E
Chang Edward F
Livezey Jesse A
Publication venue: eScholarship, University of California
Publication date: 26/03/2018
Field of study

A fundamental challenge in neuroscience is to understand what structure in the world is represented in spatially distributed patterns of neural activity from multiple single-trial measurements. This is often accomplished by learning a simple, linear transformations between neural features and features of the sensory stimuli or motor task. While successful in some early sensory processing areas, linear mappings are unlikely to be ideal tools for elucidating nonlinear, hierarchical representations of higher-order brain areas during complex tasks, such as the production of speech by humans. Here, we apply deep networks to predict produced speech syllables from a dataset of high gamma cortical surface electric potentials recorded from human sensorimotor cortex. We find that deep networks had higher decoding prediction accuracy compared to baseline models. Having established that deep networks extract more task relevant information from neural data sets relative to linear models (i.e., higher predictive accuracy), we next sought to demonstrate their utility as a data analysis tool for neuroscience. We first show that deep network's confusions revealed hierarchical latent structure in the neural data, which recapitulated the underlying articulatory nature of speech motor control. We next broadened the frequency features beyond high-gamma and identified a novel high-gamma-to-beta coupling during speech production. Finally, we used deep networks to compare task-relevant information in different neural frequency bands, and found that the high-gamma band contains the vast majority of information relevant for the speech prediction task, with little-to-no additional contribution from lower-frequency amplitudes. Together, these results demonstrate the utility of deep networks as a data analysis tool for basic and applied neuroscience

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

Multi-Stage Based Feature Extraction Methods for Uyghur Handwriting Based Writer Identification

Author: Andy Adler
Kurban Ubul
Mamatjan Yasin
Publication venue: 'IntechOpen'
Publication date: 21/03/2012
Field of study

IntechOpen

Analysis of momentous fragmentary formants in Talaqi-like neoteric assessment of Quran recitation using MFCC miniature features of Quranic syllables

Author: Abas Hafiza
Adam Mohamad Zulkefli
Azizan Azizul
Shafie Noraimi
Publication venue: 'The Science and Information Organization'
Publication date: 01/10/2021
Field of study

The use of technological speech recognition systems with a variety of approaches and techniques has grown rapidly in a variety of human-machine interaction applications. Further to this, a computerized assessment system to identify errors in reading the Qur’an can be developed to practice the advantages of technology that exist today. Based on Quranic syllable utterances, which contain Tajweed rules that generally consist of Makhraj (articulation process), Sifaat (letter features or pronunciation) and Harakat (pronunciation extension), this paper attempts to present the technological capabilities in realizing Quranic recitation assessment. The transformation of the digital signal of the Quranic voice with the identification of reading errors (based on the Law of Tajweed) is the main focus of this paper. This involves many stages in the process related to the representation of Quranic syllable-based Recitation Speech Signal (QRSS), feature extraction, non-phonetic transcription Quranic Recitation Acoustic Model (QRAM), and threshold classification processes. MFCC-Formants are used in a miniature state that are hybridized with three bands in representing QRSS combined vowels and consonants. A human-guided threshold classification approach is used to assess recitation based on Quranic syllables and threshold classification performance for the low, medium, and high band groups with performances of 87.27%, 86.86%and 86.33%, respectively

Universiti Teknologi Malaysia Institutional Repository

Classification of stress based on speech features

Author: Jasim Arshed Ahmed
Publication venue
Publication date: 01/01/2014
Field of study

Contemporary life is filled with challenges, hassles, deadlines, disappointments, and endless demands. The consequent of which might be stress. Stress has become a global phenomenon that is been experienced in our modern daily lives. Stress might play a significant role in psychological and/or behavioural disorders like anxiety or depression. Hence early detection of the signs and symptoms of stress is an antidote towards reducing its harmful effects and high cost of stress management efforts. This research work thereby presented Automatic Speech Recognition (ASR) technique to stress detection as a better alternative to other approaches such as chemical analysis, skin conductance, electrocardiograms that are obtrusive, intrusive, and also costly. Two set of voice data was recorded from ten Arabs students at Universiti Utara Malaysia (UUM) in neural and stressed mode. Speech features of fundamental, frequency (f0); formants (F1, F2, and F3), energy and Mel-Frequency Cepstral Coefficients (MFCC) were extracted and classified by K-nearest neighbour, Linear Discriminant Analysis and Artificial Neural Network. Result from average value of fundamental frequency reveals that stress is highly correlated with increase in fundamental frequency value. Of the three classifiers, K-nearest neighbor (KNN) performance is best followed by linear discriminant analysis (LDA) while artificial neural network (ANN) shows the least performance. Stress level classification into low, medium and high was done based of the classification result of KNN. This research shows the viability of ASR as better means of stress detection and classification

Universiti Utara Malaysia: UUM eTheses