9 research outputs found

    Kernel SVM Classifiers based on Fractal Analysis for Estimation of Hearing Loss

    No full text
    Hearing screening consists of analyzing the hearing capacity of an individual, regardless of age. It identifies serious hearing problems, degree, type and cause of the hearing loss and the needs of the person to propose a solution. Auditory evoked potentials (AEPs) which are detected on the EEG auditory cortex area are very small signals in response to a sound stimulus (or electric) from the inner ear to the primary auditory areas of the brain. AEPs are noninvasive methods used to detect hearing disorders and to estimate hearing thresholds level. In this paper, due to the nonlinear characteristics of EEG, Detrented Fluctuation Analysis (DFA) is used to characterize the irregularity or complexity of EEG signals by calculating the Fractal Dimension (FD) from the recorded AEP signals of the impaired hearing and the normal subjects. This is to estimate their hearing threshold. In order to classify both groups, hearing impaired and normal persons, support vector machine (SVM) is used. For comparably evaluating the performance of SVM classifier, three kernel functions: linear, radial basis function (RBF) and polynomial are employed to distinguish normal and the abnormal hearing subjects. Grid search technique is selected to estimate the optimal kernel parameters. Our results indicate that the RBF kernel SVM classifier is promising; it is able to obtain a high training as well as testing classification accuracy

    An Acoustic Study of the Emphatic Occlusive [ṭ] in School-Going Children with Cleft Palate or Cleft Lip

    No full text
    The aim of this acoustic study is to analyse the phoneme [ṭ] produced by school children surgically operated on for the cleft palate or cleft lip, in order to examine their vocal characteristics, to provide speech therapists with numerous concrete analyses of voice and speech, to effectively support them and to prevent some serious outcomes on their psychological and academic development. The motivation for this study was mainly stemming from the difficulties that Algerian schoolchildren with clefts encounter in the pronunciation of this phoneme. To carry out the study, several acoustic parameters were investigated in terms of the fundamental frequency F0, the first three formants F1, F2, and F3, the energy E0, the Voice Onset Time (VOT), the durations [CV] and [V] of the subsequent vowel [a]. For the analysis, further important parameters in the field of pathological speech were deployed, namely the degree of disturbance of F0 (jitter), the degree of disturbance of intensity (shimmer) and the HNR (Harmonics to Noise Ratio). Results revealed disturbance in the values of F1, F2, and F3 and stability in the values of F0. Another important reported aspect is the increase in the value of the VOT due to the difficulties in controlling the plosives’ successive closure and release

    Objective Evaluation of the Pathological Voice Based on Deep Learning Neural Networks in an Algerian hospital environment

    No full text
    In this study, we propose a method based on Recurrent Neural Networks, to objectively evaluate the process of rehabilitation of the pathological voice, in an Algerian clinical environment. We choose Unilateral Laryngeal Paralysis as the pathology of the voice. In this paper, we used a Deep Learning system of pathological voice detection by Long Short Term Memory neural model (LSTM). As the dysphony studied in our work concerns essentially the laryngeal vibration, we choose the acoustic parameters based on the instability of the frequency and the amplitude of the laryngeal vibration: Jitter and Shimmer, Noise parameters and Cepstraux MFCC coefficients (Mel Frequency Cepstral Coefficients). A pathological voice detection rate of 88.65% shows important results brought by the rehabilitation technique adopted in Algerian clinical setting. The exclusive and abusive use of hearing to evaluate the effect of speech rehabilitation in the Algerian hospital environment remains insufficient. It is important to correlate perceptual data with objective methods based on detection and classification methods by introducing relevant acoustic parameters, for an effective and objective management of vocal pathology assessment

    Conditional Random Fields Applied to Arabic Orthographic-Phonetic Transcription

    No full text
    Orthographic-To-Phonetic (O2P) Transcription is the process of learning the relationship between the written word and its phonetic transcription. It is a necessary part of Text-To-Speech (TTS) systems and it plays an important role in handling Out-Of-Vocabulary (OOV) words in Automatic Speech Recognition systems. The O2P is a complex task, because for many languages, the correspondence between the orthography and its phonetic transcription is not completely consistent. Over time, the techniques used to tackle this problem have evolved, from earlier rules based systems to the current more sophisticated machine learning approaches. In this paper, we propose an approach for Arabic O2P Conversion based on a probabilistic method: Conditional Random Fields (CRF). We discuss the results and experiments of this method apply on a pronunciation dictionary of the Most Commonly used Arabic Words, a database that we called (MCAW-Dic). MCAW-Dic contains over 35 000 words in Modern Standard Arabic (MSA) and their pronunciation, a database that we have developed by ourselves assisted by phoneticians and linguists from the University of Tlemcen. The results achieved are very satisfactory and point the way towards future innovations. Indeed, in all our tests, the score was between 11 and 15% error rate on the transcription of phonemes (Phoneme Error Rate). We could improve this result by including a large context, but in this case, we encountered memory limitations and calculation difficulties

    Sound recordings of Speakers with Cleft and Lip Palate

    No full text
       Our corpus consists of six words containing the emphatic phoneme [ṭ] at the beginning of the word, followed by the short vowel [a] and the long vowel [ā] : [ṭabība] (طبيبة)      ; [ṭarīq] (طريق)      ; [ṭamāṭim]( طماطم)      ; [ṭāwila] (طاولة)      ; [ṭā’ira] (طائرة)      ; [ṭāwūs] (طاووس)      . Each word of our corpus is repeated at least three times by each speaker during the recording. twenty eight (28) Pathological speakers with different facial clefts and thirty eight (38) Control speakers participated in the recordings. Pathological speakers Ps were selected by speech therapists on the basis of their progress in rehabilitation. These Ps are divided according to the type of facial cleft into two groups: Seventeen (17) Speakers undergoing speech therapy and eleven (11) having completed their speech therapy. Recall that the classification of clefts that we have kept is that of V. Veau (Fezari et al., 2014). We studied three types of cleft palate: Clefts of the soft and hard palate, up to the      incisive foramen (09 Ps); Clefts of the soft and hard palate extending      unilaterally through alveolus (03 Ps); Clefts of the soft and hard palate extending      bilaterally through alveolus (16 Ps) The recordings of the control speakers were performed under two conditions : Pupils aged between 5 to 11, from "the 1st November, 1954" school in Hammamet -Algiers, were recorded in a small silent room, with an authorization from the administration of the establishment and the parents of the children. Other speakers of different ages, accompanied by their parents, were recorded in a silent room within the university of Algiers 2, while respecting the same recording conditions. For a good quality recording, that is to say faithful and with little parasitic noise, we used the TASCAM DR-05 which is a portable recorder which allows to record audio sequences in (.wav) format of 16 bits with a sampling frequency of 44100 KHz. We manually segmented the recorded sound files using the Praat analysis tool. Usage Notes Our audio sequences are recorded in 16-bit (.wave) at a sampling frequency of 44100 Khz. They are coded in eight bits: : 1 : Speech type 2 ; 3 ; 4 : Code of Speaker 5 : Code of Gender 6 ; 7 ; 8 : Code of spoken word 9 : repetition numbering Speech Type We have four types of speech, this box can have four values : 1, 2, 3 et 4 : Normal Speech (healthy) : 1 Cleft type 2 (Clefts of the soft and hard palate, up to the incisive foramen) : 2 Cleft type 3 (Clefts of the soft and hard palate extending unilaterally through alveolus) : 3 Cleft type 4 (Clefts of the soft and hard palate extending bilaterally through alveolus) : 4 Code of spoken word Our corpus is made up of six words: Coding of the words of the corpus Spoken word Code [ṭabība] : 171 ;  [ṭarīq] : 197 ; [ṭamāṭim] : 022 ; [ṭāwila] : 142 ; [ṭā’ira] : 004 ; [ṭāwūs] : 077.</p
    corecore