33 research outputs found

    Acoustic Approaches to Gender and Accent Identification

    Get PDF
    There has been considerable research on the problems of speaker and language recognition from samples of speech. A less researched problem is that of accent recognition. Although this is a similar problem to language identification, di�erent accents of a language exhibit more fine-grained di�erences between classes than languages. This presents a tougher problem for traditional classification techniques. In this thesis, we propose and evaluate a number of techniques for gender and accent classification. These techniques are novel modifications and extensions to state of the art algorithms, and they result in enhanced performance on gender and accent recognition. The first part of the thesis focuses on the problem of gender identification, and presents a technique that gives improved performance in situations where training and test conditions are mismatched. The bulk of this thesis is concerned with the application of the i-Vector technique to accent identification, which is the most successful approach to acoustic classification to have emerged in recent years. We show that it is possible to achieve high accuracy accent identification without reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis describes various stages in the development of i-Vector based accent classification that improve the standard approaches usually applied for speaker or language identification, which are insu�cient. We demonstrate that very good accent identification performance is possible with acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can obtain from the same data. We claim to have achieved the best accent identification performance on the test corpus for acoustic methods, with up to 90% identification rate. This performance is even better than previously reported acoustic-phonotactic based systems on the same corpus, and is very close to performance obtained via transcription based accent identification. Finally, we demonstrate that the utilization of our techniques for speech recognition purposes leads to considerably lower word error rates. Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British English, Prosody, Speech Recognition

    A Contrastive Study Between RP And GA Segmental Features

    Get PDF
    A CONTRASTIVE STUDY BETWEEN RP AND GA SEGMENTAL FEATURES Aulianisa Netasya Salam Faculty of Teacher Training and Education, Muhammadiyah University of Surakarta [email protected] Dr. Maryadi, M.A Faculty of Teacher Training and Education, Muhammadiyah University of Surakarta [email protected] This research is a contrastive study aimed to describe the similarities and the differences between RP and GA segmental features. This research used descriptive-qualitative method which collected the data from the YouTube video. The study found that the similarities between RP and GA segmental sounds in initial, medial, and final positions are [ɪ], [ə], [eɪ], [ͻɪ], [p], [b], [t], [d], [tʃ], [θ], [g], [f], [v], [s], [z], [ʃ], [m], [n], [l]. The similar sounds found in initial and medial positions are [ӕ], [tʃ], [dȝ], [ð], [h], [w], [j]; in medial and final positions are [aɪ], [k], [ȝ], [ղ]; in initial position is [r] and in medial positions: [ʊ], [ʌ], [ɛ]. Then the differences sound between RP and GA segmental features have been found in initial and medial positions are [ͻ], [ɑ:]; in medial and final positions are [ɪə], [əʊ], in initial position is [ʌ], [eə] whereas in medial position is [ɒ], [i:], [u:], [ͻ:], [ʊə], [t]

    AUTOMATIC IDENTIFICATION OF VIETNAMESE DIALECTS

    Get PDF
    The dialect identification was studied for many languages over the world nevertheless the research on signal processing for Vietnamese dialects is still limited and there were not many published works. There are many different dialects for Vietnamese. The influence of dialectal features on speech recognition systems is important. If the information about dialects is known during speech recognition process, the performance of recognition systems will be better because the corpus of these systems is normally organized according to different dialects. This paper will present the combination of MFCC coefficients and fundamental frequency features of Vietnamese for dialectal identification based on GMM. The experiment result for the dialect corpus of Vietnamese shows that the performance of dialectal identification is increased from 59% for the case using only MFCC coefficients to 71% for the case using MFCC coefficients and the information of fundamental frequency

    Acoustic Features Based Accent Classification of Kashmiri Language using Deep Learning

    Get PDF
    Automatic identification of accents is important in today’s world, where we are souranded by ASR systems. Accent classification is the problem of knowing the native place of a person from the way He/She speaks the language into consideration. Accents are present in almost all the languages and it forms an important part of the language. Accents are produced from prosodic and articulation characteristics; in this research the aim is to classify accents of Kashmir Language. We have considered using the MFCC and Mel spectrograms for our research. A lot of research has been done for languages like English and is being done in this field and many models of machine learning and deep learning have shown state of the art results, but this problem is new for Kashmiri Language. The accents in Kashmir, vary from area to area and we have chosen 6 areas as our classes. We extracted the features from the audio data, converted those features into Images and then used the CNN architectures as our model. This research can be taken as base research for further researches in this language. Our custom models achieved the loss of 0.12 and accuracy of 98.66% on test data using Mel spectrograms, which is our best for our features

    Speakers are more cooperative and less individual when interacting in larger group sizes

    Full text link
    Introduction: Cooperation, acoustically signaled through vocal convergence, is facilitated when group members are more similar. Excessive vocal convergence may, however, weaken individual recognizability. This study aimed to explore whether constraints to convergence can arise in circumstances where interlocutors need to enhance their vocal individuality. Therefore, we tested the effects of group size (3 and 5 interactants) on vocal convergence and individualization in a social communication scenario in which individual recognition by voice is at stake. Methods: In an interactive game, players had to recognize each other through their voices while solving a cooperative task online. The vocal similarity was quantified through similarities in speaker i-vectors obtained through probabilistic linear discriminant analysis (PLDA). Speaker recognition performance was measured through the system Equal Error Rate (EER). Results: Vocal similarity between-speakers increased with a larger group size which indicates a higher cooperative vocal behavior. At the same time, there wasan increase in EER for the same speakers between the smaller and the largergroup size, meaning a decrease in overall recognition performance. Discussion: The decrease in vocal individualization in the larger group size suggests thatingroup cooperation and social cohesion conveyed through acoustic convergence have priority over individualization in larger groups of unacquainted speakers
    corecore