605 research outputs found

    A Hidden Conditional Random Field-Based Approach for Thai Tone Classification

    Get PDF
    In Thai, tonal information is a crucial component for identifying the lexical meaning of a word. Consequently, Thai tone classification can obviously improve performance of Thai speech recognition system. In this article, we therefore reported our study of Thai tone classification. Based on our investigation, most of Thai tone classification studies relied on statistical machine learning approaches, especially the Artificial Neural Network (ANN)-based approach and the Hidden Markov Model (HMM)-based approach. Although both approaches gave reasonable performances, they had some limitations due to their mathematical models. We therefore introduced a novel approach for Thai tone classification using a Hidden Conditional Random Field (HCRF)-based approach. In our study, we also investigated tone configurations involving tone features, frequency scaling and normalization techniques in order to fine tune performances of Thai tone classification. Experiments were conducted in both isolated word scenario and continuous speech scenario. Results showed that the HCRF-based approach with the feature F_dF_aF, ERB-rate scaling and a z-score normalization technique yielded the highest performance and outperformed a baseline using the ANN-based approach, which had been reported as the best for the Thai tone classification, in both scenarios. The best performance of HCRF-based approach provided the error rate reduction of 10.58% and 12.02% for isolated word scenario and continuous speech scenario respectively when comparing with the best result of baselines

    Standard Yorùbá context dependent tone identification using Multi-Class Support Vector Machine (MSVM)

    Get PDF
    Most state-of-the-art large vocabulary continuous speech recognition systems employ context dependent (CD) phone units, however, the CD phone units are not efficient in capturing long-term spectral dependencies of tone in most tone languages. The Standard Yorùbá (SY) is a language composed of syllable with tones and requires different method for the acoustic modeling. In this paper, a context dependent tone acoustic model was developed. Tone unit is assumed as syllables, amplitude magnified difference function (AMDF) was used to derive the utterance wide F contour, followed by automatic syllabification and tri-syllable forced alignment with speech phonetization alignment and syllabification SPPAS tool. For classification of the context dependent (CD) tone, slope and intercept of F values were extracted from each segmented unit. Supervised clustering scheme was utilized to partition CD tri-tone based on category and normalized based on some statistics to derive the acoustic feature vectors. Multi-class support vector machine (MSVM) was used for tri-tone training. From the experimental results, it was observed that the word recognition accuracy obtained from the MSVM tri-tone system based on dynamic programming tone embedded features was comparable with phone features. A best parameter tuning was obtained for 10-fold cross validation and overall accuracy was 97.5678%. In term of word error rate (WER), the MSVM CD tri-tone system outperforms the hidden Markov model tri-phone system with WER of 44.47%.Keywords: Syllabification, Standard Yorùbá, Context Dependent Tone, Tri-tone Recognitio

    Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech

    Get PDF
    Segment-based speech recognition has shown to be a competitive alternative to the state-of-the-art HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segment-based segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    StyloThai: A scalable framework for stylometric authorship identification of Thai documents

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in January 2020, available online: https://doi.org/10.1145/3365832 The accepted version of the publication may differ from the final published version.© 2020 Association for Computing Machinery. All rights reserved. Authorship identification helps to identify the true author of a given anonymous document from a set of candidate authors. The applications of this task can be found in several domains, such as law enforcement agencies and information retrieval. These application domains are not limited to a specific language, community, or ethnicity. However, most of the existing solutions are designed for English, and a little attention has been paid to Thai. These existing solutions are not directly applicable to Thai due to the linguistic differences between these two languages. Moreover, the existing solution designed for Thai is unable to (i) handle outliers in the dataset, (ii) scale when the size of the candidate authors set increases, and (iii) perform well when the number of writing samples for each candidate author is low.We identify a stylometric feature space for the Thai authorship identification task. Based on our feature space, we present an authorship identification solution that uses the probabilistic k nearest neighbors classifier by transforming each document into a collection of point sets. Specifically, this document transformation allows us to (i) use set distance measures associated with an outlier handling mechanism, (ii) capture stylistic variations within a document, and (iii) produce multiple predictions for a query document. We create a new Thai authorship identification corpus containing 547 documents from 200 authors, which is significantly larger than the corpus used by the existing study (an increase of 32 folds in terms of the number of candidate authors). The experimental results show that our solution can overcome the limitations of the existing solution and outperforms all competitors with an accuracy level of 91.02%. Moreover, we investigate the effectiveness of each stylometric features category with the help of an ablation study. We found that combining all categories of the stylometric features outperforms the other combinations. Finally, we cross compare the feature spaces and classification methods of all solutions. We found that (i) our solution can scale as the number of candidate authors increases, (ii) our method outperforms all the competitors, and (iii) our feature space provides better performance than the feature space used by the existing study.The research was partially supported by the Digital Economy Promotion Agency (project# MP-62- 0003); and Thailand Research Fund and Office of the Higher Education Commission (MRG6180266).Published versio

    Hidden Markov Models in Dynamic System Modelling and Diagnosis

    Get PDF

    Language Profiles Of Thai Children With Autism: Lexical, Grammatical, And Pragmatic Factors

    Get PDF
    This dissertation is a linguistically-motivated investigation into different areas of language in children with autism spectrum disorders (ASD), compared to typically developing (TD) children. Fine distinctions between linguistic units were used in designing tasks on language production and comprehension in seven experiments. The focus of each chapter of this dissertation was on three main hypotheses respectively, namely (1) the Abstract Representation Difficulty Hypothesis that children with ASD (perhaps limited to the subgroup with co-morbid language impairments) have difficulties activating abstract lexical representations as effectively as TD children, due to their hyperattention to phonetic details of speech, (2) the Pragmatic over Grammatical Deficit Hypothesis that pragmatics is particularly difficult for all the ASD children, while morphological and semantic aspects of language are relatively intact, and (3) the Cognitive Factor Hypothesis that cognitive factors such as nonverbal intelligence quotient (NVIQ) and nonverbal working memory play a greater role in the ASD than the TD performance on linguistic tasks. Chapter 2 investigates the morpho-phonological and semantic aspects of the lexical processing of Thai compound and simplex words. Results suggest that morphological facilitation effects can be obtained independently of phonological and semantic relatedness in the processing of Thai compounds. While children with ASD with lower task performance display hyper-attention to the acoustic differences between primes and targets, children with ASD in the higher performance group have enhanced morphological effects, compared to their TD peers, and the effects appear to be independent of the presence of phonological effects and enhanced semantic effects. The lack of phonological effects in the first set of experiments was explored further in the later experiments. Children with ASD were found to be slower in processing natural-sounding surface phonological forms, suggesting that a deeper processing of neutralized forms than full forms. The similar performance on the next task with the integration of visual information suggests that the slower processing may result from their slower lexical semantic processing. The Abstract Representation Difficulty Hypothesis, thus, holds for a subgroup of children with ASD, while other children with ASD display intact phonological representation, enhanced morphological processing compared to TD controls, and intact but slower lexical processing. Chapter 3 explores the Pragmatic over Grammatical Deficits Hypothesis. Using fine distinctions within the personal reference terms, consistently replicated results suggest that while grammatical person phi-features are intact in children with ASD\u27s representation of pronouns, these children are less sensitive to deictic information in their interpretation of pronouns and tend to avoid using the first-person pronoun, with high deictic level, when they have freedom to choose personal names to refer to themselves. Children with ASD also performed more poorly on the comprehension of unmarked pronouns which requires implicated presupposition, suggesting that even with minimal comparisons among the pronouns, lexically-encoded core grammatical features and pragmatic ones are distinguished in children\u27s language processing. Chapter 3 also adds to the literature on lexical presuppositions, scalar implicature, and implicated presuppositions that not only adolescents, but also children with ASD are age-appropriate in deriving scalar implicatures and that not all kinds of pragmatic inferences are equally challenging for children with ASD. The most indicative difference between the children with ASD and the TD group lies in the children with ASD\u27s heavier reliance on literal, logical meaning when other semantically- and pragmatically-inferred meanings are violated. Chapter 4 partly contributes to the Cognitive Factor Hypothesis, suggesting a possibility that cognitive factors, as opposed to developmental factors, correlates more with children with ASD\u27s performance on linguistic tasks. Additionally, children in both groups displayed correlations in their performance across all of the experiment in the dissertation. Individual language profiles were compiled with the results from the previous chapters. Two subgroups of children with ASD were identified through k-means cluster analysis. The children with ASD in Cluster 1 have globally better performance across experiments than children with ASD in Cluster 2, supporting that ASD children may be able to be classified into subgroups based on their performance on linguistic tasks alone. Even with globally better linguistic task performance, the children with ASD in Cluster 1 still appear to be less sensitive to social-deictic information, confirming that certain types of pragmatics are indeed more challenging than the others. In sum, this dissertation advances our understanding on morphological, semantic, and pragmatic abilities of children with autism through carefully-designed linguistically-motivated experiments
    corecore