5,264 research outputs found

    Tone classification of syllable -segmented Thai speech based on multilayer perceptron

    Get PDF
    Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems

    Computer-Assisted Regulation of Emotional and Social Processes

    Get PDF

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Visual pattern recognition using neural networks

    Get PDF
    Neural networks have been widely studied in a number of fields, such as neural architectures, neurobiology, statistics of neural network and pattern classification. In the field of pattern classification, neural network models are applied on numerous applications, for instance, character recognition, speech recognition, and object recognition. Among these, character recognition is commonly used to illustrate the feature and classification characteristics of neural networks. In this dissertation, the theoretical foundations of artificial neural networks are first reviewed and existing neural models are studied. The Adaptive Resonance Theory (ART) model is improved to achieve more reasonable classification results. Experiments in applying the improved model to image enhancement and printed character recognition are discussed and analyzed. We also study the theoretical foundation of Neocognitron in terms of feature extraction, convergence in training, and shift invariance. We investigate the use of multilayered perceptrons with recurrent connections as the general purpose modules for image operations in parallel architectures. The networks are trained to carry out classification rules in image transformation. The training patterns can be derived from user-defmed transformations or from loading the pair of a sample image and its target image when the prior knowledge of transformations is unknown. Applications of our model include image smoothing, enhancement, edge detection, noise removal, morphological operations, image filtering, etc. With a number of stages stacked up together we are able to apply a series of operations on the image. That is, by providing various sets of training patterns the system can adapt itself to the concatenated transformation. We also discuss and experiment in applying existing neural models, such as multilayered perceptron, to realize morphological operations and other commonly used imaging operations. Some new neural architectures and training algorithms for the implementation of morphological operations are designed and analyzed. The algorithms are proven correct and efficient. The proposed morphological neural architectures are applied to construct the feature extraction module of a personal handwritten character recognition system. The system was trained and tested with scanned image of handwritten characters. The feasibility and efficiency are discussed along with the experimental results

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

    Get PDF
    The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing
    • …
    corecore