355 research outputs found

    Exploiting 2-Dimensional Source Correlation in Channel Decoding with Parameter Estimation

    Get PDF
    Traditionally, it is assumed that source coding is perfect and therefore, the redundancy of the source encoded bit-stream is zero. However, in reality, this is not the case as the existing source encoders are imperfect and yield residual redundancy at the output. The residual redundancy can be exploited by using Joint Source Channel Coding (JSCC) with Markov chain as the source. In several studies, the statistical knowledge of the sources has been assumed to be perfectly available at the receiver. Although the result was better in terms of the BER performance, practically, the source correlation knowledge were not always available at the receiver and thus, this could affect the reliability of the outcome. The source correlation on all rows and columns of the 2D sources were well exploited by using a modified Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm in the decoder. A parameter estimation technique was used jointly with the decoder to estimate the source correlation knowledge. Hence, this research aims to investigate the parameter estimation for 2D JSCC system which reflects a practical scenario where the source correlation knowledge are not always available. We compare the performance of the proposed joint decoding and estimation technique with the ideal 2D JSCC system with perfect knowledge of the source correlation knowledge. Simulation results reveal that our proposed coding scheme performs very close to the ideal 2D JSCC system

    Word alignment and smoothing methods in statistical machine translation: Noise, prior knowledge and overfitting

    Get PDF
    This thesis discusses how to incorporate linguistic knowledge into an SMT system. Although one important category of linguistic knowledge is that obtained by a constituent / dependency parser, a POS / super tagger, and a morphological analyser, linguistic knowledge here includes larger domains than this: Multi-Word Expressions, Out-Of-Vocabulary words, paraphrases, lexical semantics (or non-literal translations), named-entities, coreferences, and transliterations. The first discussion is about word alignment where we propose a MWE-sensitive word aligner. The second discussion is about the smoothing methods for a language model and a translation model where we propose a hierarchical Pitman-Yor process-based smoothing method. The common grounds for these discussion are the examination of three exceptional cases from real-world data: the presence of noise, the availability of prior knowledge, and the problem of underfitting. Notable characteristics of this design are the careful usage of (Bayesian) priors in order that it can capture both frequent and linguistically important phenomena. This can be considered to provide one example to solve the problems of statistical models which often aim to learn from frequent examples only, and often overlook less frequent but linguistically important phenomena

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    TOWARD INTELLIGENT WELDING BY BUILDING ITS DIGITAL TWIN

    Get PDF
    To meet the increasing requirements for production on individualization, efficiency and quality, traditional manufacturing processes are evolving to smart manufacturing with the support from the information technology advancements including cyber-physical systems (CPS), Internet of Things (IoT), big industrial data, and artificial intelligence (AI). The pre-requirement for integrating with these advanced information technologies is to digitalize manufacturing processes such that they can be analyzed, controlled, and interacted with other digitalized components. Digital twin is developed as a general framework to do that by building the digital replicas for the physical entities. This work takes welding manufacturing as the case study to accelerate its transition to intelligent welding by building its digital twin and contributes to digital twin in the following two aspects (1) increasing the information analysis and reasoning ability by integrating deep learning; (2) enhancing the human user operative ability to physical welding manufacturing via digital twins by integrating human-robot interaction (HRI). Firstly, a digital twin of pulsed gas tungsten arc welding (GTAW-P) is developed by integrating deep learning to offer the strong feature extraction and analysis ability. In such a system, the direct information including weld pool images, arc images, welding current and arc voltage is collected by cameras and arc sensors. The undirect information determining the welding quality, i.e., weld joint top-side bead width (TSBW) and back-side bead width (BSBW), is computed by a traditional image processing method and a deep convolutional neural network (CNN) respectively. Based on that, the weld joint geometrical size is controlled to meet the quality requirement in various welding conditions. In the meantime, this developed digital twin is visualized to offer a graphical user interface (GUI) to human users for their effective and intuitive perception to physical welding processes. Secondly, in order to enhance the human operative ability to the physical welding processes via digital twins, HRI is integrated taking virtual reality (VR) as the interface which could transmit the information bidirectionally i.e., transmitting the human commends to welding robots and visualizing the digital twin to human users. Six welders, skilled and unskilled, tested this system by completing the same welding job but demonstrate different patterns and resulted welding qualities. To differentiate their skill levels (skilled or unskilled) from their demonstrated operations, a data-driven approach, FFT-PCA-SVM as a combination of fast Fourier transform (FFT), principal component analysis (PCA), and support vector machine (SVM) is developed and demonstrates the 94.44% classification accuracy. The robots can also work as an assistant to help the human welders to complete the welding tasks by recognizing and executing the intended welding operations. This is done by a developed human intention recognition algorithm based on hidden Markov model (HMM) and the welding experiments show that developed robot-assisted welding can help to improve welding quality. To further take the advantages of the robots i.e., movement accuracy and stability, the role of the robot upgrades to be a collaborator from an assistant to complete a subtask independently i.e., torch weaving and automatic seam tracking in weaving GTAW. The other subtask i.e., welding torch moving along the weld seam is completed by the human users who can adjust the travel speed to control the heat input and ensure the good welding quality. By doing that, the advantages of humans (intelligence) and robots (accuracy and stability) are combined together under this human-robot collaboration framework. The developed digital twin for welding manufacturing helps to promote the next-generation intelligent welding and can be applied in other similar manufacturing processes easily after small modifications including painting, spraying and additive manufacturing

    Application of Weighted Voting Taggers to Languages Described with Large Tagsets

    Get PDF
    The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets

    Applications of broad class knowledge for noise robust speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 157-164).This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. These classes are formed by grouping together parts of the acoustic signal that have similar temporal and spectral characteristics, and therefore have much less variability than typical sub-word units used in speech recognition (i.e., phonemes, acoustic units). We explore broad classes formed along phonetic and acoustic dimensions. This thesis first introduces an instantaneous adaptation technique to robustly recognize broad classes in the input signal. Given an initial set of broad class models and input speech data, we explore a gradient steepness metric using the Extended Baum-Welch (EBW) transformations to explain how much these initial model must be adapted to fit the target data. We incorporate this gradient metric into a Hidden Markov Model (HMM) framework for broad class recognition and illustrate that this metric allows for a simple and effective adaptation technique which does not suffer from issues such as data scarcity and computational intensity that affect other adaptation methods such as Maximum a-Posteriori (MAP), Maximum Likelihood Linear Regression (MLLR) and feature-space Maximum Likelihood Linear Regression (fM-LLR). Broad class recognition experiments indicate that the EBW gradient metric method outperforms the standard likelihood technique, both when initial models are adapted via MLLR and without adaptation.(cont.) Next, we explore utilizing broad class knowledge as a pre-processor for segmentbased speech recognition systems, which have been observed to be quite sensitive to noise. The experiments are conducted with the SUMMIT segment-based speech recognizer, which detects landmarks - representing possible transitions between phonemes - from large energy changes in the acoustic signal. These landmarks are often poorly detected in noisy conditions. We investigate using the transitions between broad classes, which typically occur at areas of large acoustic change in the audio signal, to aid in landmark detection. We also explore broad classes motivated along both acoustic and phonetic dimensions. Phonetic recognition experiments indicate that utilizing either phonetically or acoustically motivated broad classes offers significant recognition improvements compared to the baseline landmark method in both stationary and non-stationary noise conditions. Finally, this thesis investigates using broad class knowledge for island-driven search. Reliable regions of a speech signal, known as islands, carry most information in the signal compared to unreliable regions, known as gaps. Most speech recognizers do not differentiate between island and gap regions during search and as a result most of the search computation is spent in unreliable regions. Island-driven search addresses this problem by first identifying islands in the speech signal and directing the search outwards from these islands.(cont.) In this thesis, we develop a technique to identify islands from broad classes which have been confidently identified from the input signal. We explore a technique to prune the search space given island/gap knowledge. Finally, to further limit the amount of computation in unreliable regions, we investigate scoring less detailed broad class models in gap regions and more detailed phonetic models in island regions. Experiments on both small and large scale vocabulary tasks indicate that the island-driven search strategy results in an improvement in recognition accuracy and computation time.by Tara N. Sainath.Ph.D

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems
    corecore