422 research outputs found

    Multidialectal acoustic modeling: a comparative study

    No full text
    In this paper, multidialectal acoustic modeling based on shar- ing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds between di- alects, and the decision tree structure applied. Proposed systems are tested with Spanish dialects across Spain and Latin Amer- ica. All multidialectal proposed systems improve monodialectal performance using data from another dialect but it is shown that the way to share data is critical. The best combination between similarity measure and tree structure achieves an improvement of 7% over the results obtained with monodialectal systems.Peer ReviewedPostprint (published version

    Context-Dependent Acoustic Modelling for Speech Recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Probabilistic Process Monitoring in Process-Aware Information Systems

    Get PDF
    Complex information systems generate large amount of event logs that represent the states of system dynamics. By monitoring these logs, we can learn the process models that describe the underlying business procedures, predict the future development of the systems, and check whether the process models match the expected ones. Most of the existing process monitoring techniques are derived from the workflow management systems used to cope with the logs generated by systems with deterministic outcomes. In this dissertation, however, I consider novel techniques that handle event log data, monitor system deviations, and infer the development of systems based on probabilistic process models. In particular, I present a novel process monitoring approach based on maximizing the information divergences of the system state dynamics and demonstrate its efficiency in detecting abrupt changes, as well as long-term system deviation. In addition, a new process modeling technique, Classification Tree hidden (semi-) Markov Model (CTHMM), is proposed. I show that CTHMM derived from Classification and Regression Tree and hidden semi-Markov model (HSMM) with hidden system states identified by Classification Tree can help discover and predict relevant system state sequences in temporal-probabilistic manners. The main contributions of this dissertation can be summarized as follows: 1) a new approach used in process monitoring that helps detect anomalies of dynamic systems from the point of views of both system change-point and long-term system deviation; 2) a unique HMM/HSMM learning technique that solves the problem of hidden state splitting and estimates HMM/HSMM parameters simultaneously; 3) a novel temporal-probabilistic process model that generates human-comprehensible IF-THEN system state definitions used to help infer evolutions of discrete dynamic systems

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Optimizing Clinical Assessments in Parkinson's Disease Through the Use of Wearable Sensors and Data Driven Modeling

    Get PDF
    The emergence of motion sensors as a tool that provides objective motor performance data on individuals afflicted with Parkinson's disease offers an opportunity to expand the horizon of clinical care for this neurodegenerative condition. Subjective clinical scales and patient based motor diaries have limited clinometric properties and produce a glimpse rather than continuous real time perspective into motor disability. Furthermore, the expansion of machine learn algorithms is yielding novel classification and probabilistic clinical models that stand to change existing treatment paradigms, refine the application of advance therapeutics, and may facilitate the development and testing of disease modifying agents for this disease. We review the use of inertial sensors and machine learning algorithms in Parkinson's disease

    Data-Driven Enhancement of State Mapping-Based Cross-Lingual Speaker Adaptation

    Get PDF
    The thesis work was motivated by the goal of developing personalized speech-to-speech translation and focused on one of its key component techniques – cross-lingual speaker adaptation for text-to-speech synthesis. A personalized speech-to-speech translator enables a person’s spoken input to be translated into spoken output in another language while maintaining his/her voice identity. Before addressing any technical issues, work in this thesis set out to understand human perception of speaker identity. Listening tests were conducted in order to determine whether people could differentiate between speakers when they spoke different languages. The results demonstrated that differentiating between speakers across languages was an achievable task. However, it was difficult for listeners to differentiate between speakers across both languages and speech types (original recordings versus synthesized samples). The underlying challenge in cross-lingual speaker adaptation is how to apply speaker adaptation techniques when the language of adaptation data is different from that of synthesis models. The main body of the thesis work was devoted to the analysis and improvement of HMM state mapping-based cross-lingual speaker adaptation. Firstly, the effect of unsupervised cross-lingual adaptation was investigated, as it relates to the application scenario of personalized speech-to-speech translation. The comparison of paired supervised and unsupervised systems shows that the performance of unsupervised cross-lingual speaker adaptation is comparable to that of the supervised fashion, even if the average phoneme error rate of the unsupervised systems is around 75%. Then the effect of the language mismatch between synthesis models and adaptation data was investigated. The mismatch is found to transfer undesirable language information from adaptation data to synthesis models, thereby limiting the effectiveness of generating multiple regression class-specific transforms, using larger quantities of adaptation data and estimating adaptation transforms iteratively. Thirdly, in order to tackle the problems caused by the language mismatch, a data-driven adaptation framework using phonological knowledge is proposed. Its basic idea is to group HMM states according to phonological knowledge in a data-driven manner and then to map each state to a phonologically consistent counterpart in a different language. This framework is also applied to regression class tree construction for transform estimation. It is found that the proposed framework alleviates the negative effect of the language mismatch and gives consistent improvement compared to previous state-of-the-art approaches. Finally, a two-layer hierarchical transformation framework is developed, where one layer captures speaker characteristics and the other compensates for the language mismatch. The most appropriate means to construct the hierarchical arrangement of transforms was investigated in an initial study. While early results show some promise, further in-depth investigation is needed to confirm the validity of this hierarchy

    Unsupervised learning for text-to-speech synthesis

    Get PDF
    This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented
    corecore