3 research outputs found

    1991 to 2019: The rise of machine interpreting research

    Get PDF
    Concebido como um estudo cienciométrico, este artigo procura compreender o estado da investigação sobre interpretação automática na base de dados IEEE de 1991 a 2019. Os documentos foram analisados considerando uma série de medições como as instituições e países mais proeminentes que investigam a interpretação automática, citação, co-autoria, co-ocorrência de palavras-chave, acoplamento bibliográfico e análise baseada em textos recuperados dos títulos e resumos dos documentos. Através do software VOSviewer e de suas ferramentas de coleta e visualização de dados, a pesquisa sobre interpretação automática no corpus analisado centra-se em três aspectos principais: tecnologias de tradução automática, síntese de voz e língua japonesa.Conceived as a scientometric study, this paper searches for understanding the research status of machine interpreting on the IEEE (Institute of Electrical and Electronics Engineers) database from 1991 to 2019. Documents were analyzed considering a series of measures such as most prominent academic institutions and countries that investigate machine interpreting, citation, co-authorship, keywords co-occurrence, reference coupling, and textual-based analysis retrieved from the documents’ titles and abstracts. Through VOSviewer software and its tools for data collecting and visualization, machine interpreting research in the analyzed corpus focuses on three main concerns: machine translation, speech synthesis, and Japanese language

    Intonation Modelling for Speech Synthesis and Emphasis Preservation

    Get PDF
    Speech-to-speech translation is a framework which recognises speech in an input language, translates it to a target language and synthesises speech in this target language. In such a system, variations in the speech signal which are inherent to natural human speech are lost, as the information goes through the different building blocks of the translation process. The work presented in this thesis addresses aspects of speech synthesis which are lost in traditional speech-to-speech translation approaches. The main research axis of this thesis is the study of prosody for speech synthesis and emphasis preservation. A first investigation of regional accents of spoken French is carried out to understand the sensitivity of native listeners with respect to accented speech synthesis. Listening tests show that standard adaptation methods for speech synthesis are not sufficient for listeners to perceive accentedness. On the other hand, combining adaptation with original prosody allows perception of accents. Addressing the need of a more suitable prosody model, a physiologically plausible intonation model is proposed. Inspired by the command-response model, it has basic components, which can be related to muscle responses to nerve impulses. These components are assumed to be a representation of muscle control of the vocal folds. A motivation for such a model is its theoretical language independence, based on the fact that humans share the same vocal apparatus. An automatic parameter extraction method which integrates a perceptually relevant measure is proposed with the model. This approach is evaluated and compared with the standard command-response model. Two corpora including sentences with emphasised words are presented, in the context of the SIWIS project. The first is a multilingual corpus with speech from multiple speaker; the second is a high quality speech synthesis oriented corpus from a professional speaker. Two broad uses of the model are evaluated. The first shows that it is difficult to predict model parameters; however the second shows that parameters can be transferred in the context of emphasis synthesis. A relation between model parameters and linguistic features such as stress and accent is demonstrated. Similar observations are made between the parameters and emphasis. Following, we investigate the extraction of atoms in emphasised speech and their transfer in neutral speech, which turns out to elicit emphasis perception. Using clustering methods, this is extended to the emphasis of other words, using linguistic context. This approach is validated by listening tests, in the case of English
    corecore