301,091 research outputs found

    Transcription

    Get PDF
    Transcription, the written representation of spoken human language, is vital to any language documentation and revitalization project. This hands-on workshop introduces participants to a variety of different transcription methodologies for various purposes. The workshop primarily focuses two types of transcription: (i) phonetic transcription using the International Phonetic Alphabet and (ii) discourse transcription where elements of well-known transcription systems are introduced (Conversation Analysis – developed at UC Santa Barbara). The workshop emphasizes the nuts and bolts of these transcription systems and how to apply transcription practices to documentation and revitalization projects.2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA Alaska Native Language Cente

    Towards mixed language speech recognition systems

    Get PDF
    Multilingual speech recognition obviously involves numerous research challenges, including common phoneme sets, adaptation on limited amount of training data, as well as mixed language recognition (common in many countries, like Switzerland). In this latter case, it is not even possible to assume that one knows in advance the language being spoken. This is the context and motivation of the present work. We indeed investigate how current state-of-the-art speech recognition systems can be exploited in multilingual environments, where the language (from an assumed set of five possible languages, in our case) is not a priori known during recognition. We combine monolingual systems and extensively develop and compare different features and acoustic models. On SpeechDat(II) datasets, and in the context of isolated words, we show that it is actually possible to approach the performances of monolingual systems even if the identity of the spoken language is not a priori known

    A Graphemic Analysis of an Old English Text: The Parker Manuscript, the Laws of Alfred & Ine

    Get PDF
    This study may be considered an exercise in graphemic analysis. It proceeds from the point of view that writing is an independent manifestation of language. As such, the writing system of a language may be subject to a descriptive analysis based upon methods similar to those used in the analysis of spoken language systems. The purpose of such a description is to determine the distinctive and non -distinctive elements of the system. Chapter V of this study is a graphemic analysis of one section of the Parker Manuscript. This analysis is based upon the principles discussed in Chapter II and follows the specific criteria presented in Chapter IV. Since the writing system of the text is an alphabetic one, Chapter VI indicates, to a limited extent, the relationship or fit of the writing system with the Late West Saxon dialect of Old English, of which the Parker Manuscript is a specimen

    Using question-specific vocabularies to support speech data collection with SALAAM

    Get PDF
    There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping

    Natural Language Generation for dialogue: system survey

    Get PDF
    Many natural language dialogue systems make use of `canned text' for output generation. This approach may be su±cient for dialogues in restricted domains where system utterances are short and simple and use fixed expressions (e.g., slot filling dialogues in the ticket reservation or travel information domain); but for more sophisticated dialogues (e.g., tutoring dialogues) a more advanced generation method is required. In such dialogues, the system utterances should be produced in a context-sensitive fashion, for instance by pronominalising anaphoric references, and by using more or less elaborate wording depending on the state of the dialogue, the expertise of the user, etc. In the case of spoken dialogues, it is very useful if the natural language generation component can provide information that is relevant for determining the prosody of the speech output. Similarly, for use in embodied agents it is useful if the generation component can provide information about the facial and body movements that should accompany the language being produced by the agent. Clearly, it will be extremely di±cult to achieve all this using simple string manipulation, so a more flexible and context-sensitive generation method is required. This report discusses some of the possibilities for the sophisticated generation of system utterances in a dialogue system. The basic assumption is that this task is performed by a separate language generation module, which takes as its input a message specification produced by a dialogue planner and transforms this message into an expression in natural language. Part I of this report provides a general discussion of different methods for performing this task, and outlines some requirements on language generation systems that might be used for this purpose. Part II gives an overview of publicly available language generation systems, and discusses to what extent they meet the previously stated requirements

    PRESENCE: A human-inspired architecture for speech-based human-machine interaction

    No full text
    Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially and performance appears to be asymptotic to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. This paper addresses these issues and presents a novel architecture for speech-based human-machine interaction inspired by recent findings in the neurobiology of living systems. Called PRESENCE-"PREdictive SENsorimotor Control and Emulation" - this new architecture blurs the distinction between the core components of a traditional spoken language dialogue system and instead focuses on a recursive hierarchical feedback control structure. Cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of interaction in which the system has in mind the needs and intentions of a user and a user has in mind the needs and intentions of the system

    Spoken Language Intent Detection using Confusion2Vec

    Full text link
    Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to achieve state-of-the-art results under noisy ASR conditions. Our system reduces classification error rate (CER) by 20.84% and improves robustness by 37.48% (lower CER degradation) relative to the previous state-of-the-art going from clean to noisy transcripts. Improvements are also demonstrated when training the intent detection models on noisy transcripts

    Robust Spoken Language Understanding for House Service Robots

    Get PDF
    Service robotics has been growing significantly in thelast years, leading to several research results and to a numberof consumer products. One of the essential features of theserobotic platforms is represented by the ability of interactingwith users through natural language. Spoken commands canbe processed by a Spoken Language Understanding chain, inorder to obtain the desired behavior of the robot. The entrypoint of such a process is represented by an Automatic SpeechRecognition (ASR) module, that provides a list of transcriptionsfor a given spoken utterance. Although several well-performingASR engines are available off-the-shelf, they operate in a generalpurpose setting. Hence, they may be not well suited in therecognition of utterances given to robots in specific domains. Inthis work, we propose a practical yet robust strategy to re-ranklists of transcriptions. This approach improves the quality of ASRsystems in situated scenarios, i.e., the transcription of roboticcommands. The proposed method relies upon evidences derivedby a semantic grammar with semantic actions, designed tomodel typical commands expressed in scenarios that are specificto human service robotics. The outcomes obtained throughan experimental evaluation show that the approach is able toeffectively outperform the ASR baseline, obtained by selectingthe first transcription suggested by the AS
    • …
    corecore