14,690 research outputs found

    Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

    Full text link
    This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in the form of dialog history embedding vectors (so-called h-vectors) and is provided as an additional information to end-to-end SLU models in order to improve the system performance. Three following types of h-vectors are proposed and experimentally evaluated in this paper: (1) supervised-all embeddings predicting bag-of-concepts expected in the answer of the user from the last dialog system response; (2) supervised-freq embeddings focusing on predicting only a selected set of semantic concept (corresponding to the most frequent errors in our experiments); and (3) unsupervised embeddings. Experiments on the MEDIA corpus for the semantic slot filling task demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019

    Guidelines for annotating the LUNA corpus with frame information

    Get PDF
    This document defines the annotation workflow aimed at adding frame information to the LUNA corpus of conversational speech. In particular, it details both the corpus pre-processing steps and the proper annotation process, giving hints about how to choose the frame and the frame element labels. Besides, the description of 20 new domain-specific and language-specific frames is reported. To our knowledge, this is the first attempt to adapt the frame paradigm to dialogs and at the same time to define new frames and frame elements for the specific domain of software/hardware assistance. The technical report is structured as follows: in Section 2 an overview of the FrameNet project is given, while Section 3 introduces the LUNA project and the annotation framework involving the Italian dialogs. Section 4 details the annotation workflow, including the format preparation of the dialog files and the annotation strategy. In Section 5 we discuss the main issues of the annotation of frame information in dialogs and we describe how the standard annotation procedure was changed in order to face such issues. Then, the 20 newly introduced frames are reported in Section 6

    A CEFR- Based Comparison of ELT Curriculum and Course Books used in Turkish and Portuguese Primary Schools

    Get PDF
    This cross-cultural study aims to explore to what extent a macro-level language policy, the Common European Framework of Reference for Languages (CEFR) (CoE, 2001), is implemented at micro-level contexts, more specifically, primary English classrooms in Turkey and Portugal. This study investigated the 3rd and 4th grade course books and the Turkish and Portuguese English language curricula through content analysis and cross-cultural comparison. The course book analysis was carried out with reference to language skills as suggested in the CEFR, intercultural characteristics of the course books, and A1 level descriptors. Results highlight similarities and differences in both countries in terms of the implementation of the CEFR and representation of A1 level descriptors in course book activities in primary English classrooms. Implications refer to the importance of teacher education, preparation of age and inter-culturally appropriate materials for primary levels and necessities for sustainable and consistent language policy and planning

    The Many Functions of Discourse Particles: A Computational Model of Pragmatic Interpretation

    Get PDF
    We present a connectionist model for the interpretation of discourse\ud particles in real dialogues that is based on neuronal\ud principles of categorization (categorical perception, prototype\ud formation, contextual interpretation). It can be shown that\ud discourse particles operate just like other morphological and\ud lexical items with respect to interpretation processes. The description\ud proposed locates discourse particles in an elaborate\ud model of communication which incorporates many different\ud aspects of the communicative situation. We therefore also\ud attempt to explore the content of the category discourse particle.\ud We present a detailed analysis of the meaning assignment\ud problem and show that 80%– 90% correctness for unseen discourse\ud particles can be reached with the feature analysis provided.\ud Furthermore, we show that ‘analogical transfer’ from\ud one discourse particle to another is facilitated if prototypes\ud are computed and used as the basis for generalization. We\ud conclude that the interpretation processes which are a part of\ud the human cognitive system are very similar with respect to\ud different linguistic items. However, the analysis of discourse\ud particles shows clearly that any explanatory theory of language\ud needs to incorporate a theory of communication processes

    Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora

    Get PDF
    International audienceThe PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems' capabilities in terms of robustness and portability across languages and domains. A new test set with some adaptation data is prepared for each case: in Italian as an example of a new language, for ticket reservation as an example of a new domain. Finally the work is complemented by the proposition of a new high level semantic annotation scheme well-suited to dialogue data

    Restrictive highlighting in English: only, just and ALL clefts

    Get PDF

    Annotation graphs as a framework for multidimensional linguistic data analysis

    Full text link
    In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing complex annotation structures incorporating hierarchy and overlap. Here, we motivate and illustrate our approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain specialists, we have constructed a hybrid multi-level annotation for a fragment of the Boston University Radio Speech Corpus which includes the following levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named entity. We show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats. We also show how the approach facilitates substantive comparison of multiple annotations of a single signal based on different theoretical models. The discussion shows how annotation graphs open the door to wide-ranging integration of tools, formats and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational Linguistic

    Передача иноязычных вкраплений в произведениях В. Быкова при их переводе с белорусского языка на русский

    Get PDF
    This article presents the results of the original research, based on the semantic analysis of foreign language inclusions in the narratives “Obelisk” and “Sign of Misfortune” (“Абяліск” and “Знак бяды”) by the Belarusian writer Vasil Bykov and their translation in the Russian interpretations (“Обелиск” and “Знак беды”). Foreign language inclusions are considered as a noticeable feature in V. Bykov’s works that is essential in recreating the cultural environment and in rendering the cultural constituents of the historical events described in the narratives. Their adequate translation into other foreign languages may appear an obvious challenge for a translator who is supposed to select and apply certain strategies when introducing the mentioned inclusions into new interpretations in order to minimize translation losses.В статье представлены результаты оригинального исследования, основанного на семантическом анализе иноязычных вкраплений в произведениях белорусского писателя Василя Быкова «Абяліск» и «Знак бяды» и их переводе на русский язык («Обелиск» и «Знак беды»). Иноязычные вкрапления рассматриваются как отличительная черта произведений В. Быкова, которая необходима для воссоздания культурной среды и передачи культурных составляющих исторических событий, описываемых в повестях. Их адекватный перевод на другие языки может оказаться очевидным вызовом для переводчика, который должен выбрать и применить определенные стратегии при введении упомянутых включений в литературные интерпретации для минимизации потерь при переводе

    Discourse structure and information structure : interfaces and prosodic realization

    Get PDF
    In this paper we review the current state of research on the issue of discourse structure (DS) / information structure (IS) interface. This field has received a lot of attention from discourse semanticists and pragmatists, and has made substantial progress in recent years. In this paper we summarize the relevant studies. In addition, we look at the issue of DS/ISinteraction at a different level—that of phonetics. It is known that both information structure and discourse structure can be realized prosodically, but the issue of phonetic interaction between the prosodic devices they employ has hardly ever been discussed in this context. We think that a proper consideration of this aspect of DS/IS-interaction would enrich our understanding of the phenomenon, and hence we formulate some related research-programmatic positions
    corecore