14,690 research outputs found
Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems
This work investigates the embeddings for representing dialog history in
spoken language understanding (SLU) systems. We focus on the scenario when the
semantic information is extracted directly from the speech signal by means of a
single end-to-end neural network model. We proposed to integrate dialogue
history into an end-to-end signal-to-concept SLU system. The dialog history is
represented in the form of dialog history embedding vectors (so-called
h-vectors) and is provided as an additional information to end-to-end SLU
models in order to improve the system performance. Three following types of
h-vectors are proposed and experimentally evaluated in this paper: (1)
supervised-all embeddings predicting bag-of-concepts expected in the answer of
the user from the last dialog system response; (2) supervised-freq embeddings
focusing on predicting only a selected set of semantic concept (corresponding
to the most frequent errors in our experiments); and (3) unsupervised
embeddings. Experiments on the MEDIA corpus for the semantic slot filling task
demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019
Guidelines for annotating the LUNA corpus with frame information
This document defines the annotation workflow aimed at adding frame information to the LUNA corpus of conversational speech. In particular, it details both the corpus pre-processing steps and the proper annotation process, giving hints about how to choose the frame and the frame element labels. Besides, the description of 20 new domain-specific and language-specific frames is reported. To our knowledge, this is the first attempt to adapt the frame paradigm to dialogs and at the same time to define new frames and frame elements for the specific domain of software/hardware assistance. The technical report is structured as follows: in Section 2 an overview of the FrameNet project is given, while Section 3 introduces the LUNA project and the annotation framework involving the Italian dialogs. Section 4 details the annotation workflow, including the format preparation of the dialog files and the annotation strategy. In Section 5 we discuss the main issues of the annotation of frame information in dialogs and we describe how the standard annotation procedure was changed in order to face such issues. Then, the 20 newly introduced frames are reported in Section 6
A CEFR- Based Comparison of ELT Curriculum and Course Books used in Turkish and Portuguese Primary Schools
This cross-cultural study aims to explore to what extent a macro-level language policy, the Common European Framework of Reference for Languages (CEFR) (CoE, 2001), is implemented at micro-level contexts, more specifically, primary English classrooms in Turkey and Portugal. This study investigated the 3rd and 4th grade course books and the Turkish and Portuguese English language curricula through content analysis and cross-cultural comparison. The course book analysis was carried out with reference to language skills as suggested in the CEFR, intercultural characteristics of the course books, and A1 level descriptors. Results highlight similarities and differences in both countries in terms of the implementation of the CEFR and representation of A1 level descriptors in course book activities in primary English classrooms. Implications refer to the importance of teacher education, preparation of age and inter-culturally appropriate materials for primary levels and necessities for sustainable and consistent language policy and planning
The Many Functions of Discourse Particles: A Computational Model of Pragmatic Interpretation
We present a connectionist model for the interpretation of discourse\ud
particles in real dialogues that is based on neuronal\ud
principles of categorization (categorical perception, prototype\ud
formation, contextual interpretation). It can be shown that\ud
discourse particles operate just like other morphological and\ud
lexical items with respect to interpretation processes. The description\ud
proposed locates discourse particles in an elaborate\ud
model of communication which incorporates many different\ud
aspects of the communicative situation. We therefore also\ud
attempt to explore the content of the category discourse particle.\ud
We present a detailed analysis of the meaning assignment\ud
problem and show that 80%– 90% correctness for unseen discourse\ud
particles can be reached with the feature analysis provided.\ud
Furthermore, we show that ‘analogical transfer’ from\ud
one discourse particle to another is facilitated if prototypes\ud
are computed and used as the basis for generalization. We\ud
conclude that the interpretation processes which are a part of\ud
the human cognitive system are very similar with respect to\ud
different linguistic items. However, the analysis of discourse\ud
particles shows clearly that any explanatory theory of language\ud
needs to incorporate a theory of communication processes
Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora
International audienceThe PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems' capabilities in terms of robustness and portability across languages and domains. A new test set with some adaptation data is prepared for each case: in Italian as an example of a new language, for ticket reservation as an example of a new domain. Finally the work is complemented by the proposition of a new high level semantic annotation scheme well-suited to dialogue data
Annotation graphs as a framework for multidimensional linguistic data analysis
In recent work we have presented a formal framework for linguistic annotation
based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet
powerful method for representing complex annotation structures incorporating
hierarchy and overlap. Here, we motivate and illustrate our approach using
discourse-level annotations of text and speech data drawn from the CALLHOME,
COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain
specialists, we have constructed a hybrid multi-level annotation for a fragment
of the Boston University Radio Speech Corpus which includes the following
levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named
entity. We show how annotation graphs can represent hybrid multi-level
structures which derive from a diverse set of file formats. We also show how
the approach facilitates substantive comparison of multiple annotations of a
single signal based on different theoretical models. The discussion shows how
annotation graphs open the door to wide-ranging integration of tools, formats
and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse
Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational
Linguistic
Передача иноязычных вкраплений в произведениях В. Быкова при их переводе с белорусского языка на русский
This article presents the results of the original research, based on the semantic analysis of foreign language inclusions in the narratives “Obelisk” and “Sign of Misfortune” (“Абяліск” and “Знак бяды”) by the Belarusian writer Vasil Bykov and their translation in the Russian interpretations (“Обелиск” and “Знак беды”). Foreign language inclusions are considered as a noticeable feature in V. Bykov’s works that is essential in recreating the cultural environment and in rendering the cultural constituents of the historical events described in the narratives. Their adequate translation into other foreign languages may appear an obvious challenge for a translator who is supposed to select and apply certain strategies when introducing the mentioned inclusions into new interpretations in order to minimize translation losses.В статье представлены результаты оригинального исследования, основанного на семантическом анализе иноязычных вкраплений в произведениях белорусского писателя Василя Быкова «Абяліск» и «Знак бяды» и их переводе на русский язык («Обелиск» и «Знак беды»). Иноязычные вкрапления рассматриваются как отличительная черта произведений В. Быкова, которая необходима для воссоздания культурной среды и передачи культурных составляющих исторических событий, описываемых в повестях. Их адекватный перевод на другие языки может оказаться очевидным вызовом для переводчика, который должен выбрать и применить определенные стратегии при введении упомянутых включений в литературные интерпретации для минимизации потерь при переводе
Discourse structure and information structure : interfaces and prosodic realization
In this paper we review the current state of research on the issue of discourse structure (DS) / information structure (IS) interface. This field has received a lot of attention from discourse semanticists and pragmatists, and has made substantial progress in recent years. In this paper we summarize the relevant studies. In addition, we look at the issue of DS/ISinteraction at a different level—that of phonetics. It is known that both information structure and discourse structure can be realized prosodically, but the issue of phonetic interaction between the prosodic devices they employ has hardly ever been discussed in this context. We think that a proper consideration of this aspect of DS/IS-interaction would enrich our understanding of the phenomenon, and hence we formulate some related research-programmatic positions
- …