772 research outputs found

    Discourse-aware neural machine translation

    Get PDF
    Machine translation (MT) models usually translate a text by considering isolated sentences based on a strict assumption that the sentences in a text are independent of one another. However, it is a truism that texts have properties of connectedness that go beyond those of their individual sentences. Disregarding dependencies across sentences will harm translation quality especially in terms of coherence, cohesion, and consistency. Previously, some discourse-aware approaches have been investigated for conventional statistical machine translation (SMT). However, this is a serious obstacle for the state-of-the-art neural machine translation (NMT), which recently has surpassed the performance of SMT. In this thesis, we try to incorporate useful discourse information for enhancing NMT models. More specifically, we conduct research on two main parts: 1) exploiting novel document-level NMT architecture; and 2) dealing with a specific discourse phenomenon for translation models. Firstly, we investigate the influence of historical contextual information on the perfor- mance of NMT models. A cross-sentence context-aware NMT model is proposed to consider the influence of previous sentences in the same document. Specifically, this history is summarized using an additional hierarchical encoder. The historical representations are then integrated into the standard NMT model in different strategies. Experimental results on a Chinese–English document-level translation task show that the approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points. In addition, analysis and comparison also give insightful discussions and conclusions for this research direction. Secondly, we explore the impact of discourse phenomena on the performance of MT. In this thesis, we focus on the phenomenon of pronoun-dropping (pro-drop), where, in pro-drop languages, pronouns can be omitted when it is possible to infer the referent from the context. As the data for training a dropped pronoun (DP) generator is scarce, we propose to automatically annotate DPs using alignment information from a large parallel corpus. We then introduce a hybrid approach: building a neural-based DP generator and integrating it into the SMT model. Experimental results on both Chinese–English and Japanese–English translation tasks demonstrate that our approach achieves a significant improvement of up to +1.58 BLEU points with 66% F-score for DP generation accuracy. Motivated by this promising result, we further exploit the DP translation approach for advanced NMT models. A novel reconstruction-based model is proposed to reconstruct the DP-annotated source sentence from the hidden states of either encoder or decoder, or both components. Experimental results on the same translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is trained on DP-annotated parallel data. To avoid the errors propagated from an external DP prediction model, we finally investigate an end-to-end DP translation model. Specifically, we improve the reconstruction-based model from three perspectives. We first employ a shared reconstructor to better exploit encoder and decoder representations. Secondly, we propose to jointly learn to translate and predict DPs. In order to capture discourse information for DP prediction, we finally combine the hierarchical encoder with the DP translation model. Experimental results on the same translation tasks show that our approach significantly improves both translation performance and DP prediction accuracy

    Context-Aware Self-Attention Networks

    Full text link
    Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which have proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculates the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness and universality of the proposed methods. Furthermore, we conducted extensive analyses to quantity how the context vectors participate in the self-attention model.Comment: AAAI 201

    Formality Style Transfer Within and Across Languages with Limited Supervision

    Get PDF
    While much natural language processing work focuses on analyzing language content, language style also conveys important information about the situational context and purpose of communication. When editing an article, professional editors take into account the target audience to select appropriate word choice and grammar. Similarly, professional translators translate documents for a specific audience and often ask what is the expected tone of the content when taking a translation job. Computational models of natural language should consider both their meaning and style. Controlling style is an emerging research area in text rewriting and is under-investigated in machine translation. In this dissertation, we present a new perspective which closely connects formality transfer and machine translation: we aim to control style in language generation with a focus on rewriting English or translating French to English with a desired formality. These are challenging tasks because annotated examples of style transfer are only available in limited quantities. We first address this problem by inducing a lexical formality model based on word embeddings and a small number of representative formal and informal words. This enables us to assign sentential formality scores and rerank translation hypotheses whose formality scores are closer to user-provided formality level. To capture broader formality changes, we then turn to neural sequence to sequence models. Joint modeling of formality transfer and machine translation enables formality control in machine translation without dedicated training examples. Along the way, we also improve low-resource neural machine translation

    Transformer-based NMT : modeling, training and implementation

    Get PDF
    International trade and industrial collaborations enable countries and regions to concentrate their developments on specific industries while making the most of other countries' specializations, which significantly accelerates global development. However, globalization also increases the demand for cross-region communication. Language barriers between many languages worldwide create a challenge for achieving deep collaboration between groups speaking different languages, increasing the need for translation. Language technology, specifically, Machine Translation (MT) holds the promise to enable communication between languages efficiently in real-time with minimal costs. Even though nowadays computers can perform computation in parallel very fast, which provides machine translation users with translations with very low latency, and although the evolution from Statistical Machine Translation (SMT) to Neural Machine Translation (NMT) with the utilization of advanced deep learning algorithms has significantly boosted translation quality, current machine translation algorithms are still far from accurately translating all input. Thus, how to further improve the performance of state-of-the-art NMT algorithm remains a valuable open research question which has received a wide range of attention. In the research presented in this thesis, we first investigate the long-distance relation modeling ability of the state-of-the-art NMT model, the Transformer. We propose to learn source phrase representations and incorporate them into the Transformer translation model, aiming to enhance its ability to capture long-distance dependencies well. Second, though previous work (Bapna et al., 2018) suggests that deep Transformers have difficulty in converging, we empirically find that the convergence of deep Transformers depends on the interaction between the layer normalization and residual connections employed to stabilize its training. We conduct a theoretical study about how to ensure the convergence of Transformers, especially for deep Transformers, and propose to ensure the convergence of deep Transformers by putting the Lipschitz constraint on its parameter initialization. Finally, we investigate how to dynamically determine proper and efficient batch sizes during the training of the Transformer model. We find that the gradient direction gets stabilized with increasing batch size during gradient accumulation. Thus we propose to dynamically adjust batch sizes during training by monitoring the gradient direction change within gradient accumulation, and to achieve a proper and efficient batch size by stopping the gradient accumulation when the gradient direction starts to fluctuate. For our research in this thesis, we also implement our own NMT toolkit, the Neutron implementation of the Transformer and its variants. In addition to providing fundamental features as the basis of our implementations for the approaches presented in this thesis, we support many advanced features from recent cutting-edge research work. Implementations of all our approaches in this thesis are also included and open-sourced in the toolkit. To compare with previous approaches, we mainly conducted our experiments on the data from the WMT 14 English to German (En-De) and English to French (En-Fr) news translation tasks, except when studying the convergence of deep Transformers, where we alternated the WMT 14 En-Fr task with the WMT 15 Czech to English (Cs-En) news translation task to compare with Bapna et al. (2018). The sizes of these datasets vary from medium (the WMT 14 En-De, ~ 4.5M sentence pairs) to very large (the WMT 14 En-Fr, ~ 36M sentence pairs), thus we suggest our approaches help improve the translation quality between popular language pairs which are widely used and have sufficient data.China Scholarship Counci

    Doctor of Philosophy

    Get PDF
    dissertationThis dissertation provides a detailed case study of language endangerment-induced language change in the Shoshone community of Duck Valley, a Shoshone and Paiute reservation where the native languages have lost ground to English significantly over the past decades. The analysis incorporates factors from individual speaker backgrounds and sociolinguistic histories in determining language endangerment-induced language change and variation and indicates how these variables show that extensive contact with English and decreasing use of Shoshone have led to structural simplification in the nominal morphological features examined in this study. This dissertation examines structural changes in Shoshone, a language whose community is undergoing language shift. Shoshone is an endangered Uto-Aztecan language with approximately 2,000 speakers spoken in the Great Basin area: Nevada, Utah, Idaho, and Wyoming. There is a large dialect chain of Shoshone speakers over this geographical area. Miller conducted extensive fieldwork throughout the Shoshone-speaking community during the 1960s and 70s. His vast collection of texts and grammatical work, the Wick R. Miller Collection (WRMC), is housed at the University of Utah. The availability of this collection for comparison with present-day data that I collected during a three-week field trip to the reservation provides a picture of language change over a period where the language has gone from relatively viable to extremely endangered. This analysis focuses on nominal morphology motivated by the existence of crosslinguistically marked features, the anecdotal or preliminary observations of changes in progress, and to build on the findings of previous case studies into endangered language nominal morphology. Descriptive analysis of each feature is presented with examples from present-day speakers of various language proficiency levels and compared with the texts from the WRMC. The features discussed are the accusative case allomorphy, number marking and agreement, and proximity indication in demonstrative stems. The analysis of the sociolinguistic data on language use, gender, age, and social network indicates that these factors interact with age being the most relevant factor in retention or innovation in the features presented here

    음성언어 이해에서의 중의성 해소

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 김남수.언어의 중의성은 필연적이다. 그것은 언어가 의사 소통의 수단이지만, 모든 사람이 생각하는 어떤 개념이 완벽히 동일하게 전달될 수 없는 것에 기인한다. 이는 필연적인 요소이기도 하지만, 언어 이해에서 중의성은 종종 의사 소통의 단절이나 실패를 가져오기도 한다. 언어의 중의성에는 다양한 층위가 존재한다. 하지만, 모든 상황에서 중의성이 해소될 필요는 없다. 태스크마다, 도메인마다 다른 양상의 중의성이 존재하며, 이를 잘 정의하고 해소될 수 있는 중의성임을 파악한 후 중의적인 부분 간의 경계를 잘 정하는 것이 중요하다. 본고에서는 음성 언어 처리, 특히 의도 이해에 있어 어떤 양상의 중의성이 발생할 수 있는지 알아보고, 이를 해소하기 위한 연구를 진행한다. 이러한 현상은 다양한 언어에서 발생하지만, 그 정도 및 양상은 언어에 따라서 다르게 나타나는 경우가 많다. 우리의 연구에서 주목하는 부분은, 음성 언어에 담긴 정보량과 문자 언어의 정보량 차이로 인해 중의성이 발생하는 경우들이다. 본 연구는 운율(prosody)에 따라 문장 형식 및 의도가 다르게 표현되는 경우가 많은 한국어를 대상으로 진행된다. 한국어에서는 다양한 기능이 있는(multi-functional한) 종결어미(sentence ender), 빈번한 탈락 현상(pro-drop), 의문사 간섭(wh-intervention) 등으로 인해, 같은 텍스트가 여러 의도로 읽히는 현상이 발생하곤 한다. 이것이 의도 이해에 혼선을 가져올 수 있다는 데에 착안하여, 본 연구에서는 이러한 중의성을 먼저 정의하고, 중의적인 문장들을 감지할 수 있도록 말뭉치를 구축한다. 의도 이해를 위한 말뭉치를 구축하는 과정에서 문장의 지향성(directivity)과 수사성(rhetoricalness)이 고려된다. 이것은 음성 언어의 의도를 서술, 질문, 명령, 수사의문문, 그리고 수사명령문으로 구분하게 하는 기준이 된다. 본 연구에서는 기록된 음성 언어(spoken language)를 충분히 높은 일치도(kappa = 0.85)로 주석한 말뭉치를 이용해, 음성이 주어지지 않은 상황에서 중의적인 텍스트를 감지하는 데에 어떤 전략 혹은 언어 모델이 효과적인가를 보이고, 해당 태스크의 특징을 정성적으로 분석한다. 또한, 우리는 텍스트 층위에서만 중의성에 접근하지 않고, 실제로 음성이 주어진 상황에서 중의성 해소(disambiguation)가 가능한지를 알아보기 위해, 텍스트가 중의적인 발화들만으로 구성된 인공적인 음성 말뭉치를 설계하고 다양한 집중(attention) 기반 신경망(neural network) 모델들을 이용해 중의성을 해소한다. 이 과정에서 모델 기반 통사적/의미적 중의성 해소가 어떠한 경우에 가장 효과적인지 관찰하고, 인간의 언어 처리와 어떤 연관이 있는지에 대한 관점을 제시한다. 본 연구에서는 마지막으로, 위와 같은 절차로 의도 이해 과정에서의 중의성이 해소되었을 경우, 이를 어떻게 산업계 혹은 연구 단에서 활용할 수 있는가에 대한 간략한 로드맵을 제시한다. 텍스트에 기반한 중의성 파악과 음성 기반의 의도 이해 모듈을 통합한다면, 오류의 전파를 줄이면서도 효율적으로 중의성을 다룰 수 있는 시스템을 만들 수 있을 것이다. 이러한 시스템은 대화 매니저(dialogue manager)와 통합되어 간단한 대화(chit-chat)가 가능한 목적 지향 대화 시스템(task-oriented dialogue system)을 구축할 수도 있고, 단일 언어 조건(monolingual condition)을 넘어 음성 번역에서의 에러를 줄이는 데에 활용될 수도 있다. 우리는 본고를 통해, 운율에 민감한(prosody-sensitive) 언어에서 의도 이해를 위한 중의성 해소가 가능하며, 이를 산업 및 연구 단에서 활용할 수 있음을 보이고자 한다. 본 연구가 다른 언어 및 도메인에서도 고질적인 중의성 문제를 해소하는 데에 도움이 되길 바라며, 이를 위해 연구를 진행하는 데에 활용된 리소스, 결과물 및 코드들을 공유함으로써 학계의 발전에 이바지하고자 한다.Ambiguity in the language is inevitable. It is because, albeit language is a means of communication, a particular concept that everyone thinks of cannot be conveyed in a perfectly identical manner. As this is an inevitable factor, ambiguity in language understanding often leads to breakdown or failure of communication. There are various hierarchies of language ambiguity. However, not all ambiguity needs to be resolved. Different aspects of ambiguity exist for each domain and task, and it is crucial to define the boundary after recognizing the ambiguity that can be well-defined and resolved. In this dissertation, we investigate the types of ambiguity that appear in spoken language processing, especially in intention understanding, and conduct research to define and resolve it. Although this phenomenon occurs in various languages, its degree and aspect depend on the language investigated. The factor we focus on is cases where the ambiguity comes from the gap between the amount of information in the spoken language and the text. Here, we study the Korean language, which often shows different sentence structures and intentions depending on the prosody. In the Korean language, a text is often read with multiple intentions due to multi-functional sentence enders, frequent pro-drop, wh-intervention, etc. We first define this type of ambiguity and construct a corpus that helps detect ambiguous sentences, given that such utterances can be problematic for intention understanding. In constructing a corpus for intention understanding, we consider the directivity and rhetoricalness of a sentence. They make up a criterion for classifying the intention of spoken language into a statement, question, command, rhetorical question, and rhetorical command. Using the corpus annotated with sufficiently high agreement on a spoken language corpus, we show that colloquial corpus-based language models are effective in classifying ambiguous text given only textual data, and qualitatively analyze the characteristics of the task. We do not handle ambiguity only at the text level. To find out whether actual disambiguation is possible given a speech input, we design an artificial spoken language corpus composed only of ambiguous sentences, and resolve ambiguity with various attention-based neural network architectures. In this process, we observe that the ambiguity resolution is most effective when both textual and acoustic input co-attends each feature, especially when the audio processing module conveys attention information to the text module in a multi-hop manner. Finally, assuming the case that the ambiguity of intention understanding is resolved by proposed strategies, we present a brief roadmap of how the results can be utilized at the industry or research level. By integrating text-based ambiguity detection and speech-based intention understanding module, we can build a system that handles ambiguity efficiently while reducing error propagation. Such a system can be integrated with dialogue managers to make up a task-oriented dialogue system capable of chit-chat, or it can be used for error reduction in multilingual circumstances such as speech translation, beyond merely monolingual conditions. Throughout the dissertation, we want to show that ambiguity resolution for intention understanding in prosody-sensitive language can be achieved and can be utilized at the industry or research level. We hope that this study helps tackle chronic ambiguity issues in other languages ​​or other domains, linking linguistic science and engineering approaches.1 Introduction 1 1.1 Motivation 2 1.2 Research Goal 4 1.3 Outline of the Dissertation 5 2 Related Work 6 2.1 Spoken Language Understanding 6 2.2 Speech Act and Intention 8 2.2.1 Performatives and statements 8 2.2.2 Illocutionary act and speech act 9 2.2.3 Formal semantic approaches 11 2.3 Ambiguity of Intention Understanding in Korean 14 2.3.1 Ambiguities in language 14 2.3.2 Speech act and intention understanding in Korean 16 3 Ambiguity in Intention Understanding of Spoken Language 20 3.1 Intention Understanding and Ambiguity 20 3.2 Annotation Protocol 23 3.2.1 Fragments 24 3.2.2 Clear-cut cases 26 3.2.3 Intonation-dependent utterances 28 3.3 Data Construction . 32 3.3.1 Source scripts 32 3.3.2 Agreement 32 3.3.3 Augmentation 33 3.3.4 Train split 33 3.4 Experiments and Results 34 3.4.1 Models 34 3.4.2 Implementation 36 3.4.3 Results 37 3.5 Findings and Summary 44 3.5.1 Findings 44 3.5.2 Summary 45 4 Disambiguation of Speech Intention 47 4.1 Ambiguity Resolution 47 4.1.1 Prosody and syntax 48 4.1.2 Disambiguation with prosody 50 4.1.3 Approaches in SLU 50 4.2 Dataset Construction 51 4.2.1 Script generation 52 4.2.2 Label tagging 54 4.2.3 Recording 56 4.3 Experiments and Results 57 4.3.1 Models 57 4.3.2 Results 60 4.4 Summary 63 5 System Integration and Application 65 5.1 System Integration for Intention Identification 65 5.1.1 Proof of concept 65 5.1.2 Preliminary study 69 5.2 Application to Spoken Dialogue System 75 5.2.1 What is 'Free-running' 76 5.2.2 Omakase chatbot 76 5.3 Beyond Monolingual Approaches 84 5.3.1 Spoken language translation 85 5.3.2 Dataset 87 5.3.3 Analysis 94 5.3.4 Discussion 95 5.4 Summary 100 6 Conclusion and Future Work 103 Bibliography 105 Abstract (In Korean) 124 Acknowledgment 126박

    Supervision distante pour l'apprentissage de structures discursives dans les conversations multi-locuteurs

    Get PDF
    L'objectif principal de cette thèse est d'améliorer l'inférence automatique pour la modélisation et la compréhension des communications humaines. En particulier, le but est de faciliter considérablement l'analyse du discours afin d'implémenter, au niveau industriel, des outils d'aide à l'exploration des conversations. Il s'agit notamment de la production de résumés automatiques, de recommandations, de la détection des actes de dialogue, de l'identification des décisions, de la planification et des relations sémantiques entre les actes de dialogue afin de comprendre les dialogues. Dans les conversations à plusieurs locuteurs, il est important de comprendre non seulement le sens de l'énoncé d'un locuteur et à qui il s'adresse, mais aussi les relations sémantiques qui le lient aux autres énoncés de la conversation et qui donnent lieu à différents fils de discussion. Une réponse doit être reconnue comme une réponse à une question particulière ; un argument, comme un argument pour ou contre une proposition en cours de discussion ; un désaccord, comme l'expression d'un point de vue contrasté par rapport à une autre idée déjà exprimée. Malheureusement, les données de discours annotées à la main et de qualités sont coûteuses et prennent du temps, et nous sommes loin d'en avoir assez pour entraîner des modèles d'apprentissage automatique traditionnels, et encore moins des modèles d'apprentissage profond. Il est donc nécessaire de trouver un moyen plus efficace d'annoter en structures discursives de grands corpus de conversations multi-locuteurs, tels que les transcriptions de réunions ou les chats. Un autre problème est qu'aucune quantité de données ne sera suffisante pour permettre aux modèles d'apprentissage automatique d'apprendre les caractéristiques sémantiques des relations discursives sans l'aide d'un expert ; les données sont tout simplement trop rares. Les relations de longue distance, dans lesquelles un énoncé est sémantiquement connecté non pas à l'énoncé qui le précède immédiatement, mais à un autre énoncé plus antérieur/tôt dans la conversation, sont particulièrement difficiles et rares, bien que souvent centrales pour la compréhension. Notre objectif dans cette thèse a donc été non seulement de concevoir un modèle qui prédit la structure du discours pour une conversation multipartite sans nécessiter de grandes quantités de données annotées manuellement, mais aussi de développer une approche qui soit transparente et explicable afin qu'elle puisse être modifiée et améliorée par des experts.The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well
    corecore