129 research outputs found

    Recurrent neural models and related problems in natural language processing

    Get PDF
    Le réseau de neurones récurrent (RNN) est l’un des plus puissants modèles d’apprentissage automatique spécialis és dans la capture des variations temporelles et des dépendances de données séquentielles. Grâce à la résurgence de l’apprentissage en profondeur au cours de la dernière d écennie, de nombreuses structures RNN innovantes ont été invent ́ees et appliquées à divers problèmes pratiques, en particulier dans le domaine du traitement automatique du langage naturel (TALN). Cette thèse suit une direction similaire, dans laquelle nous proposons de nouvelles perspectives sur les propriétés structurelles des RNN et sur la manière dont les modèles RNN récemment proposés peuvent stimuler le developpement de nouveaux problèmes ouverts en TALN. Cette thèse se compose de deux parties: l’analyse de modèle et le traitement de nouveaux problèmes ouverts. Dans la première partie, nous explorons deux aspects importants des RNN: l’architecture de leurs connexions et les opérations de base dans leurs fonctions de transition. Plus précisément, dans le premier article, nous définissons plusieurs mesures rigoureuses pour évaluer la complexité architecturale de toute architecture récurrente donnée, quelle que soit la topologie du réseau. Des expériences approfondies sur ces mesures démontrent à la fois la validité théorique de celles-ci, et l’importance de guider la conception des architectures RNN. Dans le deuxième article, nous proposons un nouveau module permettant de combiner plusieurs flux d’informations de manière multiplicative dans les fonctions de tran- sition de base des RNN. Il a été démontré empiriquement que les RNN équipés du nouveau module possédaient de meilleures propriétés de gradient et des capacités de généralisation plus grandes sans coûts de calcul et de mémoire supplémentaires. La deuxième partie se concentre sur deux problèmes non résolus de la TALN: comment effectuer un raisonnement avancé à sauts multiples en compréhension de texte machine, et comment incorporer des traits de personnalité dans des systèmes conversationnels. Nous recueillons deux ensembles de données à grande échelle, dans le but de motiver les progrès méthodologiques sur ces deux problèmes. Spécifiquement, dans le troisième article, nous introduisons l'ensemble de données HotpotQA qui contient plus de 113000 paires question-réponse basées sur Wikipedia. La plupart des questions de HotpotQA ne peuvent résolues que par un raisonnement multi-saut précis sur plusieurs documents. Les faits à l'appui néces- saires au raisonnement sont également fournis pour aider le modèle à établir des prédictions explicables. Le quatrième article aborde le problème du manque de personnalité des chatbots. Le jeu de données persona-chat que nous proposons encourage des conversations plus engageantes et cohérentes en conditionnant la personnalité des membres en conversation sur des personnages spécifiques. Nous montrons des modèles de base entraînés sur persona-chat sont capables déxprimer des personnalités cohérentes et de réagir de manière plus captivante en se concentrant sur leurs propres personnages ainsi que ceux de leurs interlocuteurs.The recurrent neural network (RNN) is one of the most powerful machine learning models specialized in capturing temporal variations and dependencies of sequential data. Thanks to the resurgence of deep learning during the past decade, we have witnessed plenty of novel RNN structures being invented and applied to various practical problems especially in the field of natural language processing (NLP). This thesis follows a similar direction, in which we offer new insights about RNNs’ structural properties and how the recently proposed RNN models may stimulate the formation of new open problems in NLP. The scope of this thesis is divided into two parts: model analysis and new open problems. In the first part, we explore two important aspects of RNNs: their connecting architectures and basic operations in their transition functions. Specifically, in the first article, we define several rigorous measurements for evaluating the architectural complexity of any given recurrent architecture with arbitrary network topology. Thoroughgoing experiments on these measurements demonstrate their theoretical validity and utility of guiding the RNN architecture design. In the second article, we propose a novel module to combine different information flows multiplicatively in RNNs’ basic transition functions. RNNs equipped with the new module are empirically showed to have better gradient properties and stronger generalization capacities without extra computational and memory cost. The second part focuses on two open problems in NLP: how to perform advanced multi-hop reasoning in machine reading comprehension and how to encode personalities into chitchat dialogue systems. We collect two different large scale datasets aiming to motivate the methodological progress on these two problems. Particularly, in the third article we introduce HotpotQA dataset containing over 113k Wikipedia based question-answer pairs. Most of the questions in HotpotQA are answerable only through accurate multi-hop reasoning over multiple documents. Supporting facts required for reasoning are also provided to help the model to make explainable predictions. The fourth article tackles the problem of the lack of personality in chatbots. The proposed persona-chat dataset encourages more engaging and consistent conversations by forcing dialog partners conditioning on given personas. We show that baseline models trained on persona-chat are able to express consistent personalities and to respond in more captivating ways by concentrating on personas of both themselves and other interlocutors

    Neural Techniques for German Dependency Parsing

    Get PDF
    Syntactic parsing is the task of analyzing the structure of a sentence based on some predefined formal assumption. It is a key component in many natural language processing (NLP) pipelines and is of great benefit for natural language understanding (NLU) tasks such as information retrieval or sentiment analysis. Despite achieving very high results with neural network techniques, most syntactic parsing research pays attention to only a few prominent languages (such as English or Chinese) or language-agnostic settings. Thus, we still lack studies that focus on just one language and design specific parsing strategies for that language with regards to its linguistic properties. In this thesis, we take German as the language of interest and develop more accurate methods for German dependency parsing by combining state-of-the-art neural network methods with techniques that address the specific challenges posed by the language-specific properties of German. Compared to English, German has richer morphology, semi-free word order, and case syncretism. It is the combination of those characteristics that makes parsing German an interesting and challenging task. Because syntactic parsing is a task that requires many levels of language understanding, we propose to study and improve the knowledge of parsing models at each level in order to improve syntactic parsing for German. These levels are: (sub)word level, syntactic level, semantic level, and sentence level. At the (sub)word level, we look into a surge in out-of-vocabulary words in German data caused by compounding. We propose a new type of embeddings for compounds that is a compositional model of the embeddings of individual components. Our experiments show that character-based embeddings are superior to word and compound embeddings in dependency parsing, and compound embeddings only outperform word embeddings when the part-of-speech (POS) information is unavailable. Thus, we conclude that it is the morpho-syntactic information of unknown compounds, not the semantic one, that is crucial for parsing German. At the syntax level, we investigate challenges for local grammatical function labeler that are caused by case syncretism. In detail, we augment the grammatical function labeling component in a neural dependency parser that labels each head-dependent pair independently with a new labeler that includes a decision history, using Long Short-Term Memory networks (LSTMs). All our proposed models significantly outperformed the baseline on three languages: English, German and Czech. However, the impact of the new models is not the same for all languages: the improvement for English is smaller than for the non-configurational languages (German and Czech). Our analysis suggests that the success of the history-based models is not due to better handling of long dependencies but that they are better in dealing with the uncertainty in head direction. We study the interaction of syntactic parsing with the semantic level via the problem of PP attachment disambiguation. Our motivation is to provide a realistic evaluation of the task where gold information is not available and compare the results of disambiguation systems against the output of a strong neural parser. To our best knowledge, this is the first time that PP attachment disambiguation is evaluated and compared against neural dependency parsing on predicted information. In addition, we present a novel approach for PP attachment disambiguation that uses biaffine attention and utilizes pre-trained contextualized word embeddings as semantic knowledge. Our end-to-end system outperformed the previous pipeline approach on German by a large margin simply by avoiding error propagation caused by predicted information. In the end, we show that parsing systems (with the same semantic knowledge) are in general superior to systems specialized for PP attachment disambiguation. Lastly, we improve dependency parsing at the sentence level using reranking techniques. So far, previous work on neural reranking has been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. We re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). Our proposed reranker not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. Our analysis points out that the failure is due to the lower quality of the k-best lists, where the gold tree ratio and the diversity of the list play an important role

    Neural information extraction from natural language text

    Get PDF
    Natural language processing (NLP) deals with building computational techniques that allow computers to automatically analyze and meaningfully represent human language. With an exponential growth of data in this digital era, the advent of NLP-based systems has enabled us to easily access relevant information via a wide range of applications, such as web search engines, voice assistants, etc. To achieve it, a long-standing research for decades has been focusing on techniques at the intersection of NLP and machine learning. In recent years, deep learning techniques have exploited the expressive power of Artificial Neural Networks (ANNs) and achieved state-of-the-art performance in a wide range of NLP tasks. Being one of the vital properties, Deep Neural Networks (DNNs) can automatically extract complex features from the input data and thus, provide an alternative to the manual process of handcrafted feature engineering. Besides ANNs, Probabilistic Graphical Models (PGMs), a coupling of graph theory and probabilistic methods have the ability to describe causal structure between random variables of the system and capture a principled notion of uncertainty. Given the characteristics of DNNs and PGMs, they are advantageously combined to build powerful neural models in order to understand the underlying complexity of data. Traditional machine learning based NLP systems employed shallow computational methods (e.g., SVM or logistic regression) and relied on handcrafting features which is time-consuming, complex and often incomplete. However, deep learning and neural network based methods have recently shown superior results on various NLP tasks, such as machine translation, text classification, namedentity recognition, relation extraction, textual similarity, etc. These neural models can automatically extract an effective feature representation from training data. This dissertation focuses on two NLP tasks: relation extraction and topic modeling. The former aims at identifying semantic relationships between entities or nominals within a sentence or document. Successfully extracting the semantic relationships greatly contributes in building structured knowledge bases, useful in downstream NLP application areas of web search, question-answering, recommendation engines, etc. On other hand, the task of topic modeling aims at understanding the thematic structures underlying in a collection of documents. Topic modeling is a popular text-mining tool to automatically analyze a large collection of documents and understand topical semantics without actually reading them. In doing so, it generates word clusters (i.e., topics) and document representations useful in document understanding and information retrieval, respectively. Essentially, the tasks of relation extraction and topic modeling are built upon the quality of representations learned from text. In this dissertation, we have developed task-specific neural models for learning representations, coupled with relation extraction and topic modeling tasks in the realms of supervised and unsupervised machine learning paradigms, respectively. More specifically, we make the following contributions in developing neural models for NLP tasks: 1. Neural Relation Extraction: Firstly, we have proposed a novel recurrent neural network based architecture for table-filling in order to jointly perform entity and relation extraction within sentences. Then, we have further extended our scope of extracting relationships between entities across sentence boundaries, and presented a novel dependency-based neural network architecture. The two contributions lie in the supervised paradigm of machine learning. Moreover, we have contributed in building a robust relation extractor constrained by the lack of labeled data, where we have proposed a novel weakly-supervised bootstrapping technique. Given the contributions, we have further explored interpretability of the recurrent neural networks to explain their predictions for the relation extraction task. 2. Neural Topic Modeling: Besides the supervised neural architectures, we have also developed unsupervised neural models to learn meaningful document representations within topic modeling frameworks. Firstly, we have proposed a novel dynamic topic model that captures topics over time. Next, we have contributed in building static topic models without considering temporal dependencies, where we have presented neural topic modeling architectures that also exploit external knowledge, i.e., word embeddings to address data sparsity. Moreover, we have developed neural topic models that incorporate knowledge transfers using both the word embeddings and latent topics from many sources. Finally, we have shown improving neural topic modeling by introducing language structures (e.g., word ordering, local syntactic and semantic information, etc.) that deals with bag-of-words issues in traditional topic models. The class of proposed neural NLP models in this section are based on techniques at the intersection of PGMs, deep learning and ANNs. Here, the task of neural relation extraction employs neural networks to learn representations typically at the sentence level, without access to the broader document context. However, topic models have access to statistical information across documents. Therefore, we advantageously combine the two complementary learning paradigms in a neural composite model, consisting of a neural topic and a neural language model that enables us to jointly learn thematic structures in a document collection via the topic model, and word relations within a sentence via the language model. Overall, our research contributions in this dissertation extend NLP-based systems for relation extraction and topic modeling tasks with state-of-the-art performances

    Code Structure Guided Transformer for Source Code Summarization

    Full text link
    Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this paper, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., data flow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% in terms of METEOR score, a metric widely used for measuring generation quality, respectively on two benchmark datasets

    A Linearization Framework for Dependency and Constituent Trees

    Get PDF
    [Abstract]: Parsing is a core natural language processing problem in which, given an input raw sentence, a model automatically produces a structured output that represents its syntactic structure. The most common formalisms in this field are constituent and dependency parsing. Although both formalisms show differences, they also share limitations, in particular the limited speed of the models to obtain the desired representation, and the lack of a common representation that allows any end-to-end neural system to obtain those models. Transforming both parsing tasks into a sequence labeling task solves both of these problems. Several tree linearizations have been proposed in the last few years, however there is no common suite that facilitates their use under an integrated framework. In this work, we will develop such a system. On the one hand, the system will be able to: (i) encode syntactic trees according to the desired syntactic formalism and linearization function, and (ii) decode linearized trees into their original representation. On the other hand, (iii) we will also train several neural sequence labeling systems to perform parsing from those labels, and we will compare the results.[Resumen]: El análisis sintáctico es una tarea central dentro del procesado del lenguaje natural, en el que dada una oración se produce una salida que representa su estructura sintáctica. Los formalismos más populares son el de constituyentes y el de dependencias. Aunque son fundamentalmente diferentes, tienen ciertas limitaciones en común, como puede ser la lentitud de los modelos empleados para su predicción o la falta de una representación común que permita predecirlos con sistemas neuronales de uso general. Transformar ambos formalismos a una tarea de etiquetado de secuencias permite resolver ambos problemas. Durante los últimos años se han propuesto diferentes maneras de linearizar árboles sintácticos, pero todavía se carecía de un software unificado que permitiese obtener representaciones para ambos formalismos sobre un mismo sistema. En este trabajo se desarrollará dicho sistema. Por un lado, éste permitirá: (i) linearizar árboles sintácticos en el formalismo y función de linearización deseadas y (ii) decodificar árboles linearizados de vuelta a su formato original. Por otro lado, también se entrenarán varios modelos de etiquetado de secuencias, y se compararán los resultados obtenidos.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/202

    Enhancing Word Representation Learning with Linguistic Knowledge

    Get PDF
    Representation learning, the process whereby representations are modelled from data, has recently become a central part of Natural Language Processing (NLP). Among the most widely used learned representations are word embeddings trained on large corpora of unannotated text, where the learned embeddings are treated as general representations that can be used across multiple NLP tasks. Despite their empirical successes, word embeddings learned entirely from data can only capture patterns of language usage from the particular linguistic domain of the training data. Linguistic knowledge, which does not vary among linguistic domains, can potentially be used to address this limitation. The vast sources of linguistic knowledge that are readily available nowadays can help train more general word embeddings (i.e. less affected by distance between linguistic domains) by providing them with such information as semantic relations, syntactic structure, word morphology, etc. In this research, I investigate the different ways in which word embedding models capture and encode words’ semantic and contextual information. To this end, I propose two approaches to integrate linguistic knowledge into the statistical learning of word embeddings. The first approach is based on augmenting the training data for a well-known Skip-gram word embedding model, where synonym information is extracted from a lexical knowledge base and incorporated into the training data in the form of additional training examples. This data augmentation approach seeks to enforce synonym relations in the learned embeddings. The second approach exploits structural information in text by transforming every sentence in the data into its corresponding dependency parse trees and training an autoencoder to recover the original sentence. While learning a mapping from a dependency parse tree to its originating sentence, this novel Structure-to-Sequence (Struct2Seq) model produces word embeddings that contain information about a word’s structural context. Given that the combination of knowledge and statistical methods can often be unpredictable, a central focus of this thesis is on understanding the effects of incorporating linguistic knowledge into word representation learning. Through the use of intrinsic (geometric characteristics) and extrinsic (performance on downstream tasks) evaluation metrics, I aim to measure the specific influence that the injected knowledge can have on different aspects of the informational composition of word embeddings

    Natural language generation as neural sequence learning and beyond

    Get PDF
    Natural Language Generation (NLG) is the task of generating natural language (e.g., English sentences) from machine readable input. In the past few years, deep neural networks have received great attention from the natural language processing community due to impressive performance across different tasks. This thesis addresses NLG problems with deep neural networks from two different modeling views. Under the first view, natural language sentences are modelled as sequences of words, which greatly simplifies their representation and allows us to apply classic sequence modelling neural networks (i.e., recurrent neural networks) to various NLG tasks. Under the second view, natural language sentences are modelled as dependency trees, which are more expressive and allow to capture linguistic generalisations leading to neural models which operate on tree structures. Specifically, this thesis develops several novel neural models for natural language generation. Contrary to many existing models which aim to generate a single sentence, we propose a novel hierarchical recurrent neural network architecture to represent and generate multiple sentences. Beyond the hierarchical recurrent structure, we also propose a means to model context dynamically during generation. We apply this model to the task of Chinese poetry generation and show that it outperforms competitive poetry generation systems. Neural based natural language generation models usually work well when there is a lot of training data. When the training data is not sufficient, prior knowledge for the task at hand becomes very important. To this end, we propose a deep reinforcement learning framework to inject prior knowledge into neural based NLG models and apply it to sentence simplification. Experimental results show promising performance using our reinforcement learning framework. Both poetry generation and sentence simplification are tackled with models following the sequence learning view, where sentences are treated as word sequences. In this thesis, we also explore how to generate natural language sentences as tree structures. We propose a neural model, which combines the advantages of syntactic structure and recurrent neural networks. More concretely, our model defines the probability of a sentence by estimating the generation probability of its dependency tree. At each time step, a node is generated based on the representation of the generated subtree. We show experimentally that this model achieves good performance in language modeling and can also generate dependency trees

    A Survey of Machine Learning for Big Code and Naturalness

    Get PDF
    Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code's abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.Comment: Website accompanying this survey paper can be found at https://ml4code.github.i

    구문론을 활용한 신경망 기반 문장 표현의 학습 및 분석

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 김태욱.구문론(syntax)은 언어학의 한 갈래로써, 자연어 문장의 형성 과정에 내포되어 있 는 원리와 그로 인해 촉발되는 여러 언어적 현상을 규정하고 이를 검증하는 연구 분야를 총칭한다. 구문론은 단어, 구 및 절과 같은 문장 내의 구성 요소로부터 해당 문장의 의미를 점진적으로 구축해 나가는 과정에 대한 체계적인 이론적 절차를 제공하며, 따라서 이는 자연어처리에서 문장 표현 학습 및 분석을 위한 방법론을 구상하는데 있어 활용될 수 있는 잠재성을 지니고 있다. 본 논문에서는 신경망 기반의 문장 표현 방법론을 개발하는 데 있어 구문론을 활용하는 두 측면에 관하여 논한다. 먼저, 언어학적인 파스 트리의 형태로 표현되 어 있거나 혹은 타 신경망 모델의 파라미터에 암시적으로 저장되어 있는 구문론적 지식을 도입하여 더 나은 문장 표현을 만드는 보다 직접적인 방법론을 제시한다. 이에 더하여, 구문론에 바탕한 문법적 체계를 이용하여 학습이 완료된 신경망 기반 문장 표현 모델들의 작동 원리를 규명하고 이들의 개선점을 찾는데 도움을 줄 수 있 는 분석적 접근법 또한 소개한다. 실제 환경에서의 다각적인 실험과 검증을 통하여 규칙 및 통계 기반 자연어처리에서 귀중한 자원으로 간주되었던 구문론이 신경망 기반의 모델이 대중적으로 사용되고 있는 현재의 자연어처리에서도 보완재로써 기능할 수 있음을 보인다. 구체적으로, 구문론이 고성능의 문장 표현을 위한 신경 망 모델 혹은 이를 위한 학습 방법론을 개발하는데 있어 효과적인 직관을 제공할 수 있음을 실증하고, 문장 표현 신경망 모델이 자체적으로 파스 트리를 복원해낼 수 있는 능력을 평가함으로써 구문론을 내부 작동 체계가 불명확한 신경망 모델의 작동 원리에 대한 이해도를 증진시키는 분석 도구로 활용한다.Syntax is a theory in linguistics that deals with the principles underlying the composition of sentences. As this theoretical framework provides formal instructions regarding the procedure of constructing a sentence with its constituents, it has been considered as a valuable reference in sentence representation learning, whose objective is to discover an approach of transforming a sentence into the vector that illustrates its meaning in a computationally tractable manner. This dissertation provides two particular perspectives on harmonizing syntax with neural sentence representation models, especially focusing on constituency grammar. We first propose two methods for enriching the quality of sentence embeddings by exploiting the syntactic knowledge either represented as explicit parse trees or implicitly stored in neural models. Second, we regard syntactic formalism as a lens through which we reveal the inner workings of pre-trained language models which are state-of-the-art in sentence representation learning. With a series of demonstrations in practical scenarios, we show that syntax is useful even in the neural era where the models trained with huge corpora in an end-to-end manner are prevalent, functioning as either (i) a source of inductive biases that facilitate fast and effective learning of such models or (ii) an analytic tool that increases the interpretability of the black-box models.Chapter 1 Introduction 1 1.1 Dissertation Outline 5 1.2 Related Publications 6 Chapter 2 Background 8 2.1 Introduction to Syntax 8 2.2 Neural Networks for Sentence Representations 10 2.2.1 Recursive Neural Network 11 2.2.2 Transformer 12 2.2.3 Pre-trained Language Models 14 2.3 Related Literature 16 2.3.1 Sentence Representation Learning 16 2.3.2 Probing Methods for Neural NLP Models 17 2.3.3 Grammar Induction and Unsupervised Parsing 18 Chapter 3 Sentence Representation Learning with Explicit Syntactic Structure 19 3.1 Introduction 19 3.2 Related Work 21 3.3 Method 23 3.3.1 Tree-LSTM 24 3.3.2 Structure-aware Tag Representation 25 3.3.3 Leaf-LSTM 28 3.3.4 SATA Tree-LSTM 29 3.4 Experiments 31 3.4.1 General Configurations 31 3.4.2 Sentence Classification Tasks 32 3.4.3 Natural Language Inference 35 3.5 Analysis 36 3.5.1 Ablation Study 36 3.5.2 Representation Visualization 38 3.6 Limitations and Future Work 39 3.7 Summary 40 Chapter 4 Sentence Representation Learning with Implicit Syntactic Knowledge 41 4.1 Introduction 41 4.2 Related Work 44 4.3 Method 46 4.3.1 Contrastive Learning with Self-Guidance 47 4.3.2 Learning Objective Optimization 50 4.4 Experiments 52 4.4.1 General Configurations 52 4.4.2 Semantic Textual Similarity Tasks 53 4.4.3 Multilingual STS Tasks 58 4.4.4 SentEval Benchmark 59 4.5 Analysis 60 4.5.1 Ablation Study 60 4.5.2 Robustness to Domain Shifts 61 4.5.3 Computational Efficiency 62 4.5.4 Representation Visualization 63 4.6 Limitations and Future Work 63 4.7 Summary 65 Chapter 5 Syntactic Analysis of Sentence Representation Models 66 5.1 Introduction 66 5.2 Related Work 68 5.3 Motivation 70 5.4 Method 72 5.4.1 CPE-PLM 72 5.4.2 Top-down CPE-PLM 73 5.4.3 Pre-trained Language Models 74 5.4.4 Distance Measure Functions 76 5.4.5 Injecting Bias into Syntactic Distances 77 5.5 Experiments 78 5.5.1 General Configurations 78 5.5.2 Experimental Results on PTB 80 5.5.3 Experimental Results on MNLI 83 5.6 Analysis 85 5.6.1 Performance Comparison by Layer 85 5.6.2 Estimating the Upper Limit of Distance Measure Functions 86 5.6.3 Constituency Tree Examples 88 5.7 Summary 93 Chapter 6 Multilingual Syntactic Analysis with Enhanced Techniques 94 6.1 Introduction 94 6.2 Related work 96 6.3 Method 97 6.3.1 Chart-based CPE-PLM 97 6.3.2 Top-K Ensemble for CPE-PLM 100 6.4 Experiments 100 6.4.1 General Configurations 100 6.4.2 Experiments on Monolingual Settings 102 6.4.3 Experiments on Multilingual Settings 103 6.5 Analysis 106 6.5.1 Factor Correlation Analysis 108 6.5.2 Visualization of Attention Heads 108 6.5.3 Recall Scores on Noun and Verb Phrases 109 6.6 Limitations and Future Work 110 6.7 Summary 111 Chapter 7 Conclusion 112 Bibliography 116 초록 138박

    Beyond Extractive: Advancing Abstractive Automatic Text Summarization in Norwegian with Transformers

    Get PDF
    Automatic summarization is a key area in natural language processing (NLP) and machine learning which attempts to generate informative summaries of articles and documents. Despite its evolution since the 1950s, research on automatically summarising Norwegian text has remained relatively underdeveloped. Though there have been some strides made in extractive systems, which generate summaries by selecting and condensing key phrases directly from the source material, the field of abstractive summarization remains unexplored for the Norwegian language. Abstractive summarization is distinct as it generates summaries incorporating new words and phrases not present in the original text. This Master's thesis revolves around one key question: Is it possible to create a machine learning system capable of performing abstractive summarization in Norwegian? To answer this question, we generate and release the first two Norwegian datasets for creating and evaluating Norwegian summarization models. One of these datasets is a web scrape of Store Norske Leksikon (SNL), and the other is a machine-translated version of CNN/Daily Mail. Using these datasets, we fine-tune two Norwegian T5 language models with 580M and 1.2B parameters to create summaries. To assess the quality of the models, we employed both automatic ROUGE scores and human evaluations on the generated summaries. In an effort to better understand the model's behaviour, we measure how a model generates summaries with various metrics, including our own novel contribution which we name "Match Ratio" which measures sentence similarities between summaries and articles based on Levenshtein distances. The top-performing models achieved ROUGE-1 scores of 35.07 and 34.02 on SNL and CNN/DM, respectively. In terms of human evaluation, the best model yielded an average score of 3.96/5.00 for SNL and 4.64/5.00 for CNN/Daily Mail across various criteria. Based on these results, we conclude that it is possible to perform abstractive summarization of Norwegian with high-quality summaries. With this research, we have laid a foundation that hopefully will facilitate future research, empowering others to build upon our findings and contribute further to the development of Norwegian summarization models