9 research outputs found

    Comparing Czech and English AMRs

    Get PDF
    This paper compares Czech and English annotation using Abstract Meaning Represantation formalism

    Bootstrapping Multilingual AMR with Contextual Word Alignments

    Full text link
    We develop high performance multilingualAbstract Meaning Representation (AMR) sys-tems by projecting English AMR annotationsto other languages with weak supervision. Weachieve this goal by bootstrapping transformer-based multilingual word embeddings, in partic-ular those from cross-lingual RoBERTa (XLM-R large). We develop a novel technique forforeign-text-to-English AMR alignment, usingthe contextual word alignment between En-glish and foreign language tokens. This wordalignment is weakly supervised and relies onthe contextualized XLM-R word embeddings.We achieve a highly competitive performancethat surpasses the best published results forGerman, Italian, Spanish and Chinese

    XL-AMR: Enabling Cross-Lingual AMR Parsing with Transfer Learning Techniques

    Get PDF
    Abstract Meaning Representation (AMR) is a popular formalism of natural language that represents the meaning of a sentence as a semantic graph. It is agnostic about how to derive meanings from strings and for this reason it lends itself well to the encoding of semantics across languages. However, cross-lingual AMR parsing is a hard task, because training data are scarce in languages other than English and the existing English AMR parsers are not directly suited to being used in a cross-lingual setting. In this work we tackle these two problems so as to enable cross-lingual AMR parsing: we explore different transfer learning techniques for producing automatic AMR annotations across languages and develop a cross-lingual AMR parser, XL-AMR. This can be trained on the produced data and does not rely on AMR aligners or source-copy mechanisms as is commonly the case in English AMR parsing. The results of XL-AMR significantly surpass those previously reported in Chinese, German, Italian and Spanish. Finally we provide a qualitative analysis which sheds light on the suitability of AMR across languages. We release XL-AMR at github.com/SapienzaNLP/xl-amr

    A Study Towards Spanish Abstract Meaning Representation

    Get PDF
    Taking into account the increasing attention that researchers of Natural Language Understanding (NLU) and Natural Language Generation (NLG) are paying to Computational Semantics, we analyze the feasibility of annotating Spanish Abstract Meaning Representations. The Abstract Meaning Representation (AMR) project aims to create a large- scale sembank of simple structures that represent unified, complete semantic information contained in English sentences. Although AMR is not destined to be an interlingua, one of its key features is the ability to focus on events rather than on word forms. They do this, for instance, by abstracting away from morpho-syntactic idiosyncrasies. In this thesis, we investigate the requirements to – and we come up with a proposal to – annotate Spanish AMRs, based on the premise that many of these idiosyncrasies mark differences between languages. To our knowledge, this is the first work towards the development of Abstract Meaning Representation for Spanish

    A Study Towards Spanish Abstract Meaning Representation

    Get PDF
    Taking into account the increasing attention that researchers of Natural Language Understanding (NLU) and Natural Language Generation (NLG) are paying to Computational Semantics, we analyze the feasibility of annotating Spanish Abstract Meaning Representations. The Abstract Meaning Representation (AMR) project aims to create a large- scale sembank of simple structures that represent unified, complete semantic information contained in English sentences. Although AMR is not destined to be an interlingua, one of its key features is the ability to focus on events rather than on word forms. They do this, for instance, by abstracting away from morpho-syntactic idiosyncrasies. In this thesis, we investigate the requirements to – and we come up with a proposal to – annotate Spanish AMRs, based on the premise that many of these idiosyncrasies mark differences between languages. To our knowledge, this is the first work towards the development of Abstract Meaning Representation for Spanish

    Understanding and generating language with abstract meaning representation

    Get PDF
    Abstract Meaning Representation (AMR) is a semantic representation for natural language that encompasses annotations related to traditional tasks such as Named Entity Recognition (NER), Semantic Role Labeling (SRL), word sense disambiguation (WSD), and Coreference Resolution. AMR represents sentences as graphs, where nodes represent concepts and edges represent semantic relations between them. Sentences are represented as graphs and not trees because nodes can have multiple incoming edges, called reentrancies. This thesis investigates the impact of reentrancies for parsing (from text to AMR) and generation (from AMR to text). For the parsing task, we showed that it is possible to use techniques from tree parsing and adapt them to deal with reentrancies. To better analyze the quality of AMR parsers, we developed a set of fine-grained metrics and found that state-of-the-art parsers predict reentrancies poorly. Hence we provided a classification of linguistic phenomena causing reentrancies, categorized the type of errors parsers do with respect to reentrancies, and proved that correcting these errors can lead to significant improvements. For the generation task, we showed that neural encoders that have access to reentrancies outperform those who do not, demonstrating the importance of reentrancies also for generation. This thesis also discusses the problem of using AMR for languages other than English. Annotating new AMR datasets for other languages is an expensive process and requires defining annotation guidelines for each new language. It is therefore reasonable to ask whether we can share AMR annotations across languages. We provided evidence that AMR datasets for English can be successfully transferred to other languages: we trained parsers for Italian, Spanish, German, and Chinese to investigate the cross-linguality of AMR. We showed cases where translational divergences between languages pose a problem and cases where they do not. In summary, this thesis demonstrates the impact of reentrancies in AMR as well as providing insights on AMR for languages that do not yet have AMR datasets

    Graph-based Approaches to Text Generation

    Get PDF
    Deep Learning advances have enabled more fluent and flexible text generation. However, while these neural generative approaches were initially successful in tasks such as machine translation, they face problems – such as unfaithfulness to the source, repetition and incoherence – when applied to generation tasks where the input is structured data, such as graphs. Generating text from graph-based data, including Abstract Meaning Representation (AMR) or Knowledge Graphs (KG), is a challenging task due to the inherent difficulty of properly encoding the input graph while maintaining its original semantic structure. Previous work requires linearizing the input graph, which makes it complicated to properly capture the graph structure since the linearized representation weakens structural information by diluting the explicit connectivity, particularly when the graph structure is complex. This thesis makes an attempt to tackle these issues focusing on two major challenges: first, the creation and improvement of neural text generation systems that can better operate when consuming graph-based input data. Second, we examine text-to-text pretrained language models for graph-to-text generation, including multilingual generation, and present possible methods to adapt these models pretrained on natural language to graph-structured data. In the first part of this thesis, we investigate how to directly exploit graph structures for text generation. We develop novel graph-to-text methods with the capability of incorporating the input graph structure into the learned representations, enhancing the quality of the generated text. For AMR-to-text generation, we present a dual encoder, which incorporates different graph neural network methods, to capture complementary perspectives of the AMR graph. Next, we propose a new KG-to-text framework that learns richer contextualized node embeddings, combining global and local node contexts. We thus introduce a parameter-efficient mechanism for inserting the node connections into the Transformer architecture operating with shortest path lengths between nodes, showing strong performance while using considerably fewer parameters. The second part of this thesis focuses on pretrained language models for text generation from graph-based input data. We first examine how encoder-decoder text-to-text pretrained language models perform in various graph-to-text tasks and propose different task-adaptive pretraining strategies for improving their downstream performance. We then propose a novel structure-aware adapter method that allows to directly inject the input graph structure into pretrained models, without updating their parameters and reducing their reliance on specific representations of the graph structure. Finally, we investigate multilingual text generation from AMR structures, developing approaches that can operate in languages beyond English
    corecore