    Neural Transition-based Parsing of Library Deprecations

    This paper tackles the challenging problem of automating code updates to fix deprecated API usages of open source libraries by analyzing their release notes. Our system employs a three-tier architecture: first, a web crawler service retrieves deprecation documentation from the web; then a specially built parser processes those text documents into tree-structured representations; finally, a client IDE plugin locates and fixes identified deprecated usages of libraries in a given codebase. The focus of this paper in particular is the parsing component. We introduce a novel transition-based parser in two variants: based on a classical feature engineered classifier and a neural tree encoder. To confirm the effectiveness of our method, we gathered and labeled a set of 426 API deprecations from 7 well-known Python data science libraries, and demonstrated our approach decisively outperforms a non-trivial neural machine translation baseline.Comment: 11 pages + references and appendix (14 total). This is an edited version of our rejected submission to ESEC/FSE 2022 to include a citation of our earlier short paper and remove all content pertaining to the demo paper submission currently under review for ICSE 202

    Transition-based Semantic Dependency Parsing with Pointer Networks

    [Abstract]: Transition-based parsers implemented with Pointer Networks have become the new state of the art in dependency parsing, excelling in producing labelled syntactic trees and outperforming graph-based models in this task. In order to further test the capabilities of these powerful neural networks on a harder NLP problem, we propose a transition system that, thanks to Pointer Networks, can straightforwardly produce labelled directed acyclic graphs and perform semantic dependency parsing. In addition, we enhance our approach with deep contextualized word embeddings extracted from BERT. The resulting system not only outperforms all existing transition-based models, but also matches the best fully-supervised accuracy to date on the SemEval 2015 Task 18 datasets among previous state-of-the-art graph-based parsers.This work has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150), from the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and from Xunta de Galicia (ED431B 2017/01, ED431G 2019/01).Xunta de Galicia; ED431B 2017/01Xunta de Galicia; ED431G 2019/0

    Discontinuous grammar as a foreign language

    [Abstract] In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2020/11We acknowledge the European Research Council (ERC), which has funded this research under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150) and the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615), ERDF/ MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), Xunta de Galicia (ED431C 2020/11), and Centro de Investigación de Galicia ‘‘CITIC”, funded by Xunta de Galicia and the European Union (ERDF - Galicia 2014–2020 Program), by grant ED431G 2019/01. Funding for open access charge: Universidade da Coruña/CISUG

    Complex Knowledge Base Question Answering: A Survey

    Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.Comment: 20 pages, 4 tables, 7 figures. arXiv admin note: text overlap with arXiv:2105.1164

    Graph Neural Networks for Natural Language Processing: A Survey

    Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best expressed with a graph structure. As a result, thereis a surge of interests in developing new deep learning techniques on graphs for a large numberof NLP tasks. In this survey, we present a comprehensive overview onGraph Neural Networks(GNNs) for Natural Language Processing. We propose a new taxonomy of GNNs for NLP, whichsystematically organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models. We further introducea large number of NLP applications that are exploiting the power of GNNs and summarize thecorresponding benchmark datasets, evaluation metrics, and open-source codes. Finally, we discussvarious outstanding challenges for making the full use of GNNs for NLP as well as future researchdirections. To the best of our knowledge, this is the first comprehensive overview of Graph NeuralNetworks for Natural Language Processing.Comment: 127 page

    Understanding and generating language with abstract meaning representation

    Abstract Meaning Representation (AMR) is a semantic representation for natural language that encompasses annotations related to traditional tasks such as Named Entity Recognition (NER), Semantic Role Labeling (SRL), word sense disambiguation (WSD), and Coreference Resolution. AMR represents sentences as graphs, where nodes represent concepts and edges represent semantic relations between them. Sentences are represented as graphs and not trees because nodes can have multiple incoming edges, called reentrancies. This thesis investigates the impact of reentrancies for parsing (from text to AMR) and generation (from AMR to text). For the parsing task, we showed that it is possible to use techniques from tree parsing and adapt them to deal with reentrancies. To better analyze the quality of AMR parsers, we developed a set of fine-grained metrics and found that state-of-the-art parsers predict reentrancies poorly. Hence we provided a classification of linguistic phenomena causing reentrancies, categorized the type of errors parsers do with respect to reentrancies, and proved that correcting these errors can lead to significant improvements. For the generation task, we showed that neural encoders that have access to reentrancies outperform those who do not, demonstrating the importance of reentrancies also for generation. This thesis also discusses the problem of using AMR for languages other than English. Annotating new AMR datasets for other languages is an expensive process and requires defining annotation guidelines for each new language. It is therefore reasonable to ask whether we can share AMR annotations across languages. We provided evidence that AMR datasets for English can be successfully transferred to other languages: we trained parsers for Italian, Spanish, German, and Chinese to investigate the cross-linguality of AMR. We showed cases where translational divergences between languages pose a problem and cases where they do not. In summary, this thesis demonstrates the impact of reentrancies in AMR as well as providing insights on AMR for languages that do not yet have AMR datasets