245 research outputs found

    RobertNLP at the IWPT 2021 shared task: simple enhanced UD parsing for 17 languages

    Get PDF
    This paper presents our multilingual dependency parsing system as used in the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. Our system consists of an unfactorized biaffine classifier that operates directly on fine-tuned XLM-R embeddings and generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens. To avoid sparsity issues resulting from lexicalized dependency labels, we replace lexical items in relations with placeholders at training and prediction time, later retrieving them from the parse via a hybrid rule-based/machine-learning system. In addition, we utilize model ensembling at prediction time. Our system achieves high parsing accuracy on the blind test data, ranking 3rd out of 9 with an average ELAS F1 score of 86.97

    Lexicalizing DBpedia with Realization Enabled Ensemble Architecture: RealText lex2 Approach

    Get PDF
    Abstract. DBpedia encodes massive amounts of open domain knowledge and is growing by accumulating more triples at the same rate as Wikipedia. However, the applications often require natural language formulations of these triples to present the information as a natural text. The RealText lex2 framework offers a scalable platform to transform these triples to natural language sentences using lexicalization patterns. The framework has evolved from its previous version (RealText lex ) and is comprised of four lexicalization pattern mining modules which derive patterns from a training triple collection. These patterns can be then applied on the new triples given that they satisfy a defined set of constraints

    Extending the Inter-Lingual-Index with new concepts

    Get PDF

    A Lexical Description of English for Architecture: A Corpus-based Approach

    Get PDF
    Every knowledge community has a distinct type of discourse and a linguistic identity which brings together the ideas of that discipline. These are expressed through characteristic linguistic realizations which are of considerable interest in the study of English for Specific Purposes (ESP) from many different perspectives. Despite the fact that ESP is a recent area of linguistic research, there is already a varied literature on academic and professional languages: English for law, business, computer and technology, advertising, marketing and engineering, just to mention a few. According to Dudley-Evans (1998:19), the development of ESP arose as a result of general improvements in the world economy in the 1960’s, along with the expansion of science and technology. Other relevant factors were the growing use of English as the international language of science, technology and business, and the increasing flow of exchange students to and from the UK, US and Australia

    Neural Greedy Constituent Parsing with Dynamic Oracles

    Get PDF
    International audienceDynamic oracle training has shown substantial improvements for dependency parsing in various settings, but has not been explored for constituent parsing. The present article introduces a dynamic oracle for transition-based constituent parsing. Experiments on the 9 languages of the SPMRL dataset show that a neural greedy parser with morphological features , trained with a dynamic oracle, leads to accuracies comparable with the best non-reranking and non-ensemble parsers

    Delving into the uncharted territories of Word Sense Disambiguation

    Get PDF
    The automatic disambiguation of word senses, i.e. Word Sense Disambiguation, is a long-standing task in the field of Natural Language Processing; an AI-complete problem that took its first steps more than half a century ago, and which, to date, has apparently attained human-like performances on standard evaluation benchmarks. Unfortunately, the steady evolution that the task experienced over time in terms of sheer performance has not been followed hand in hand by adequate theoretical support, nor by careful error analysis. Furthermore, we believe that the lack of an exhaustive bird’s eye view which accounts for the sort of high-end and unrealistic computational architectures that systems will soon need in order to further refine their performances could lead the field to a dead angle in a few years. In essence, taking advantage of the current moment of great accomplishments and renewed interest in the task, we argue that Word Sense Disambiguation is mature enough for researchers to really observe the extent of the results hitherto obtained, evaluate what is actually missing, and answer the much sought for question: “are current state-of-the-art systems really able to effectively solve lexical ambiguity?” Driven by the desire to become both architects and participants in this period of pondering, we have identified a few macro-areas representatives of the challenges of automatic disambiguation. From this point of view, in this thesis, we propose experimental solutions and empirical tools so as to bring to the attention of the Word Sense Disambiguation community unusual and unexplored points of view. We hope these will represent a new perspective through which to best observe the current state of disambiguation, as well as to foresee future paths for the task to evolve on. Specifically, 1q) prompted by the growing concern about the rise in performance being closely linked to the demand for more and more unrealistic computational architectures in all areas of application of Deep Learning related techniques, we 1a) provide evidence for the undisclosed potential of approaches based on knowledge-bases, via the exploitation of syntagmatic information. Moreover, 2q) driven by the dissatisfaction with the use of cognitively-inaccurate, finite inventories of word senses in Word Sense Disambiguation, we 2a) introduce an approach based on Definition Modeling paradigms to generate contextual definitions for target words and phrases, hence going beyond the limits set by specific lexical-semantic inventories. Finally, 3q) moved by the desire to analyze the real implications beyond the idea of “machines performing disambiguation on par with their human counterparts” we 3a) put forward a detailed analysis of the shared errors affecting current state-of-the-art systems based on diverse approaches for Word Sense Disambiguation, and highlight, by means of a novel evaluation dataset tailored to represent common and critical issues shared by all systems, performances way lower than those usually reported in the current literature
    • …
    corecore