1,434 research outputs found

    Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

    Full text link
    Existing approaches to automatic VerbNet-style verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for inducing VerbNets for multiple languages. To the best of our knowledge, this is the first study which demonstrates how the architectures for learning word embeddings can be applied to this challenging syntactic-semantic task. Our method uses cross-lingual translation pairs to tie each of the six target languages into a bilingual vector space with English, jointly specialising the representations to encode the relational information from English VerbNet. A standard clustering algorithm is then run on top of the VerbNet-specialised representations, using vector dimensions as features for learning verb classes. Our results show that the proposed cross-lingual transfer approach sets new state-of-the-art verb classification performance across all six target languages explored in this work.Comment: EMNLP 2017 (long paper

    Linking flat predicate argument structures

    Get PDF
    This report presents an approach to enriching flat and robust predicate argument structures with more fine-grained semantic information, extracted from underspecified semantic representations and encoded in Minimal Recursion Semantics (MRS). Such representations are provided by a hand-built HPSG grammar with a wide linguistic coverage. A specific semantic representation, called linked predicate argument structure (LPAS), has been worked out, which describes the explicit embedding relationships among predicate argument structures. LPAS can be used as a generic interface language for integrating semantic representations with different granularities. Some initial experiments have been conducted to convert MRS expressions into LPASs. A simple constraint solver is developed to resolve the underspecified dominance relations between the predicates and their arguments in MRS expressions. LPASs are useful for high-precision information extraction and question answering tasks because of their fine-grained semantic structures. In addition, I have attempted to extend the lexicon of the HPSG English Resource Grammar (ERG) exploiting WordNet and to disambiguate the readings of HPSG parsing with the help of a probabilistic parser, in order to process texts from application domains. Following the presented approach, the HPSG ERG grammar can be used for annotating some standard treebank, e.g., the Penn Treebank, with its fine-grained semantics. In this vein, I point out opportunities for a fruitful cooperation of the HPSG annotated Redwood Treebank and the Penn PropBank. In my current work, I exploit HPSG as an additional knowledge resource for the automatic learning of LPASs from dependency structures

    Using WordNet for Building WordNets

    Full text link
    This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The English WordNet is used in this approach as a backbone for Catalan and Spanish WordNets and as a lexical knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL

    Predicate Matrix: an interoperable lexical knowledge base for predicates

    Get PDF
    183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

    Revisiting knowledge-based Semantic Role Labeling

    Get PDF
    International audienceSemantic role labeling has seen tremendous progress in the last years, both for supervised and unsupervised approaches. The knowledge-based approaches have been neglected while they have shown to bring the best results to the related word sense disambiguation task. We contribute a simple knowledge-based system with an easy to reproduce specification. We also present a novel approach to handle the passive voice in the context of semantic role labeling that reduces the error rate in F1 by 15.7%, showing that significant improvements can be brought while retaining the key advantages of the approach: a simple approach which facilitates analysis of individual errors, does not need any hand-annotated corpora and which is not domain-specific

    Automatic Semantic Role Annotation for Spanish

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 215-218. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    Vers la création d'un Verbnet du français

    Get PDF
    International audienceVerbNet est une ressource lexicale pour les verbes anglais qui est bien utile pour le TAL grâce à sa large couverture et sa classification cohérente. Une telle ressource n'existe pas pour le français malgré quelques tentatives. Nous montrons comment adapter semi-automatiquement VerbNet en utilisant deux ressources lexicales existantes, le LVF (Les Verbes Français) et le LG (Lexique-Grammaire). Abstract. VerbNet is an English lexical resource that has proven useful for NLP due to its high coverage and coherent classification. Such a resource doesn't exist for French, despite some (mostly automatic and unsupervised) at-tempts. We show how to semi-automatically adapt VerbNet using existing lexical resources, namely LVF (Les Verbes Français) and LG (Lexique-Grammaire). Mots-clés : VerbNet, cadres de sous-catégorisations, rôles sémantiques

    Multiword expressions at length and in depth

    Get PDF
    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
    corecore