Search CORE

48 research outputs found

Lexical Semantic Recognition

Author: Hershcovich Daniel
Kranzlein Michael
Liu Nelson F.
Schneider Nathan
Publication venue
Publication date: 01/01/2021
Field of study

In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.Comment: 11 pages, 3 figures; to appear at MWE 202

arXiv.org e-Print Archive

Copenhagen University Research Information System

The Tanl Named Entity Recognizer at Evalita 2009

Author: Attardi Giuseppe
Dei Rossi Stefano
Dell\u27Orletta Felice
Simi Mari
Vecchi Eva Maria
Publication venue: Evalita 2009 Organizers
Publication date
Field of study

We describe the tagger present in the Tanl toolkit, which is a flexible and customizable tool for use in various tagging tasks, including POS tagging and SuperSense tagging. The tagger uses a variety of features, both local and global, which can be specified in a configuration file. The tagger is based on a Maximum Entropy classifier and uses dynamic programming to select accurate sequences of tags. We applied it to the NER tagging task in Evalita 2009, customizing the set of features to use and generating a set of dictionaries from the training corpus, that also provide additional features. The final accuracy is further improved by applying simple symbolic rules

PUblication MAnagement

Adapting the TANL tool suite to Universal Dependencies

Author: ATTARDI GIUSEPPE
SIMI MARIA
Publication venue: place:Paris
Publication date: 01/01/2016
Field of study

TANL is a suite of tools for text analytics based on the software architecture paradigm of data driven pipelines. The strategies for upgrading TANL to the use of Universal Dependencies range from a minimalistic approach consisting of introducing pre/post-processing steps into the native pipeline to revising the whole pipeline. We explore the issue in the context of the Italian Treebank, considering both the efforts involved, how to avoid losing linguistically relevant information and the loss of accuracy in the process

Archivio della Ricerca - Università di Pisa

Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks

Author: Besacier Laurent
Semmar Nasredine
Zennaki Othman
Publication venue: HAL CCSD
Publication date: 29/09/2016
Field of study

International audienceThis work focuses on the rapid development of linguistic annotation tools for resource-poor languages. We experiment several cross-lingual annotation projection methods using Recurrent Neural Networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between the source and target language. More precisely, our method has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about foreign languages, which makes it applicable to a wide range of resource-poor languages, (c) it provides truly multilingual taggers. We investigate both uni-and bi-directional RNN models and propose a method to include external information (for instance low level information from POS) in the RNN to train higher level taggers (for instance, super sense taggers). We demonstrate the validity and genericity of our model by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual POS and super sense taggers

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL-CEA

A Corpus and Model Integrating Multiword Expressions and Supersenses

Author: Schneider Nathan
Smith Noah A
Publication venue
Publication date: 01/01/2015
Field of study

Abstract This paper introduces a task of identifying and semantically classifying lexical expressions in running text. We investigate the online reviews genre, adding semantic supersense annotations to a 55,000 word English corpus that was previously annotated for multiword expressions. The noun and verb supersenses apply to full lexical expressions, whether single-or multiword. We then present a sequence tagging model that jointly infers lexical expressions and their supersenses. Results show that even with our relatively small training corpus in a noisy domain, the joint task can be performed to attain 70% class labeling F 1

CiteSeerX

Edinburgh Research Explorer

Word vs. Class-Based Word Sense Disambiguation

Author: Izquierdo Beviá Rubén
Rigau Claramunt German
Suárez Cueto Armando
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2015
Field of study

As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a de-facto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal WordNet meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.This work has been partially supported by the NewsReader project (ICT-2011-316404), the Spanish project SKaTer (TIN2012-38584-C06-02)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Corpus of Preposition Supersenses

Author: Conger Kathryn
Green Meredith
Hwang Jena D.
O'Gorman Tim
Palmer Martha
Schneider Nathan
Srikumar Vivek
Suresh Abhijit
Publication venue
Publication date: 11/08/2016
Field of study

Edinburgh Research Explorer