Search CORE

50 research outputs found

A methodology for the semiautomatic annotation of EPEC-RolSem, a basque corpus labeled at predicative level following the PropBank-Verb Net model

Author: Aldezabal Roteta Izaskun
Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
Estarrona Ibarloza Ainara
Publication venue
Publication date: 01/01/2013
Field of study

In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, Levin’s Classification and FrameNet. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.Lan honetan, EPEC-RolSem corpusa etiketatzeko jarraitu dugun metodologia deskribatuko dugu. EPEC-RolSem corpusa PropBank-VerbNet ereduari jarraiki predikatu-mailan etiketatutako euskarazko corpusa da. Etiketatze-lana aurrera eramateko euskal aditzen izaera semantikoa aztertu eta ingeleseko aditzekin konparatu dugu, azterketa horren emaitza da lan honetan proposatzen dugun metodologia. Metodologiaren atal bat PropBank-VerbNet eredura sortutako euskal aditzen lexikoiaren osaketa izan da, lexikoi hau Basque Verb Index (BVI) deitu dugu. Gure lanak alor honetan beste hizkuntzetan dagoen joera nagusia jarraitzen du, hau da, etiketatutako corpusetatik lexikoiak sortzea. EPEC-RolSem eta BVI oso baliabide garrantzitsuak dira euskararen semantika konputazionalaren alorrean, izan ere, euskararako sortutako mota honetako lehen baliabideak dira. Honetaz guztiaz gain, BVIko sarrera bakoitza PropBank, VerbNet, WordNet, Levinen sailkapena eta FrameNet bezalako baliabide ezagunekin lotua dago. Hainbat prozesu automatiko inplementatu ditugu EPEC-RolSem corpusaren eskuzko etiketatzea laguntzeko eta baita BVI sortzeko eta osatzeko ere

Archivo Digital para la Docencia y la Investigación

VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling

Author: andrea di fabio
CONIA SIMONE
roberto navigli
Publication venue
Publication date: 01/01/2019
Field of study

We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Predicate Matrix: an interoperable lexical knowledge base for predicates

Author: López de Lacalle Maddalen
Publication venue
Publication date: 10/07/2023
Field of study

183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

Archivo Digital para la Docencia y la Investigación

Proceedings

Author: Dickinson Markus
Müürisep Kaili
Passarotti Marco
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Rol semantikoen etiketatzeak testuetako espazio-denbora informazioaren prozesamenduan daukan ereaginaz

Author: Salaverri Izco Haritz
Publication venue
Publication date: 27/07/2017
Field of study

222 p.Tesi honen xede nagusia euskarazko rol semantikoen etiketatze automatikoa da ( Semantic RoleLabeling , SRL). Besteak beste, euskaraz idatzitako testuen analisi-kateanSRL edo azaleko analisi semantikoa egitea ahalbidetu dugu. Gainera, SRL atazarekin lotura daukateneuskarazko denbora eta espazio informazioaren etiketatze automatikorakoere aurrerapenak egin ditugu. Izan ere, gaur egungo estandarretara egokitutako denboraren etaespazioaren etiketatzeko tresnak garatu ditugu tesian. Orobat, diseinatu etainplementatutako sistema guztien emaitzak beste hizkuntza batzuk prozesatzen dituztentresnen emaitzekin alderatu ditugu.Gure lanaren beste helburua, euskararen analisi-katea hedatzeaz eta osatzeaz gainera,ondorengo bi hipotesiak baieztatzea izan da:Euskaraz denboraren adierazpen linguistikoa etiketatzeko orduan rol semantikoekdaukaten eragina positiboa dela, ingelesez eta gaztelaniaz bezala.Espazioaren adierazpen linguistikoa, denborarena bezala, fenomeno semantikoa dela,eta horregatik semantika eta, zehazkiago, rol semantikoek duten garrantzia nabarmenadela, informazio espazialaren etiketatze eraginkorra egin ahal izateko

Archivo Digital para la Docencia y la Investigación

An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences

Author: Newsom Eric Tyner
Publication venue
Publication date: 12/11/2013
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed

IUPUIScholarWorks

KIDE4I: A Generic Semantics-Based Task-Oriented Dialogue System for Human-Machine Interaction in Industry 5.0

Author: Aceta Moreno Cristina
Fernández González Izaskun
Soroa Echave Aitor
Publication venue: 'MDPI AG'
Publication date: 24/01/2022
Field of study

In Industry 5.0, human workers and their wellbeing are placed at the centre of the production process. In this context, task-oriented dialogue systems allow workers to delegate simple tasks to industrial assets while working on other, more complex ones. The possibility of naturally interacting with these systems reduces the cognitive demand to use them and triggers acceptation. Most modern solutions, however, do not allow a natural communication, and modern techniques to obtain such systems require large amounts of data to be trained, which is scarce in these scenarios. To overcome these challenges, this paper presents KIDE4I (Knowledge-drIven Dialogue framEwork for Industry), a semantic-based task-oriented dialogue system framework for industry that allows workers to naturally interact with industrial systems, is easy to adapt to new scenarios and does not require great amounts of data to be constructed. This work also reports the process to adapt KIDE4I to new scenarios. To validate and evaluate KIDE4I, it has been adapted to four use cases that are relevant to industrial scenarios following the described methodology, and two of them have been evaluated through two user studies. The system has been considered as accurate, useful, efficient, not demanding cognitively, flexible and fast. Furthermore, subjects view the system as a tool to improve their productivity and security while carrying out their tasks.This research was partially funded by the Basque Government’s Elkartek research and innovation program, projects EKIN (grant no KK-2020/00055) and DeepText (grant no KK-2020/00088)

Multidisciplinary Digital Publishing Institute

Archivo Digital para la Docencia y la Investigación

A Study Towards Spanish Abstract Meaning Representation

Author: Migueles Abraira Noelia
Publication venue
Publication date: 27/06/2017
Field of study

Taking into account the increasing attention that researchers of Natural Language Understanding (NLU) and Natural Language Generation (NLG) are paying to Computational Semantics, we analyze the feasibility of annotating Spanish Abstract Meaning Representations. The Abstract Meaning Representation (AMR) project aims to create a large- scale sembank of simple structures that represent unified, complete semantic information contained in English sentences. Although AMR is not destined to be an interlingua, one of its key features is the ability to focus on events rather than on word forms. They do this, for instance, by abstracting away from morpho-syntactic idiosyncrasies. In this thesis, we investigate the requirements to – and we come up with a proposal to – annotate Spanish AMRs, based on the premise that many of these idiosyncrasies mark differences between languages. To our knowledge, this is the first work towards the development of Abstract Meaning Representation for Spanish

Archivo Digital para la Docencia y la Investigación

Recommended from our members

Probabilistic Modeling of Verbnet Clusters

Author: Peterson Daniel Wyde
Publication venue: University of Colorado Boulder
Publication date: 01/01/2019
Field of study

The objective of this research is to build automated models that emulate VerbNet, a semantic resource for English verbs. VerbNet has been built and expanded by linguists, forming a hierarchical clustering of verbs with common semantic and syntactic expressions, and is useful in semantic tasks. A major drawback is the difficulty of extending a manually-curated resource, which leads to gaps in coverage. After over a decade of development, VerbNet has missing verbs, missing senses of common verbs, and is missing appropriate classes to contain at least some of them. Although there have been efforts to build VerbNet resources in other languages, none have received as much attention, so these coverage issues are often more glaring in resource-poor languages. Probabilistic models can emulate VerbNet by learning distributions from large corpora, addressing coverage by providing both a complete clustering of the observed data, and a model to assign unseen sentences to clusters. The output of these models can aid the creation and expansion of VerbNet in English and other languages, especially if they align strongly with known VerbNet classes.This work develops several improvements to the state-of-the-art system for verb sense induction and VerbNet-like clustering. The baseline is two-step process for automatically inducing verb senses and producing a polysemy-aware clustering, that matched VerbNet more closely than any previous methods. First, we will see that a single-step process can produce better automatic senses and clusters. Second, we explore an alternative probabilistic model, which is successful on the verb clustering task. This model does not perform well on sense induction, so we analyze the limitations on its applicability. Third, we explore methods of supervising these probabilistic models with limited labeled data, which dramatically improves the recovery of correct clusters. Together these improvements suggest a line of research for practitioners to take advantage of probabilistic models in VerbNet annotation efforts

CU Scholar Institutional Repository