50 research outputs found
A methodology for the semiautomatic annotation of EPEC-RolSem, a basque corpus labeled at predicative level following the PropBank-Verb Net model
In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, Levin’s Classification and FrameNet. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.Lan honetan, EPEC-RolSem corpusa etiketatzeko jarraitu dugun metodologia deskribatuko dugu. EPEC-RolSem corpusa PropBank-VerbNet ereduari jarraiki predikatu-mailan etiketatutako euskarazko corpusa da. Etiketatze-lana aurrera eramateko euskal aditzen izaera semantikoa aztertu eta ingeleseko aditzekin konparatu dugu, azterketa horren emaitza da lan honetan proposatzen dugun metodologia. Metodologiaren atal bat PropBank-VerbNet eredura sortutako euskal aditzen lexikoiaren osaketa izan da, lexikoi hau Basque Verb Index (BVI) deitu dugu. Gure lanak alor honetan beste hizkuntzetan dagoen joera nagusia jarraitzen du, hau da, etiketatutako corpusetatik lexikoiak sortzea. EPEC-RolSem eta BVI oso baliabide garrantzitsuak dira euskararen semantika konputazionalaren alorrean, izan ere, euskararako sortutako mota honetako lehen baliabideak dira. Honetaz guztiaz gain, BVIko sarrera bakoitza PropBank, VerbNet, WordNet, Levinen sailkapena eta FrameNet bezalako baliabide ezagunekin lotua dago. Hainbat prozesu automatiko inplementatu ditugu EPEC-RolSem corpusaren eskuzko etiketatzea laguntzeko eta baita BVI sortzeko eta osatzeko ere
VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling
We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments.
We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org
Predicate Matrix: an interoperable lexical knowledge base for predicates
183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas
Proceedings
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 268 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
Rol semantikoen etiketatzeak testuetako espazio-denbora informazioaren prozesamenduan daukan ereaginaz
222 p.Tesi honen xede nagusia euskarazko rol semantikoen etiketatze automatikoa da ( Semantic RoleLabeling , SRL). Besteak beste, euskaraz idatzitako testuen analisi-kateanSRL edo azaleko analisi semantikoa egitea ahalbidetu dugu. Gainera, SRL atazarekin lotura daukateneuskarazko denbora eta espazio informazioaren etiketatze automatikorakoere aurrerapenak egin ditugu. Izan ere, gaur egungo estandarretara egokitutako denboraren etaespazioaren etiketatzeko tresnak garatu ditugu tesian. Orobat, diseinatu etainplementatutako sistema guztien emaitzak beste hizkuntza batzuk prozesatzen dituztentresnen emaitzekin alderatu ditugu.Gure lanaren beste helburua, euskararen analisi-katea hedatzeaz eta osatzeaz gainera,ondorengo bi hipotesiak baieztatzea izan da:Euskaraz denboraren adierazpen linguistikoa etiketatzeko orduan rol semantikoekdaukaten eragina positiboa dela, ingelesez eta gaztelaniaz bezala.Espazioaren adierazpen linguistikoa, denborarena bezala, fenomeno semantikoa dela,eta horregatik semantika eta, zehazkiago, rol semantikoek duten garrantzia nabarmenadela, informazio espazialaren etiketatze eraginkorra egin ahal izateko
An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences
Indiana University-Purdue University Indianapolis (IUPUI)The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed
KIDE4I: A Generic Semantics-Based Task-Oriented Dialogue System for Human-Machine Interaction in Industry 5.0
In Industry 5.0, human workers and their wellbeing are placed at the centre of the production process. In this context, task-oriented dialogue systems allow workers to delegate simple tasks to industrial assets while working on other, more complex ones. The possibility of naturally interacting with these systems reduces the cognitive demand to use them and triggers acceptation. Most modern solutions, however, do not allow a natural communication, and modern techniques to obtain such systems require large amounts of data to be trained, which is scarce in these scenarios. To overcome these challenges, this paper presents KIDE4I (Knowledge-drIven Dialogue framEwork for Industry), a semantic-based task-oriented dialogue system framework for industry that allows workers to naturally interact with industrial systems, is easy to adapt to new scenarios and does not require great amounts of data to be constructed. This work also reports the process to adapt KIDE4I to new scenarios. To validate and evaluate KIDE4I, it has been adapted to four use cases that are relevant to industrial scenarios following the described methodology, and two of them have been evaluated through two user studies. The system has been considered as accurate, useful, efficient, not demanding cognitively, flexible and fast. Furthermore, subjects view the system as a tool to improve their productivity and security while carrying out their tasks.This research was partially funded by the Basque Government’s Elkartek research and innovation program, projects EKIN (grant no KK-2020/00055) and DeepText (grant no KK-2020/00088)
A Study Towards Spanish Abstract Meaning Representation
Taking into account the increasing attention that researchers of Natural Language
Understanding (NLU) and Natural Language Generation (NLG) are paying to Computational
Semantics, we analyze the feasibility of annotating Spanish Abstract Meaning
Representations. The Abstract Meaning Representation (AMR) project aims to create a large-
scale sembank of simple structures that represent unified, complete semantic information
contained in English sentences. Although AMR is not destined to be an interlingua, one of its
key features is the ability to focus on events rather than on word forms. They do this, for
instance, by abstracting away from morpho-syntactic idiosyncrasies. In this thesis, we
investigate the requirements to – and we come up with a proposal to – annotate Spanish
AMRs, based on the premise that many of these idiosyncrasies mark differences between
languages. To our knowledge, this is the first work towards the development of Abstract
Meaning Representation for Spanish
Recommended from our members
Probabilistic Modeling of Verbnet Clusters
The objective of this research is to build automated models that emulate VerbNet, a semantic resource for English verbs. VerbNet has been built and expanded by linguists, forming a hierarchical clustering of verbs with common semantic and syntactic expressions, and is useful in semantic tasks. A major drawback is the difficulty of extending a manually-curated resource, which leads to gaps in coverage. After over a decade of development, VerbNet has missing verbs, missing senses of common verbs, and is missing appropriate classes to contain at least some of them. Although there have been efforts to build VerbNet resources in other languages, none have received as much attention, so these coverage issues are often more glaring in resource-poor languages. Probabilistic models can emulate VerbNet by learning distributions from large corpora, addressing coverage by providing both a complete clustering of the observed data, and a model to assign unseen sentences to clusters. The output of these models can aid the creation and expansion of VerbNet in English and other languages, especially if they align strongly with known VerbNet classes.This work develops several improvements to the state-of-the-art system for verb sense induction and VerbNet-like clustering. The baseline is two-step process for automatically inducing verb senses and producing a polysemy-aware clustering, that matched VerbNet more closely than any previous methods. First, we will see that a single-step process can produce better automatic senses and clusters. Second, we explore an alternative probabilistic model, which is successful on the verb clustering task. This model does not perform well on sense induction, so we analyze the limitations on its applicability. Third, we explore methods of supervising these probabilistic models with limited labeled data, which dramatically improves the recovery of correct clusters. Together these improvements suggest a line of research for practitioners to take advantage of probabilistic models in VerbNet annotation efforts