12 research outputs found
Filling Conversation Ellipsis for Better Social Dialog Understanding
The phenomenon of ellipsis is prevalent in social conversations. Ellipsis
increases the difficulty of a series of downstream language understanding
tasks, such as dialog act prediction and semantic role labeling. We propose to
resolve ellipsis through automatic sentence completion to improve language
understanding. However, automatic ellipsis completion can result in output
which does not accurately reflect user intent. To address this issue, we
propose a method which considers both the original utterance that has ellipsis
and the automatically completed utterance in dialog act and semantic role
labeling tasks. Specifically, we first complete user utterances to resolve
ellipsis using an end-to-end pointer network model. We then train a prediction
model using both utterances containing ellipsis and our automatically completed
utterances. Finally, we combine the prediction results from these two
utterances using a selection model that is guided by expert knowledge. Our
approach improves dialog act prediction and semantic role labeling by 1.3% and
2.5% in F1 score respectively in social conversations. We also present an
open-domain human-machine conversation dataset with manually completed user
utterances and annotated semantic role labeling after manual completion.Comment: Accepted to AAAI 202
Szemantikus szerepek automatikus cĂmkĂ©zĂ©se termĂ©szetes szövegekben = Semantic Role Labeling on Natural Texts
In this study we introduce a machine leaming-based approach that can automatically label semantic roles in Hungarian texts by applying a dependency parser. In our study we dealt with the areas of purchases of companies and news from stock markets. For the tasks we applied binary classifiers based on rich feature sets. In this study we introduce new methods for this application area. Having evaluated them on test databases, our algorithms achieve competitive results as compared to the current English results
Szemantikus szerepek automatikus cĂmkĂ©zĂ©se fĂĽggĹ‘sĂ©gi elemzĹ‘ alkalmazásával magyar nyelvű gazdasági szövegeken
Jelen tanulmányunkban bemutatjuk gazdag jellemzĹ‘tĂ©ren alapulĂł gĂ©pi tanulĂł megközelĂtĂ©sĂĽnket, amely automatikusan kĂ©pes magyar nyelvű szövegekben szemantikus szerepek cĂmkĂ©zĂ©sĂ©re fĂĽggĹ‘sĂ©gi elemzĹ‘ alkalmazásával. Munkánkban a vállalati vásárlások, tulajdonváltozások keretĂ©vel foglalkoztunk. JellemzĹ‘kĂ©szletĂĽnkben felszĂni, morfolĂłgiai Ă©s a fĂĽggĹ‘sĂ©gi elemzĂ©s alapján kinyert jellemzĹ‘ket használtunk fel. Ezen alapjellemzĹ‘ket kiegĂ©szĂtettĂĽk a jellemzĹ‘kbĹ‘l számolt statisztikai arányokkal is. Megvizsgáltuk, hogy a modell hogyan teljesĂt egy gyakori cĂ©lszĂłra önállĂłan, Ă©s a cĂ©lszavak keretekbe összefoglalt csoportjára is
Semantic role labeling for protein transport predicates
<p>Abstract</p> <p>Background</p> <p>Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs – manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role.</p> <p>Results</p> <p>We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones.</p> <p>Conclusion</p> <p>We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.</p
Combination Strategies for Semantic Role Labeling
This paper introduces and analyzes a battery of inference models for the
problem of semantic role labeling: one based on constraint satisfaction, and
several strategies that model the inference as a meta-learning problem using
discriminative classifiers. These classifiers are developed with a rich set of
novel features that encode proposition and sentence-level information. To our
knowledge, this is the first work that: (a) performs a thorough analysis of
learning-based inference models for semantic role labeling, and (b) compares
several inference strategies in this context. We evaluate the proposed
inference strategies in the framework of the CoNLL-2005 shared task using only
automatically-generated syntactic information. The extensive experimental
evaluation and analysis indicates that all the proposed inference strategies
are successful -they all outperform the current best results reported in the
CoNLL-2005 evaluation exercise- but each of the proposed approaches has its
advantages and disadvantages. Several important traits of a state-of-the-art
SRL combination strategy emerge from this analysis: (i) individual models
should be combined at the granularity of candidate arguments rather than at the
granularity of complete solutions; (ii) the best combination strategy uses an
inference model based in learning; and (iii) the learning-based inference
benefits from max-margin classifiers and global feedback
Extracção de relações entre entidades mencionadas
Actualmente existe uma grande quantidade de conteúdos digitais de cariz académico,
pessoal e noticioso, entre outros, disponvéis para consulta na Internet. A obtenção
de informação estruturada a partir destes conteúdos de forma manual tornou-se
praticamente impossĂvel. Assim, nos Ăşltimos anos tem-se registado um aumento na
investigação de sistemas para análise e extracção de informação de forma automática.
A classicação dos documentos por temas ou categorias constitui uma forma de
relacionar conteĂşdos. No entanto, os documentos poderĂŁo, de igual forma, ser relacionados
a partir das entidades que neles figuram, sejam elas Pessoas, Locais ou
Organizações; mais ainda, ao extrair informação sobre as relações existentes entre
as entidades, as formas de interacção entre documentos tornam-se muito mais ricas
já que será possivel, por exemplo, relacionar os documentos que referem que determinada
entidade praticou determinada acção e quais as entidades que a sofreram.
Este trabalho propõe um sistema para identificação e extracção de relações entre
entidades presentes num documento. As relações são obtidas a partir de um classicador
de argumentos sintácticos utilizado em conjunto com um reconhecedor de
entidades.
Tratando-se de um sistema aplicado a lĂngua Portuguesa foi necessário desenvolvimento
de alguns recursos especĂficos para a lĂngua : um etiquetador de categorias
gramaticais e dois corpora: um para ser utilizado pelo etiquetador e outro com informação
sintáctica a nĂvel das palavras, sintagmas e orações para ser utilizado na
tarefa de classicação de argumentos sintácticos.
Embora utilizando um classicador de argumentos sintácticos preliminar, a experimentação
mostra que o sistema desenvolvido consegue atingir o objectivo proposto
e identificar relações entre entidades. Por outro lado, a criação dos recursos referidos
vem enriquecer o conjunto de ferramentas disponveĂs para a lĂngua Portuguesa
passĂveis de serem utilizados em futuros trabalhos; ### Abstract:
Currently there is a large amount of digital content, being personal, academic and
news, among others, available on the Internet. Obtaining structured information
from these contents by hand has become virtually impossible. So, in recent years
there has been an increase in the investigation of systems for automatic analysis and
information extraction.
Classi cation of documents by themes or categories is a way of relating content.
However, documents can, likewise, be related by the entities they contain, being
they people, places or organizations; moreover, extracting information on relations
between the entities, the forms of interaction between documents become much
richer as it will enable, for example, to list the documents that refer to a particular
entity having practiced a speci c action and which entities have su ered that action.
This paper proposes a system for identifying and extracting relations between entities
present in a document. Relations are obtained from a semantic role labeller
used in conjunction with named entity recognizer.
Being applied to the Portuguese language, it was necessary to develop speci c resources
for the language: a part-of-speech tagger and two corpora: one to be used
with the POS-tagger and other with syntactic information for words, phrases and
sentences to be used by the semantic role labeller.
Although a preliminary semantic role labeller, experimentation shows that the system
can achieve the proposed objective and identify relationships between entities.
On the other hand, the creation of the refered resources will enrich the available
Portuguese language set of tools that can be used in future wor
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements