12 research outputs found

    Filling Conversation Ellipsis for Better Social Dialog Understanding

    Full text link
    The phenomenon of ellipsis is prevalent in social conversations. Ellipsis increases the difficulty of a series of downstream language understanding tasks, such as dialog act prediction and semantic role labeling. We propose to resolve ellipsis through automatic sentence completion to improve language understanding. However, automatic ellipsis completion can result in output which does not accurately reflect user intent. To address this issue, we propose a method which considers both the original utterance that has ellipsis and the automatically completed utterance in dialog act and semantic role labeling tasks. Specifically, we first complete user utterances to resolve ellipsis using an end-to-end pointer network model. We then train a prediction model using both utterances containing ellipsis and our automatically completed utterances. Finally, we combine the prediction results from these two utterances using a selection model that is guided by expert knowledge. Our approach improves dialog act prediction and semantic role labeling by 1.3% and 2.5% in F1 score respectively in social conversations. We also present an open-domain human-machine conversation dataset with manually completed user utterances and annotated semantic role labeling after manual completion.Comment: Accepted to AAAI 202

    Szemantikus szerepek automatikus címkézése természetes szövegekben = Semantic Role Labeling on Natural Texts

    Get PDF
    In this study we introduce a machine leaming-based approach that can automatically label semantic roles in Hungarian texts by applying a dependency parser. In our study we dealt with the areas of purchases of companies and news from stock markets. For the tasks we applied binary classifiers based on rich feature sets. In this study we introduce new methods for this application area. Having evaluated them on test databases, our algorithms achieve competitive results as compared to the current English results

    Szemantikus szerepek automatikus címkézése függőségi elemző alkalmazásával magyar nyelvű gazdasági szövegeken

    Get PDF
    Jelen tanulmányunkban bemutatjuk gazdag jellemzőtéren alapuló gépi tanuló megközelítésünket, amely automatikusan képes magyar nyelvű szövegekben szemantikus szerepek címkézésére függőségi elemző alkalmazásával. Munkánkban a vállalati vásárlások, tulajdonváltozások keretével foglalkoztunk. Jellemzőkészletünkben felszíni, morfológiai és a függőségi elemzés alapján kinyert jellemzőket használtunk fel. Ezen alapjellemzőket kiegészítettük a jellemzőkből számolt statisztikai arányokkal is. Megvizsgáltuk, hogy a modell hogyan teljesít egy gyakori célszóra önállóan, és a célszavak keretekbe összefoglalt csoportjára is

    Semantic role labeling for protein transport predicates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs – manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role.</p> <p>Results</p> <p>We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones.</p> <p>Conclusion</p> <p>We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.</p

    Combination Strategies for Semantic Role Labeling

    Full text link
    This paper introduces and analyzes a battery of inference models for the problem of semantic role labeling: one based on constraint satisfaction, and several strategies that model the inference as a meta-learning problem using discriminative classifiers. These classifiers are developed with a rich set of novel features that encode proposition and sentence-level information. To our knowledge, this is the first work that: (a) performs a thorough analysis of learning-based inference models for semantic role labeling, and (b) compares several inference strategies in this context. We evaluate the proposed inference strategies in the framework of the CoNLL-2005 shared task using only automatically-generated syntactic information. The extensive experimental evaluation and analysis indicates that all the proposed inference strategies are successful -they all outperform the current best results reported in the CoNLL-2005 evaluation exercise- but each of the proposed approaches has its advantages and disadvantages. Several important traits of a state-of-the-art SRL combination strategy emerge from this analysis: (i) individual models should be combined at the granularity of candidate arguments rather than at the granularity of complete solutions; (ii) the best combination strategy uses an inference model based in learning; and (iii) the learning-based inference benefits from max-margin classifiers and global feedback

    Extracção de relações entre entidades mencionadas

    Get PDF
    Actualmente existe uma grande quantidade de conteúdos digitais de cariz académico, pessoal e noticioso, entre outros, disponvéis para consulta na Internet. A obtenção de informação estruturada a partir destes conteúdos de forma manual tornou-se praticamente impossível. Assim, nos últimos anos tem-se registado um aumento na investigação de sistemas para análise e extracção de informação de forma automática. A classicação dos documentos por temas ou categorias constitui uma forma de relacionar conteúdos. No entanto, os documentos poderão, de igual forma, ser relacionados a partir das entidades que neles figuram, sejam elas Pessoas, Locais ou Organizações; mais ainda, ao extrair informação sobre as relações existentes entre as entidades, as formas de interacção entre documentos tornam-se muito mais ricas já que será possivel, por exemplo, relacionar os documentos que referem que determinada entidade praticou determinada acção e quais as entidades que a sofreram. Este trabalho propõe um sistema para identificação e extracção de relações entre entidades presentes num documento. As relações são obtidas a partir de um classicador de argumentos sintácticos utilizado em conjunto com um reconhecedor de entidades. Tratando-se de um sistema aplicado a língua Portuguesa foi necessário desenvolvimento de alguns recursos específicos para a língua : um etiquetador de categorias gramaticais e dois corpora: um para ser utilizado pelo etiquetador e outro com informação sintáctica a nível das palavras, sintagmas e orações para ser utilizado na tarefa de classicação de argumentos sintácticos. Embora utilizando um classicador de argumentos sintácticos preliminar, a experimentação mostra que o sistema desenvolvido consegue atingir o objectivo proposto e identificar relações entre entidades. Por outro lado, a criação dos recursos referidos vem enriquecer o conjunto de ferramentas disponveís para a língua Portuguesa passíveis de serem utilizados em futuros trabalhos; ### Abstract: Currently there is a large amount of digital content, being personal, academic and news, among others, available on the Internet. Obtaining structured information from these contents by hand has become virtually impossible. So, in recent years there has been an increase in the investigation of systems for automatic analysis and information extraction. Classi cation of documents by themes or categories is a way of relating content. However, documents can, likewise, be related by the entities they contain, being they people, places or organizations; moreover, extracting information on relations between the entities, the forms of interaction between documents become much richer as it will enable, for example, to list the documents that refer to a particular entity having practiced a speci c action and which entities have su ered that action. This paper proposes a system for identifying and extracting relations between entities present in a document. Relations are obtained from a semantic role labeller used in conjunction with named entity recognizer. Being applied to the Portuguese language, it was necessary to develop speci c resources for the language: a part-of-speech tagger and two corpora: one to be used with the POS-tagger and other with syntactic information for words, phrases and sentences to be used by the semantic role labeller. Although a preliminary semantic role labeller, experimentation shows that the system can achieve the proposed objective and identify relationships between entities. On the other hand, the creation of the refered resources will enrich the available Portuguese language set of tools that can be used in future wor

    Natural Language Processing (Almost) from Scratch

    Get PDF
    We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements
    corecore