28,406 research outputs found

    Filling Conversation Ellipsis for Better Social Dialog Understanding

    Full text link
    The phenomenon of ellipsis is prevalent in social conversations. Ellipsis increases the difficulty of a series of downstream language understanding tasks, such as dialog act prediction and semantic role labeling. We propose to resolve ellipsis through automatic sentence completion to improve language understanding. However, automatic ellipsis completion can result in output which does not accurately reflect user intent. To address this issue, we propose a method which considers both the original utterance that has ellipsis and the automatically completed utterance in dialog act and semantic role labeling tasks. Specifically, we first complete user utterances to resolve ellipsis using an end-to-end pointer network model. We then train a prediction model using both utterances containing ellipsis and our automatically completed utterances. Finally, we combine the prediction results from these two utterances using a selection model that is guided by expert knowledge. Our approach improves dialog act prediction and semantic role labeling by 1.3% and 2.5% in F1 score respectively in social conversations. We also present an open-domain human-machine conversation dataset with manually completed user utterances and annotated semantic role labeling after manual completion.Comment: Accepted to AAAI 202

    Mining Event - Based Commonsense Knowledge from Web using NLP Techniques

    Get PDF
    The real life intelligent applications such as agents, expert systems, dialog understanding systems, weather forecasting systems, robotics etc. mainly focus on commonsense knowledge And basically these works on the knowledgebase which contains large amount of commonsense knowledge. The main intention of this work is to create a commonsense knowledge base by using an effective methodology to retrieve commonsense knowledge from large amount of web data. In order to achieve the best results, it makes use of different natural language processing techniques such as semantic role labeling, lexical and syntactic analysi. Keywords: Automatic statistical semantic role tagger (ASSERT), lexico - syntactic pattern matching, semantic role labeling (SRL

    MEBCK from Web using NLP Techniques

    Get PDF
    The real life intelligent applications such as agents, expert systems, dialog understanding systems, weather forecasting systems, robotics etc. mainly focus on commonsense knowledge And basically these works on the knowledgebase which contains large amount of commonsense knowledge. The main intention of this work is to create a commonsense knowledge base by using an effective methodology to retrieve commonsense knowledge from large amount of web data. In order to achieve the best results, it makes use of different natural language processing techniques such as semantic role labeling, lexical and syntactic analysis. Keywords: Automatic statistical semantic role tagger (ASSERT), lexico - syntactic pattern matching, semantic role labeling (SRL

    Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing

    Get PDF
    In this paper, we present the Universal Semantic Annotator (USeA), which offers the first unified API for high-quality automatic annotations of texts in 100 languages through state-of-the-art systems for Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing. Together, such annotations can be used to provide users with rich and diverse semantic information, help second-language learners, and allow researchers to integrate explicit semantic knowledge into downstream tasks and real-world applications

    Automatic semantic role labeling for European Portuguese

    Get PDF
    Dissertação de mestrado, Ciências da Linguagem, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014This thesis addresses the task of Semantic Role Labeling (SRL) in European Portuguese. SRL can be used in a number of NLP application, namely Anaphora Resolution, Question Answering, Summarization, etc. A general-purpose, consensual set of 37 semantic roles was defined, based on a survey of the relevant related work, and using highly reproducible properties. A set of annotation guidelines was also built, in order to clarify how semantic roles should be assigned to verbal arguments in context. A SRL module was built and integrated in a fully-fledged Natural Language Processing (NLP) chain, named STRING, developed at INESC-ID Lisboa. For this module, the information from a lexicon-syntactic database, ViPEr, which contains the relevant linguistic information for more than 6,000 European Portuguese full (or lexical, or distributional) verbs, was used and the database manually enriched with the information pertaining to the semantic roles of all verbal arguments. The SRL module is composed of 183 pattern-matching rules for labeling of subject (N0), first (N1) and second (N2) essential complements of verbal constructions and also allows the attribution of SR to other syntactic slots in the case of time, locative, manner, instrumental, comitative and other complements (both essential and circumstantial). This module was tested in a small corpus that was specifically annotated for this purpose. After this manual annotation, the corpus containing 655 semantic roles was used as a golden standard for automatic comparison with the system’s output. Considering that the SRL module operates at the last stages of the processing chain, a relatively high precision was achieved (69.9% in a strict evaluation and 77.7%, when evaluation included partial matches), though the recall was low (17.9%), which calls for future improvements.Esta tese aborda a tarefa de Anotação de Papéis Semânticos (APS) em Português Europeu. A APS pode ser usada em diversas aplicações de Processamento de Linguagem Natural (PLN) tais como, Resolução de Anáforas, Recuperação/Extração de Informação, Sumarização Automática, etc. Um conjunto de 37 papéis semânticos, consensual e de uso geral, foi definido com base nos trabalhos relacionados relevantes e recorrendo a propriedades suficientemente reprodutíveis. Foi também elaborado um conjunto de diretrizes de anotação, a fim de esclarecer como deveriam ser atribuídos aos argumentos verbais, em contexto, os seus respetivos papéis semânticos. Com base nestes elementos, foi construído um módulo de APS, que se encontra integrado na cadeia de Processamento de Linguagem Natural STRING, desenvolvida no INESC-ID Lisboa. Para este módulo, foram utilizadas as informações de um banco de dados léxico-sintáticos, ViPEr, que contém a informação linguística relevante para mais de 6.000 verbos plenos (ou lexicais, ou distribucionais) do Português Europeu, e a base de dados foi enriquecida manualmente com as informações referentes ao papéis semânticos de todos os argumentos verbais (sujeito e complementos essenciais). O módulo de APS é composto por 183 regras de correspondência de padrões para a marcação de sujeito (N0), primeiro (N1) e segundo (N2) complementos essenciais das construções verbais, e também permite a atribuição de papéis semânticos para outros constituintes sintáticos, adjuntos adverbiais, tais como os complementos de tempo, de modo, os complementos locativos, instrumentais, comitativos, entre outros (tanto essenciais como circunstanciais). Este módulo foi testado num corpus de textos reais, de natureza tipológica variada e abordando diversos tópicos, o qual foi manualmente anotado por dois linguistas especificamente para este propósito. Após esse processo de anotação manual, o corpus, que contém 655 papéis semânticos, foi usado como um corpus de referência (golden standard) para a comparação automática com a saída do sistema. Considerando-se que o módulo de APS opera nos últimos passos da cadeia de processamento, foi alcançada uma precisão relativamente alta (69,9 % em uma avaliação estrita e 77,7 %, quando a avaliação inclui resultados parciais), embora a abrangência (ou recall) tenha sido baixa (17,9 %), o que deverá constituir um dos objetivos do trabalho futuro

    Automatic Question Generation Using Semantic Role Labeling for Morphologically Rich Languages

    Get PDF
    In this paper, a novel approach to automatic question generation (AQG) using semantic role labeling (SRL) for morphologically rich languages is presented. A model for AQG is developed for our native speaking language, Croatian. Croatian language is a highly inflected language that belongs to Balto-Slavic family of languages. Globally this article can be divided into two stages. In the first stage we present a novel approach to SRL of texts written in Croatian language that uses Conditional Random Fields (CRF). SRL traditionally consists of predicate disambiguation, argument identification and argument classification. After these steps most approaches use beam search to find optimal sequence of arguments based on given predicate. We propose the architecture for predicate identification and argument classification in which finding the best sequence of arguments is handled by Viterbi decoding. We enrich SRL features with custom attributes that are custom made for this language. Our SRL system achieves F1 score of 78% in argument classification step on Croatian hr 500k corpus. In the second stage the proposed SRL model is used to develop AQG system for question generation from texts written in Croatian language. We proposed custom templates for AQG that were used to generate a total of 628 questions which were evaluated by experts scoring every question on a Likert scale. Expert evaluation of the system showed that our AQG achieved good results. The evaluation showed that 68% of the generated questions could be used for educational purposes. With these results the proposed AQG system could be used for possible implementation inside educational systems such as Intelligent Tutoring Systems

    Automatic Multiple Choice Question Generation System for Semantic Attributes Using String Similarity Measures

    Get PDF
    This research introduces an automatic multiple choice question generation system to evaluate the understanding of the semantic role labels and named entities in a text. The system provided selects the informative sentence and the keyword to be asked based on the semantic labels and named entities that exist in the sentence, the distractors are chosen based on a similarity measure between sentences in the data set. The system is tested using a set of sentences extracted from the TREC 2007 dataset for question answering. From the experimental results, it can be induced that the semantic role labeling and named entity recognition approaches could be used as   a good keyword selection mechanism. The second conclusion is that the string similarity measures proved to be a very good approach that can used in generating the distractors for an automatic multiple choice question. Also, combining the similarity measures of different algorithms would lead to generate a good distractors