15 research outputs found

    Automatic Question Generation Using Semantic Role Labeling for Morphologically Rich Languages

    Get PDF
    In this paper, a novel approach to automatic question generation (AQG) using semantic role labeling (SRL) for morphologically rich languages is presented. A model for AQG is developed for our native speaking language, Croatian. Croatian language is a highly inflected language that belongs to Balto-Slavic family of languages. Globally this article can be divided into two stages. In the first stage we present a novel approach to SRL of texts written in Croatian language that uses Conditional Random Fields (CRF). SRL traditionally consists of predicate disambiguation, argument identification and argument classification. After these steps most approaches use beam search to find optimal sequence of arguments based on given predicate. We propose the architecture for predicate identification and argument classification in which finding the best sequence of arguments is handled by Viterbi decoding. We enrich SRL features with custom attributes that are custom made for this language. Our SRL system achieves F1 score of 78% in argument classification step on Croatian hr 500k corpus. In the second stage the proposed SRL model is used to develop AQG system for question generation from texts written in Croatian language. We proposed custom templates for AQG that were used to generate a total of 628 questions which were evaluated by experts scoring every question on a Likert scale. Expert evaluation of the system showed that our AQG achieved good results. The evaluation showed that 68% of the generated questions could be used for educational purposes. With these results the proposed AQG system could be used for possible implementation inside educational systems such as Intelligent Tutoring Systems

    A comparative study of conversion aided methods for WordNet sentence textual similarity

    Get PDF
    In this paper, we present a comparison of three methods for taxonomic-based sentence semantic relatedness, aided with word parts of speech (PoS) conversion. We use WordNet ontology for determining word level semantic similarity while augmenting WordNet with two other lexicographical databases; namely Categorial Variation Database (CatVar) and Morphosemantic Database in assisting the word category conversion. Using a human annotated benchmark data set, all the three approaches achieved a high positive correlation reaching up to (r = 0.881647) with comparison to human ratings and two other baselines evaluated on the same benchmark data set

    Improvement of VerbNet-like resources by frame typing

    Get PDF
    International audienceVerbenet is a French lexicon developed by " translation " of its English counterpart — VerbNet (Kipper-Schuler, 2005) — and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a " virtuous circle ". We present the principles underlying a program we developed and used to automatically type frames in Verbenet. We also show that our system is portable to other languages

    Adapting VerbNet to French using existing resources

    Get PDF
    International audienceVerbNet is an English lexical resource for verbs that has proven useful for English NLP due to its high coverage and coherent classification. Such a resource doesn’t exist for other languages, despite some (mostly automatic and unsupervised) attempts. We show how to semi-automatically adapt VerbNet using existing resources designed for different purposes. This study focuses on French and uses two French resources: a semantic lexicon (Les Verbes Français) and a syntactic lexicon (Lexique-Grammaire)

    Improvement of VerbNet-like resources by frame typing

    Get PDF
    International audienceVerbenet is a French lexicon developed by " translation " of its English counterpart — VerbNet (Kipper-Schuler, 2005) — and treatment of the specificities of French syntax (Pradet et al., 2014; Danlos et al., 2016). One difficulty encountered in its development springs from the fact that the list of (potentially numerous) frames has no internal organization. This paper proposes a type system for frames that shows whether two frames are variants of a given alternation. Frame typing facilitates coherence checking of the resource in a " virtuous circle ". We present the principles underlying a program we developed and used to automatically type frames in Verbenet. We also show that our system is portable to other languages

    A New Approach to plagiarism Detection Using Cellular Learning Automatons and Semantic Role Labeling

    Get PDF
    Plagiarism is removal and to put it in their own name the ideas or words of others. With the Increasing progress of the Internet and the proliferation of online articles, scientific theft has also become easier. Many systems have been developed today to detect plagiarism. Most of these systems are based on lexical structure and string matching algorithms. Therefore, these systems can hardly detect recovery robberies, placement of synonyms. This paper presents a method for identifying plagiarism based on semantic role labeling and cellular learning automata. In this paper, cellular learning automata are used to locate the processed words. Semantic role labeling specifies the role of words in sentence. Comparison operations are performed for all sentences of the original text and suspicious text. Results of the experiments on PAN-PC-11 corpus demonstrate the proposed method improves values of evaluation parameters such as recall, precision and F-measure, comparing to previous approaches in plagiarism detection

    Knowledge-Based Sentence Semantic Similarity:Algebraical Properties

    Get PDF
    Determining the extent to which two text snippets are semantically equivalent is a well-researched topic in the areas of natural language processing, information retrieval and text summarization. The sentence-to-sentence similarity scoring is extensively used in both generic and query-based summarization of documents as a significance or a similarity indicator. Nevertheless, most of these applications utilize the concept of semantic similarity measure only as a tool, without paying importance to the inherent properties of such tools that ultimately restrict the scope and technical soundness of the underlined applications. This paper aims to contribute to fill in this gap. It investigates three popular WordNet hierarchical semantic similarity measures, namely path-length, Wu and Palmer and Leacock and Chodorow, from both algebraical and intuitive properties, highlighting their inherent limitations and theoretical constraints. We have especially examined properties related to range and scope of the semantic similarity score, incremental monotonicity evolution, monotonicity with respect to hyponymy/hypernymy relationship as well as a set of interactive properties. Extension from word semantic similarity to sentence similarity has also been investigated using a pairwise canonical extension. Properties of the underlined sentence-to-sentence similarity are examined and scrutinized. Next, to overcome inherent limitations of WordNet semantic similarity in terms of accounting for various Part-of-Speech word categories, a WordNet “All word-To-Noun conversion” that makes use of Categorial Variation Database (CatVar) is put forward and evaluated using a publicly available dataset with a comparison with some state-of-the-art methods. The finding demonstrates the feasibility of the proposal and opens up new opportunities in information retrieval and natural language processing tasks

    Plagiarism detection scheme based on semantic role labeling

    Get PDF
    Nowadays, many documents are available on the internet and are easy to access. Due to this wide availability, users can easily create a new document by copying and pasting. Plagiarism occurs when the content is copied without permission or citation. This paper introduces a plagiarism detection technique based on the Semantic Role Labeling (SRL). The technique analyses and compares text based on the semantic allocation for each term inside the sentence. SRL is superior in generating arguments for each sentence semantically. In addition, experimental results on PAN-PC-09 data sets showed that our method outperforms the modern methods for plagiarism detection in terms of Recall, Precision and F-measure
    corecore