Search CORE

84 research outputs found

Revamping question answering with a semantic approach over world knowledge

Author: Cardoso Nuno
Dornescu Iustin
Hartrumpf Sven
Leveling Johannes
Publication venue
Publication date: 01/09/2010
Field of study

Classic textual question answering (QA) approaches that rely on statistical keyword relevance scoring without exploiting semantic content are useful to a certain extent, but are limited to questions answered by a small text excerpt. With the maturation of Wikipedia and with upcoming projects like DBpedia, we feel that nowadays QA can adopt a deeper, semantic approach to the task, where answers can be inferred using knowledge bases to overcome the limitations of textual QA approaches. In GikiCLEF, a QA-flavoured evaluation task, the best performing systems followed a semantic approach. In this paper, we present our motivations for preferring semantic approaches to QA over textual approaches, with Wikipedia serving as a raw knowledge source

DCU Online Research Access Service

Predicting the Law Area and Decisions of French Supreme Court Cases

Author: Sulea Octavia-Maria
van Genabith Josef
Vela Mihaela
Zampieri Marcos
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court. We also investigate the influence of the time period in which a ruling was made over the textual form of the case description and the extent to which it is necessary to mask the judge's motivation for a ruling to emulate a real-world test scenario. We report results of 96% f1 score in predicting a case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 score in estimating the time span when a ruling has been issued using a linear Support Vector Machine (SVM) classifier trained on lexical features.Comment: RANLP 201

arXiv.org e-Print Archive

Crossref

Universaar

Acronym

Sentiment Analysis: State of the Art

Author: Chalothorn Tawunrat
Ellman Jeremy
Publication venue: Institute of Research Engineers and Doctors
Publication date: 01/08/2013
Field of study

We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included

Northumbria University Research Portal

SEMONTOQA: A Semantic Understanding-Based Ontological Framework for Factoid Question Answering

Author: Hoque Moinul
Quaresma Paulo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

This paper presents an outline of an Ontological and Se- mantic understanding-based model (SEMONTOQA) for an open-domain factoid Question Answering (QA) system. The outlined model analyses unstructured English natural lan- guage texts to a vast extent and represents the inherent con- tents in an ontological manner. The model locates and ex- tracts useful information from the text for various question types and builds a semantically rich knowledge-base that is capable of answering different categories of factoid ques- tions. The system model converts the unstructured texts into a minimalistic, labelled, directed graph that we call a Syntactic Sentence Graph (SSG). An Automatic Text In- terpreter using a set of pre-learnt Text Interpretation Sub- graphs and patterns tries to understand the contents of the SSG in a semantic way. The system proposes a new fea- ture and action based Cognitive Entity-Relationship Net- work designed to extend the text understanding process to an in-depth level. Application of supervised learning allows the system to gradually grow its capability to understand the text in a more fruitful manner. The system incorpo- rates an effective Text Inference Engine which takes the re- sponsibility of inferring the text contents and isolating enti- ties, their features, actions, objects, associated contexts and other properties, required for answering questions. A similar understanding-based question processing module interprets the user’s need in a semantic way. An Ontological Mapping Module, with the help of a set of pre-defined strategies de- signed for different classes of questions, is able to perform a mapping between a question’s ontology with the set of ontologies stored in the background knowledge-base. Em- pirical verification is performed to show the usability of the proposed model. The results achieved show that, this model can be used effectively as a semantic understanding based alternative QA system

Crossref

Repositório Científico da Universidade de Évora

Plagiarism detection for Indonesian texts

Author: Krisnawati Lucia Dwi
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 18/05/2016
Field of study

As plagiarism becomes an increasing concern for Indonesian universities and research centers, the need of using automatic plagiarism checker is becoming more real. However, researches on Plagiarism Detection Systems (PDS) in Indonesian documents have not been well developed, since most of them deal with detecting duplicate or near-duplicate documents, have not addressed the problem of retrieving source documents, or show tendency to measure document similarity globally. Therefore, systems resulted from these researches are incapable of referring to exact locations of ``similar passage'' pairs. Besides, there has been no public and standard corpora available to evaluate PDS in Indonesian texts. To address the weaknesses of former researches, this thesis develops a plagiarism detection system which executes various methods of plagiarism detection stages in a workflow system. In retrieval stage, a novel document feature coined as phraseword is introduced and executed along with word unigram and character n-grams to address the problem of retrieving source documents, whose contents are copied partially or obfuscated in a suspicious document. The detection stage, which exploits a two-step paragraph-based comparison, is aimed to address the problems of detecting and locating source-obfuscated passage pairs. The seeds for matching source-obfuscated passage pairs are based on locally-weighted significant terms to capture paraphrased and summarized passages. In addition to this system, an evaluation corpus was created through simulation by human writers, and by algorithmic random generation. Using this corpus, the performance evaluation of the proposed methods was performed in three scenarios. On the first scenario which evaluated source retrieval performance, some methods using phraseword and token features were able to achieve the optimum recall rate 1. On the second scenario which evaluated detection performance, our system was compared to Alvi's algorithm and evaluated in 4 levels of measures: character, passage, document, and cases. The experiment results showed that methods resulted from using token as seeds have higher scores than Alvi's algorithm in all 4 levels of measures both in artificial and simulated plagiarism cases. In case detection, our systems outperform Alvi's algorithm in recognizing copied, shaked, and paraphrased passages. However, Alvi's recognition rate on summarized passage is insignificantly higher than our system. The same tendency of experiment results were demonstrated on the third experiment scenario, only the precision rates of Alvi's algorithm in character and paragraph levels are higher than our system. The higher Plagdet scores produced by some methods in our system than Alvi's scores show that this study has fulfilled its objective in implementing a competitive state-of-the-art algorithm for detecting plagiarism in Indonesian texts. Being run at our test document corpus, Alvi's highest scores of recall, precision, Plagdet, and detection rate on no-plagiarism cases correspond to its scores when it was tested on PAN'14 corpus. Thus, this study has contributed in creating a standard evaluation corpus for assessing PDS for Indonesian documents. Besides, this study contributes in a source retrieval algorithm which introduces phrasewords as document features, and a paragraph-based text alignment algorithm which relies on two different strategies. One of them is to apply local-word weighting used in text summarization field to select seeds for both discriminating paragraph pair candidates and matching process. The proposed detection algorithm results in almost no multiple detection. This contributes to the strength of this algorithm

Semantics-based Question Generation and Implementation

Author: Bouma Gosse
Yao Xuchen
Zhang Yi
Publication venue: University of Illinois at Chicago Library
Publication date: 01/01/2012
Field of study

This paper presents a question generation system based on the approach of semantic rewriting. The state-of-the-art deep linguistic parsing and generation tools are employed to convert (back and forth) between the natural language sentences and their meaning representations in the form of Minimal Recursion Semantics (MRS). By carefully operating on the semantic structures, we show a principled way of generating questions without ad-hoc manipulation of the syntactic structures. Based on the (partial) understanding of the sentence meaning, the system generates questions which are semantically grounded and purposeful. And with the support of deep linguistic grammars, the grammaticality of the generation results is warranted. Further, with a specialized ranking model, the linguistic realizations from the general purpose generation model are further refined for our the question generation task. The evaluation results from QGSTEC2010 show promising prospects of the proposed approach

University of Illinois at Chicago: Journals@UIC

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dialogue & Discourse (E-Journal - Universität Bielefeld)

Dissertations of the University of Groningen

TAKSONOMIJA METODA AKADEMSKOG PLAGIRANJA

Author: Ana Meštrović
Tedo Vrbanec
Publication venue: 'Polytechnic of Rijeka University'
Publication date: 01/01/2021
Field of study

The article gives an overview of the plagiarism domain, with focus on academic plagiarism. The article defines plagiarism, explains the origin of the term, as well as plagiarism related terms. It identifies the extent of the plagiarism domain and then focuses on the plagiarism subdomain of text documents, for which it gives an overview of current classifications and taxonomies and then proposes a more comprehensive classification according to several criteria: their origin and purpose, technical implementation, consequence, complexity of detection and according to the number of linguistic sources. The article suggests the new classification of academic plagiarism, describes sorts and methods of plagiarism, types and categories, approaches and phases of plagiarism detection, the classification of methods and algorithms for plagiarism detection. The title of the article explicitly targets the academic community, but it is sufficiently general and interdisciplinary, so it can be useful for many other professionals like software developers, linguists and librarians.Rad daje pregled domene plagiranja tekstnih dokumenata. Opisuje porijeklo pojma plagijata, daje prikaz definicija te objašnjava plagijatu srodne pojmove. Ukazuje na širinu domene plagiranja, a za tekstne dokumenate daje pregled dosadašnjih taksonomija i predlaže sveobuhvatniju taksonomiju prema više kriterija: porijeklu i namjeni, tehničkoj provedbi plagiranja, posljedicama plagiranja, složenosti otkrivanja i (više)jezičnom porijeklu. Rad predlaže novu klasifikaciju akademskog plagiranja, prikazuje vrste i metode plagiranja, tipove i kategorije plagijata, pristupe i faze otkrivanja plagiranja. Potom opisuje klasifikaciju metoda i algoritama otkrivanja plagijata. Iako cilja na akademskog čitatelja, može biti od koristi u interdisciplinarnim područjima te razvijateljima softvera, lingvistima i knjižničarima

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

'Truth is the same old story':truth, genre and the ethics of memoir in Maggie and me

Author: Barr Damian
Publication venue: Lancaster University
Publication date: 01/01/2020
Field of study

This is a PhD by publication consisting of a published work and accompanying critical reflection. The published work is my memoir, Maggie & Me (2013), which depicts my chaotic and traumatic childhood in a post-industrial village outside Glasgow in the 1980s. The memoir juxtaposes the personal and political – ‘Maggie’ is Margaret Thatcher whose policies and persona were indelibly imposed on my community, my family and me. The critical reflection aims to contextualise Maggie & Me in the genre of memoir, to interrogate memoir as a genre and to deconstruct my process and practice in four chapters, each arranged around a single question. I draw on the work of Buckley (1974), Couser (2012) and Gornick (2002; 2009), with reference to specific memoirs, chiefly Galloway (2008; 2011), Sanghera (2009) and Winterson (2011). In Chapter 1 I ask Why not a novel? considering Maggie & Me as a bildungsroman while examining the origins, expectations and, ultimately, limitations of the coming-of-age novel. In Chapter 2 I ask Why write a memoir? while outlining expectations and tropes of the genre and reassessing my decision to write Maggie & Me as a memoir rather than as an autobiography or work of fiction or autofiction. In Chapter 3 I ask Is it all true? establishing the distinctive ethical and legal considerations involved in writing memoir, and the pact this genre forges between writer and reader. Chapter 4 concludes by investigating the nature of memoir as trauma relived and performed and considers the possibility of catharsis before finally asking Do I feel better now? – the answer to which may lie with the reader as a shared act of meaning making

Lancaster E-Prints