21 research outputs found

    Tweet Contextualization Based on Wikipedia and Dbpedia

    No full text
    National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection

    Improving Synoptic Querying for Source Retrieval

    Get PDF
    Source retrieval is a part of plagiarism discovery process, where only a selected set of candidate documents are retrieved from a large corpus of potential source documents and passed for detailed document comparison in order to highlight potential plagiarism. This paper describes used methodology and the architecture of source retrieval system developed for PAN 2015 lab on uncovering plagiarism, authorship, and social software misuse. The system is based on our previous systems used at PAN since 2012. The majority of features were adopted with some improvements described in this paper. The paper analyzes used methodology and discuss the queries performance. The paper provides explanation for many implementation settings in the source retrieval process. The source retrieval subsystem forms an integral part of a modern system for plagiarism discovery.Source retrieval is a part of plagiarism discovery process, where only a selected set of candidate documents are retrieved from a large corpus of potential source documents and passed for detailed document comparison in order to highlight potential plagiarism. This paper describes used methodology and the architecture of source retrieval system developed for PAN 2015 lab on uncovering plagiarism, authorship, and social software misuse. The system is based on our previous systems used at PAN since 2012. The majority of features were adopted with some improvements described in this paper. The paper analyzes used methodology and discuss the queries performance. The paper provides explanation for many implementation settings in the source retrieval process. The source retrieval subsystem forms an integral part of a modern system for plagiarism discovery
    corecore