Search CORE

2,745 research outputs found

Deep Investigation of Cross-Language Plagiarism Detection Methods

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 24/05/2017
Field of study

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Experiments to investigate the utility of nearest neighbour metrics based on linguistically informed features for detecting textual plagiarism

Author: Almquist Per
Karlgren Jussi
Publication venue
Publication date: 01/01/2011
Field of study

Plagiarism detection is a challenge for linguistic models — most current implemented models use simple occurrence statistics for linguistic items. In this paper we report two experiments related to plagiarism detection where we use a model for distributional semantics and of sentence stylistics to compare sentence by sentence the likelihood of a text being partly plagiarised. The result of the comparison are displayed for visual inspection by a plagiarism assessor

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

DSpace at Tartu University Library

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 01/01/2017
Field of study

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Plagiarism detection using information retrieval and similarity measures based on image processing techniques

Author: Banchs Rafael E.
Codina Joan
Grivolla Jens
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2010
Field of study

This paper describes the Barcelona Media Innovation Center participation in the 2nd International Competition on Plagiarism Detection. Particularly, our system focused on the external plagiarism detection task, which assumes the source documents are available. We present a two-step a approach. In the first step of our method, we build an information retrieval system based on Solr/Lucene, segmenting both suspicious and source documents into smaller texts.We perform a search based on bag-of-words which provides a first selection of potentially plagiarized texts. In the second step, each promising pair is further investigated. We implemented a sliding window approach that computes cosine distances between overlapping text segments from both the source and suspicious documents on a pair wise basis. As a result, a similarity matrix between text segments is obtained, which is smoothed by means of low-pass 2-D filtering. From the smoothed similarity matrix, plagiarized segments are identified by using image processing techniques. Our results were placed in the middle of the official ranking, which considered together two types of plagiarism: intrinsic and external.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC