2 research outputs found

    A Deep Learning Approach to Persian Plagiarism Detection

    Get PDF
    ABSTRACT Plagiarism detection is defined as automatic identification of reused text materials. General availability of the internet and easy access to textual information enhances the need for automated plagiarism detection. In this regard, different algorithms have been proposed to perform the task of plagiarism detection in text documents. Due to drawbacks and inefficiency of traditional methods and lack of proper algorithms for Persian plagiarism detection, in this paper, we propose a deep learning based method to detect plagiarism. In the proposed method, words are represented as multi-dimensional vectors, and simple aggregation methods are used to combine the word vectors for sentence representation. By comparing representations of source and suspicious sentences, pair sentences with the highest similarity are considered as the candidates for plagiarism. The decision on being plagiarism is performed using a two level evaluation method. Our method has been used in PAN2016 Persian plagiarism detection contest and results in %90.6 plagdet, %85.8 recall, and % 95.9 precision on the provided data sets. CCS Concepts • Information systems → Near-duplicate and plagiarism detection • Information systems → Evaluation of retrieval results

    G.: Dynamically Adjustable Approach through Obfuscation Type Recognition—Notebook for PAN at CLEF 2015. In: [8

    No full text
    Abstract. The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2015. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme and cosine and dice similarity measures. We used and extended our previous algorithm for clustering and introduced a new verbatim detection method and extended the decision making regarding which approach or output to use. We improve significantly the performance regarding our previous PAN 2014 approach and hence, our approach outperforms the best-performing system of the PAN 2014. Our system is available open source
    corecore