6,021 research outputs found

    Prevention of Textual Plagiarism Application

    Get PDF
    This project is mainly about detecting and preventing plagiarism among UTP students. In order to achieve this, a system which named as Prevention of Textual Plagiarism Application will be developed which provides the capability to detect similarities between documents submitted by students. Thus, the main focus of this project is to perform a study on how to detect textual plagiarism. Word -for- word plagiarism is the most noticeable and serious form of plagiarism. This form of plagiarism can be categorized as a form of direct stealing without proper acknowledgment and consent of another's work. Rabin - Karp Algorithm which is a string searching algorithm that uses hashing to compare the strings will be integrated in this project as well. Some fact findings have also been carried out in order to perform the study on plagiarism. As a result, this project is able to compare against each of the files submitted by students in order to find for the similarities among them. The percentage of similarities will then be generated in a text format as the way to ease the lecturers for detecting plagiarism activities among student

    Implementasi Algoritma Rabin-Karp pada Pendeteksian Plagiarisme

    Get PDF
    Implementation of Rabin-Karp Algorithm in Plagiarism Detection - Plagiarism is a crime and a scourge of science. To avoid plagiarism in scientific articles, as in the case of this research, string-matching methods can be used. This study aims to implement the Rabin-Karp Algorithm in detecting plagiarism in scientific writing based on the level of text similarity. The Rabin-Karp algorithm was chosen for this research problem because previous studies revealed that the Rabin-Karp premise is to separate the hash value of the input string from the text substring. Assuming they are the same, the character check is performed one more time, and if not, moves the substring aside. The main part of this computation exhibit is successfully calculating the hash of the substring when applied. This research is quantitative. The stages of this research flow were carried out by testing the implementation of the Rabin-Karp algorithm. Based on the calculation above, the percentage of similarity between Test Sentence 1 and Test Sentence 2 is 77.96%. Referring to previous studies, the Winnowing algorithm was found to be better at detecting text similarities than the Rabin-Karp algorithm. This is shown in the results of the similarity detection test of 30 paper documents as test data with the results of the average percentage value. Rabin-Karp Algorithm 41.41% and Winnowing Algorithm 35.15%. This study shows that the Rabin-Karp Algorithm does not work optimally in detecting text similarity, so further research needs additional methods to calculate a good level of similarity to optimize the performance of the Rabin-Karp Algorithm

    Expert and Corpus-Based Evaluation of a 3-Space Model of Conceptual Blending

    Get PDF
    This paper presents the 3-space model of conceptual blending that estimates the figurative similarity between Input spaces 1 and 2 using both their analogical similarity and the interconnecting Generic Space. We describe how our Dr Inventor model is being evaluated as a model of lexically based figurative similarity. We describe distinct but related evaluation tasks focused on 1) identifying novel and quality analogies between computer graphics publications 2) evaluation of machine generated translations of text documents 3) evaluation of documents in a plagiarism corpus. Our results show that Dr Inventor is capable of generating novel comparisons between publications but also appears to be a useful tool for evaluating machine translation systems and for detecting and assessing the level of plagiarism between documents. We also outline another more recent evaluation, using a corpus of patent applications

    Determining and Characterizing the Reused Text for Plagiarism Detection

    Full text link
    An important task in plagiarism detection is determining and measuring similar text portions between a given pair of documents. One of the main difficulties of this task resides on the fact that reused text is commonly modified with the aim of covering or camouflaging the plagiarism. Another difficulty is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to tackle these problems, we propose a novel method for detecting likely portions of reused text. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of reused text. We also propose representing the identified reused text by means of a set of features that denote its degree of plagiarism, relevance and fragmentation. This new representation aims to facilitate the recognition of plagiarism by considering diverse characteristics of the reused text during the classification phase. Experimental results employing a supervised classification strategy showed that the proposed method is able to outperform traditionally used approaches. 2012 Elsevier Ltd. All rights reserved.This work was done under partial support of CONACyT project Grants: 134186, and Scholarships: 258345/224483. This work is the result of the collaboration in the framework of the WIQEI IRSES project (Grant No. 269180) within the FP 7 Marie Curie. The work of the last author was in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Sánchez-Vega, F.; Villatoro-Tello, E.; Montes-Y-Gómez, M.; Villaseñor-Pineda; Luis; Rosso, P. (2013). Determining and Characterizing the Reused Text for Plagiarism Detection. Expert Systems with Applications. 40(5):1804-1813. https://doi.org/10.1016/j.eswa.2012.09.021S1804181340

    Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

    Full text link
    Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines.Comment: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) 2019. The data and code of our study are openly available at https://purl.org/hybridP

    Plagiarism detection for Indonesian texts

    Get PDF
    As plagiarism becomes an increasing concern for Indonesian universities and research centers, the need of using automatic plagiarism checker is becoming more real. However, researches on Plagiarism Detection Systems (PDS) in Indonesian documents have not been well developed, since most of them deal with detecting duplicate or near-duplicate documents, have not addressed the problem of retrieving source documents, or show tendency to measure document similarity globally. Therefore, systems resulted from these researches are incapable of referring to exact locations of ``similar passage'' pairs. Besides, there has been no public and standard corpora available to evaluate PDS in Indonesian texts. To address the weaknesses of former researches, this thesis develops a plagiarism detection system which executes various methods of plagiarism detection stages in a workflow system. In retrieval stage, a novel document feature coined as phraseword is introduced and executed along with word unigram and character n-grams to address the problem of retrieving source documents, whose contents are copied partially or obfuscated in a suspicious document. The detection stage, which exploits a two-step paragraph-based comparison, is aimed to address the problems of detecting and locating source-obfuscated passage pairs. The seeds for matching source-obfuscated passage pairs are based on locally-weighted significant terms to capture paraphrased and summarized passages. In addition to this system, an evaluation corpus was created through simulation by human writers, and by algorithmic random generation. Using this corpus, the performance evaluation of the proposed methods was performed in three scenarios. On the first scenario which evaluated source retrieval performance, some methods using phraseword and token features were able to achieve the optimum recall rate 1. On the second scenario which evaluated detection performance, our system was compared to Alvi's algorithm and evaluated in 4 levels of measures: character, passage, document, and cases. The experiment results showed that methods resulted from using token as seeds have higher scores than Alvi's algorithm in all 4 levels of measures both in artificial and simulated plagiarism cases. In case detection, our systems outperform Alvi's algorithm in recognizing copied, shaked, and paraphrased passages. However, Alvi's recognition rate on summarized passage is insignificantly higher than our system. The same tendency of experiment results were demonstrated on the third experiment scenario, only the precision rates of Alvi's algorithm in character and paragraph levels are higher than our system. The higher Plagdet scores produced by some methods in our system than Alvi's scores show that this study has fulfilled its objective in implementing a competitive state-of-the-art algorithm for detecting plagiarism in Indonesian texts. Being run at our test document corpus, Alvi's highest scores of recall, precision, Plagdet, and detection rate on no-plagiarism cases correspond to its scores when it was tested on PAN'14 corpus. Thus, this study has contributed in creating a standard evaluation corpus for assessing PDS for Indonesian documents. Besides, this study contributes in a source retrieval algorithm which introduces phrasewords as document features, and a paragraph-based text alignment algorithm which relies on two different strategies. One of them is to apply local-word weighting used in text summarization field to select seeds for both discriminating paragraph pair candidates and matching process. The proposed detection algorithm results in almost no multiple detection. This contributes to the strength of this algorithm

    Student’s Plagiarisms in Higher Learning Institutions in the Era of Improved Internet Access: Case Study of Developing Countries

    Get PDF
    This study investigated students’ plagiarism practices in Tanzania higher learning institutions by involving two universities-one public and one private university as a case study. The universities involved have honour code and policies for plagiarism detection however they do not employ software for checking students’ plagiarism. The study employed qualitative research approach within the interpretive paradigm. The participants for the case study were purposively selected. Data were collected using focus group discussions and documents analysis (assignments, dissertations and proposal suspected for plagiarism). The findings indicated that plagiarism is a critical problem for the students in sampled universities as assignment submitted during the course of study contains a substantial text that was copied from other sources without acknowledging the original authors. Moreover, study findings also shows that most students had understanding that plagiarism is the academic dishonest, however, this has not stopped them plagiarizing. Factors such as the access of internet, shortage of books, student’s laziness and poor academic writing skills played a key role in students’ plagiarism at the two universities. Based on these results, the study recommends universities to have adequate resources in particular software for detecting plagiarism. In addition, lecturers/instructors to play their role effectively in educating students about the effects of plagiarism in academic works which to some extent will minimize the problem of direct copying and pasting other peoples’ works without acknowledgment. Keywords: plagiarism, plagiarism software, information, materials, challenge

    RANCANG BANGUN APLIKASI PENDETEKSI PENJIPLAKAN DOKUMEN MENGGUNAKAN ALGORITMA BIWORD WINNOWING

    Get PDF
    Plagiarism is taking the essays or opinions of others without acknowledgment of the source text, and make it as their own essays . Nowadays, we have many algorithms that discusses how to detect plagiarism text documents, such as Rabin-Karp algorithm, Winnowing, and edit distance. In this research, will developed of Winnowing algorithm in detecting plagiarism. Winnowing algorithm is an algorithm that uses the approach of k-grams in shaping the document fingerprint. Fingerprints have formed a character-based techniques. This research tries to use a different fingerprint techniques, i.e Phrase-based techniques. Phrase-based techniques will split a text document into tokens biword. Tokens are encrypted to MD5, that token has the same hash value and can be used as long as fingerprinting text documents. By applying the approach biword Winnowing algorithm, this algorithm can check each document phrases and then stored in an array. So that to display text that has same values, the algorithm can show the array value in the form of token biword as a fingerprint of a documen
    • …
    corecore