517 research outputs found

    English-Persian Plagiarism Detection based on a Semantic Approach

    Get PDF
    Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-lingual plagiarism. In cross-lingual translation, writers meld a translation with their own words and ideas. Based on monolingual plagiarism detection methods, this paper ultimately intends to find a way to detect cross-lingual plagiarism. A framework called Multi-Lingual Plagiarism Detection (MLPD) has been presented for cross-lingual plagiarism analysis with ultimate objective of detection of plagiarism cases. English is the reference language and Persian materials are back translated using translation tools. The data for assessment of MLPD were obtained from English-Persian Mizan parallel corpus. Apache’s Solr was also applied to record the creep of the documents and their indexation. The accuracy mean of the proposed method revealed to be 98.82% when employing highly accurate translation tools which indicate the high accuracy of the proposed method. Also, Google translation service showed the accuracy mean to be 56.9%. These tests demonstrate that improved translation tools enhance the accuracy of the proposed method

    Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

    Get PDF
    With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucity of works in the field of Persian language due to lack of reliable plagiarism checkers in Persian there is a need for a method to improve the accuracy of detecting plagiarized Persian phrases. Attempt is made in the article to present the PCP solution. This solution is a combinational method that in addition to meaning and stem of words, synonyms and pluralization is dealt with by applying the document tree representation based on manner fingerprinting the text in the 3-grams words. The obtained grams are eliminated from the text, hashed through the BKDR hash function, and stored as the fingerprint of a document in fingerprints of reference documents repository, for checking suspicious documents. The PCP proposed method here is evaluated by eight experiments on seven different sets, which include suspicions document and the reference document, from the Hamshahri newspaper website. The results indicate that accuracy of this proposed method in detection of similar texts in comparison with "Winnowing" localized method has 21.15 percent is improvement average. The accuracy of the PCP method in detecting the similarity in comparison with the language-free tool reveals 31.65 percent improvement average

    An improved extrinsic monolingual plagiarism detection approach of the Bengali text

    Get PDF
    Plagiarism is an act of literature fraud, which is presenting others’ work or ideas without giving credit to the original work. All published and unpublished written documents are under the cover of this definition. Plagiarism, which increased significantly over the last few years, is a concerning issue for students, academicians, and professionals. Due to this, there are several plagiarism detection tools or software available to detect plagiarism in different languages. Unfortunately, negligible work has been done and no plagiarism detection software available in the Bengali language where Bengali is one of the most spoken languages in the world. In this paper, we have proposed a plagiarism detection tool for the Bengali language that mainly focuses on the educational and newspaper domain. We have collected 82 textbooks from the National Curriculum of Textbooks (NCTB), Bangladesh, scrapped all articles from 12 reputed newspapers and compiled our corpus with more than 10 million sentences. The proposed method on Bengali text corpus shows an accuracy rate of 97.31

    Plagiarism Detection Techniques for Arabic Script Languages: A Literature Review

    Get PDF
    Plagiarism is generally defined as literary theft and academic dishonesty. This considered as the serious issue in an academic documents and texts. There are numerous of plagiarism detection techniques have been developed for various natural languages, mainly English. In this paper we investigate and review the plagiarism detection techniques and algorithms which have been developed for Arabic Script Languages (ASL), and providing a literature review of the utilized methods in terms of techniques and outcomes.  The result of this paper will help the researchers who are going to commence their development and extend their researches in ASL like Arabic, Persian, Urdu, and Kurdish

    What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

    Get PDF
    Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective
    • …
    corecore