10 research outputs found

    DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources

    Get PDF
    Plagiarism refers to the act of presenting external words, thoughts, or ideas as one’s own, without providing references to the sources from which they were taken. The exponential growth of different digital document sources available on the Web has facilitated the spread of this practice, making the accurate detection of it a crucial task for educational institutions. In this article, we present DOCODE 3.0, a Web system for educational institutions that performs automatic analysis of large quantities of digital documents in relation to their degree of originality. Since plagiarism is a complex problem, frequently tackled at different levels, our system applies algorithms in order to perform an information fusion process from multi data source to all these levels. These algorithms have been successfully tested in the scientific community in solving tasks like the identification of plagiarized passages and the retrieval of source candidates from the Web, among other multi data sources as digital libraries, and have proven to be very effective. We integrate these algorithms into a multi-tier, robust and scalable JEE architecture, allowing many different types of clients with different requirements to consume our services. For users, DOCODE produces a number of visualizations and reports from the different outputs to let teachers and professors gain insights on the originality of the documents they review, allowing them to discover, understand and handle possible plagiarism cases and making it easier and much faster to analyze a vast number of documents. Our experience here is so far focused on the Chilean situation and the Spanish language, offering solutions to Chilean educational institutions in any of their preferred Virtual Learning Environments. However, DOCODE can easily be adapted to increase language coverage

    Detecting Machine-obfuscated Plagiarism

    Full text link
    Related dataset is at https://doi.org/10.7302/bewj-qx93 and also listed in the dc.relation field of the full item record.Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/152346/1/Foltynek2020_Paraphrase_Detection.pdfDescription of Foltynek2020_Paraphrase_Detection.pdf : Foltynek2020_Paraphrase_Detectio

    Enhancing computer-aided plagiarism detection

    Get PDF

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci

    Integrating State-of-the-art NLP Tools into Existing Methods to Address Current Challenges in Plagiarism Detection

    Get PDF
    Paraphrase plagiarism occurs when text is deliberately obfuscated to evade detection, deliberate alteration increases the complexity of plagiarism and the difficulty in detecting paraphrase plagiarism. In paraphrase plagiarism, copied texts often contain little or no matching words, and conventional plagiarism detectors, most of which are designed to detect matching stings are ineffective under such condition. The problem of plagiarism detection has been widely researched in recent years with significant progress made particularly in the platform of Pan@Clef competition on plagiarism detection. However further research is required specifically in the area of paraphrase and translation (obfuscation) plagiarism detection as studies show that the state-of-the-art is unsatisfactory. A rational solution to the problem is to apply models that detect plagiarism using semantic features in texts, rather than matching strings. Deep contextualised learning models (DCLMs) have the ability to learn deep textual features that can be used to compare text for semantic similarity. They have been remarkably effective in many natural language processing (NLP) tasks, but have not yet been tested in paraphrase plagiarism detection. The second problem facing conventional plagiarism detection is translation plagiarism, which occurs when copied text is translated to a different language and sometimes paraphrased and used without acknowledging the original sources. The most common method used for detecting cross-lingual plagiarism (CLP) require internet translation services, which is limiting to the detection process in many ways. A rational solution to the problem is to use detection models that do not utilise internet translation services. In this thesis we addressed these ongoing challenges facing conventional plagiarism detection by applying some of the most advanced methods in NLP, which includes contextualised and non-contextualised deep learning models. To address the problem of paraphrased plagiarism, we proposed a novel paraphrase plagiarism detector that integrates deep contextualised learning (DCL) into a generic plagiarism detection framework. Evaluation results revealed that our proposed paraphrase detector outperformed a state-of-art model, and a number of standard baselines in the task of paraphrase plagiarism detection. With respect to CLP detection, we propose a novel multilingual translation model (MTM) based on the Word2Vec (word embedding) model that can effectively translate text across a number of languages, it is independent of the internet and performs comparably, and in many cases better than a common cross-lingual plagiarism detection model that rely on online machine translator. The MTM does not require parallel or comparable corpora, it is therefore designed to resolve the problem of CLPD in low resource languages. The solutions provided in this research advance the state-of-the-art and contribute to the existing body of knowledge in plagiarism detection, and would also have a positive impact on academic integrity that has been under threat for a while by plagiarism

    Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm

    Get PDF
    Abstract— Online transportation has become a basic requirement of the general public in support of all activities to go to work, school or vacation to the sights. Public transportation services compete to provide the best service so that consumers feel comfortable using the services offered, so that all activities are noticed, one of them is the search for the shortest route in picking the buyer or delivering to the destination. Node Combination method can minimize memory usage and this methode is more optimal when compared to A* and Ant Colony in the shortest route search like Dijkstra algorithm, but can’t store the history node that has been passed. Therefore, using node combination algorithm is very good in searching the shortest distance is not the shortest route. This paper is structured to modify the node combination algorithm to solve the problem of finding the shortest route at the dynamic location obtained from the transport fleet by displaying the nodes that have the shortest distance and will be implemented in the geographic information system in the form of map to facilitate the use of the system. Keywords— Shortest Path, Algorithm Dijkstra, Node Combination, Dynamic Location (key words

    Reconstructing Data Provenance from Log Files

    Get PDF
    Data provenance describes the derivation history of data, capturing details such as the entities involved and the relationships between entities. Knowledge of data provenance can be used to address issues, such as data quality assurance, data audit and system security. However, current computer systems are usually not equipped with means to acquire data provenance. Modifying underlying systems or introducing new monitoring software for provenance logging may be too invasive for production systems. As a result, data provenance may not always be available. This thesis investigates the completeness and correctness of data provenance reconstructed from log files with respect to the actual derivation history. To accomplish this, we designed and tested a solution that first extracts and models information from log files into provenance relations then reconstructs the data provenance from those relations. The reconstructed output is then evaluated against the ground truth provenance. The thesis also details the methodology used for constructing a dataset for provenance reconstruction research. Experimental results revealed data provenance that completely captures the ground truth can be reconstructed from system-layer log files. However, the outputs are susceptible to errors generated during event logging and errors induced by program dependencies. Results also show that usage of log files of different granularities collected from the system can help resolve logging errors described. Experiments with removing suspected program dependencies using approaches such as blacklisting and clustering have shown that the number of errors can be reduced by a factor of one hundred. Conclusions drawn from this research contribute towards the work on using reconstruction as an alternative approach for acquiring data provenance from computer systems

    Enhancing maritime defence and security through persistently autonomous operations and situation awareness systems

    Get PDF
    This thesis is concerned with autonomous operations with Autonomous Underwater Vehicles(AUVs) and maritime situation awareness in the context of enhancing maritime defence and security. The problem of autonomous operations with AUVs is one of persistence. That is, AUVs get stuck due to a lack of cognitive ability to deal with a situation and require intervention from a human operator. This thesis focuses on addressing vehicle subsystem failures and changes in high level mission priorities in a manner that preserves autonomy during Mine Counter measures (MCM) operations in unknown environments. This is not a trivial task. The approach followed utilizes ontologies for representing knowledge about the operational environment, the vehicle as well as mission planning and execution. Reasoning about the vehicle capabilities and consequently the actions it can execute is continuous and occurs in real time. Vehicle component faults are incorporated into the reasoning process as a means of driving adaptive planning and execution. Adaptive planning is based on a Planning Domain Definition Language (PDDL) planner. Adaptive execution is prioritized over adaptive planning as mission planning can be very demanding in terms of computational resources. Changes in high level mission priorities are also addressed as part of the adaptive planning behaviour of the system. The main contribution of this thesis regarding persistently autonomous operations is an ontological framework that drives an adaptive behaviour for increasing persistent autonomy of AUVs in unexpected situations. That is, when vehicle component faults threaten to put the mission at risk and changes in high level mission priorities should be incorporated as part of decision making. Building maritime situation awareness for maritime security is a very difficult task. High volumes of information gathered from various sources as well as their efficient fusion taking into consideration any contradictions and the requirement for reliable decision making and (re)action under potentially multiple interpretations of a situation are the most prominent challenges. To address those challenges and help alleviate the burden from humans which usually undertake such tasks, this thesis is concerned with maritime situation awareness built with Markov Logic Networks(MLNs) that support humans in their decision making. However, commonly maritime situation awareness systems rely on human experts to transfer their knowledge into the system before it can be deployed. In that respect, a promising alternative for training MLNs with data is presented. In addition, an in depth evaluation of their performance is provided during which the significance of interpreting an unfolding situation in context is demonstrated. To the best of the author’s knowledge, it is the first time that MLNs are trained with data and evaluated using cross validation in the context of building maritime situation awareness for maritime security

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f
    corecore