Search CORE

36 research outputs found

Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

Author: Gipp Bela
Meuschke Norman
Ruas Terry
Wahle Jan Philip
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/03/2021
Field of study

The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available

arXiv.org e-Print Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Incentive Mechanisms in Peer-to-Peer Networks — A Systematic Literature Review

Author: Gipp Bela
Ihle Cornelius
Meuschke Norman
Schubotz Moritz
Trautwein Dennis
Publication venue: New York, NY : Association for Computing Machinery
Publication date: 01/01/2023
Field of study

Centralized networks inevitably exhibit single points of failure that malicious actors regularly target. Decentralized networks are more resilient if numerous participants contribute to the network’s functionality. Most decentralized networks employ incentive mechanisms to coordinate the participation and cooperation of peers and thereby ensure the functionality and security of the network. This article systematically reviews incentive mechanisms for decentralized networks and networked systems by covering 165 prior literature reviews and 178 primary research papers published between 1993 and October 2022. Of the considered sources, we analyze 11 literature reviews and 105 primary research papers in detail by categorizing and comparing the distinctive properties of the presented incentive mechanisms. The reviewed incentive mechanisms establish fairness and reward participation and cooperative behavior. We review work that substitutes central authority through independent and subjective mechanisms run in isolation at each participating peer and work that applies multiparty computation. We use monetary, reputation, and service rewards as categories to differentiate the implementations and evaluate each incentive mechanism’s data management, attack resistance, and contribution model. Further, we highlight research gaps and deficiencies in reproducibility and comparability. Finally, we summarize our assessments and provide recommendations to apply incentive mechanisms to decentralized networks that share computational resources

Repositorium für Naturwissenschaften und Technik

Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

Author: Gipp Bela
Karmer Michael
Meuschke Norman
Schubotz Moritz
Stange Vincent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/06/2019
Field of study

Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines.Comment: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) 2019. The data and code of our study are openly available at https://purl.org/hybridP

arXiv.org e-Print Archive

Crossref

CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central

Author: Gipp Bela
Lipinsk Mario
Meuschke Norman
Publication venue: 'iSchools'
Publication date: 01/01/2015
Field of study

Citation-based similarity measures such as Bibliographic Coupling and Co-Citation are an integral component of many information retrieval systems. However, comparisons of the strengths and weaknesses of measures are challenging due to the lack of suitable test collections. This paper presents CITREC, an open evaluation framework for citation-based and text-based similarity measures. CITREC prepares the data from the PubMed Central Open Access Subset and the TREC Genomics collection for a citation-based analysis and provides tools necessary for performing evaluations of similarity measures. To account for different evaluation purposes, CITREC implements 35 citation-based and text-based similarity measures, and features two gold standards. The first gold standard uses the Medical Subject Headings (MeSH) thesaurus and the second uses the expert relevance feedback that is part of the TREC Genomics collection to gauge similarity. CITREC additionally offers a system that allows creating user defined gold standards to adapt the evaluation framework to individual information needs and evaluation purposes.ye

KOPS - The Institutional Repository of the University of Konstanz

Illinois Digital Environment for Access to Learning and Scholarship Repository

Identifying Machine-Paraphrased Plagiarism

Author: Foltýnek Tomáš
Gipp Bela
Meuschke Norman
Ruas Terry
Wahle Jan Philip
Publication venue
Publication date: 29/11/2021
Field of study

Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers, graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99% (F1=99.68% for SpinBot and F1=71.64% for SpinnerChief cases), while human evaluators achieved F1=78.4% for SpinBot and F1=65.6% for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan. To facilitate future research, all data, code, and two web applications showcasing our contributions are openly available

arXiv.org e-Print Archive

Trusted Timestamping using the Crypto Currency Bitcoin

Author: Gernandt André
Gipp Bela
Meuschke Norman
Publication venue: 'iSchools'
Publication date: 01/01/2015
Field of study

Trusted timestamping is a process for proving that certain information existed at a given point in time. This paper presents a trusted timestamping concept and its implementation in form of a web-based service that uses the decentralized Bitcoin block chain to store anonymous, tamper-proof timestamps for digital content. The service allows users to hash files, such as text, photos or videos, and store the created hashes in the Bitcoin block chain. Users can then retrieve and verify the timestamps that have been committed to the block chain. The non-commercial service enables anyone, e.g., researchers, authors, journalists, students, or artists, to prove that they were in possession of certain information at a given point in time. Common use cases include proving that a contract has been signed, a photo taken, a video recorded, or a task completed prior to a certain date. All procedures maintain complete privacy of the user’s data.ye

KOPS - The Institutional Repository of the University of Konstanz

Illinois Digital Environment for Access to Learning and Scholarship Repository