3 research outputs found

    Systematic Characterizations of Text Similarity in Full Text Biomedical Publications

    Get PDF
    Computational methods have been used to find duplicate biomedical publications in MEDLINE. Full text articles are becoming increasingly available, yet the similarities among them have not been systematically studied. Here, we quantitatively investigated the full text similarity of biomedical publications in PubMed Central.72,011 full text articles from PubMed Central (PMC) were parsed to generate three different datasets: full texts, sections, and paragraphs. Text similarity comparisons were performed on these datasets using the text similarity algorithm eTBLAST. We measured the frequency of similar text pairs and compared it among different datasets. We found that high abstract similarity can be used to predict high full text similarity with a specificity of 20.1% (95% CI [17.3%, 23.1%]) and sensitivity of 99.999%. Abstract similarity and full text similarity have a moderate correlation (Pearson correlation coefficient: -0.423) when the similarity ratio is above 0.4. Among pairs of articles in PMC, method sections are found to be the most repetitive (frequency of similar pairs, methods: 0.029, introduction: 0.0076, results: 0.0043). In contrast, among a set of manually verified duplicate articles, results are the most repetitive sections (frequency of similar pairs, results: 0.94, methods: 0.89, introduction: 0.82). Repetition of introduction and methods sections is more likely to be committed by the same authors (odds of a highly similar pair having at least one shared author, introduction: 2.31, methods: 1.83, results: 1.03). There is also significantly more similarity in pairs of review articles than in pairs containing one review and one nonreview paper (frequency of similar pairs: 0.0167 and 0.0023, respectively).While quantifying abstract similarity is an effective approach for finding duplicate citations, a comprehensive full text analysis is necessary to uncover all potential duplicate citations in the scientific literature and is helpful when establishing ethical guidelines for scientific publications

    Plagiarism, Cheating and Research Integrity: Case Studies from a Masters Program in Peru

    No full text
    Plagiarism is a serious, yet widespread type of research misconduct, and is often neglected in developing countries. Despite its far-reaching implications, plagiarism is poorly acknowledged and discussed in the academic setting, and insufficient evidence exists in Latin America and developing countries to inform the development of preventive strategies. In this context, we present a longitudinal case study of seven instances of plagiarism and cheating arising in four consecutive classes (2011–2014) of an Epidemiology Masters’ program in Lima, Peru, and describe the implementation and outcomes of a multifaceted, “zero-tolerance” policy aimed at introducing research integrity. Two cases involved cheating in graded assignments, and five cases correspond to plagiarism in the thesis protocol. Cases revealed poor awareness and high tolerance to plagiarism, poor academic performance, and widespread writing deficiencies, compensated with patchwriting and copy-pasting. Depending on the events’ severity, penalties included course failure (6/7) and separation from the program (3/7). Students at fault did not engage in further plagiarism. Between 2011 and 2013, the Masters’ program sequentially introduced a preventive policy consisting of: (i) intensified research integrity and scientific writing education, (ii) a stepwise, cumulative writing process; (iii) honor codes; (iv) active search for plagiarism in all academic products; and (v) a “zero tolerance” policy in response to documented cases. No cases were detected in 2014. In conclusion, plagiarism seems to be widespread in resource-limited settings and a greater response with educational and zero-tolerance components is needed to prevent it.This study was funded by the training Grant 2D43 TW007393-06 awarded to the U.S. Naval Medical Research Unit No. 6 (NAMRU-6) by the Fogarty International Center of the U.S. National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
    corecore