819 research outputs found

    Identifying duplicate content using statistically improbable phrases

    Get PDF
    Motivation: Document similarity metrics such as PubMed's ‘Find related articles’ feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of abstracts, which constitute only a small fraction of a publication's total text. Extending searches to include text archived by online search engines would drastically increase comparison ability. For large-scale studies, submitting short phrases encased in direct quotes to search engines for exact matches would be optimal for both individual queries and programmatic interfaces. We have derived a method of analyzing statistically improbable phrases (SIPs) for assistance in identifying duplicate content

    An IR-based Approach Utilising Query Expansion for Plagiarism Detection in MEDLINE

    Get PDF
    The identification of duplicated and plagiarised passages of text has become an increasingly active area of research. In this paper we investigate methods for plagiarism detection that aim to identify potential sources of plagiarism from MEDLINE, particularly when the original text has been modified through the replacement of words or phrases. A scalable approach based on Information Retrieval is used to perform candidate document selection - the identification of a subset of potential source documents given a suspicious text - from MEDLINE. Query expansion is performed using the ULMS Metathesaurus to deal with situations in which original documents are obfuscated. Various approaches to Word Sense Disambiguation are investigated to deal with cases where there are multiple Concept Unique Identifiers (CUIs) for a given term. Results using the proposed IR-based approach outperform a state-of-the-art baseline based on Kullback-Leibler Distance

    Data Fingerprinting with Similarity Digests

    Full text link

    Philosophy’s gender gap and argumentative arena: an empirical study

    Get PDF
    While the empirical evidence pointing to a gender gap in professional, academic philosophy in the English-speaking world is widely accepted, explanations of this gap are less so. In this paper, we aim to make a modest contribution to the literature on the gender gap in academic philosophy by taking a quantitative, corpus-based empirical approach. Since some philosophers have suggested that it may be the argumentative, “logic-chopping,” and “paradox-mongering” nature of academic philosophy that explains the underrepresentation of women in the discipline, our research questions are the following: Do men and women philosophers make different types of arguments in their published works? If so, which ones and with what frequency? Using data mining and text analysis methods, we study a large corpus of philosophical texts mined from the JSTOR database in order to answer these questions empirically. Using indicator words to classify arguments by type, we search through our corpus to find patterns of argumentation. Overall, the results of our empirical study suggest that women philosophers make deductive, inductive, and abductive arguments in their published works just as much as male philosophers do, with no statistically significant differences in the proportions of those arguments relative to each philosopher’s body of work

    A refresher in research publication ethics

    Get PDF

    Word Order in Epigraphic Gǝ’ǝz

    Get PDF
    The paper offers the results of analysis of word order throughout the epigraphic corpus of Gǝʿǝz. This evidence is mostly in agreement with the data from Classical Gǝʿǝz and confirms that early Gǝʿǝz represents the classical Semitic type of a right-branching language: objects and prepositional phrases mostly follow the verbs, and relative clauses and genitive complements usually follow the head nouns. At the same time, some differences between the syntax of Classical Gǝʿǝz and Epigraphic Gǝʿǝz have been registered, notably in the behaviour of numerals.
    corecore