11 research outputs found

    Software similarity and classification

    Full text link
    This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

    DCU@FIRE-2014: an information retrieval approach for source code plagiarism detection

    Get PDF
    This paper investigates an information retrieval (IR) based approach for source code plagiarism detection. The method of extensively checking pairwise similarities between documents is not scalable for large collections of source code documents. To make the task of source code plagiarism detection fast and scalable in practice, we propose an IR based approach in which each document is treated as a pseudo-query in order to retrieve a list of potential candidate documents in a decreasing order of their similarity values. A threshold is then applied on the relative similarity decrement ratios to report a set of documents as potential cases of source-code reuse. Instead of treating a source code as an unstructured text document, we explore term extraction from the annotated parse tree of a source code and also make use of field based language model for indexing and retrieval of source code documents. Results conrm that source code parsing plays a vital role in improving the plagiarism prediction accuracy

    Hunting for Pirated Software Using Metamorphic Analysis

    Get PDF
    In this paper, we consider the problem of detecting software that has been pirated and modified. We analyze a variety of detection techniques that have been previously studied in the context of malware detection. For each technique, we empirically determine the detection rate as a function of the degree of modification of the original code. We show that the code must be greatly modified before we fail to reliably distinguish it, and we show that our results offer a significant improvement over previous related work. Our approach can be applied retroactively to any existing software and hence, it is both practical and effective

    Detecting Camouflaged Applications on Mobile Application Markets

    Get PDF

    Code similarity and clone search in large-scale source code data

    Get PDF
    Software development is tremendously benefited from the Internet by having online code corpora that enable instant sharing of source code and online developer's guides and documentation. Nowadays, duplicated code (i.e., code clones) not only exists within or across software projects but also between online code repositories and websites. We call them "online code clones."' They can lead to license violations, bug propagation, and re-use of outdated code similar to classic code clones between software systems. Unfortunately, they are difficult to locate and fix since the search space in online code corpora is large and no longer confined to a local repository. This thesis presents a combined study of code similarity and online code clones. We empirically show that many code snippets on Stack Overflow are cloned from open source projects. Several of them become outdated or violate their original license and are possibly harmful to reuse. To develop a solution for finding online code clones, we study various code similarity techniques to gain insights into their strengths and weaknesses. A framework, called OCD, for evaluating code similarity and clone search tools is introduced and used to compare 34 state-of-the-art techniques on pervasively modified code and boiler-plate code. We also found that clone detection techniques can be enhanced by compilation and decompilation. Using the knowledge from the comparison of code similarity analysers, we create and evaluate Siamese, a scalable token-based clone search technique via multiple code representations. Our evaluation shows that Siamese scales to large-scale source code data of 365 million lines of code and offers high search precision and recall. Its clone search precision is comparable to seven state-of-the-art clone detection tools on the OCD framework. Finally, we demonstrate the usefulness of Siamese by applying the tool to find online code clones, automatically analyse clone licenses, and recommend tests for reuse

    The Palgrave Handbook of Digital Russia Studies

    Get PDF
    This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today
    corecore