43 research outputs found

    JSBiRTH: Dynamic javascript birthmark based on the run-time heap

    Get PDF
    JavaScript is currently the dominating client-side scripting language in the web community. However, the source code of JavaScript can be easily copied through a browser. The intellectual property right of the developers lacks protection. In this paper, we consider using dynamic software birthmark for JavaScript. Instead of using control flow trace (which can be corrupted by code obfuscation) and API (which may not work if the software does not have many API calls), we exploit the run-time heap, which reflects substantially the dynamic behavior of a program, to extract birthmarks. We introduce JSBiRTH, a novel software birthmark system for JavaScript based on the comparison of run-time heaps. We evaluated our system using 20 JavaScript programs with most of them being large-scale. Our system gave no false positive or false negative. Moreover, it is robust against code obfuscation attack. We also show that our system is effective in detecting partial code theft. © 2011 IEEE.published_or_final_versionThe 35th IEEE Annual Computer Software and Applications Conference (COMPSAC 2011), Munich, Germany, 18-22 July 2011. In Proceedings of 35th COMPSAC, 2011, p. 407-41

    Heap graph based software theft detection

    Get PDF
    published_or_final_versio

    Exploiting JavaScript Birthmarking Techniques for Code Theft Detection

    Get PDF
    Este relatório visa a análise de técnicas de birthmarking para detetar o roubo de código. Um birthmark de um software, como o próprio nome indica, é um conjunto de características únicas que permitem identificar esse mesmo software. Para a deteção do roubo de código são extraídos birthmarks de dois programas, o original e um suspeito, e são comparados um com o outro, permitindo assim detetar o roubo caso sejam muito semelhantes ou iguais. Como hoje em dia a internet é cada vez mais utilizada e o código JavaScript é usado para grande parte das aplicações web, o roubo de código nesta área é um grande problema da atualidade. Tendo isto em conta, a solução final tem como objetivo esta mesma linguagem.São analisadas, cronologicamente e tendo em conta a relevância para o tema, algumas das técnicas de birthmarking existentes. As técnicas são analisadas individualmente e no final é feito um resumo e comparação de todas as técnicas. Como a maior parte das técnicas existentes não foram pensadas para JavaScript, a sua aplicabilidade à linguagem é também analisada e são tiradas conclusões acerca de bons candidatos à solução final. O objetivo final é construir uma ferramenta que, usando uma técnica de birthmarking, determine se dois programas JavaScript foram copiados, de modo a suportar alegações de roubo de código.The purpose of this dissertation is to analyse birthmarking techniques in order to detect code theft. A birthmark of a software is, as the name suggests, the set of unique characteristics that allow to identify that software. In order to detect the theft, the birthmarks of two programs are extracted, the suspect and the original, and compared to each other to check if they are too similar or identical. Nowadays the web applications are growing and JavaScript code is the most used in this field, therefore the theft in this area is a current problem. Because of that, the theft detection of programs developed in that language is the focus of this dissertation.Some techniques of birthmarking are analysed in chronological order and accordingly to the relevance for the theme. Each technique is analysed individually and in the end a comparison between them is made. Given that most of the techniques were not created for JavaScript, their applicability to the language is analysed. With those analysis, some conclusions about the best candidates to the final solution are drawn.The final goal is to develop a tool, that uses a birthmarking technique, to determine if two JavaScript programs were copied, in order to support code theft allegations

    Graphs Resemblance based Software Birthmarks through Data Mining for Piracy Control

    Get PDF
    The emergence of software artifacts greatly emphasizes the need for protecting intellectual property rights (IPR) hampered by software piracy requiring effective measures for software piracy control. Software birthmarking targets to counter ownership theft of software by identifying similarity of their origins. A novice birthmarking approach has been proposed in this paper that is based on hybrid of text-mining and graph-mining techniques. The code elements of a program and their relations with other elements have been identified through their properties (i.e code constructs) and transformed into Graph Manipulation Language (GML). The software birthmarks generated by exploiting the graph theoretic properties (through clustering coefficient) are used for the classifications of similarity or dissimilarity of two programs. The proposed technique has been evaluated over metrics of credibility, resilience, method theft, modified code detection and self-copy detection for programs asserting the effectiveness of proposed approach against software ownership theft. The comparative analysis of proposed approach with contemporary ones shows better results for having properties and relations of program nodes and for employing dynamic techniques of graph mining without adding any overhead (such as increased program size and processing cost)

    Software similarity and classification

    Full text link
    This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

    Structural analysis of source code plagiarism using graphs

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg in fulfillment of the requirements for the degree of Master of Science. May 2017Plagiarism is a serious problem in academia. It is prevalent in the computing discipline where students are expected to submit source code assignments as part of their assessment; hence, there is every likelihood of copying. Ideally, students can collaborate with each other to perform a programming task, but it is expected that each student submit his/her own solution for the programming task. More so, one might conclude that the interaction would make them learn programming. Unfortunately, that may not always be the case. In undergraduate courses, especially in the computer sciences, if a given class is large, it would be unfeasible for an instructor to manually check each and every assignment for probable plagiarism. Even if the class size were smaller, it is still impractical to inspect every assignment for likely plagiarism because some potentially plagiarised content could still be missed by humans. Therefore, automatically checking the source code programs for likely plagiarism is essential. There have been many proposed methods that attempt to detect source code plagiarism in undergraduate source code assignments but, an ideal system should be able to differentiate actual cases of plagiarism from coincidental similarities that usually occur in source code plagiarism. Some of the existing source code plagiarism detection systems are either not scalable, or performed better when programs are modified with a number of insertions and deletions to obfuscate plagiarism. To address this issue, a graph-based model which considers structural similarities of programs is introduced to address cases of plagiarism in programming assignments. This research study proposes an approach to measuring cases of similarities in programming assignments using an existing plagiarism detection system to find similarities in programs, and a graph-based model to annotate the programs. We describe experiments with data sets of undergraduate Java programs to inspect the programs for plagiarism and evaluate the graph-model with good precision. An evaluation of the graph-based model reveals a high rate of plagiarism in the programs and resilience to many obfuscation techniques, while false detection (coincident similarity) rarely occurred. If this detection method is adopted into use, it will aid an instructor to carry out the detection process conscientiously.MT 201

    Heaps don't lie : countering unsoundness with heap snapshots

    Get PDF
    Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).peer-reviewe

    SEMEO: A SEMANTIC EQUIVALENCE ANALYSIS FRAMEWORK FOR OBFUSCATED ANDROID APPLICATIONS

    Get PDF
    Software repackaging is a common approach for creating malware. In this approach, malware authors inject malicious payloads into legitimate applications; then, to ren- der security analysis more difficult, they obfuscate most or all of the code. This forces analysts to spend a large amount of effort filtering out benign obfuscated methods in order to locate potentially malicious methods for further analysis. If an effective mechanism for filtering out benign obfuscated methods were available, the number of methods that must be analyzed could be reduced, allowing analysts to be more productive. In this thesis, we introduce SEMEO, a highly effective and efficient fil- tering approach that can determine whether an obfuscated and an original version of a method are semantically equivalent. Our approach handles seven common, com- plex types of obfuscation and can be effective even when all types are compositely applied. In an empirical evaluation, we applied SEMEO to nine Android apps of varying complexity, and the approach provided over 76% recall and 100% precision in identifying semantically equivalent methods. We then performed three additional studies, that showed that: (1) SEMEO is much more effective at identifying semantically equivalent methods than FSquaDRA, an existing technique; (2) SEMEO is also effective for identifying repackaged apps that have been previously obfuscated by ProGuard, a popular obfuscation tool; and (3) SEMEO is effective at identifying semantically equivalent methods in a repackaged, malicious version of Pokemon Go
    corecore