64 research outputs found

    Exploiting JavaScript Birthmarking Techniques for Code Theft Detection

    Get PDF
    Este relatório visa a análise de técnicas de birthmarking para detetar o roubo de código. Um birthmark de um software, como o próprio nome indica, é um conjunto de características únicas que permitem identificar esse mesmo software. Para a deteção do roubo de código são extraídos birthmarks de dois programas, o original e um suspeito, e são comparados um com o outro, permitindo assim detetar o roubo caso sejam muito semelhantes ou iguais. Como hoje em dia a internet é cada vez mais utilizada e o código JavaScript é usado para grande parte das aplicações web, o roubo de código nesta área é um grande problema da atualidade. Tendo isto em conta, a solução final tem como objetivo esta mesma linguagem.São analisadas, cronologicamente e tendo em conta a relevância para o tema, algumas das técnicas de birthmarking existentes. As técnicas são analisadas individualmente e no final é feito um resumo e comparação de todas as técnicas. Como a maior parte das técnicas existentes não foram pensadas para JavaScript, a sua aplicabilidade à linguagem é também analisada e são tiradas conclusões acerca de bons candidatos à solução final. O objetivo final é construir uma ferramenta que, usando uma técnica de birthmarking, determine se dois programas JavaScript foram copiados, de modo a suportar alegações de roubo de código.The purpose of this dissertation is to analyse birthmarking techniques in order to detect code theft. A birthmark of a software is, as the name suggests, the set of unique characteristics that allow to identify that software. In order to detect the theft, the birthmarks of two programs are extracted, the suspect and the original, and compared to each other to check if they are too similar or identical. Nowadays the web applications are growing and JavaScript code is the most used in this field, therefore the theft in this area is a current problem. Because of that, the theft detection of programs developed in that language is the focus of this dissertation.Some techniques of birthmarking are analysed in chronological order and accordingly to the relevance for the theme. Each technique is analysed individually and in the end a comparison between them is made. Given that most of the techniques were not created for JavaScript, their applicability to the language is analysed. With those analysis, some conclusions about the best candidates to the final solution are drawn.The final goal is to develop a tool, that uses a birthmarking technique, to determine if two JavaScript programs were copied, in order to support code theft allegations

    EXTRACTION AND PREDICTION OF SYSTEM PROPERTIES USING VARIABLE-N-GRAM MODELING AND COMPRESSIVE HASHING

    Get PDF
    In modern computer systems, memory accesses and power management are the two major performance limiting factors. Accesses to main memory are very slow when compared to operations within a processor chip. Hardware write buffers, caches, out-of-order execution, and prefetch logic, are commonly used to reduce the time spent waiting for main memory accesses. Compiler loop interchange and data layout transformations also can help. Unfortunately, large data structures often have access patterns for which none of the standard approaches are useful. Using smaller data structures can significantly improve performance by allowing the data to reside in higher levels of the memory hierarchy. This dissertation proposes using lossy data compression technology called ’Compressive Hashing’ to create “surrogates”, that can augment original large data structures to yield faster typical data access. One way to optimize system performance for power consumption is to provide a predictive control of system-level energy use. This dissertation creates a novel instruction-level cost model called the variable-n-gram model, which is closely related to N-Gram analysis commonly used in computational linguistics. This model does not require direct knowledge of complex architectural details, and is capable of determining performance relationships between instructions from an execution trace. Experimental measurements are used to derive a context-sensitive model for performance of each type of instruction in the context of an N-instruction sequence. Dynamic runtime power prediction mechanisms often suffer from high overhead costs. To reduce the overhead, this dissertation encodes the static instruction-level predictions into a data structure and uses compressive hashing to provide on-demand runtime access to those predictions. Genetic programming is used to evolve compressive hash functions and performance analysis of applications shows that, runtime access overhead can be reduced by a factor of ~3x-9x

    Dark Web Data Classification Using Neural Network

    Get PDF
    There are several issues associated with Dark Web Structural Patterns mining (including many redundant and irrelevant information), which increases the numerous types of cybercrime like illegal trade, forums, terrorist activity, and illegal online shopping. Understanding online criminal behavior is challenging because the data is available in a vast amount. To require an approach for learning the criminal behavior to check the recent request for improving the labeled data as a user profiling, Dark Web Structural Patterns mining in the case of multidimensional data sets gives uncertain results. Uncertain classification results cause a problem of not being able to predict user behavior. Since data of multidimensional nature has feature mixes, it has an adverse influence on classification. The data associated with Dark Web inundation has restricted us from giving the appropriate solution according to the need. In the research design, a Fusion NN (Neural network)-S3VM for Criminal Network activity prediction model is proposed based on the neural network; NN- S3VM can improve the prediction

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

    Towards Least Privilege Principle: Limiting Unintended Accesses in Software Systems.

    Full text link
    Adhering to the least privilege principle involves ensuring that only legitimate subjects have access rights to objects. Sometimes, this is hard because of permission irrevocability, changing security requirements, infeasibility of access control mechanisms, and permission creeps. If subjects turn rogue, the accesses can be abused. This thesis examines three scenarios where accesses are commonly abused and lead to security issues, and proposes three systems, SEAL, DeGap, and Expose to detect and, where practical, eliminate unintended accesses. Firstly, we examine abuse of email addresses, whose leakages are irreversible. Also, users can only hope that businesses requiring their email addresses for validating affiliations do not misuse them. SEAL uses semi-private aliases, which permits gradual and selective controls while providing privacy for affiliation validations. Secondly, access control mechanisms may be ineffective as subject roles change and administrative oversights lead to permission gaps, which should be removed expeditiously. Identifying permission gaps can be hard since another reference point besides granted permissions is often unavailable. DeGap uses access logs to estimate the gaps while using a common logic for various system services. DeGap also recommends configuration changes towards reducing the gaps. Lastly, unintended software code re-use can lead to intellectual property theft and license violations. Determining whether an application uses a library can be difficult. Compiler optimizations, function inlining, and lack of symbols make using syntactic methods a challenge, while pure semantic analysis is slow. Given a library and a set of applications, Expose combines syntactic and semantic analysis to efficiently help identify applications that re-use the library.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99976/1/bengheng_1.pd

    Malware variant detection

    Get PDF
    Malware programs (e.g., viruses, worms, Trojans, etc.) are a worldwide epidemic. Studies and statistics show that the impact of malware is getting worse. Malware detectors are the primary tools in the defence against malware. Most commercial anti-malware scanners maintain a database of malware patterns and heuristic signatures for detecting malicious programs within a computer system. Malware writers use semantic-preserving code transformation (obfuscation) techniques to produce new stealth variants of their malware programs. Malware variants are hard to detect with today's detection technologies as these tools rely mostly on syntactic properties and ignore the semantics of malicious executable programs. A robust malware detection technique is required to handle this emerging security threat. In this thesis, we propose a new methodology that overcomes the drawback of existing malware detection methods by analysing the semantics of known malicious code. The methodology consists of three major analysis techniques: the development of a semantic signature, slicing analysis and test data generation analysis. The core element in this approach is to specify an approximation for malware code semantics and to produce signatures for identifying, possibly obfuscated but semantically equivalent, variants of a sample of malware. A semantic signature consists of a program test input and semantic traces of a known malware code. The key challenge in developing our semantics-based approach to malware variant detection is to achieve a balance between improving the detection rate (i.e. matching semantic traces) and performance, with or without the e ects of obfuscation on malware variants. We develop slicing analysis to improve the construction of semantic signatures. We back our trace-slicing method with a theoretical result that shows the notion of correctness of the slicer. A proof-of-concept implementation of our malware detector demonstrates that the semantics-based analysis approach could improve current detection tools and make the task more di cult for malware authors. Another important part of this thesis is exploring program semantics for the selection of a suitable part of the semantic signature, for which we provide two new theoretical results. In particular, this dissertation includes a test data generation method that works for binary executables and the notion of correctness of the method

    Diversification and obfuscation techniques for software security: A systematic literature review

    Get PDF
    Context: Diversification and obfuscation are promising techniques for securing software and protecting computers from harmful malware. The goal of these techniques is not removing the security holes, but making it difficult for the attacker to exploit security vulnerabilities and perform successful attacks.Objective: There is an increasing body of research on the use of diversification and obfuscation techniques for improving software security; however, the overall view is scattered and the terminology is unstructured. Therefore, a coherent review gives a clear statement of state-of-the-art, normalizes the ongoing discussion and provides baselines for future research.Method: In this paper, systematic literature review is used as the method of the study to select the studies that discuss diversification/obfuscation techniques for improving software security. We present the process of data collection, analysis of data, and report the results.Results: As the result of the systematic search, we collected 357 articles relevant to the topic of our interest, published between the years 1993 and 2017. We studied the collected articles, analyzed the extracted data from them, presented classification of the data, and enlightened the research gaps.Conclusion: The two techniques have been extensively used for various security purposes and impeding various types of security attacks. There exist many different techniques to obfuscate/diversify programs, each of which targets different parts of the programs and is applied at different phases of software development life-cycle. Moreover, we pinpoint the research gaps in this field, for instance that there are still various execution environments that could benefit from these two techniques, including cloud computing, Internet of Things (IoT), and trusted computing. We also present some potential ideas on applying the techniques on the discussed environments.</p

    Forensic identification and detection of hidden and obfuscated malware

    Get PDF
    The revolution in online criminal activities and malicious software (malware) has posed a serious challenge in malware forensics. Malicious attacks have become more organized and purposefully directed. With cybercrimes escalating to great heights in quantity as well as in sophistication and stealth, the main challenge is to detect hidden and obfuscated malware. Malware authors use a variety of obfuscation methods and specialized stealth techniques of information hiding to embed malicious code, to infect systems and to thwart any attempt to detect them, specifically with the use of commercially available anti-malware engines. This has led to the situation of zero-day attacks, where malware inflict systems even with existing security measures. The aim of this thesis is to address this situation by proposing a variety of novel digital forensic and data mining techniques to automatically detect hidden and obfuscated malware. Anti-malware engines use signature matching to detect malware where signatures are generated by human experts by disassembling the file and selecting pieces of unique code. Such signature based detection works effectively with known malware but performs poorly with hidden or unknown malware. Code obfuscation techniques, such as packers, polymorphism and metamorphism, are able to fool current detection techniques by modifying the parent code to produce offspring copies resulting in malware that has the same functionality, but with a different structure. These evasion techniques exploit the drawbacks of traditional malware detection methods, which take current malware structure and create a signature for detecting this malware in the future. However, obfuscation techniques aim to reduce vulnerability to any kind of static analysis to the determent of any reverse engineering process. Furthermore, malware can be hidden in file system slack space, inherent in NTFS file system based partitions, resulting in malware detection that even more difficult.Doctor of Philosoph
    corecore