37 research outputs found

    Automated Fault Localization in Large Java Applications

    Get PDF
    Modern software systems evolve steadily. Software developers change the software codebase every day to add new features, to improve the performance, or to fix bugs. Despite extensive testing and code inspection processes before releasing a new software version, the chance of introducing new bugs is still high. A code that worked yesterday may not work today, or it can show a degraded performance causing software regression. The laws of software evolution state that the complexity increases as software evolves. Such increasing complexity makes software maintenance harder and more costly. In a typical software organization, the cost of debugging, testing, and verification can easily range from 50% to 75% of the total development costs. Given that human resources are the main cost factor in the software maintenance and the software codebase evolves continuously, this dissertation tries to answer the following question: How can we help developers to localize the software defects more effectively during software development? We answer this question in three aspects. First, we propose an approach to localize failure-inducing changes for crashing bugs. Assume the source code of a buggy version, a failing test, the stack trace of the crashing site, and a previous correct version of the application. We leverage program analysis to contrast the behavior of the two software versions under the failing test. The difference set is the code statements which contribute to the failure site with a high probability. Second, we extend the version comparison technique to detect the leak-inducing defects caused by software changes. Assume two versions of a software codebase (one previous non-leaky and the current leaky version) and the existing test suite of the application. First, we compare the memory footprint of the code locations between two versions. Then, we use a confidence score to rank the suspicious code statements, i.e., those statements which can be the potential root causes of memory leaks. The higher the score, the more likely the code statement is a potential leak. Third, our observation on the related work about debugging and fault localization reveals that there is no empirical study which characterizes the properties of the leak- inducing defects and their repairs. Understanding the characteristics of the real defects caused by resource and memory leaks can help both researchers and practitioners to improve the current techniques for leak detection and repair. To fill this gap, we conduct an empirical study on 491 reported resource and memory leak defects from 15 large Java applications. We use our findings to draw implications for leak avoidance, detection, localization, and repair

    An investigation into the unsoundness of static program analysis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Static program analysis is widely used in many software applications such as in security analysis, compiler optimisation, program verification and code refactoring. In contrast to dynamic analysis, static analysis can perform a full program analysis without the need of running the program under analysis. While it provides full program coverage, one of the main issues with static analysis is imprecision -- i.e., the potential of reporting false positives due to overestimating actual program behaviours. For many years, research in static program analysis has focused on reducing such imprecision while improving scalability. However, static program analysis may also miss some critical parts of the program, resulting in program behaviours not being reported. A typical example of this is the case of dynamic language features, where certain behaviours are hard to model due to their dynamic nature. The term ``unsoundness'' has been used to describe those missed program behaviours. Compared to static analysis, dynamic analysis has the advantage of obtaining precise results, as it only captures what has been executed during run-time. However, dynamic analysis is also limited to the defined program executions. This thesis investigates the unsoundness issue in static program analysis. We first investigate causes of unsoundness in terms of Java dynamic language features and identify potential usage patterns of such features. We then report the results of a number of empirical experiments we conducted in order to identify and categorise the sources of unsoundness in state-of-the-art static analysis frameworks. Finally, we quantify and measure the level of unsoundness in static analysis in the presence of dynamic language features. The models developed in this thesis can be used by static analysis frameworks and tools to boost the soundness in those frameworks and tools

    Program Analysis Based Approaches to Ensure Security and Safety of Emerging Software Platforms

    Full text link
    Our smartphones, homes, hospitals, and automobiles are being enhanced with software that provide an unprecedentedly rich set of functionalities, which has created an enormous market for the development of software that run on almost every personal computing devices in a person's daily life, including security- and safety-critical ones. However, the software development support provided by the emerging platforms also raises security risks by allowing untrusted third-party code, which can potentially be buggy, vulnerable or even malicious to control user's device. Moreover, as the Internet-of-Things (IoT) technology is gaining vast adoptions by a wide range of industries, and is penetrating every aspects of people's life, safety risks brought by the open software development support of the emerging IoT platform (e.g., smart home) could bring more severe threat to the well-being of customers than what security vulnerabilities in mobile apps have done to a cell phone user. To address this challenge posed on the software security in emerging domains, my dissertation focuses on the flaws, vulnerabilities and malice in the software developed for platforms in these domains. Specifically, we demonstrate that systematic program analyses of software (1) Lead to an understanding of design and implementation flaws across different platforms that can be leveraged in miscellaneous attacks or causing safety problems; (2) Lead to the development of security mechanisms that limit the potential for these threats.We contribute static and dynamic program analysis techniques for three modern platforms in emerging domains -- smartphone, smart home, and autonomous vehicle. Our app analysis reveals various different vulnerabilities and design flaws on these platforms, and we propose (1) static analysis tool OPAnalyzer to automates the discovery of problems by searching for vulnerable code patterns; (2) dynamic testing tool AutoFuzzer to efficiently produce and capture domain specific issues that are previously undefined; and (3) propose new access control mechanism ContexIoT to strengthen the platform's immunity to the vulnerability and malice in third-party software. Concretely, we first study a vulnerability family caused by the open ports on mobile devices, which allows remote exploitation due to insufficient protection. We devise a tool called OPAnalyzer to perform the first systematic study of open port usage and their security implications on mobile platform, which effectively identify and characterize vulnerable open port usage at scale in popular Android apps. We further identify the lack of context-based access control as a main enabler for such attacks, and begin to seek for defense solution to strengthen the system security. We study the popular smart home platform, and find the existing access control mechanisms to be coarse-grand, insufficient, and undemanding. Taking lessons from previous permission systems, we propose the ContexIoT approach, a context-based permission system for IoT platform that supports third-party app development, which protects the user from vulnerability and malice in these apps through fine-grained identification of context. Finally, we design dynamic fuzzing tool, AutoFuzzer for the testing of self-driving functionalities, which demand very high code quality using improved testing practice combining the state-of-the-art fuzzing techniques with vehicular domain knowledge, and discover problems that lead to crashes in safety-critical software on emerging autonomous vehicle platform.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145845/1/jackjia_1.pd

    Analyzing the Unanalyzable: an Application to Android Apps

    Get PDF
    In general, software is unreliable. Its behavior can deviate from users’ expectations because of bugs, vulnerabilities, or even malicious code. Manually vetting software is a challenging, tedious, and highly-costly task that does not scale. To alleviate excessive costs and analysts’ burdens, automated static analysis techniques have been proposed by both the research and practitioner communities making static analysis a central topic in software engineering. In the meantime, mobile apps have considerably grown in importance. Today, most humans carry software in their pockets, with the Android operating system leading the market. Millions of apps have been proposed to the public so far, targeting a wide range of activities such as games, health, banking, GPS, etc. Hence, Android apps collect and manipulate a considerable amount of sensitive information, which puts users’ security and privacy at risk. Consequently, it is paramount to ensure that apps distributed through public channels (e.g., the Google Play) are free from malicious code. Hence, the research and practitioner communities have put much effort into devising new automated techniques to vet Android apps against malicious activities over the last decade. Analyzing Android apps is, however, challenging. On the one hand, the Android framework proposes constructs that can be used to evade dynamic analysis by triggering the malicious code only under certain circumstances, e.g., if the device is not an emulator and is currently connected to power. Hence, dynamic analyses can -easily- be fooled by malicious developers by making some code fragments difficult to reach. On the other hand, static analyses are challenged by Android-specific constructs that limit the coverage of off-the-shell static analyzers. The research community has already addressed some of these constructs, including inter-component communication or lifecycle methods. However, other constructs, such as implicit calls (i.e., when the Android framework asynchronously triggers a method in the app code), make some app code fragments unreachable to the static analyzers, while these fragments are executed when the app is run. Altogether, many apps’ code parts are unanalyzable: they are either not reachable by dynamic analyses or not covered by static analyzers. In this manuscript, we describe our contributions to the research effort from two angles: ① statically detecting malicious code that is difficult to access to dynamic analyzers because they are triggered under specific circumstances; and ② statically analyzing code not accessible to existing static analyzers to improve the comprehensiveness of app analyses. More precisely, in Part I, we first present a replication study of a state-of-the-art static logic bomb detector to better show its limitations. We then introduce a novel hybrid approach for detecting suspicious hidden sensitive operations towards triaging logic bombs. We finally detail the construction of a dataset of Android apps automatically infected with logic bombs. In Part II, we present our work to improve the comprehensiveness of Android apps’ static analysis. More specifically, we first show how we contributed to account for atypical inter-component communication in Android apps. Then, we present a novel approach to unify both the bytecode and native in Android apps to account for the multi-language trend in app development. Finally, we present our work to resolve conditional implicit calls in Android apps to improve static and dynamic analyzers

    DETECTION, DIAGNOSIS AND MITIGATION OF MALICIOUS JAVASCRIPT WITH ENRICHED JAVASCRIPT EXECUTIONS

    Get PDF
    Malicious JavaScript has become an important attack vector for software exploitation attacks and imposes a severe threat to computer security. In particular, three major class of problems, malware detection, exploit diagnosis, and exploits mitigation, bring considerable challenges to security researchers. Although a lot of research efforts have been made to address these threats, they have fundamental limitations and thus cannot solve the problems. Existing analysis techniques fall into two general categories: static analysis and dynamic analysis. Static analysis tends to produce inaccurate results (both false positive and false negative) and is vulnerable to a wide series of obfuscation techniques. Thus, dynamic analysis is constantly gaining popularity for exposing the typical features of malicious JavaScript. However, existing dynamic analysis techniques possess limitations such as limited code coverage and incomplete environment setup, leaving a broad attack surface for evading the detection. Once a zero-day exploit is captured, it is critical to quickly pinpoint the JavaScript statements that uniquely characterize the exploit and the payload location in the exploit. However, the current diagnosis techniques are inadequate because they approach the problem either from a JavaScript perspective and fail to account for “implicit” data flow invisible at JavaScript level, or from a binary execution perspective and fail to present the JavaScript level view of exploit. Although software vendors have deployed techniques like ASLR, sandbox, etc. to mitigate JavaScript exploits, hacking contests (e.g.,PWN2OWN, GeekPWN) have demonstrated that the latest software (e.g., Chrome, IE, Edge, Safari) can still be exploited. An ideal JavaScript exploit mitigation solution should be flexible and allow for deployment without requiring code changes. To combat malicious JavaScript, this dissertation addresses these problems through enriched executions, which explore arbitrary paths for detection, preserve JS-binary semantics for diagnosis, and perturbs memory with chaff code for mitigation. Firstly, JSForce, a forced execution engine for JavaScript, is proposed and developed to improve the detection results of current malicious JavaScript detection techniques. It drives an arbitrary JavaScript snippet to execute along different paths without any input or environment setup. While increasing code coverage, JSForce can tolerate invalid object accesses while introducing no runtime errors during execution. Secondly, JScalpel, a system that utilizes the JavaScript context information from the JavaScript level to perform context-aware binary analysis, is presented for JavaScript exploit diagnosis. In essence, it performs JS-Binary analysis to (1) generate a minimized exploit script, which in turn helps to generate a signature for the exploit, and (2) precisely locate the payload within the exploit. It replaces the malicious payload with a friendly payload and generates a PoV for the exploit. Thirdly, ChaffyScript, a vulnerability-agnostic mitigation system, is introduced to block JavaScript exploits via undermining the memory preparation stage. Specifically, given suspicious JavaScript, ChaffyScript rewrites the code to insert memory perturbation code, and then generates semantically-equivalent code. JavaScript exploits will fail as a result of unexpected memory states introduced by memory perturbation code, while the benign JavaScript still behaves as expected since the memory perturbation code does not change the JavaScript’s original semantics

    Design and implementation of robust systems for secure malware detection

    Get PDF
    Malicious software (malware) have significantly increased in terms of number and effectiveness during the past years. Until 2006, such software were mostly used to disrupt network infrastructures or to show coders’ skills. Nowadays, malware constitute a very important source of economical profit, and are very difficult to detect. Thousands of novel variants are released every day, and modern obfuscation techniques are used to ensure that signature-based anti-malware systems are not able to detect such threats. This tendency has also appeared on mobile devices, with Android being the most targeted platform. To counteract this phenomenon, a lot of approaches have been developed by the scientific community that attempt to increase the resilience of anti-malware systems. Most of these approaches rely on machine learning, and have become very popular also in commercial applications. However, attackers are now knowledgeable about these systems, and have started preparing their countermeasures. This has lead to an arms race between attackers and developers. Novel systems are progressively built to tackle the attacks that get more and more sophisticated. For this reason, a necessity grows for the developers to anticipate the attackers’ moves. This means that defense systems should be built proactively, i.e., by introducing some security design principles in their development. The main goal of this work is showing that such proactive approach can be employed on a number of case studies. To do so, I adopted a global methodology that can be divided in two steps. First, understanding what are the vulnerabilities of current state-of-the-art systems (this anticipates the attacker’s moves). Then, developing novel systems that are robust to these attacks, or suggesting research guidelines with which current systems can be improved. This work presents two main case studies, concerning the detection of PDF and Android malware. The idea is showing that a proactive approach can be applied both on the X86 and mobile world. The contributions provided on this two case studies are multifolded. With respect to PDF files, I first develop novel attacks that can empirically and optimally evade current state-of-the-art detectors. Then, I propose possible solutions with which it is possible to increase the robustness of such detectors against known and novel attacks. With respect to the Android case study, I first show how current signature-based tools and academically developed systems are weak against empirical obfuscation attacks, which can be easily employed without particular knowledge of the targeted systems. Then, I examine a possible strategy to build a machine learning detector that is robust against both empirical obfuscation and optimal attacks. Finally, I will show how proactive approaches can be also employed to develop systems that are not aimed at detecting malware, such as mobile fingerprinting systems. In particular, I propose a methodology to build a powerful mobile fingerprinting system, and examine possible attacks with which users might be able to evade it, thus preserving their privacy. To provide the aforementioned contributions, I co-developed (with the cooperation of the researchers at PRALab and Ruhr-Universität Bochum) various systems: a library to perform optimal attacks against machine learning systems (AdversariaLib), a framework for automatically obfuscating Android applications, a system to the robust detection of Javascript malware inside PDF files (LuxOR), a robust machine learning system to the detection of Android malware, and a system to fingerprint mobile devices. I also contributed to develop Android PRAGuard, a dataset containing a lot of empirical obfuscation attacks against the Android platform. Finally, I entirely developed Slayer NEO, an evolution of a previous system to the detection of PDF malware. The results attained by using the aforementioned tools show that it is possible to proactively build systems that predict possible evasion attacks. This suggests that a proactive approach is crucial to build systems that provide concrete security against general and evasion attacks

    Protecting applications using trusted execution environments

    Get PDF
    While cloud computing has been broadly adopted, companies that deal with sensitive data are still reluctant to do so due to privacy concerns or legal restrictions. Vulnerabilities in complex cloud infrastructures, resource sharing among tenants, and malicious insiders pose a real threat to the confidentiality and integrity of sensitive customer data. In recent years trusted execution environments (TEEs), hardware-enforced isolated regions that can protect code and data from the rest of the system, have become available as part of commodity CPUs. However, designing applications for the execution within TEEs requires careful consideration of the elevated threats that come with running in a fully untrusted environment. Interaction with the environment should be minimised, but some cooperation with the untrusted host is required, e.g. for disk and network I/O, via a host interface. Implementing this interface while maintaining the security of sensitive application code and data is a fundamental challenge. This thesis addresses this challenge and discusses how TEEs can be leveraged to secure existing applications efficiently and effectively in untrusted environments. We explore this in the context of three systems that deal with the protection of TEE applications and their host interfaces: SGX-LKL is a library operating system that can run full unmodified applications within TEEs with a minimal general-purpose host interface. By providing broad system support inside the TEE, the reliance on the untrusted host can be reduced to a minimal set of low-level operations that cannot be performed inside the enclave. SGX-LKL provides transparent protection of the host interface and for both disk and network I/O. Glamdring is a framework for the semi-automated partitioning of TEE applications into an untrusted and a trusted compartment. Based on source-level annotations, it uses either dynamic or static code analysis to identify sensitive parts of an application. Taking into account the objectives of a small TCB size and low host interface complexity, it defines an application-specific host interface and generates partitioned application code. EnclaveDB is a secure database using Intel SGX based on a partitioned in-memory database engine. The core of EnclaveDB is its logging and recovery protocol for transaction durability. For this, it relies on the database log managed and persisted by the untrusted database server. EnclaveDB protects against advanced host interface attacks and ensures the confidentiality, integrity, and freshness of sensitive data.Open Acces
    corecore