154 research outputs found

    Transparent and Precise Malware Analysis Using Virtualization: From Theory to Practice

    Get PDF
    Dynamic analysis is an important technique used in malware analysis and is complementary to static analysis. Thus far, virtualization has been widely adopted for building fine-grained dynamic analysis tools and this trend is expected to continue. Unlike User/Kernel space malware analysis platforms that essentially co-exist with malware, virtualization based platforms benefit from isolation and fine-grained instrumentation support. Isolation makes it more difficult for malware samples to disrupt analysis and fine-grained instrumentation provides analysts with low level details, such as those at the machine instruction level. This in turn supports the development of advanced analysis tools such as dynamic taint analysis and symbolic execution for automatic path exploration. The major disadvantage of virtualization based malware analysis is the loss of semantic information, also known as the semantic gap problem. To put it differently, since analysis takes place at the virtual machine monitor where only the raw system state (e.g., CPU and memory) is visible, higher level constructs such as processes and files must be reconstructed using the low level information. The collection of techniques used to bridge semantic gaps is known as Virtual Machine Introspection. Virtualization based analysis platforms can be further separated into emulation and hardware virtualization. Emulators have the advantages of flexibility of analysis tool development and efficiency for fine-grained analysis; however, emulators suffer from the transparency problem. That is, malware can employ methods to determine whether it is executing in an emulated environment versus real hardware and cease operations to disrupt analysis if the machine is emulated. In brief, emulation based dynamic analysis has advantages over User/Kernel space and hardware virtualization based techniques, but it suffers from semantic gap and transparency problems. These problems have been exacerbated by recent discoveries of anti-emulation malware that detects emulators and Android malware with two semantic gaps, Java and native. Also, it is foreseeable that malware authors will have a similar response to taint analysis. In other words, once taint analysis becomes widely used to understand how malware operates, the authors will create new malware that attacks the imprecisions in taint analysis implementations and induce false-positives and false-negatives in an effort to frustrate analysts. This dissertation addresses these problems by presenting concepts, methods and techniques that can be used to transparently and precisely analyze both desktop and mobile malware using virtualization. This is achieved in three parts. First, precise heterogeneous record and replay is presented as a means to help emulators benefit from the transparency characteristics of hardware virtualization. This technique is implemented in a tool called V2E that uses KVM for recording and TEMU for replaying and analysis. It was successfully used to analyze real-world anti-emulation malware that evaded analysis using TEMU alone. Second, the design of an emulation based Android malware analysis platform that uses virtual machine introspection to bridge both the Java and native level semantic gaps as well as seamlessly bind the two views together into a single view is presented. The core introspection and instrumentation techniques were implemented in a new analysis platform called DroidScope that is based on the Android emulator. It was successfully used to analyze two real-world Android malware samples that have cooperating Java and native level components. Taint analysis was also used to study their information ex-filtration behaviors. Third, formal methods for studying the sources of false-positives and false-negatives in dynamic taint analysis designs and for verifying the correctness of manually defined taint propagation rules are presented. These definitions and methods were successfully used to analyze and compare previously published taint analysis platforms in terms of false-positives and false-negatives

    Efficient runtime management for enabling sustainable performance in real-world mobile applications

    Full text link
    Mobile devices have become integral parts of our society. They handle our diverse computing needs from simple daily tasks (i.e., text messaging, e-mail) to complex graphics and media processing under a limited battery budget. Mobile system-on-chip (SoC) designs have become increasingly sophisticated to handle performance needs of diverse workloads and to improve user experience. Unfortunately, power and thermal constraints have also emerged as major concerns. Increased power densities and temperatures substantially impair user experience due to frequent throttling as well as diminishing device reliability and battery life. Addressing these concerns becomes increasingly challenging due to increased complexities at both hardware (e.g., heterogeneous CPUs, accelerators) and software (e.g., vast number of applications, multi-threading). Enabling sustained user experience in face of these challenges requires (1) practical runtime management solutions that can reason about the performance needs of users and applications while optimizing power and temperature; (2) tools for analyzing real-world mobile application behavior and performance. This thesis aims at improving sustained user experience under thermal limitations by incorporating insights from real-world mobile applications into runtime management. This thesis first proposes thermally-efficient and Quality-of-Service (QoS) aware runtime management techniques to enable sustained performance. Our work leverages inherent QoS tolerance of users in real-world applications and introduces QoS-temperature tradeoff as a viable control knob to improve user experience under thermal constraints. We present a runtime control framework, QScale, which manages CPU power and scheduling decisions to optimize temperature while strictly adhering to given QoS targets. We also design a framework, Maestro, which provides autonomous and application-aware management of QoS-temperature tradeoffs. Maestro uses our thermally-efficient QoS control framework, QScale, as its foundation. This thesis also presents tools to facilitate studies of real-world mobile applications. We design a practical record and replay system, RandR, to generate repeatable executions of mobile applications. RandR provides this capability by automatically reproducing non-deterministic input sources in mobile applications such as user inputs and network events. Finally, we focus on the non-deterministic executions in Android malware which seek to evade analysis environments. We propose the Proteus system to identify the instruction-level inputs that reveal analysis environments

    DISCOVERING ANOMALOUS BEHAVIORS BY ADVANCED PROGRAM ANALYSIS TECHNIQUES

    Get PDF
    As soon as a technology started to be used by the masses, ended up as a target of the investigation of bad guys that write malicious software with the only and explicit intent to damage users and take control of their systems to perform different types of fraud. Malicious programs, in fact, are a serious threat for the security and privacy of billions of users. The bad guys are the main characters of this unstoppable threat which improves as the time goes by. At the beginning it was pure computer vandalism, then turned into petty theft followed by cybercrime, cyber espionage, and finally gray market business. Cybercrime is a very dangerous threat which consists of, for instance, stealing credentials of bank accounts, sending SMS to premium number, stealing user sensitive information, using resources of infected computer to develop e.g., spam business, DoS, botnets, etc. The interest of the cybercrime is to intentionally create malicious programs for its own interest, mostly lucrative. Hence, due to the malicious activity, cybercriminals have all the interest in not being detected during the attack, and developing their programs to be always more resilient against anti-malware solution. As a proof that this is a dangerous threat, the FBI reported a decline in physical crime and an increase of cybercrime. In order to deal with the increasing number of exploits found in legacy code and to detect malicious code which leverages every subtle hardware and software detail to escape from malware analysis tools, the security research community started to develop and improve various code analysis techniques (static, dynamic or both), with the aim to detect the different forms of stealthy malware and to individuate security bugs in legacy code. Despite the improvement of the research solutions, yet the current ones are inadequate to face new stealthy and mobile malware. Following such a line of research, in this dissertation, we present new program analysis techniques that aim to improve the analysis environment and deal with mobile malware. To perform malware analysis, behavior analysis technique is the prominent: the actions that a program is performing during its real-time execution are collected to understand its behavior. Nevertheless, they suffer of some limitations. State-of-the-Art malware analysis solutions rely on emulated execution environment to prevent the host to get infected, quickly recover to a pristine state, and easily collect process information. A drawback of these solutions is the non-transparency, that is, the execution environment does not faithfully emulate the physical end-user environment, which could lead to end up with incomplete results. In fact, malicious programs could detect when they are monitored in such environment, and thus modifying their behavior to mislead the analysis and avoid detection. On the contrary, a faithful emulator would drastically reduce the chance of detection of the analysis environment from the analyzed malware. To this end, we present EmuFuzzer, a novel testing methodology specific for CPU emulators, based on fuzzing to verify whether the CPU is properly emulated or not. Another shortcoming regards the stimulation of the analyzed application. It is not uncommon that an application exhibit certain behaviors only when exercised with specific events (i.e., button click, insert text, socket connection, etc.). This flaw is even exacerbated when analyzing mobile application. At this aim, we introduce CopperDroid, a program analysis tool built on top of QEMU to automatically perform out-of-the-box dynamic behavior analysis of Android malware. To this end, CopperDroid presents a unified analysis to characterize low-level OS-specific and high-level Android-specific behaviors

    Evaluation infrastructure for mobile distributed applications

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 50-54).Sophisticated applications that run on mobile devices have become commonplace. Within the wide realm of mobile software applications there exists a significant number that make use of networking in some form. Unfortunately, such distributed mobile applications are inherently difficult to evaluate. Conventional evaluations of such distributed applications are limited to small, real-world deployments consisting of, perhaps, a handful of phones. Such tests often do not have the requisite number of users to produce the desired performance. Also, these experiments do not scale and are not repeatable. To address all these issues, we sought to evaluate distributed applications in a virtual environment. Besides being cheaper, such evaluations are reproducible and scale significantly better. This thesis documents our efforts in working towards this goal. We discuss the designs that we iterated through, along with the problems we faced in each of them. We hope these problems will inform future designs that can solve the challenges that we weren't able to solve efficiently.by Anirudh Sivaraman Kaushalram.S.M

    Applications of information sharing for code generation in process virtual machines

    Get PDF
    As the backbone of many computing environments today, it is important that process virtual machines be both performant and robust in mobile, personal desktop, and enterprise applications. This thesis focusses on code generation within these virtual machines, particularly addressing situations where redundant work is being performed. The goal is to exploit information sharing in order to improve the performance and robustness of virtual machines that are accelerated by native code generation. First, the thesis investigates the potential to share generated code between multiple threads in a dynamic binary translator used to perform instruction set simulation. This is done through a code generation design that allows native code to be executed by any simulated core and adding a mechanism to share native code regions between threads. This is shown to improve the average performance of multi-threaded benchmarks by 1.4x when simulating 128 cores on a quad-core host machine. Secondly, the ahead-of-time code generation system used for executing Android applications is improved through the use of profiling. The thesis investigates the potential for profiles produced by individual users of applications to be shared and merged together to produce a generic profile that still provides a lot of benefit for a new user who is then able to skip the expensive profiling phase. These profiles can not only be used for selective compilation to reduce code-size and installation time, but can also be used for focussed optimisation on vital code regions of an application in order to improve overall performance. With selective compilation applied to a set of popular Android applications, code-size can be reduced by 49.9% on average, while installation time can be reduced by 31.8%, with only an average 8.5% increase in the amount of sequential runtime required to execute the collected profiles. The thesis also shows that, among the tested users, the use of a crowd-sourced and merged profile does not significantly affect their estimated performance loss from selective compilation (0.90x-0.92x) in comparison to when they they perform selective compilation with their own unique profile (0.93x). Furthermore, by proposing a new, more powerful code generator for Android’s virtual machine, these same profiles can be used to perform focussed optimisation, which preliminary results show to increase runtime performance across a set of common Android benchmarks by 1.46x-10.83x. Finally, in such a situation where a new code generator is being added to a virtual machine, it is also important to test the code generator for correctness and robustness. The methods of execution of a virtual machine, such as interpreters and code generators, must share a set of semantics about how programs must be executed, and this can be exploited in order to improve testing. This is done through the application of domain-aware binary fuzzing and differential testing within Android’s virtual machine. The thesis highlights a series of actual code generation and verification bugs that were found in Android’s virtual machine using this testing methodology, as well as comparing the proposed approach to other state-of-the-art fuzzing techniques

    Selective Dynamic Analysis of Virtualized Whole-System Guest Environments

    Get PDF
    Dynamic binary analysis is a prevalent and indispensable technique in program analysis. While several dynamic binary analysis tools and frameworks have been proposed, all suffer from one or more of: prohibitive performance degradation, a semantic gap between the analysis code and the execution under analysis, architecture/OS specificity, being user-mode only, and lacking flexibility and extendability. This dissertation describes the design of the Dynamic Executable Code Analysis Framework (DECAF), a virtual machine-based, multi-target, whole-system dynamic binary analysis framework. In short, DECAF seeks to address the shortcomings of existing whole-system dynamic analysis tools and extend the state of the art by utilizing a combination of novel techniques to provide rich analysis functionality without crippling amounts of execution overhead. DECAF extends the mature QEMU whole-system emulator, a type-2 hypervisor capable of emulating every instruction that executes within a complete guest system environment. DECAF provides a novel, hardware event-based method of just-in-time virtual machine introspection (VMI) to address the semantic gap problem. It also implements a novel instruction-level taint tracking engine at bitwise level of granularity, ensuring that taint propagation is sound and highly precise throughout the guest environment. A formal analysis of the taint propagation rules is provided to verify that most instructions introduce neither false positives nor false negatives. DECAF’s design also provides a plugin architecture with a simple-to-use, event-driven programming interface that makes it both flexible and extendable for a variety of analysis tasks. The implementation of DECAF consists of 9550 lines of C++ code and 10270 lines of C code. Its performance is evaluated using CPU2006 SPEC benchmarks, which show an average overhead of 605% for system wide tainting and 12% for VMI. Three platformneutral DECAF plugins - Instruction Tracer, Keylogger Detector, and API Tracer - are described and evaluated in this dissertation to demonstrate the ease of use and effectiveness of DECAF in writing cross-platform and system-wide analysis tools. This dissertation also presents the Virtual Device Fuzzer (VDF), a scalable fuzz testing framework for discovering bugs within the virtual devices implemented as part of QEMU. Such bugs could be used by malicious software executing within a guest under analysis by DECAF, so the discovery, reproduction, and diagnosis of such bugs helps to protect DECAF against attack while improving QEMU and any analysis platforms built upon QEMU. VDF uses selective instrumentation to perform targeted fuzz testing, which explores only the branches of execution belonging to virtual devices under analysis. By leveraging record and replay of memory-mapped I/O activity, VDF quickly cycles virtual devices through an arbitrarily large number of states without requiring a guest OS to be booted or present. Once a test case is discovered that triggers a bug, VDF reduces the test case to the minimum number of reads/writes required to trigger the bug and generates source code suitable for reproducing the bug during debugging and analysis. VDF is evaluated by fuzz testing eighteen QEMU virtual devices, generating 1014 crash or hang test cases that reveal bugs in six of the tested devices. Over 80% of the crashes and hangs were discovered within the first day of testing. VDF covered an average of 62.32% of virtual device branches during testing, and the average test case was minimized to a reproduction test case only 18.57% of its original size

    Using Virtualisation to Protect Against Zero-Day Attacks

    Get PDF
    Bal, H.E. [Promotor]Bos, H.J. [Copromotor
    • …
    corecore