144 research outputs found

    Transparent and Precise Malware Analysis Using Virtualization: From Theory to Practice

    Get PDF
    Dynamic analysis is an important technique used in malware analysis and is complementary to static analysis. Thus far, virtualization has been widely adopted for building fine-grained dynamic analysis tools and this trend is expected to continue. Unlike User/Kernel space malware analysis platforms that essentially co-exist with malware, virtualization based platforms benefit from isolation and fine-grained instrumentation support. Isolation makes it more difficult for malware samples to disrupt analysis and fine-grained instrumentation provides analysts with low level details, such as those at the machine instruction level. This in turn supports the development of advanced analysis tools such as dynamic taint analysis and symbolic execution for automatic path exploration. The major disadvantage of virtualization based malware analysis is the loss of semantic information, also known as the semantic gap problem. To put it differently, since analysis takes place at the virtual machine monitor where only the raw system state (e.g., CPU and memory) is visible, higher level constructs such as processes and files must be reconstructed using the low level information. The collection of techniques used to bridge semantic gaps is known as Virtual Machine Introspection. Virtualization based analysis platforms can be further separated into emulation and hardware virtualization. Emulators have the advantages of flexibility of analysis tool development and efficiency for fine-grained analysis; however, emulators suffer from the transparency problem. That is, malware can employ methods to determine whether it is executing in an emulated environment versus real hardware and cease operations to disrupt analysis if the machine is emulated. In brief, emulation based dynamic analysis has advantages over User/Kernel space and hardware virtualization based techniques, but it suffers from semantic gap and transparency problems. These problems have been exacerbated by recent discoveries of anti-emulation malware that detects emulators and Android malware with two semantic gaps, Java and native. Also, it is foreseeable that malware authors will have a similar response to taint analysis. In other words, once taint analysis becomes widely used to understand how malware operates, the authors will create new malware that attacks the imprecisions in taint analysis implementations and induce false-positives and false-negatives in an effort to frustrate analysts. This dissertation addresses these problems by presenting concepts, methods and techniques that can be used to transparently and precisely analyze both desktop and mobile malware using virtualization. This is achieved in three parts. First, precise heterogeneous record and replay is presented as a means to help emulators benefit from the transparency characteristics of hardware virtualization. This technique is implemented in a tool called V2E that uses KVM for recording and TEMU for replaying and analysis. It was successfully used to analyze real-world anti-emulation malware that evaded analysis using TEMU alone. Second, the design of an emulation based Android malware analysis platform that uses virtual machine introspection to bridge both the Java and native level semantic gaps as well as seamlessly bind the two views together into a single view is presented. The core introspection and instrumentation techniques were implemented in a new analysis platform called DroidScope that is based on the Android emulator. It was successfully used to analyze two real-world Android malware samples that have cooperating Java and native level components. Taint analysis was also used to study their information ex-filtration behaviors. Third, formal methods for studying the sources of false-positives and false-negatives in dynamic taint analysis designs and for verifying the correctness of manually defined taint propagation rules are presented. These definitions and methods were successfully used to analyze and compare previously published taint analysis platforms in terms of false-positives and false-negatives

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques

    Applications of information sharing for code generation in process virtual machines

    Get PDF
    As the backbone of many computing environments today, it is important that process virtual machines be both performant and robust in mobile, personal desktop, and enterprise applications. This thesis focusses on code generation within these virtual machines, particularly addressing situations where redundant work is being performed. The goal is to exploit information sharing in order to improve the performance and robustness of virtual machines that are accelerated by native code generation. First, the thesis investigates the potential to share generated code between multiple threads in a dynamic binary translator used to perform instruction set simulation. This is done through a code generation design that allows native code to be executed by any simulated core and adding a mechanism to share native code regions between threads. This is shown to improve the average performance of multi-threaded benchmarks by 1.4x when simulating 128 cores on a quad-core host machine. Secondly, the ahead-of-time code generation system used for executing Android applications is improved through the use of profiling. The thesis investigates the potential for profiles produced by individual users of applications to be shared and merged together to produce a generic profile that still provides a lot of benefit for a new user who is then able to skip the expensive profiling phase. These profiles can not only be used for selective compilation to reduce code-size and installation time, but can also be used for focussed optimisation on vital code regions of an application in order to improve overall performance. With selective compilation applied to a set of popular Android applications, code-size can be reduced by 49.9% on average, while installation time can be reduced by 31.8%, with only an average 8.5% increase in the amount of sequential runtime required to execute the collected profiles. The thesis also shows that, among the tested users, the use of a crowd-sourced and merged profile does not significantly affect their estimated performance loss from selective compilation (0.90x-0.92x) in comparison to when they they perform selective compilation with their own unique profile (0.93x). Furthermore, by proposing a new, more powerful code generator for Android’s virtual machine, these same profiles can be used to perform focussed optimisation, which preliminary results show to increase runtime performance across a set of common Android benchmarks by 1.46x-10.83x. Finally, in such a situation where a new code generator is being added to a virtual machine, it is also important to test the code generator for correctness and robustness. The methods of execution of a virtual machine, such as interpreters and code generators, must share a set of semantics about how programs must be executed, and this can be exploited in order to improve testing. This is done through the application of domain-aware binary fuzzing and differential testing within Android’s virtual machine. The thesis highlights a series of actual code generation and verification bugs that were found in Android’s virtual machine using this testing methodology, as well as comparing the proposed approach to other state-of-the-art fuzzing techniques

    ShareJIT: JIT Code Cache Sharing across Processes and Its Practical Implementation

    Get PDF
    Just-in-time (JIT) compilation coupled with code caching are widely used to improve performance in dynamic programming language implementations. These code caches, along with the associated profiling data for the hot code, however, consume significant amounts of memory. Furthermore, they incur extra JIT compilation time for their creation. On Android, the current standard JIT compiler and its code caches are not shared among processes---that is, the runtime system maintains a private code cache, and its associated data, for each runtime process. However, applications running on the same platform tend to share multiple libraries in common. Sharing cached code across multiple applications and multiple processes can lead to a reduction in memory use. It can directly reduce compile time. It can also reduce the cumulative amount of time spent interpreting code. All three of these effects can improve actual runtime performance. In this paper, we describe ShareJIT, a global code cache for JITs that can share code across multiple applications and multiple processes. We implemented ShareJIT in the context of the Android Runtime (ART), a widely used, state-of-the-art system. To increase sharing, our implementation constrains the amount of context that the JIT compiler can use to optimize the code. This exposes a fundamental tradeoff: increased specialization to a single process' context decreases the extent to which the compiled code can be shared. In ShareJIT, we limit some optimization to increase shareability. To evaluate the ShareJIT, we tested 8 popular Android apps in a total of 30 experiments. ShareJIT improved overall performance by 9% on average, while decreasing memory consumption by 16% on average and JIT compilation time by 37% on average.Comment: OOPSLA 201

    Present but Unreachable: Reducing Persistentlatent Secrets in HotSpot JVM

    Get PDF
    Applications that manage \ sensitive secrets, including cryptographic keys, are typically \ engineered to overwrite the secrets in memory once they\u27re no longer \ necessary, offering an important defense against forensic attacks \ against the computer. In a modern garbage-collected memory system, \ however, live objects will be copied and compacted into new memory \ pages, with the user program being unable to reach and zero out \ obsolete copies in old memory pages that have not yet \ been reused. This paper considers this problem in the HotSpot JVM, \ the default JVM used by the Oracle and OpenJDK Java platforms. \ We analyze the SerialGC and Garbage First Garbage Collector (G1GC) \ implementations, showing that sensitive data such as TLS keys are \ easily extracted from the garbage. To mitigate this issue, we \ implemented techniques to sanitize older heap pages and we measure \ the performance impact--sometimes good, sometimes unacceptable. We \ also discuss how future garbage collectors might be designed from \ scratch with efficient heap sanitation in mind.
    • 

    corecore