32 research outputs found

    Transparent and Precise Malware Analysis Using Virtualization: From Theory to Practice

    Get PDF
    Dynamic analysis is an important technique used in malware analysis and is complementary to static analysis. Thus far, virtualization has been widely adopted for building fine-grained dynamic analysis tools and this trend is expected to continue. Unlike User/Kernel space malware analysis platforms that essentially co-exist with malware, virtualization based platforms benefit from isolation and fine-grained instrumentation support. Isolation makes it more difficult for malware samples to disrupt analysis and fine-grained instrumentation provides analysts with low level details, such as those at the machine instruction level. This in turn supports the development of advanced analysis tools such as dynamic taint analysis and symbolic execution for automatic path exploration. The major disadvantage of virtualization based malware analysis is the loss of semantic information, also known as the semantic gap problem. To put it differently, since analysis takes place at the virtual machine monitor where only the raw system state (e.g., CPU and memory) is visible, higher level constructs such as processes and files must be reconstructed using the low level information. The collection of techniques used to bridge semantic gaps is known as Virtual Machine Introspection. Virtualization based analysis platforms can be further separated into emulation and hardware virtualization. Emulators have the advantages of flexibility of analysis tool development and efficiency for fine-grained analysis; however, emulators suffer from the transparency problem. That is, malware can employ methods to determine whether it is executing in an emulated environment versus real hardware and cease operations to disrupt analysis if the machine is emulated. In brief, emulation based dynamic analysis has advantages over User/Kernel space and hardware virtualization based techniques, but it suffers from semantic gap and transparency problems. These problems have been exacerbated by recent discoveries of anti-emulation malware that detects emulators and Android malware with two semantic gaps, Java and native. Also, it is foreseeable that malware authors will have a similar response to taint analysis. In other words, once taint analysis becomes widely used to understand how malware operates, the authors will create new malware that attacks the imprecisions in taint analysis implementations and induce false-positives and false-negatives in an effort to frustrate analysts. This dissertation addresses these problems by presenting concepts, methods and techniques that can be used to transparently and precisely analyze both desktop and mobile malware using virtualization. This is achieved in three parts. First, precise heterogeneous record and replay is presented as a means to help emulators benefit from the transparency characteristics of hardware virtualization. This technique is implemented in a tool called V2E that uses KVM for recording and TEMU for replaying and analysis. It was successfully used to analyze real-world anti-emulation malware that evaded analysis using TEMU alone. Second, the design of an emulation based Android malware analysis platform that uses virtual machine introspection to bridge both the Java and native level semantic gaps as well as seamlessly bind the two views together into a single view is presented. The core introspection and instrumentation techniques were implemented in a new analysis platform called DroidScope that is based on the Android emulator. It was successfully used to analyze two real-world Android malware samples that have cooperating Java and native level components. Taint analysis was also used to study their information ex-filtration behaviors. Third, formal methods for studying the sources of false-positives and false-negatives in dynamic taint analysis designs and for verifying the correctness of manually defined taint propagation rules are presented. These definitions and methods were successfully used to analyze and compare previously published taint analysis platforms in terms of false-positives and false-negatives

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    Causal reasoning about distributed programs

    Get PDF
    We present an integrated approach to the specification, verification and testing of distributed programs. We show how global properties defined by transition axiom specifications can be interpreted as definitions of causal relationships between process states. We explain why reasoning about causal rather than global relationships yields a clearer picture of distributed processing.;We present a proof system for showing the partial correctness of CSP programs that places strict restrictions on assertions. It admits no global assertions. A process annotation may reference only local state. Glue predicates relate pairs of process states at points of interprocess communication. No assertion references auxiliary variables; appropriate use of control predicates and vector clock values eliminates the need for them. Our proof system emphasizes causality. We do not prove processes correct in isolation. We instead track causality as we write our annotations. When we come to a send or receive, we consider all the statements that could communicate with it, and use the semantics of CSP message passing to derive its postcondition. We show that our CSP proof system is sound and relatively complete, and that we need only recursive assertions to prove that any program in our fragment of CSP is partially correct. Our proof system is, therefore, as powerful as other proof systems for CSP.;We extend our work to develop proof systems for asynchronous communication. For each proof system, our motivation is to be able to write proofs that show that code satisfies its specification, while making only assertions we can use to define the aspects of process state that we should trace during test runs, and check during postmortem analysis. We can trace the assertions we make without having to modify program code or add synchronization or message passing.;Why, if we verify correctness, would we want to test? We observe that a proof, like a program, is susceptible to error. By tracing and analyzing program state during testing, we can build our confidence that our proof is valid

    Selective Dynamic Analysis of Virtualized Whole-System Guest Environments

    Get PDF
    Dynamic binary analysis is a prevalent and indispensable technique in program analysis. While several dynamic binary analysis tools and frameworks have been proposed, all suffer from one or more of: prohibitive performance degradation, a semantic gap between the analysis code and the execution under analysis, architecture/OS specificity, being user-mode only, and lacking flexibility and extendability. This dissertation describes the design of the Dynamic Executable Code Analysis Framework (DECAF), a virtual machine-based, multi-target, whole-system dynamic binary analysis framework. In short, DECAF seeks to address the shortcomings of existing whole-system dynamic analysis tools and extend the state of the art by utilizing a combination of novel techniques to provide rich analysis functionality without crippling amounts of execution overhead. DECAF extends the mature QEMU whole-system emulator, a type-2 hypervisor capable of emulating every instruction that executes within a complete guest system environment. DECAF provides a novel, hardware event-based method of just-in-time virtual machine introspection (VMI) to address the semantic gap problem. It also implements a novel instruction-level taint tracking engine at bitwise level of granularity, ensuring that taint propagation is sound and highly precise throughout the guest environment. A formal analysis of the taint propagation rules is provided to verify that most instructions introduce neither false positives nor false negatives. DECAF’s design also provides a plugin architecture with a simple-to-use, event-driven programming interface that makes it both flexible and extendable for a variety of analysis tasks. The implementation of DECAF consists of 9550 lines of C++ code and 10270 lines of C code. Its performance is evaluated using CPU2006 SPEC benchmarks, which show an average overhead of 605% for system wide tainting and 12% for VMI. Three platformneutral DECAF plugins - Instruction Tracer, Keylogger Detector, and API Tracer - are described and evaluated in this dissertation to demonstrate the ease of use and effectiveness of DECAF in writing cross-platform and system-wide analysis tools. This dissertation also presents the Virtual Device Fuzzer (VDF), a scalable fuzz testing framework for discovering bugs within the virtual devices implemented as part of QEMU. Such bugs could be used by malicious software executing within a guest under analysis by DECAF, so the discovery, reproduction, and diagnosis of such bugs helps to protect DECAF against attack while improving QEMU and any analysis platforms built upon QEMU. VDF uses selective instrumentation to perform targeted fuzz testing, which explores only the branches of execution belonging to virtual devices under analysis. By leveraging record and replay of memory-mapped I/O activity, VDF quickly cycles virtual devices through an arbitrarily large number of states without requiring a guest OS to be booted or present. Once a test case is discovered that triggers a bug, VDF reduces the test case to the minimum number of reads/writes required to trigger the bug and generates source code suitable for reproducing the bug during debugging and analysis. VDF is evaluated by fuzz testing eighteen QEMU virtual devices, generating 1014 crash or hang test cases that reveal bugs in six of the tested devices. Over 80% of the crashes and hangs were discovered within the first day of testing. VDF covered an average of 62.32% of virtual device branches during testing, and the average test case was minimized to a reproduction test case only 18.57% of its original size

    STATIC ENFORCEMENT OF TERMINATION-SENSITIVE NONINTERFERENCE USING THE C++ TEMPLATE TYPE SYSTEM

    Get PDF
    A side channel is an observable attribute of program execution other than explicit communication, e.g., power usage, execution time, or page fault patterns. A side-channel attack occurs when a malicious adversary observes program secrets through a side channel. This dissertation introduces Covert C++, a library which uses template metaprogramming to superimpose a security-type system on top of C++’s existing type system. Covert C++ enforces an information-flow policy that prevents secret data from influencing program control flow and memory access patterns, thus obviating side-channel leaks. Formally, Covert C++ can facilitate an extended definition of the classical noninterference property, broadened to also cover the dynamic execution property of memory-trace obliviousness. This solution does not require any modifications to the compiler, linker, or C++ standard. To verify that these security properties can be preserved by the compiler (i.e., by compiler optimizations), this dissertation introduces the Noninterference Verification Tool (NVT). The NVT employs a novel dynamic analysis technique which combines input fuzzing with dynamic memory tracing. Specifically, the NVT detects when secret data influences a program’s memory trace, i.e., the sequence of instruction fetches and data accesses. Moreover, the NVT signals when a program leaks secret data to a publicly-observable storage channel. The Covert C++ library and the NVT are two components of the broader Covert C++ toolchain. The toolchain also provides a collection of refactoring tools to interactively transform legacy C or C++ code into Covert C++ code. Finally, the dissertation introduces libOblivious, a library to facilitate high-performance memory-trace oblivious computation with Covert C++

    Detection and Exploitation of Information Flow Leaks

    Get PDF
    This thesis contributes to the field of language-based information flow analysis with a focus on detection and exploitation of information flow leaks in programs. To achieve this goal, this thesis presents a number of precise semi-automatic approaches that allow one to detect, exploit and judge the severity of information flow leaks in programs. The first part of the thesis develops an approach to detect and demonstrate information flow leaks in a program. This approach analyses a given program statically using symbolic execution and self-composition with the aim to generate so-called insecurity formulas whose satisfying models (obtained by SMT solvers) give rise to pairs of initial states that demonstrate insecure information flows. Based on these models, small unit test cases, so-called leak demonstrators, are created that check for the detected information flow leaks and fail if these exist. The developed approach is able to deal with unbounded loops and recursive method invocation by using program specifications like loop invariants or method contracts. This allows the approach to be fully precise (if needed) but also to abstract and allow for false positives in exchange for a higher degree of automation and simpler specifications. The approach supports several information flow security policies, namely, noninterference, delimited information release, and information erasure. The second part of the thesis builds upon the previous approach that allows the user to judge the severity of an information flow leak by exploiting the detected leaks in order to infer the secret information. This is achieved by utilizing a hybrid analysis which conducts an adaptive attack by performing a series of experiments. An experiment constitutes a concrete program run which serves to accumulate the knowledge about the secret. Each experiment is carried out with optimal low inputs deduced from the prior distribution and the knowledge of secret so that the potential leakage is maximized. We propose a novel approach to quantify information leakages as explicit functions of low inputs using symbolic execution and parametric model counting. Depending on the chosen security metric, general nonlinear optimization tools or Max-SMT solvers are used to find optimal low inputs, i.e., inputs that cause the program to leak a maximum of information. For the purpose of evaluation, both approaches have been fully implemented in the tool KEG, which is based on the state-of-the-art program verification system KeY. KEG supports a rich subset of sequential Java programs and generates executable JUnit tests as leak demonstrators. For the secret inference, KEG produces executable Java programs and runs them to perform the adaptive attack. The thesis discusses the planning, execution, and results of the evaluation. The evaluation has been performed on a collection of micro-benchmarks as well as two case studies, which are taken from the literature. The evaluation using the micro-benchmarks shows that KEG detects successfully all information flow leaks and is able to generate correct demonstrators in case the supplied specifications are correct and strong enough. With respect to secret inference, it shows that the approach presented in this thesis (which computes optimal low inputs) helps an attacker to learn the secret much more efficiently compared to approaches using arbitrary low inputs. KEG has also been evaluated in two case studies. The first case study is performed on an e-voting software which has been extracted in a simplified form from a real-world e-voting system. This case study focuses on the leak detection and demonstrator generation approach. The e-voting case study shows that KEG is able to deal with relatively complicated programs that include unbounded loops, objects, and arrays. Moreover, the case study demonstrates that KEG can be integrated with a specification generation tool to obtain both precision and full automation. The second case study is conducted on a PIN integrity checking program, adapted from a real-world ATM PIN verifying system. This case study mainly demonstrates the secret inference feature of KEG. It shows that KEG can help an attacker to learn the secret more efficiently given a good enough assumption about the prior distribution of secret

    Assured Android Execution Environments

    Get PDF
    Current cybersecurity best practices, techniques, tactics and procedures are insufficient to ensure the protection of Android systems. Software tools leveraging formal methods use mathematical means to assure both a design and implementation for a system and these methods can be used to provide security assurances. The goal of this research is to determine methods of assuring isolation when executing Android software in a contained environment. Specifically, this research demonstrates security properties relevant to Android software containers can be formally captured and validated, and that an implementation can be formally verified to satisfy a corresponding specification. A three-stage methodology called The Formal Verification Cycle is presented. This cycle focuses on the iteration over a set of security properties to validate each within a specification and their verification within a software implementation. A security property can be validated when its functional language prototype (e.g. a Haskell coded version of the property) is converted and processed by a formal method (e.g. a theorem proof assistant). This validation of the property enables the definition of the property in a software specification, which can be implemented separately in an imperative programming language (e.g. the Go programming language). Once the implementation is complete another formal method can be used (e.g. symbolic execution) to verify the imperative implementation satisfies the validated specification. Successful completion of this cycle shows a given implementation is equivalent to a functional language prototype, and this cycle assures a specification for the original desired security properties was properly implemented. This research shows an application of this cycle to develop Assured Android Execution Environments

    Using embedded hardware monitor cores in critical computer systems

    Get PDF
    The integration of FPGA devices in many different architectures and services makes monitoring and real time detection of errors an important concern in FPGA system design. A monitor is a tool, or a set of tools, that facilitate analytic measurements in observing a given system. The goal of these observations is usually the performance analysis and optimisation, or the surveillance of the system. However, System-on-Chip (SoC) based designs leave few points to attach external tools such as logic analysers. Thus, an embedded error detection core that allows observation of critical system nodes (such as processor cores and buses) should enforce the operation of the FPGA-based system, in order to prevent system failures. The core should not interfere with system performance and must ensure timely detection of errors. This thesis is an investigation onto how a robust hardware-monitoring module can be efficiently integrated in a target PCI board (with FPGA-based application processing features) which is part of a critical computing system. [Continues.

    Principles of Security and Trust: 7th International Conference, POST 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings

    Get PDF
    authentication; computer science; computer software selection and evaluation; cryptography; data privacy; formal logic; formal methods; formal specification; internet; privacy; program compilers; programming languages; security analysis; security systems; semantics; separation logic; software engineering; specifications; verification; world wide we
    corecore