101 research outputs found
HyperDbg: Reinventing Hardware-Assisted Debugging (Extended Version)
Software analysis, debugging, and reverse engineering have a crucial impact
in today's software industry. Efficient and stealthy debuggers are especially
relevant for malware analysis. However, existing debugging platforms fail to
address a transparent, effective, and high-performance low-level debugger due
to their detectable fingerprints, complexity, and implementation restrictions.
In this paper, we present HyperDbg, a new hypervisor-assisted debugger for
high-performance and stealthy debugging of user and kernel applications. To
accomplish this, HyperDbg relies on state-of-the-art hardware features
available in today's CPUs, such as VT-x and extended page tables. In contrast
to other widely used existing debuggers, we design HyperDbg using a custom
hypervisor, making it independent of OS functionality or API. We propose
hardware-based instruction-level emulation and OS-level API hooking via
extended page tables to increase the stealthiness. Our results of the dynamic
analysis of 10,853 malware samples show that HyperDbg's stealthiness allows
debugging on average 22% and 26% more samples than WinDbg and x64dbg,
respectively. Moreover, in contrast to existing debuggers, HyperDbg is not
detected by any of the 13 tested packers and protectors. We improve the
performance over other debuggers by deploying a VMX-compatible script engine,
eliminating unnecessary context switches. Our experiment on three concrete
debugging scenarios shows that compared to WinDbg as the only kernel debugger,
HyperDbg performs step-in, conditional breaks, and syscall recording, 2.98x,
1319x, and 2018x faster, respectively. We finally show real-world applications,
such as a 0-day analysis, structure reconstruction for reverse engineering,
software performance analysis, and code-coverage analysis
HyperDbg: Reinventing Hardware-Assisted Debugging
Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions.
In this paper, we present StealthDbg, a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, StealthDbg relies on state-of-the-art hardware features available in today's CPUs, such as VT-x and extended page tables. In contrast to other widely used existing debuggers, we design StealthDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of 10,853 malware samples show that StealthDbg's stealthiness allows debugging on average 22% and 26% more samples than WinDbg and x64dbg, respectively. Moreover, in contrast to existing debuggers, StealthDbg is not detected by any of the 13 tested packers and protectors. We improve the performance over other debuggers by deploying a VMX-compatible script engine, eliminating unnecessary context switches. Our experiment on three concrete debugging scenarios shows that compared to WinDbg as the only kernel debugger, StealthDbg performs step-in, conditional breaks, and syscall recording, 2.98x, 1319x, and 2018x faster, respectively. We finally show real-world applications, such as a 0-day analysis, structure reconstruction for reverse engineering, software performance analysis, and code-coverage analysis
Recommended from our members
CheriABI: Enforcing Valid Pointer Provenance and Minimizing Pointer Privilege in the POSIX C Run-time Environment
The CHERI architecture allows pointers to be implemented as capabilities (rather than integer virtual addresses) in a manner that is compatible with, and strengthens, the semantics of the C language. In addition to the spatial protections offered by conventional fat pointers, CHERI capabilities offer strong integrity, enforced provenance validity, and access monotonicity. The stronger guarantees of these architectural capabilities must be reconciled with the real-world behavior of operating systems, run-time environments, and applications. When the process model, user-kernel interactions, dynamic linking, and memory management are all considered, we observe that simple derivation of architectural capabilities is insufficient to describe appropriate access to memory. We bridge this conceptual gap with a notional \emph{abstract capability} that describes the accesses that should be allowed at a given point in execution, whether in the kernel or userspace. To investigate this notion at scale, we describe the first adaptation of a full C-language operating system (FreeBSD) with an enterprise database (PostgreSQL) for complete spatial and referential memory safety. We show that awareness of abstract capabilities, coupled with CHERI architectural capabilities, can provide more complete protection, strong compatibility, and acceptable performance overhead compared with the pre-CHERI baseline and software-only approaches. Our observations also have potentially significant implications for other mitigation techniques.This work was supported by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contracts FA8750-10-C-0237 (``CTSRD'') and HR0011-18-C-0016 (``ECATS''). The views, opinions, and/or findings contained in this report are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. We also acknowledge the EPSRC REMS Programme Grant (EP/K008528/1), the ERC ELVER Advanced Grant (789108), Arm Limited, HP Enterprise, and Google, Inc. Approved for Public Release, Distribution Unlimited
Low-Impact System Performance Analysis Using Hardware Assisted Tracing Techniques
RÉSUMÉ
Les applications modernes sont difficiles à diagnostiquer avec les outils de débogage et de profilage traditionnels. Dans les systèmes de production, la première priorité est de minimiser la perturbation sur l'application cible. Les outils de traçage sont très appropriés pour l'étude des performances de tels systèmes car les événements sont enregistrés et l'analyse se fait a posteriori. Une des principales exigences des systèmes de traçage est le faible surcoût. L'activation d'un nombre réduit d'événements aide à respecter cette exigence, mais au prix de la diminution de la granularité de la trace.
Dans cette thèse, nous présentons notre travail de recherche qui traite du problème de la granularité limitée des traces en maintenant un faible surcoût sur les applications cibles. Nous présentons de nouvelles techniques et algorithmes qui abordent le problème en se basant d'abord sur une approche de filtrage logiciel et de traçage coopératif, puis en explorant des mécanismes plus avancés de traçage matériel. Nous avons proposé une approche efficace de traçage conditionnel dans l'espace noyau et utilisateur qui se base sur des mécanismes de filtrages compilés en code natif.
Afin d'atteindre l'objectif d'avoir une trace détaillée du système, nous expliquons que les processeurs modernes contiennent des blocs de traçage matériel qui n'ont pas encore été entièrement exploités dans le domaine du traçage. Nous caractérisons leur performance et nous analysons les paquets de traces, leur relation avec l'exécution du logiciel, et les possibilités de les utiliser pour une trace détaillée.
Nous proposons des techniques à faible surcoût, assistées par le matériel, rendant possible une analyse détaillée permettant la détection des latences d'interruption et des appels systèmes. Nous présentons aussi une nouvelle technique qui se base sur les paquets de trace à bas niveau du processeur pour analyser efficacement les processus et les ressources utilisées dans une machine virtuelle. De plus, nous avons identifié et solutionné des problèmes reliés au traçage matériel en utilisant l'assistance logicielle du système d'exploitation, ouvrant ainsi la voie à des recherches plus approfondies sur les approches coopératives de traçage matériel-logiciel.
Comme nos techniques sont axées sur les exigences du traçage à haute vitesse dans les systèmes embarqués et de production traitant des transactions à haute fréquence, nous avons constaté que nos progrès dans le domaine du traçage matériel-logiciel se sont avérés très utiles pour détecter la contention des ressources et la latence dans les systèmes.----------ABSTRACT
Modern applications are becoming hard to diagnose using traditional debugging and profiling tools. For production systems the first priority is to have minimal disturbance on the target application. To analyze performance of such systems, tracing tools are imperative where events can be logged and analyzed post-execution. One of the key requirements for tracing solutions however, is low overhead. A generic solution can be to select only a few events to trace, but at the cost of trace granularity. In this thesis, we present our research work that deals with the problem of lack of high granularity in traces while maintaining a low-overhead on target applications. We present our new techniques and algorithms that approach the problem initially from a software filtering and co-operative tracing approach, and then explore more advanced hardware tracing mechanisms that can be used. We have proposed an efficient kernel and userspace conditional tracing approach, with an enhanced native compiled filtering mechanism. Continuing towards our goal to have a detailed trace of a system, we further discuss how modern processors contain new hardware tracing blocks that have not yet been fully explored and exploited in the tracing domain. We characterize their performance and analyze the trace packets, their relation with software executions and opportunities to utilize them for a detailed trace. We therefore propose low-overhead hardware assisted techniques that allow a fine grained instruction based interrupt and system call latency detection mechanism. We also present a new algorithm that shows how such low-level trace packets coming directly from the processor, can be effectively utilized to analyze even the processes or resources consumed inside a VM. We have also identified and improved upon issues related to hardware tracing itself using software assistance from operating systems thus laying out ground for further research in hardware-software co-operative tracing approaches. As our techniques are focused towards requirements of high speed tracing in embedded or production systems, catering high frequency transactions, we have found that our advancements in the hardware-software domain have proved to be invaluable in detecting resource contention and latency in systems
OS-level Attacks and Defenses: from Software to Hardware-based Exploits
Run-time attacks have plagued computer systems for more than three decades, with control-flow hijacking attacks such as return-oriented programming representing the long-standing state-of-the-art in memory-corruption based exploits. These attacks exploit memory-corruption vulnerabilities in widely deployed software, e.g., through malicious inputs, to gain full control over the platform remotely at run time, and many defenses have been proposed and thoroughly studied in the past. Among those defenses, control-flow integrity emerged as a powerful and effective protection against code-reuse attacks in practice. As a result, we now start to see attackers shifting their focus towards novel techniques through a number of increasingly sophisticated attacks that combine software and hardware vulnerabilities to construct successful exploits. These emerging attacks have a high impact on computer security, since they completely bypass existing defenses that assume either hardware or software adversaries. For instance, they leverage physical effects to provoke hardware faults or force the system into transient micro-architectural states. This enables adversaries to exploit hardware vulnerabilities from software without requiring physical presence or software bugs.
In this dissertation, we explore the real-world threat of hardware and software-based run-time attacks against operating systems. While memory-corruption-based exploits have been studied for more than three decades, we show that data-only attacks can completely bypass state-of-the-art defenses such as Control-Flow Integrity which are also deployed in practice. Additionally, hardware vulnerabilities such as Rowhammer, CLKScrew, and Meltdown enable sophisticated adversaries to exploit the system remotely at run time without requiring any memory-corruption vulnerabilities in the system’s software. We develop novel design strategies to defend the OS against hardware-based attacks such as Rowhammer and Meltdown to tackle the limitations of existing defenses. First, we present two novel data-only attacks that completely break current code-reuse defenses deployed in real-world software and propose a randomization-based defense against such data-only attacks in the kernel. Second, we introduce a compiler-based framework to automatically uncover memory-corruption vulnerabilities in real-world kernel code. Third, we demonstrate the threat of Rowhammer-based attacks in security-sensitive applications and how to enable a partitioning policy in the system’s physical memory allocator to effectively and efficiently defend against such attacks. We demonstrate feasibility and real-world performance through our prototype for the popular and widely used Linux kernel. Finally, we develop a side-channel defense to eliminate Meltdown-style cache attacks by strictly isolating the address space of kernel and user memory
ISA semantics for ARMV8-A, RISC-V, and ChERI-MIPs
Architecture specifications notionally define the fundamental interface between hardware and software: the envelope of allowed behaviour for processor implementations, and the basic assumptions for software development and verification. But in practice, they are typically prose and pseudocode documents, not rigorous or executable artifacts, leaving software and verification on shaky ground.
In this paper, we present rigorous semantic models for the sequential behaviour of large parts of the mainstream ARMv8-A, RISC-V, and MIPS architectures, and the research CHERI-MIPS architecture, that are complete enough to boot operating systems, variously Linux, FreeBSD, or seL4. Our ARMv8-A models are automatically translated from authoritative ARM-internal definitions, and (in one variant) tested against the ARM Architecture Validation Suite.
We do this using a custom language for ISA semantics, Sail, with a lightweight dependent type system, that supports automatic generation of emulator code in C and OCaml, and automatic generation of proof-assistant definitions for Isabelle, HOL4, and (currently only for MIPS) Coq. We use the former for validation, and to assess specification coverage. To demonstrate the usability of the latter, we prove (in Isabelle) correctness of a purely functional characterisation of ARMv8-A address translation. We moreover integrate the RISC-V model into the RMEM tool for (user-mode) relaxed-memory concurrency exploration. We prove (on paper) the soundness of the core Sail type system.
We thereby take a big step towards making the architectural abstraction actually well-defined, establishing foundations for verification and reasoning.</jats:p
TEDDI: Tamper Event Detection on Distributed Cyber-Physical Systems
Edge devices, or embedded devices installed along the periphery of a power grid SCADA network, pose a significant threat to the grid, as they give attackers a convenient entry point to access and cause damage to other essential equipment in substations and control centers. Grid defenders would like to protect these edge devices from being accessed and tampered with, but they are hindered by the grid defender\u27s dilemma; more specifically, the range and nature of tamper events faced by the grid (particularly distributed events), the prioritization of grid availability, the high costs of improper responses, and the resource constraints of both grid networks and the defenders that run them makes prior work in the tamper and intrusion protection fields infeasible to apply. In this thesis, we give a detailed description of the grid defender\u27s dilemma, and introduce TEDDI (Tamper Event Detection on Distributed Infrastructure), a distributed, sensor-based tamper protection system built to solve this dilemma. TEDDI\u27s distributed architecture and use of a factor graph fusion algorithm gives grid defenders the power to detect and differentiate between tamper events, and also gives defenders the flexibility to tailor specific responses for each event. We also propose the TEDDI Generation Tool, which allows us to capture the defender\u27s intuition about tamper events, and assists defenders in constructing a custom TEDDI system for their network. To evaluate TEDDI, we collected and constructed twelve different tamper scenarios, and show how TEDDI can detect all of these events and solve the grid defender\u27s dilemma. In our experiments, TEDDI demonstrated an event detection accuracy level of over 99% at both the information and decision point levels, and could process a 99-node factor graph in under 233 microseconds. We also analyzed the time and resources needed to use TEDDI, and show how it requires less up-front configuration effort than current tamper protection solutions
Identifying Code Injection and Reuse Payloads In Memory Error Exploits
Today's most widely exploited applications are the web browsers and document readers we use every day. The immediate goal of these attacks is to compromise target systems by executing a snippet of malicious code in the context of the exploited application. Technical tactics used to achieve this can be classified as either code injection - wherein malicious instructions are directly injected into the vulnerable program - or code reuse, where bits of existing program code are pieced together to form malicious logic. In this thesis, I present a new code reuse strategy that bypasses existing and up-and-coming mitigations, and two methods for detecting attacks by identifying the presence of code injection or reuse payloads. Fine-grained address space layout randomization efficiently scrambles program code, limiting one's ability to predict the location of useful instructions to construct a code reuse payload. To expose the inadequacy of this exploit mitigation, a technique for "just-in-time" exploitation is developed. This new technique maps memory on-the-fly and compiles a code reuse payload at runtime to ensure it works in a randomized application. The attack also works in face of all other widely deployed mitigations, as demonstrated with a proof-of-concept attack against Internet Explorer 10 in Windows 8. This motivates the need for detection of such exploits rather than solely relying on prevention. Two new techniques are presented for detecting attacks by identifying the presence of a payload. Code reuse payloads are identified by first taking a memory snapshot of the target application, then statically profiling the memory for chains of code pointers that reuse code to implement malicious logic. Code injection payloads are identified with runtime heuristics by leveraging hardware virtualization for efficient sandboxed execution of all buffers in memory. Employing both detection methods together to scan program memory takes about a second and produces negligible false positives and false negatives provided that the given exploit is functional and triggered in the target application version. Compared to other strategies, such as the use of signatures, this approach requires relatively little effort spent on maintenance over time and is capable of detecting never before seen attacks. Moving forward, one could use these contributions to form the basis of a unique and effective network intrusion detection system (NIDS) to augment existing systems.Doctor of Philosoph
Recommended from our members
Exploitation from Malicious PCI Express Peripherals
The thesis of this dissertation is that, despite widespread belief in the security community, systems are still vulnerable to attacks from malicious peripherals delivered over the PCI Express (PCIe) protocol.
Malicious peripherals can be plugged directly into internal PCIe slots, or connected via an external Thunderbolt connection.
To prove this thesis, we designed and built a new PCIe attack platform.
We discovered that a simple platform was insufficient to carry out complex attacks, so created the first PCIe attack platform that runs a full, conventional OS.
To allows us to conduct attacks against higher-level OS functionality built on PCIe, we made the attack platform emulate in detail the behaviour of an Intel 82574L Network Interface Controller (NIC), by using a device model extracted from the QEMU emulator.
We discovered a number of vulnerabilities in the PCIe protocol itself, and with the way that the defence mechanisms it provides are used by modern OSs.
The principal defence mechanism provided is the Input/Output Memory Management Unit (IOMMU).
The remaps the address space used by peripherals in 4KiB chunks, and can prevent access to areas of address space that a peripheral should not be able to access.
We found that, contrary to belief in the security community, the IOMMUs in modern systems were not designed to protect against attacks from malicious peripherals, but to allow virtual machines direct access to real hardware.
We discovered that use of the IOMMU is patchy even in modern operating systems.
Windows effectively does not use the IOMMU at all; macOS opens windows that are shared by all devices; Linux and FreeBSD map windows into host memory separately for each device, but only if poorly documented boot flags are used.
These OSs make no effort to ensure that only data that should be visible to the devices is in the mapped windows.
We created novel attacks that subverted control flow and read private data against systems running macOS, Linux and FreeBSD with the highest level of relevant protection enabled.
These represent the first use of the relevant exploits in each case.
In the final part of this thesis, we evaluate the suitability of a number of proposed general purpose and specific mitigations against DMA attacks, and make a number of recommendations about future directions in IOMMU software and hardware.EPSRC and ARM iCASE Awar
- …