Virtual Breakpoints for x86/64 by Price, Gregory Michael
Virtual Breakpoints for x86/64
Gregory Price1∗†
August 22, 2019
Abstract
Efficient, reliable trapping of execution in a pro-
gram at the desired location is a linchpin tech-
nique for dynamic malware analysis. The pro-
gression of debuggers and malware is akin to a
game of cat and mouse - each are constantly in
a state of trying to thwart one another. At the
core of most efficient debuggers today is a com-
bination of virtual machines and traditional bi-
nary modification breakpoints (int3). In this pa-
per, we present a design for Virtual Breakpoints
— a modification to the x86 MMU which brings
breakpoint management into hardware alongside
page tables. In this paper we demonstrate the
fundamental abstraction failures of current trap-
ping methods, and design a new mechanism from
the hardware up. This design incorporates lessons
learned from 50 years of virtualization and de-
bugger design to deliver fast, reliable trapping
without the pitfalls of traditional binary modifi-
cation.
1 Introduction
In the modern age of malware analysis and “fuzzing
for dollars”, security researchers still seek a ro-
bust, transparent, efficient trapping system. The
field heavily leverages virtualization technologies
[2, 5, 6, 7, 8, 9, 14, 10]. Unfortunately, these tech-
nologies inherit the failings of traditional trap-
ping methods while adding the complexity of
architecture-specific implementations.
∗*This work was supported by Raytheon CSI
†1Gregory Price is price.gr at husky.neu.edu
Debuggers [1] and other instrumentation tools
[3][4][7][11] are becoming increasingly complex as
malware deploys equally complex anti-analysis
techniques. Many techniques require special mem-
ory allocators, compilers, or complex systems to
control the “view” of memory [10, 2]. In some
cases, the efficiency or robustness of these tech-
niques rely on architecture-level quirks that are
more of a happy accident, rather than intentional
design. For example, SPIDER breakpoints [2]
rely on a caching quirk of Intel CPU’s to remain
efficient, and many dynamic binary instrumenta-
tion systems [10] are a mess of trampolines and
runtime hooking.
Despite the growing complexity, all of these
systems must trade between efficiency, reliabil-
ity, and transparency [4]. If a trapping mech-
anism introduces significant overhead, analysis
becomes tedious. Traps that can be bypassed
via detection or eviction inherently unreliable.
Traps that modify a target program’s memory
can cause corruption. These trade-offs create
a game of whack-a-mole, which suggests we are
simply treating symptoms, rather than address-
ing the disease.
Trapping solutions such as single-stepping,
debug registers, and full system emulation have
been tried, but all fail to meet the flexibility
and efficiency requirements to reach wide-spread
adoption. Single stepping a large, complex pro-
gram (like an OS) is extremely inefficient [1][2].
Usage of debug registers can be detected[2]. Fi-
nally, pure-emulation is simply no longer suffi-
cient to run and debug modern operating sys-
tems.
ar
X
iv
:1
80
1.
09
25
0v
3 
 [c
s.O
S]
  2
0 A
ug
 20
19
Researchers have made significant headway
in solving reliability and efficiency problems by
leveraging virtualization[4, 2, 3, 8], but most still
rely on some form of binary modification. The
difficulty with binary modification for debugging
hostile programs is the lack of assurance the mod-
ifications are not a) detected, b) bypassed, or
c) introducing undefined behavior. This is a
result of these trapping mechanisms being de-
signed prior to the advent of modern security
and reverse engineering requirements.
“Cutting edge” x86/64 debugging systems such
as SPIDER [2] may accomplish (to an extent)
the goal of stealth and efficiency, but still fail
to mitigate the corruption and reliability prob-
lem with binary modification breakpoints. Bi-
nary modification systems carry a presumption
of well-behaved programs and lack of user error.
For example, if a 5-byte instruction is located at
memory address 0x10, and a trap instruction is
inserted at memory address 0x12, the resulting
behavior could be a range of unintended con-
sequences (Illegal Instructions, Jumps to Ran-
dom Memory, Memory Corruption, etc). SPI-
DER cannot handle this modification or user-
error case directly, handling the degenerate case
by simply evicting the trap and falling back to
emulation (trading reliability for efficiency).
In this paper, we propose a memory man-
agement unit extension for virtual machine de-
bugging that addresses each of these core issues.
We claim that the corruption issue is the criti-
cal piece of the puzzle that, if solved, will lead
to robust, efficient debuggers at both the system
and virtual machine level. The current field of
debuggers attempt to fix this fundamentally un-
solvable problem by way of building “a better
mousetrap”, but the problem they are attempt-
ing to solve reduces to the halting problem.
Our proposed solution introduces a “break-
point buddy-frame” and adds a “breakpoint bit”
to page table entries. When a byte on a page
with this bit set is accessed, the breakpoint frame
is referenced (in hardware) to determine if that
address has a breakpoint set. This breakpoint in-
formation is stored on a byte-per-byte basis with
the data frame.
For each byte read from a breakpointed page,
an 8-bit value is retrieved from the breakpoint
buddy-frame and used to determine if an inter-
rupt is generated. This byte implements stan-
dard read/write/execute breakpoint settings, and
generates a debug break prior to executing the
target instruction if the conditions are met. The
remaining bits remain open for future develop-
ment.
By doing away with binary modification as
the de facto standard of breakpointing, we gain
reliability, transparency, and guaranteed correct-
ness of target program execution — all without
sacrificing flexibility or efficiency.
2 Background
Debuggers and dynamic analysis tools tradition-
ally implement breakpoints in one of three ways:
single-stepping, debug registers, and binary mod-
ification [1]. Each mechanism is not without
their flaws, and much research [3, 4, 5, 6, 7, 8, 11]
has been done on the topic of mitigating the ver-
itable list of issues.
In a review of debugging technology published
in 1990[16], Vernon Paxson lays out a list of
techniques for debugging software that is eerily
familiar. Despite almost 30 years of research
since then, few if any new hardware-supported
debugging capabilities have been developed by
major hardware manufacturers. Even the “new”
ARMv8.5 Memory Tagging Extensions (MTE)
are not truly new, with Paxson discussing fully
tagged architectures existing as far back as 1982
(possibly earlier).
Despite modern tools trying to mitigate the
flaws of dated trapping mechanisms, each subse-
quent system increases complexity, reduces effi-
ciency, or narrows in applicability. All still fall
prey to anti-debugging techniques such as timing
attacks, code integrity checks, and just-in-time
compilation or code relocation.
2.1 Traditional Trapping Methods
Each established method of trapping exhibits their
own unique failures. Each battles with trying to
achieve flexibility, efficiency, transparency, and
reliability - but none seem to solve for all four.
Single-stepping approaches make executing
large, complex programs unbearably slow. Typi-
cally implemented via the use of the eflags/rflags
register, a full context switch between debug-
ger and debuggee is required on each instruc-
tion. This can push execution time to be orders
of magnitude longer than the original program
[2]. Even emulation approaches, which can be
viewed as a form of binary interpreter, are sim-
ply too slow for general use.
Debug Registers are an efficient but finite re-
source that are not practical nor easily virtu-
alized. On x86 there are only 4 debug regis-
ters (DR0-DR4), limiting a debugger to 4 to-
tal watch/break points. Further, because this is
a physical register limitation, the debuggee can
typically detect whether these registers are being
used [2][10].
Finally, while binary modification meets the
requirement of efficient execution, it is easily de-
tected by a debuggee which monitors the integrity
of its own code [2, 10, 5, 6]. On x86, these traps
are implemented by placing an “int3” (debug
break) instruction at the given address. To see
the problem with this technique, imagine a de-
buggee that periodically hashes its entire read-
only codebase. It would be able to detect this
change, and modify its behavior.
It is still not apparent whether these tradi-
tional breakpointing methods can meet the re-
quirements of an “optimal” solution. In-fact, we
claim binary modification produces an abstrac-
tion that may guarantee it can never provide an
“optimal” solution.
2.2 “Stealthy” VM Breakpoints
The first virtual machine breakpointing systems
utilized binary modification as a primary trap-
ping mechanism. This carried with it all the typ-
ical headaches, namely that it is easily detected
and may cause corruption. Much research has
been done to hide these breakpoints, but only
recently has an efficient solution emerged [2].
When Intel and AMD released virtualized Page
Table support (known as Extended or Nested
Page Tables [12, 13]), the idea of transparent bi-
nary modification came to fruition with SPIDER
[2]. SPIDER introduced a binary modification
mechanism that made use of extended page ta-
bles to split the “read-write” view of guest data
from the “execute” view of data. By doing so, a
guest cannot view a trap set via binary modifi-
cation, because the guest may only read from a
sanitized view of memory.
Unfortunately, this transparent breakpoint-
ing system still fails to be sufficiently flexible
and reliable. First, it falls victim to the “Crit-
ical Byte Problem” which will be described in
the next section. Second, because the host must
maintain consistency between data and execu-
tion views, the guest still has a mechanism (writ-
ing to its code pages) with which to for a trap
eviction.
Moving forward, we will demonstrate that
systems relying on binary modification cannot
achieve perfect flexibility and reliability. If we
hope to accomplish truly transparent and effi-
cient breakpointing, then we must design it from
the hardware up.
3 Overview
In this section we discuss the goals of an “opti-
mal” trapping system. Next, we will construct
an abstraction of how present-day trapping sys-
tems are implemented and identify violations of
our goals. What we find is that current systems
are attempting to solve an impossible problem.
3.1 Goals
To start, we borrow the requirements laid out by
SPIDER[2], with one modification made to the
requirement of Flexibility.
1. Flexibility : Ability to set a trap at any
memory address.
2. Efficiency : Maintain high performance
3. Transparency : The target program should
not be able to detect breakpoints
4. Reliability : The target program should
not be able to bypass or tamper with break-
points.
When we modify flexibility to state any mem-
ory address instead of any instruction, we find
that no binary modification solution can accom-
plish all four requirements. Even the best sys-
tem, given a degenerate case, falls back on al-
ternate trapping methods which violate at least
one requirement. For example, when a debuggee
overwrites instrumented code under SPIDER, bi-
nary modification traps must be evicted from the
modified page to avoid introducing undefined be-
havior. We have dubbed this issue the ”Critical
Byte Problem”.
3.2 The “Critical Byte Problem”
Binary modification breakpoints depend on the
execution of an instruction to trigger a break-
point interrupt (Figure 1). On x86, this is ac-
complished by replacing the first byte of an in-
struction with an int3 instruction (binary 0xCC)
(Figure 2). This dependence on execution of de-
buggee code to determine debugger behavior is a
form of implicit trust. The debugger trusts the
binary will behavior nicely, while the debuggee
trusts that the modification will not introduce
undefined behavior.
This causes what we have dubbed the “Criti-
cal Byte Problem”. If a binary modification trap
is set on any byte other than the first of an in-
struction, it introduces undefined behavior. Fig-
ure 2 shows an example of how a jmp instruction
can be overwritten with an int3 instruction, or
modified to jump to the incorrect place.
The processor reads and evaluates the first
byte of an instruction to determine the general
Figure 1: Standard x86 processor Read-Eval-
Execute loop. Breakpoints fire on execute.
behavior and the full length of the instruction.
When a trap is aligned correctly, the instruction
is effectively replaced in its entirety. When a
trap is misaligned, the 0xCC byte is decoded as
part of the jump address.
Figure 2: Misaligned breakpoints cause unde-
fined behavior
What the figure 2 shows is that our “optimal
system” should not depend on the execution of
debuggee instructions, nor should it modify de-
buggee data. We should avoid trusting the de-
buggee to provide debugger functionality, and we
should maintain the integrity of debuggee data
to guarantee correctness.
Systems such as SPIDER attempt to build
complex solutions to this problem, but only end
up trading efficiency for reliability. Their sys-
tem does not correct for user-error, and when
instrumented code pages are modified, a “Code
Modification Handler” is invoked — usually to
evict the trap and fall back to emulation. Fur-
ther, no binary modification breakpoint system
can programmatically re-set a breakpoint over
data that has been overwritten by the debuggee,
as determining where in an arbitrary program is
“safe” to instrument maps directly to the halt-
ing problem (the problem depends entirely on
the behavior of the program).
What we find is that the critical byte problem
is a symptom of a larger abstraction failure. The
next section will de-construct this failure, and
show how providing a proper virtualization layer
can mitigate the problem.
3.3 Memory Abstractions
It’s useful to step back and look at how data
in a virtualized system is segregated by owner-
ship. First, in Figure 3, we examine virtual mem-
ory implementation on x86. The page tables, in
software or in hardware, can be viewed as being
owned by the Operating System. Conversely, the
contents of the data frame can be viewed as being
owned by the software. Most Operating Systems
do not allow programs to directly modify page
table contents for security reasons. This concept
of virtual memory will be useful in designing our
optimal breakpoint solution later.
Figure 3: Virtual Memory on x86
Virtualization extensions for x86 on both In-
tel [12] and AMD [13] processors provide a sec-
ond layer of memory virtualization for virtual
machines. Extended (Intel) or Nested (AMD)
Page Tables, as in figure 4, create two layers of
page tables, where each guest memory reference
must go through a second layer of translation to
reach the physical memory frame. These page
tables are appropriately named the “Guest Page
Table” and the “Host Page Table”.
Figure 4: Extended Page Tables still share a
single frame for data and breakpoints
When we build a similar abstraction model
for SPIDER (figure 5 ), we see a pattern emerge
when we focus on “ownership” of resources.
Figure 5: SPIDER also uses a frame containing
data and breakpoints
In all three abstractions, we can see that de-
bugger and debuggee data are contained within
the same frame when binary modification traps
are introduced. A userland debugger, such as
GDB, places int3 instructions directly into the
program’s data frame. The same mechanism is
used in naive virtual machine debugging, as seen
in Figure 4. Likewise under SPIDER, despite the
read/write view of program data being sanitized
of debugger data, the execution view still holds
both debugger and debuggee data.
Given this type of trap and debugger design
design, an instrumented program can trivially
read and/or interact with breakpoints — or cause
a change in debugger behavior (via eviction). A
program debugged under SPIDER, despite not
being able to see the breakpoints, can still force
an eviction by blindly overwriting its own code
pages (a simple code-migration technique may
be sufficient). There exists no mechanism with
which to prevent this, as the program is trusted
with complete control over its data.
In summary, even the most effective mech-
anism today fails to fully mitigate breakpoint
eviction (either outright, or efficiently) due to
placing debugger and debuggee data in the same
data frame. When a breakpoint overwrite oc-
curs, current solutions must either fall back to
another execution method, use a more complex
trapping mechanism, or simply evict the break-
point all together.
In the following section, we will propose a
system which entirely separates debuggee data
from debugger data (breakpoints). This simpli-
fies the handling of degenerate cases, mitigates
the “Critical Byte Problem”, and inherently pro-
vides transparency.
4 Design
When new hardware-supported trap mechanisms
stopped appearing, malware analysis was not a
global industry, and the challenges and require-
ments related to the work hardly dreamt of. Given
this, it is worth exploring extensions which may
require new hardware support. Our focus will
be on solving the “Critical-Byte Problem” with
a new virtualization extension to the x86 Mem-
ory Management Unit (MMU).
When modifying the memory management
unit of a processor, some care must be taken
to extend it in such a way that is reasonably
friendly to existing operating systems. Our de-
sign is no exception, so we must identify a solu-
tion that leverages existing Operating Systems
structures. Likewise, we must retain backwards
compatibility with all existing trapping mecha-
nisms.
4.1 Breakpoint Virtualization Layer
Unlike developing software breakpoint systems,
a similar hardware-based breakpoint virtualiza-
tion system has extremely strict limitations and
requirements. We claim any MMU modification
that supports a separate view of data and break-
points must adhere to (at a minimum) the fol-
lowing three requirements to have a chance at
being adopted:
1. TLB Compatibility
2. Hardware-Friendly Translation Mechanism
3. Flexible Allocation and Use
To be TLB compatible, any design must be
limited to using only the machine physical ad-
dress and the page size. When using a virtual
machine, the use of host virtual addresses is pro-
hibited because the TLB stores only a direct
translation from guest virtual to machine phys-
ical address[12]. We are afforded the page size
from the MMU being set up by the operating
system to walk a predetermined structure, how-
ever this could also be configurable via a new
Model Specific Register.
Next, the mechanism of translation from a
data address to a breakpoint address must be
simple and fast. Requiring multiple additional
memory dereferences during an instruction or
data fetch may not be feasible.
Finally, the mechanism must be optional. Re-
quiring constant use on all pages would effec-
tively halve the size of usable memory and in-
troduce significant overhead. While this may be
considered a fair trade off for a specialized sys-
tem, a general purpose system must allow op-
tional use.
Figure 6: Individually contiguous buddy frames
avoid the need for large contiguous areas of mem-
ory
When applying the first two requirements strictly,
we find a design that uses physically contigu-
ous page frames to be the obvious solution. We
call this a Buddy Frame. Implementing buddy
frames on a per-frame basis, rather than in bulk,
allows us to avoid allocating large contiguous
areas memory (figure 6 ). This frame could be
global (1 per physical frame) or per-cpu (1 buddy
frame per physical or virtual cpu).
A per-cpu buddy frame system would require
the use of the cpu id (0-n) during address transla-
tion to determine the offset from the data frame
to buddy frame. It may require additional TLB
and Cache invalidation as well. There are some
complexities with a per-cpu option that are not
considered within this work. We leave this de-
sign extension up to an implementer to experi-
ment with. For the remainder of this work, we
will work with a single buddy frame design.
The translation from data address to break-
point address is then a simple addition of the
page-size (figure 7 ). This also makes our solu-
tion extensible to operating systems configured
Figure 7: Virtualization layer to translate phys-
ical address to buddy frame addresses
to use pages larger than the standard 4KB as
long as this information is retrievable in hard-
ware by the MMU.
The last requirement dictates an agreement
between the operating system and the MMU about
how page frames will be allocated and managed.
This is accomplished through the use of a page
table entry which has software-defined bits.
Figure 8: Virtual Breakpoints. Data is
“debuggee-only”, Breakpoints are “debugger-
only”.
We propose the addition of a “Breakpoint
Bit” to the page table entry (figure 8). Much
like the “present bit” determines whether the
MMU produces a page-fault, the breakpoint-bit
would determine whether the MMU would do a
subsequent breakpoint-frame lookup for an ac-
cessed address. Breakpoint info would be stored
in byte-for-byte parity with the data frame. Fig-
ure 8 also shows that our new “ownership” ab-
straction has resolved the issue of debugger and
debuggee data living within the same data frame.
This design can be seen as applying the idea
of Virtual Memory (Figure 3) to the concept of
breakpoints. This is the provenance of the name
Virtual Breakpoint.
4.2 OS Modifications
Given the above hardware implementation, the
OS modifications to support virtual breakpoints
are straight-forward. An Operating System must
provide four new mechanisms:
1. Buddy Frame Allocation
2. Page Table Entry “Breakpoint Bit”
3. Breakpoint Manipulation API
4. Buddy Frame Swap Compatibility
The first two mechanisms are solved by im-
plementing a new kernel allocator option. This
option would allocate two physically contiguous
frames, and set the breakpoint bit on the data
frame’s page table entry. Buddy Frame Alloca-
tion may be flexibly applied if Copy-On-Write
is required for an instrumented page. The in-
heriting task may either automatically allocate
its own buddy page during Copy-On-Write, or it
may leave it behind. The author notes that this
is an example of how such a hardware extension
may open new opportunities for debugger design.
The third mechanism is a new kernel API
which allows for the breakpoint frame to be ac-
cessed in a pre-defined manner by the debugger.
A per-byte breakpoint API could be used for pre-
cision, and a per-page optimization could be pro-
vided if large breakpoints are needed. This API
gives debugger and instrumentation developers a
standardized way to set breakpoints.
This system places no limitation set on acces-
sibility of debuggee data by the debugger, tradi-
tional binary modification breakpoints can still
be used. The retains backward compatibility for
all existing debugging platforms.
Finally, in a naive system, care must be taken
not to swap out buddy frames when their data-
frame counterparts are still in memory (and vice-
versa). The operating system’s swapping algo-
rithm could account for this by pinning the frame
or by utilizing an additional bit in the Page Ta-
ble Entry that denotes whether a given frame is
itself a breakpoint frame. This work leaves this
as future work for OS and debugger designers.
4.3 Analysis of Design
Using the above design, we re-examine the orig-
inal design goals we set out to accomplish.
• Flexibility: Breakpoints can be set on any
address, and the allocation of buddy-frames
is limited to just those pages where break-
points are set.
At worst the number of frames can increases
linearly with the number of breakpoints.
This can double memory usage. However,
we note that SPIDER breakpoints suffer
from the same problem [2].
• Efficiency: Debuggee execution now occurs
purely in bare-metal without requiring ad-
ditional instrumentation for misbehaved pro-
grams.
A debuggee now executes and performs I/O
on the same data frame without incurring
a context switch. A context switch on sys-
tems such as SPIDER can take thousands
of instructions [2], and occurs twice per
fault (guest exit and entry). Instead, it is
possible this solution may reduce this cost
by distributing a small number of cycles
per instruction on an instrumented page.
When the number of breakpoints is small
and concentrated to a small portion of code
(a typical use-case), performance suffers only
when utilizing an instrumented page. In
fact, performance should be no worse than
SPIDER’s stealthy breakpoints for self-modifying
code given the similar design. As the num-
ber of breakpoint frames grow, the number
of memory references grows by a factor of
at most 2 (one for data, one for breakpoint
information).
• Transparency: Complete segregation of de-
buggee and debugger data ensures the de-
buggee has no avenue with which to see
breakpoint information.
• Reliability: Complete segregation of debuggee
and debugger data ensures the debuggee
has no avenue with which to affect break-
point information.
In the worst case scenario, a breakpoint is set
on every page of a target. This doubles memory
usage and causes an additional memory refer-
ence to occur per memory reference. In both
cases, existing systems exhibit at least as bad
performance, if not worse. SPIDER breakpoints
double memory usage per utilized page and can
cause thousands of instructions of overhead per
memory access on an instrumented page.
5 Implementation
This section provides a roadmap for implement-
ing a prototype of a virtual breakpoint system.
At first, we propose implementing the solution
in an emulator to prove feasibility, and then ex-
tending an existing hardware platform (poten-
tially via FPGA) to prove efficiency. However,
notably, neither implementation would be useful
beyond further academic exploration of the prob-
lem, since extending an emulator would lead to
severe inefficiencies and existing open-architectures
tend not to exhibit the critical byte problem.
5.1 Modify QEMU i386 MMU
• tlb pte breakpoint bit checking
• page table walking bit checking
• buddy frame lookups when breakpoint bit
is set
• raise an exception when breakpoint terms
are met
5.2 Modified Linux Kernel
• page table entry breakpoint-bit modifica-
tion
• breakpoint page allocation option/interface
for buddy frames
• breakpoint management interface (set break-
point on given address)
• virtual breakpoint interrupt handler
• add memory mapping modifier (modify non-
breakpointed page to add virtual break-
point page)
• memory swap algorithm modification to avoid
swapping buddy-pages
5.3 Modified Debugger
• add/remove virtual breakpoint page com-
mand
• add/remove virtual breakpoint command
• add/remove virtual watchpoint command
• virtual breakpoint signal handler
6 Future Work
By solving the critical-byte problem, we open up
a number of possibilities - including new types
of breakpoints, and implementing more efficient
code coverage systems. This section discusses
this potential extensions.
6.1 New breakpoint types
• Break on Instruction Fetch
Presently, instruction-fetch breakpoints are
not possible (except for work-around like
debug registers and page table permission
twiddling). A bit in the 8-bit breakpoint
field on a buddy-page could be utilized to
cause a breakpoint interrupt to fire on in-
struction fetch.
• Nesting
Implementing the virtualization layer such
that a guest can set up another layer of
virtual breakpointing. The would allow for
introspection of a hostile virtual machine
within another virtual machine.
• Coverage/Taint
A taint-propagation debugger can be de-
veloped that copies breakpoint-bits when-
ever data is transferred from one area of
memory to another. For example, When
external data is injected into memory (via
inbound network traffic), the storage lo-
cation has coverage bits set in its buddy
frame. As this data propagates to other
areas of memory, the coverage bits propa-
gate with them.
• ”Hook Points”
A specialized interrupt handler could be
registered with the operating system that
is called whenever a “hook point” interrupt
is fired. The “Hook Point” is set on an ad-
dress, and the address is entered into a map
accessed by the interrupt handler. Upon
executing / accessing the hooked address,
the debuggee exits directly to the interrupt
handler for dispatching. This is exception-
ally useful for virtual machines, where dis-
patching could be handled directly upon
VMExit (similar to a vmcall).
This could be useful for recording inbound
or outbound non-deterministic events for
checkpoint/restart and replay systems (such
as clock reads via RDTSC).
7 Related Work
The idea of segregating “views” of data is not
necessarily novel. A number of technologies[2] [5]
[6] [14] employ software to modify shadow page
tables, extended page tables, and interrupt han-
dlers to mask certain data from guest machines.
Overshadow[14] provides a mechanism with
which to hide (encrypt) the contents of a guest-
process’s memory from the guest kernel. The
researchers goals were to provide a mechanism
with which a guest process could execute securely,
even if the guest operating system was actively
hostile. Overshadow accomplished this by ex-
tending the original form of page table virtual-
ization released by Intel and AMD - Shadow page
tables.
Overshadow implemented a shadow page ta-
ble containing multiple mappings (encrypted and
unencrypted) of a guest’s physical memory, and
actively tracks the “identity” of the guest pro-
cess attempting to read a page. When the ac-
cessing process does not have the correct iden-
tity, an encrypted page is presented (on read) or
the machine is terminated (on write). When the
accessing process does have the correct identity,
an unencrypted page is presented with the per-
missions originally granted to the page. Unfor-
tunately, its dependence on shadow page tables
means it’s likely to be too inefficient [15] for mod-
ern operating systems and heavy load systems.
VAMPiRE[5] takes advantage of virtual mem-
ory page permissions to be executed via alternate
methods. In particular, the developers chose to
implement a single-step handler which executes
any instruction (or accesses any data) falling on
a page which contains a breakpoint. This is
done to ensure any reads and overwrites of break-
points are not possible, and so the integrity of
guest data is preserved. Unfortunately, relying
on single-stepping and what are effectively page-
length breakpoints limits the flexibility of the
breakpoint system. For example, if a program
is contained entirely within a single page, or a
breakpointed page contains “hot” code, then speed
will suffer dramatically.
Finally, Spider[2] comes the closest to im-
plementing a solution that maximizes efficiency,
while retaining the flexibility of traditional bi-
nary modification breakpoints. Similar to VAM-
PiRE, it leverages virtual page permissions to de-
termine what “view” of memory should be pro-
vided to the debuggee. On read/write, a “sani-
tized” view of memory (sans breakpoints) is pro-
vided to hardware to prevent detection. On exe-
cute, the modified page (containing int3 instruc-
tions) is provided for execution.
Spider maintains efficiency by leveraging a
quirk in caching mechanisms, which allows both
views to be cached (one in the instruction cache,
one in the data cache), limiting the number of
VM Exits required to expose the correct data on
a given access. Unfortunately, as previously dis-
cussed, Spider still falls prey to some forms of
classic anti-debugging techniques (such as over-
writing) due to its reliance on binary modifica-
tion.
Each of these works provide a trend of re-
searchers attempting to split the view of data
based on whether the debuggee is trusted. In
Overshadow, some parts of the debuggee are trusted
but not others. In VAMPiRE and Spider, the
debuggee is not trusted to read or write instru-
mented memory. Recognizing the trust violation
was the primary inspiration for producing fully
segregated data and breakpoint frames.
8 Conclusion
In this paper we demonstrated the problems with
traditional and modern introspection systems to
show a clear need for a fully virtualized break-
point solution. As we broke down various break-
pointing techniques, we identified two fundamen-
tal issues with current solutions. Either these so-
lutions were inefficient, or they traded integrity
and correctness of execution for efficiency.
We proposed a virtualization extension for
the memory management unit. We build this
from the ground up with the goals of an optimal
system in mind. Our solution is novel in that it
treats debuggee execution as an fundamentally
untrusted action, and segregates all debug re-
lated information into a separate buddy-frame
accessible only by the debugger. Unlike other so-
lutions which make heavy use of binary modifica-
tion, our system retains integrity of breakpoints
and correctness of execution by not requiring the
modification of debuggee data.
We believe that this Virtual Breakpointing
design the solution that virtual machine intro-
spection and interposition systems have been look-
ing for. It presents opportunities for the devel-
opment of brand new types of breakpoints, and
is efficient enough to run commodity operating
systems under test.
References
[1] Gdb. http://www.gnu.org/software/gdb/.
[2] Deng, Z., Zhang, X., & Xu, D. (2013, De-
cember). Spider: Stealthy binary program
instrumentation and debugging via hard-
ware virtualization. In Proceedings of the
29th Annual Computer Security Applica-
tions Conference (pp. 289-298). ACM.
[3] U. Bayer, C. Kruegel, and E. Kirda. Tt-
analyze: A tool for analyzing malware. In
EICAR06
[4] D. Bruening. Efficient, transparent, and
comprehensive runtime code manipulation.
PhD thesis, 2004.
[5] Vasudevan, A., & Yerraballi, R. (2005, De-
cember). Stealth breakpoints. In Computer
security applications conference, 21st An-
nual (pp. 10-pp). IEEE.
[6] Vasudevan, A. (2009, October). Re-inforced
stealth breakpoints. In Risks and Security
of Internet and Systems (CRiSIS), 2009
Fourth International Conference on (pp. 59-
66). IEEE.
[7] Dinaburg, A., Royal, P., Sharif, M., & Lee,
W. (2008, October). Ether: malware anal-
ysis via hardware virtualization extensions.
In Proceedings of the 15th ACM conference
on Computer and communications security
(pp. 51-62). ACM.
[8] Willems, C., Hund, R., Fobian, A., Felsch,
D., Holz, T., & Vasudevan, A. (2012, De-
cember). Down to the bare metal: Using
processor features for binary analysis. In
Proceedings of the 28th Annual Computer
Security Applications Conference (pp. 189-
198). ACM.
[9] Vogl, S., & Eckert, C. (2012, April).
Using hardware performance events for
instruction-level monitoring on the x86 ar-
chitecture. In Proceedings of the 2012 Euro-
pean Workshop on System Security EuroSec
(Vol. 12).
[10] R. R. Branco, G. N. Barbosa, and
P. D. Neto. Scientific but not academ-
ical overview of malware anti-debugging,
anti-disassembly and anti-vm technologies.
Blackhat USA12.
[11] P. Feiner, A. D. Brown, and A. Goel. Com-
prehensive kernel instrumentation via dy-
namic binary translation. In ASPLOS12
[12] Intel, Intel. ”Intel(R) 64 and IA-32 architec-
tures software developers manual.” Volume
3A: System Programming Guide, Part 1.64
(64).
[13] Amd, Amd. ”Developer Guides,
Manuals, & ISA Documents.”
https://developer.amd.com/resources/developer-
guides-manuals/.
[14] Chen, X., Garfinkel, T., Lewis, E. C., Sub-
rahmanyam, P., Waldspurger, C. A., Boneh,
D., ... & Ports, D. R. (2008, March). Over-
shadow: a virtualization-based approach to
retrofitting protection in commodity oper-
ating systems. In ACM SIGARCH Com-
puter Architecture News (Vol. 36, No. 1, pp.
2-13). ACM.
[15] Adams, K., & Agesen, O. (2006). A com-
parison of software and hardware techniques
for x86 virtualization. ACM SIGOPS Oper-
ating Systems Review, 40(5), 2-13.
[16] Paxson, V. (1990). A Survey of Support for
Implementing Debuggers.
[17] Jeffk.
[18] Albert.
