An Analysis of x86-64 Inline Assembly in C Programs by Rigger, Manuel et al.
Kent Academic Repository
Full text document (pdf)
Copyright & reuse
Content in the Kent Academic Repository is made available for research purposes. Unless otherwise stated all
content is protected by copyright and in the absence of an open licence (eg Creative Commons), permissions 
for further reuse of content should be sought from the publisher, author or other copyright holder. 
Versions of research
The version in the Kent Academic Repository may differ from the final published version. 
Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the 
published version of record.
Enquiries
For any further enquiries regarding the licence status of this document, please contact: 
researchsupport@kent.ac.uk
If you believe this document infringes copyright then please contact the KAR admin team with the take-down 
information provided at http://kar.kent.ac.uk/contact.html
Citation for published version
Rigger, Manuel and Marr, Stefan and Kell, Stephen and Leopoldseder, David and Mössenböck,
Hanspeter  (2018) An Analysis of x86-64 Inline Assembly in C Programs.    In: 14th ACM SIGPLAN/SIGOPS








An Analysis of x86-64 Inline Assembly in C Programs
Manuel Rigger




















C codebases frequently embed nonportable and unstandard-
ized elements such as inline assembly code. Such elements are
not well understood, which poses a problem to tool develop-
ers who aspire to support C code. This paper investigates the
use of x86-64 inline assembly in 1264 C projects from GitHub
and combines qualitative and quantitative analyses to answer
questions that tool authors may have. We found that 28.1%
of the most popular projects contain inline assembly code,
although the majority contain only a few fragments with
just one or two instructions. The most popular instructions
constitute a small subset concerned largely with multicore
semantics, performance optimization, and hardware control.
Our indings are intended to help developers of C-focused
tools, those testing compilers, and language designers seek-
ing to reduce the reliance on inline assembly. They may also
aid the design of tools focused on inline assembly itself.
CCS Concepts · General and reference → Empirical
studies; · Software and its engineering → Assembly
languages; Language features; · Computer systems or-
ganization→ Complex instruction set computing;
Keywords Inline Assembly, C, Empirical Survey, GitHub
ACM Reference Format:
Manuel Rigger, Stefan Marr, Stephen Kell, David Leopoldseder,
and Hanspeter Mössenböck. 2018. An Analysis of x86-64 Inline
Assembly in C Programs. In VEE ’18: 14th ACM SIGPLAN/SIGOPS
International Conference on Virtual Execution Environments, March
25, 2018,Williamsburg, VA, USA.ACM, New York, NY, USA, 16 pages.
htps://doi.org/10.1145/3186411.3186418
VEE ’18, March 25, 2018, Williamsburg, VA, USA
© 2018 Copyright held by the owner⁄author(s). Publication rights licensed
to the Association for Computing Machinery.
This is the author's version of the work. It is posted here for your personal
use. Not for redistribution. The deinitive Version of Record was published
in VEE ’18: 14th ACM SIGPLAN/SIGOPS International Conference on Virtual
Execution Environments, March 25, 2018, Williamsburg, VA, USA, htps://
doi.org/10.1145/3186411.3186418.
1 Introduction
Inline assembly refers to assembly instructions embedded in
C code in a way that allows direct interaction; for example,
they can directly access C variables. Such code is inherently
platform-dependent; it uses instructions from the target ma-
chine's Instruction Set Architecture (ISA). For example, the
following C function uses the rdtsc instruction to read a
timer on x86-64. Its two output operands tickh and tickl
store the higher and lower parts of the result. The platform-
speciic constraints =a and =d request particular registers.
uint64_t rdtsc() {
uint32_t tickl , tickh;
asm volatile ("rdtsc":"=a"(tickl),"=d"(tickh));
return (( uint64_t)tickh << 32)|tickl;
} /* see ğ2.5 for detailed syntax */
This kind of platform dependency adds to the complexity of
C programs. A single complex ISA, such as x86, can contain
about a thousand instructions [25]. Furthermore, inline as-
sembly fragments may contain not only instructions, but also
assembler directives (such as .hidden, controlling symbol
visibility) that are speciic to the host system's assembler.
It is not surprising that many tools that process C code or
associated intermediate languages (such as LLVM IR [38] and
CIL [45]) partially or entirely lack support for inline assembly.
For example, many bug-inding tools (e.g., the Clang Static
Analyzer [70], splint [18, 19, 63], Frama-C [69], uno [26],
and the LLVM sanitizers [55, 58]), tools for source transla-
tion (e.g., c2go [44]), semantic models for C [36, 43], and
alternative execution environments such as Sulong [51±53]
and Klee [12] still lack support for inline assembly, provide
only partial support, or overapproximate it (e.g., by analyz-
ing only the side efects speciied as part of the fragment),
which can lead to imprecise analyses or missed optimization
opportunities. How to provide better support depends on
the tool, for example, in Sulong, adding support for assem-
bly instructions requires emulating their behavior in Java,
while support in a formal model would require specifying
the instructions in a language such as Coq.
Literature on processing C code seldom mentions inline
assembly except for stating that it is rare [31]. Tool writers
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
would beneit from a thorough characterization of the oc-
currence of inline assembly in practice, as it would enable
them to make well-informed decisions on what support to
add. Hence, we analyzed 1264 C projects that we collected
from GitHub. We manually analyzed the inline assembly
fragments in them (i.e., inline assembly instructions that
are part of a single asm statement). From these fragments,
we created a database with fragments speciic to the x86-64
architecture to quantitatively analyze their usage.
We found that:
• Out of the most popular projects, 28.1% contain inline
assembly fragments.
• Most inline assembly fragments consist of a single
instruction, andmost projects contain only a few inline
assembly fragments.
• Since many projects use the same subset of inline as-
sembly fragments, tool writers could support as much
as 64.5% of these projects by implementing just 5% of
x86-64 instructions.
• Inline assembly is used mostly for speciic purposes:
to ensure semantics on multiple cores, to optimize
performance, to access functionality that is unavailable
in C, and to implement arithmetic operations.
Our indings suggest that tool writers might want to con-
sider adding support for inline assembly in their C tools, as
it is used surprisingly often. We also found that inline as-
sembly is not speciic to a small set of domains, but appears
in applications in which one might not expect it (e.g., text
processing). Since most applications use the same subset of
inline assembly instructions, a large proportion of projects
could be supported with just a moderate implementation
efort. Another inding, however, is that instructions are not
all that matters. Rather, assembly instructions are only one of
many non-C notations used in C codebases, all of which gen-
erally sufer from the same lack of tool support. For example,
some uses of asm contain no instructions, consisting only of
assembler directives and constraints. Others are interchange-
able with non-portable compiler intrinsics or pragmas. Yet
others gain meaning in conjunction with linker command-
line options or scripts. This paper is therefore a irst step
towards characterizing this larger łsoupž of notations that
tools must support in order to fully comprehend C codebases.
2 Methodology
To guide tool developers in supporting inline assembly, we
posed six research questions. We detail how we scoped the
survey, selected and obtained C applications, and inally
analyzed their inline assembly fragments.
2.1 Research Questions
To characterize the usage of inline assembly in C projects,
we investigated the following research questions (RQs):
RQ1: How common is inline assembly in C programs?
Knowing how commonly inline assembly is used indicates
to C tool writers whether it needs to be supported.
RQ2: How long is the average inline assembly frag-
ment? Characterizing the length of the average inline assem-
bly fragment gives further implementation guidance. If inline
assembly fragments typically contain only a single instruc-
tion, simple pattern-matching approaches might be suicient
to support them. If inline assembly fragments are large, nu-
merous, or łhiddenž behind macro meta-programming [57],
it might be more di cult to add support for them.
RQ3: In which domains is inline assembly used? An-
swering this question helps if a tool targets only speciic
domains. It seemed likely that the usage of inline assem-
bly difers across domains. We expected inline assembly in
cryptographic libraries because instruction set extensions
such as AES-NI explicitly serve cryptographic code [6]. This
was supported by a preliminary literature search, as inline
assembly is, for example, often mentioned in the context of
cryptographic libraries [23, 37, 40, 61]. We also expected it
to implement related security techniques, preventing timing-
side channels [7] and compiler interference [66, 67]. It was
less clear what other domains make frequent use of inline
assembly.
RQ4:What is inline assembly used for? Knowing the typ-
ical use cases of inline assembly helps tool writers to assign
meaningful semantics to inline assembly instructions. It also
helps to determine whether alternative implementations in
C could be considered. We hypothesized that inline assem-
bly is usedÐaside from cryptographic use casesÐmainly to
improve performance and to access functionality that is not
exposed by the C language.
RQ5: Do projects use the same subset of inline assem-
bly? Answering this question determines how much inline
assembly support needs to be implemented to cope with
the majority of C projects. Currently, C tool writers have to
assume that the whole ISA needs to be supported. However,
one of our assumptions was that most projectsÐif they use
inline assemblyÐrely on a common subset of instructions.
By adding support for this subset, C tool writers could cope
with most of the projects that use inline assembly.
2.2 Scope of the Study
Our focus was to quantitatively and qualitatively analyze
inline assembly code. For our quantitative analysis, we built
a database (using SQlite3) of inline assembly occurrences
in code written for x86-64, as it is one of the most widely
used architectures. The database contains information about
each project, inline assembly fragment, and assembly instruc-
tion analyzed. We used this database to perform aggregate
queries, for example, to determine the most common instruc-
tions. The database and aggregation scripts are available
at htps://github.com/jku-ssw/ inline-assembly to facilitate
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
further research. Additionally, we qualitatively analyzed all
instructions to summarize them in a meaningful way.
2.3 Obtaining the Projects
In our survey, we selected C applications from GitHub,
a project hosting website. To gather a diverse corpus of
projects, we used two strategies:
We selected all projects with at least 850 GitHub starsÐ
an arbitrary cut-of that gave us a manageable yet sui-
ciently large sampleÐwhich resulted in 327 projects being
selected. Stars indicate the popularity of a project and are
given by GitHub users [10]. We assumed that the most pop-
ular projects relect those applications that are most likely
processed by C tools.
We selected another 937 projects by searching for certain
keywords1 and by taking all matching projects that had at
least 10 stars. The goal was to select projects of a certain
domain with diferent degrees of popularity to account for
the long tail of the distribution. In order to avoid personal
forks, experiments, duplicate projects and the like [32, 41],
we did not consider projects that had fewer than 10 stars.
2.4 Filtering the Projects
Our primary goal was to analyze C application-level code
which we consider to be of general interest. Consequently,
we ignored projects if they were operating systems, device
drivers, irmware, and other code that is typically considered
part of an operating system. Such code directly interacts
with hardware and thus comes with its own special set of
issues and usage patterns of inline assembly. Further, to keep
the scope manageable, we focused on code for x86-64 Linux
systems. Therefore, we excluded projects that worked only
for other architectures or other operating systems. Further,
we did not consider uncommon x86 extensions such as VIA's
Padlock extensions [65].
We restricted our analysis to C code, excluding C++ code.
Projects that mixed C⁄C++ code were also excluded if the
C++ LOC were greater in number than the C LOC. We also
excluded C⁄C++ header iles (ending with .h) when they
contained C++ code. A number of projects used C code to
implement native extensions for PHP, Ruby, Lua, and other
languages; we included such code in our analysis. In a few
cases, inline assembly was part of the build process; for exam-
ple, some configure scripts checked the availability of CPU
features by using cpuid. We discarded these cases because
inline assembly was not part of the application; however, we
checked whether build scripts generated source iles with
inline assembly, which we then incorporated in our analysis.
1Our keywords were: crc, argon, checksum, md5, base64, dna, web server,
compression, math, ft, string, aes, simulation, editor, single header library,
parser, debugger, ascii, xml, markdown, smtp, sqlite, mp3, sort, json, bitcoin,
udp, random, prng, metrics, misc, tree, parser generator, hash, font, gc, i18,
and javascript.
34 projects used inline assembly in fairly large program
fragments, notably featuring SIMD instructions and using
preprocessor-based metaprogramming. Although written us-
ing inline assembly constructs, these fragments have more in
common with separate (macro) assembly source iles. In par-
ticular, supporting these would require a close-to-complete
implementation of an ISA. We excluded these fragments
from our quantitative analysis.
We performed our analysis on unpreprocessed source
code to include all inline-assembly fragments independent of
compile-time-coniguration factors [62]. This is signiicant
because inclusion of inline assembly is often only condi-
tional, achieved by #ifdefs that not only check for various
platforms and operating systems, but also for coniguration
lags, various compilers, compiler versions, and availability
of GNU C intrinsics [17]; examining only preprocessed code
would have left out many fragments.
2.5 Inline Assembly Constructs
Since inline assembly is not part of the C language standard,
compilers difer in the syntax and features provided. In this
study, we assume use of the GNU C inline assembly syntax,
which is the de-facto standard on Unix platforms, recognizes
the asm or __asm__ keywords to specify an inline assembly
fragment, and has both łbasicž and łextendedž lavors. Using
basic asm, a programmer can specify only the assembler frag-
ment or directive. Use cases for basic assembly are limited;
however, in contrast to extended asm, basic inline assembly
can be used outside of functions. For example,
asm(".symver memcpy,memcpy@GLIBC_2.2.5")
uses basic inline assembly for a symbol versioning directive
(see Section 5).
The more commonly used form is extended asm, which
also allows specifying output and input operands as well as
side efects (e.g., memory modiications). It is speciied using
asm ( AssemblerTemplate : OutputOperands
[ : InputOperands [ : Clobbers ] ]).
Adding the volatile keyword restricts the compiler in its
optimization; for example, it prevents reachable fragments
from being optimized (e.g., by register reallocation).
2.6 Analyzing the Instructions
Our analysis focused on inline assembly fragments found
with grep in the source code. We searched for strings con-
taining łasmž, which made it unlikely that we missed in-
line assembly instructions. For the quantitative analysis, we
judged whether an inline assembly fragment was used for
an x86-64 Linux machine. If so, we manually extracted the
fragment and preprocessed it (see the criteria below) using a
script created for this purpose.
We assumed that tools would support all addressingmodes
(e.g., register addressing, immediate addressing, and direct
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
memory addressing) for a certain instruction. Consequently,
we did not gather statistics for diferent addressing modes. In-
line assembly can contain assembler directives that instruct
the assembler to perform certain actions, for example, to allo-
cate a global variable. We ignored such assembler directives
in our quantitative analysis, but discuss them qualitatively.
An exception is the .byte directive, which is sometimes used
to specify instructions using their byte representation (and
similar cases, see Section 5), for which we assumed their
mnemonic (i.e., their textual) representation.
By default, GCC assumes use of the AT&T dialect [8,
9.15.3.1, 9.15.4.2]; however, some projects enabled the In-
tel syntax instead. Using the AT&T syntax, a size suix is
typically appended to denote the bitwidth of an instruction.
An add instruction can, for example, operate on a byte (8
bit), long (32 bit), or quad (64 bit) using addb, addl, and
addq, respectively. Using Intel syntax, the size suix is typi-
cally omitted. For consistency, we stripped size suixes and
recorded only the instruction itself (e.g., add). We also ap-
plied other criteria to group instructions.2
3 Quantitative Results
Based on our quantitative analysis, we can answer the irst
three research questions on the use of inline assembly in C
projects, the length of fragments used, and the domains in
which they occur.
Projects using inline assembly. Our corpus contained
1264 projects, of which 197 projects (15.6%) contained in-
line assembly for x86-64. The distribution difered between
the popular projects and those selected by keywords. Among
the most popular 327 projects, 28.1% contained inline assem-
bly, while of the 937 other projects only 11.2% used inline
assembly. One possible explanation for this diference is that
the popular projects were larger (69 KLOC on average) than
the projects selected by keywords (13 KLOC on average).
Density of inline assembly fragments. The percentage of
projects with inline assembly is high, which is surprising
because many C tools are based on the assumption that inline
assembly is rarely used. Nevertheless, in terms of density,
inline assembly is rare, with one fragment per 40 KLOC of C
code on average. The density of inline assembly is lower for
the popular projects (one fragment per 50 KLOC) than for
those selected by keywords (one fragment per 31 KLOC).
2 The x86 architecture allows adjusting the semantics of an instruction with
a preix. This includes the lock preix for exclusive access to shared mem-
ory, and rep to repeat an instruction a certain number of times. In inline
assembly, these preixes are denoted as individual instructions (e.g., lock;
cmpxchg). In our survey, we merged the preix and its instruction and han-
dled them as a single instruction (e.g., lock cmpxchg). The xchg instruction
has an implicit lock preix when used with a memory operand. For jump-
if-condition-is-met instructions and set-on-condition instructions, several
mnemonics exist for the same instruction. We grouped such mnemonics and
counted them as the same instruction. We also considered diferent software
interrupts as distinct instructions, since their purposes difer markedly.
RQ1.1: 28.1% of the most popular and 11.2% of the
keyword-selected projects contained inline assembly.
Number of fragments per project. To measure the num-
ber of inline assembly fragments in a project, we considered
only unique fragments because duplicates do not increase
the implementation efort (see Figure 2). 36.2% of the projects
with inline assembly contained only one unique inline as-
sembly fragment. 93.3% of them contained up to ten unique
inline assembly fragments. On average, projects analyzed in
detail contained 3.7 unique inline assembly fragments (with
a median of 2).
RQ1.2: C projects with inline assembly which were
analyzed in detail contained on average 3.7 unique
inline assembly fragments (median of 2)
Overview of the fragments. In total, we analyzed 1026 frag-
ments, of which 607 were unique per project. Projects that
used inline assembly tended to bundle instructions for sev-
eral operand sizes in the same source ile; consequently, we
found 715 fragments that were unique within a single ile.
Overall, we found 197 unique inline assembly fragments.
Analysis of the fragments. Of the 197 projects with inline
assembly, we analyzed the inline assembly in 163 projects
(82.7%) in detail. To this end, we extracted each fragment
and added it together with metadata about the project to
our database, which we then queried for aggregate statis-
tics (e.g., the frequency of instructions). The 34 projects that
we did not analyze used complicated macro metaprogram-
ming and⁄or contained an excessive number of large inline
assembly fragments, which made our manual analysis ap-
proach infeasible. We call these łbig-fragmentž codebases.
They consisted mostly of mature software projects (such as
video players) that used inline assembly for SIMD operations,
for which they provided several alternative implementations
(e.g., AVX, SSE, SSE2). We assumed that tools need to provide
close-to-complete SIMD inline assembly support for these
projects, and thus omitted them from the detailed analysis.
RQ2.1: 17.3% of all C projects with inline assembly
contained macro-metaprogramming and many large
inline assembly fragments that were omitted from
our detailed analysis.
Instructions in a fragment. When analyzing instructions
in inline assembly fragments, we again considered those frag-
ments that were unique to a project. Typically, they were
very short (see Figure 1). 390 (64.3%) of them had only one in-
struction. 73.3% had up to two instructions. However, we also
found inline assembly fragments with up to 438 instructions.
The average number of instructions in an inline assembly
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
Table 1. The 10 most common ile names that contained
inline assembly and their average numbers of instructions
ile name projects instr. ile name projects instr.
sqlite3.c 10 1.0 infas86.c 4 1.0
atomic.h 8 3.4 mb.h 4 1.0
SDL_endian.h 4 2.0 timing.c 4 1.0
atomic-ops.h 4 2.0 util.h 4 1.0
conigure.ac 4 1.0 utils.h 4 2.2
fragment was 9.9; the median was 1. In total, we found only
167 unique instructions, which contrasts with the approxi-
mately 1000 instructions that x84-64 provides [25].
RQ2.2: Inline assembly fragments contained on av-
erage 9.9 instructions (median of 1) per fragment.
Duplicate fragments. It has been shown that ile dupli-
cation among GitHub projectsÐmainly targeting popular
libraries copied into many projectsÐis a common phenom-
enon [41], which we also observed for the projects we an-
alyzed (see Table 1). For example, many projects contained
sqlite3.c, which corresponds to the databasewith the same
name (which uses the rdtsc instruction), SDL_endian.h for
the SDL library (which uses inline assembly for endianness
conversions), and inffas86.c (which implements a com-
pression algorithm using inline assembly). We did not try
to eliminate such duplicate iles in the analysis, because the
duplication is signiicant: tool authors have a stronger incen-
tive to implement those inline assembly instructions that are
used by many projects.
Project domains. Table 2 classiies the projects into do-
mains and shows how many projects per domain contained
inline assembly. We created this table by manually labelling
the projects using an ad-hoc vocabulary of seventeen domain
labels. Note that the domains difer in extent and intersect
in some cases. As expected, the majority of projects were
crypto libraries (with SSL⁄TLS libraries as a subdomain).
However, in general, the domains were relatively diverse.
In addition to the eleven domains in the table, we also used
seven other domain labels3 which had fewer than 7 projects
each and were omitted for brevity.
RQ3: Inline assembly is used in many domains, most
commonly in projects for crypto, networking, media,
databases, language implementations, concurrency,
ssl, string and math libraries.
3These were: games, general-purpose libraries, reverse engineering, garbage
collection, monitoring, and virtualization.
Table 2.Domains of projects that used inline assembly (each
domain containing at least 7 projects)
domain projects description
# %




networking 20 10.2 protocols, email systems, chat
clients, port scanners
media 17 8.6 video and music players and en-
coders, audio processing soft-
ware, image libraries
database 16 8.1 databases, key⁄value storages,
other in-memory data structures
language im-
plementation
15 7.6 compilers, interpreters, virtual
machines
misc 13 6.6 projects not assigned to any do-
main
concurrency 9 4.6 concurrency libraries, concurrent
data structures
ssl 8 4.1 SSL⁄TLS libraries
string library 8 4.1 string algorithms, converters be-
tween diferent formats, parsers
math library 7 3.6 scientiic applications, math li-
braries
web server 7 3.6
4 Use Cases of Inline Assembly
Instructions
We identiied four typical use cases for inline assembly. One
was to prevent instruction reorderings, either in the com-
piler (prevented by łcompiler barriersž) or in the processor,
both in single-core execution and between multiple cores
(prevented by memory barriers and atomic instructionsÐsee
Section 4.1). The second use case was performance optimiza-
tion, for example, for eicient endianness conversions, hash
functions, and bitscans (see Section 4.2). The third use case
was to interact with the hardware, for example, to detect
CPU features, to obtain precise timing information, random
numbers, and manage caches (see Section 4.3). The fourth
use case was for more general łmanagementž instructions,
for example, moving values, pushing and popping from the
stack, and arithmetic instructions (see Section 4.4).
Note that there might be more than one reason for using
assembly code: for example, programmers might read the
elapsed clock cycles using the rdtsc instruction because
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.



















Figure 1. Inline assembly fragment lengths






# unique inline assembly fragments /
project with inline assembly
Figure 2. Number of fragments per project
similar C timing functions might not provide the same accu-
racy; however, they might also use it for eiciency because
it has a lower overhead than those functions.
RQ4: Inline assembly is used to ensure correct se-
mantics on multiple cores, for performance optimiza-
tion, to access functionality that is unavailable in C,
and to implement arithmetic operations.
For each use case, we denoted in parentheses the percent-
age of projects that relied on at least one instruction. Some
instructions were counted for several use cases; for example,
xchg can be used to exchange bytes to convert the endian-
ness of a 16-bit value and has an implicit lock preix when
applied to a memory operand, which is why it can also be
used to implement an atomic operation.
We found that most inline assembly instructions can also
be issued using compiler intrinsics instead of inline assem-
bly (compiler barriers being the only exception). Although
compiler intrinsics are speciic to a compiler, they are easier
to support in tools because they follow the same conven-
tions as C functions, both syntactically and semantically. For
example, unlike inline assembly, compiler intrinsics cannot
modify local variables.
4.1 Instruction Reordering and Multicore
Programming
For threading and concurrency control, most C programs
rely on libraries (such as pthreads [9]), compiler intrinsics,
and inline assembly instructions. Intrinsics and assembly
instructions are used mainly for historical reasons, since
atomic operations became standardized only in 2011 [29].
In this section, we describe how inline assembly was used
to perform atomic operations and to control the ordering of
instructions at the compiler and processor levels.
Atomic instructions (24.0%). In 24.0% of the projects, in-
structions were preixed to execute atomically to prevent
races when data is accessed by multiple threads (see Table 3).
More recent code uses C11 atomic instructions as an alter-
native; for example, the add-and-fetch operation, which is
equivalent to lock xaddq, can also be implemented using
the C11 atomic_fetch_add function.
Compiler barriers (24.0%). C compilers are permitted to
reorder instructions that access memory. Unless specially
directed, these reorderings are allowed to assume single-
threaded execution. A common use case of inline assembly
is to implement such a special directive, called a łcompiler
barrierž, telling the compiler to assume an arbitrary side
efect to the memory, hence preventing reorderings around
the barrier. This is expressed as follows (a memory clobber):
asm volatile("" : : : "memory");
Such barriers are often necessary in lock-free concurrent
programming.
Additionally, compiler barriers were used to prevent the
compiler from optimizing away instructions. For example, in
Listing 1, the compiler is prevented from removing memset
to implement a secure_clear function that can be used to
clear sensitive data from memory. A compiler could remove
the memset call, for example, if the call is inlined and the
memory freed, because the compiler can assume that it is no
longer accessible [16]. If so, attackers could exploit a bufer
overlow at another location in the program to read the data.
Note that the C11 standard speciies the function memset_s,
which provides the same guarantees as the secure_clear
implementation.
Listing 1. Implementing a secure memory-zeroing function
void secure_clear(void *ptr , size_t len) {
memset(ptr , 0, len);
asm volatile("" : : "r"(ptr) : "memory");
}
Memory barriers (11.2%). Not only compilers, but also pro-
cessors reorder instructions. Memory barriers are used to pre-
vent reorderings by the processor (see Table 4). On x86, they
are mostly needed for special cases (e.g., write-combining
memory or non-temporal stores), as all memory accesses
except store-load are ordered, so a compiler barrier is often
suicient to ensure the desired ordering [1].
Spin loop hints (15.1%). We found that 27 projects used
the pause instruction as a processor hint in busy-waiting
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
loops. Busy waiting refers to a tight loop in which a check
is performed repeatedly. For example, in spinlocks, a thread
repeatedly tries to acquire a lock that has potentially already
been acquired by another thread. To remedy the costs of
busy waiting in terms of performance and energy, pause
causes a short delay and controls speculative execution [48].
4.2 Performance Optimizations
Several inline assembly instruction categories were used to
optimize performance, even when the code could have been
written in pure C.
SIMD instructions (6.1% + 34 projects). In the quantitative
analysis, only a few SIMD instructions ranked among the
most common instructions, for example, pxor and movdqa
(used in 9 and 8 projects, respectively). However, the actual
number of projects using SIMD instructions was higher be-
cause the 34 łbig-fragmentž projects that we did not analyze
mostly targeted various SIMD instruction sets (e.g., MMX,
SSE, and AVX);
Endianness conversion (25.7%). A common use case of in-
line assembly is to change the byte order of a value (see
Table 5), for example, when the ile format of a read ile and
the processor difer in their endianness. On x86, the xchg
instruction can be used to swap the bytes of 16-bit integers,
because x86 allows both the higher and lower byte of a 16-bit
register to be addressed. A less common alternative is to use
rotation left or right (rol or ror) by eight places. For 32-bit
and 64-bit values, the bswap instruction is used instead. Half
of the projects with instructions for endianness conversions
included the SDL library [54] as source iles in the repository
tree. Using inline assembly to implement endianness con-
version is most likely a performance optimization that is no
longer needed, because state-of-the-art compilers produce
as eicient code [22].
Hash functions (15.6%). A number of projects used inline
assembly to implement hash functions. This included the
crc32 instruction as well as the rol, ror, and shl instruc-
tions to compute the CRC32 and SHA hashsums (see Table 6).
The shift and rotate instructions could also simply be im-
plemented in C, and current C compilers produce eicient
machine code for them [49].
Bit scan (7.8%). Several projects used bit-scan instructions
to determine the most signiicant one-bit using bsr (in 12
projects) or the least signiicant one-bit using bsf (in 7
projects). Both instructions have many applications [68, Sec-
tions 5.3 and 5.4]. As bsr corresponds to a log2 function
that rounds the result down to the next lower integer, the
instruction was often used for this purpose. For an input
value, it is also possible to round the result up by providing
the input (value<<1)-1. Bitscan instructions were mostly
used by memory allocators such as jemalloc [20] (which was
included in four projects) or dlmalloc as well as by compres-
sion and math libraries.
Advanced Encryption Standard (AES) instructions
(2.2%). We found that 2.2% of the projects used inline as-
sembly to speed up AES using AES-NI instructions.
4.3 Functionality Unavailable in C
Feature detection (28.5%). The cpuid instruction allows
programs to request information about the processor. It was
often used to check cache size, facilities for random-number
generation, or support for SIMD instructions such as SSE and
AVX. Also, perhaps surprisingly, cpuid is deined as a łseri-
alizing instructionž in the processor's out-of-order execution
semantics, guaranteeing that all instructions preceding it
have been executed and none is moved above it.
Clock cycle counter (60.9%). Inline assembly was most
commonly used for accurate time measurement using the
rdtsc instruction. The rdtsc instruction reads the time-
stamp counter provided by the CPU. This instruction is
both eicient and accurate for measuring the elapsed cy-
cles, which makes it suitable for benchmarking [27]. As the
CPU's out-of-order execution could move the code-to-be-
benchmarked before the rdtsc instruction, it is typically
used together with a serializing instruction (such as cpuid)
when measuring the elapsed clock cycles. To minimize the
overhead when measuring the end time, the rdtscp instruc-
tion can be used, which also reads the timestamp counter
but has serializing properties; to prevent subsequent code
from being executed between the start- and end-measuring
instructions, another cpuid instruction is needed.
Debug interrupts (3.9%). Some projects used an interrupt
to programmatically set a breakpoint in the program. If a
debugger, such as GDB, is attached to the program, execut-
ing a breakpoint causes the program to pause execution. A
deinition of a breakpoint, for example,
#define BREAKPOINT asm("int $0x03")
is often selectively enabled through ifdefs, depending on
whether the debugging mode in the project is enabled.
Prefetching data (3.9%). The prefetch instruction was
used in 7 projects. It is a hint to the processor that the mem-
ory speciied by the operandwill be accessed soon, which typ-
ically causes it to be moved to the cache. Using a prefetch
instruction timely can improve performance, because the
latency of fetching data can be bridged. However, as pro-
cessors provide prefetching mechanisms in hardware, using
them correctly requires a thorough understanding of cache
mechanisms [39]. For example, software prefetches that are
issued too early can reduce the efectiveness of hardware
prefetching by evicting data that is still being used.
Random numbers (3.4%). The rdrand instruction was
used in 6 projects. It computes a secure random number
with an on-chip random-number generator that uses statisti-
cal tests to check the quality of the generated numbers [28].
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
Table 3. Instructions for atomics (with















































Programmers can verify successful random-number genera-
tion by checking the carry lag (CF), for example, by writing
its value to a variable (using the setc instruction).
4.4 Supporting Instructions
Some instructions were most commonly used together with
other instructions, and we therefore classify them as łsup-
porting instructionsž.
Moving and copying data (30.2%). Some inline assembly
fragments, mainly those larger in size, contained instruc-
tions to copy data to a register before some other instruction
accessed this register (see Table 9). While the mov instruc-
tion was also used in smaller fragments for that purpose, the
instruction could in many cases have been omitted entirely,
simply by correctly specifying the input and output con-
straints and letting the compiler generate the data-movement
code. In rarer cases, mov was also used to build a stack trace
by retrieving the value of %rbp. Additionally, the push and
pop instructions were used to save register values on the
stack and restore them. The pushf and popf instructions
were used to save and restore processor lags.
Arithmetic operations (21.2%). Arithmetic instructions
(see Table 10) were used in larger inline assembly fragments,
for example, in vector-reduction arithmetic (e.g., vector sum-
mation, inner product, and vector chain product) [47] in
crypto and math libraries. Additionally, they were used to
implement operations that are not available in standard C.














Table 11. Instructions for
control low (with at least











Table 12. Instructions that











An example is the mulq instruction, which can be used to
obtain a 128-bit result when multiplying two 64-bit integers.
Another example is use of the add instruction for imple-
menting signed integer addition with wraparound semantics,
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
because signed integer overlow has undeined behavior in
C [15]. Inline assembly was also used to implement opera-
tions on large integer types; for example, adc was used for
multi-word additions, because it adds the value of the carry
lag to the addition result (e.g., see [13]).
Control-flow instructions (13.4%). Control-low-related
instructions were mostly conined to larger inline assembly
fragments (see Table 11). Some of these instructions compute
condition values (test and cmp), while others transfer con-
trol low (e.g., jmp). However, they were also used for indirect
calls, for example, when implementing setjmp and longjmp
for coroutines. Another example was retrying the rdrand
instruction using jnc because it sets CF ̸=1 if unsuccessful.
Set-byte-on-condition (10.6%). Several projects used in-
structions that extract a value from the lags register (see Ta-
ble 12). They were typically used together with instructions
that indicate their success via a lag. For example, rdrand
sets CF=1 on success, and the lag's value can be used from
C by loading it into a variable using setc. As another ex-
ample, cmpxchg sets ZF=1 if the values in the operand and
destination are equal, which can be checked using setz.
No-ops (3.9%). The nop operation was used in 7 projects and
does not have any semantic efects. Normally, it is used for
instruction alignment to improve performance.
Rep instructions (3.4%). Instructions with a rep preix
were used to implement string operations (see Table 13).
The rep preix speciies that an instruction should be re-
peated a speciied number of times. To control the direction
of repetition, cld was used to clear the direction lag.
4.5 Implementing Inline Assembly
One goal was to determine the łlow-hanging fruitsž when
implementing inline assembly. Therefore, the question was
how many projects could be supported by implementing
only 5% of all x86-64 instructions (i.e., 50 instructions). The
result is shown in Table 14. It groups similar instructions
that can be easily implemented together in an order that
maximizes the number of supported projects with each new
group. Note that the order of the implementation makes a
diference because a project is considered to be supported
only if all the instructions it uses are supported.
First, the timing instructions should be implemented; al-
though rdtscp is seldom used, it is similar to rdtsc and
could be implemented together with it. Next would be the
feature detection instructions. For tools that execute C code,
the feature detection instructions could also be used to indi-
cate that certain features are missing (e.g., SIMD support),
which could then guide the program not to use inline as-
sembly for these features. Some instructions could be imple-
mented as łno-opsž, as they either have no semantic efect
(e.g., prefetch) or are important only when multithreaded
execution needs to be modeled or analyzed (e.g., memory
fences). Implementing bit operations and atomics, would
Table 14. Instruction groups and the percentage of projects
covered by them
Instruction group Instructions Supported
Projects
Timing rdtsc, rdtscp 11.0%
Feature detection cpuid, xgetbv 18.4%
”No-ops” ¡compiler barrier¿, mfence,
sfence, lfence, prefetch, nop,
int $0x03, pause, ud2
28.2%
Bit operations bsr, bsf, or, xor, neg, bswap,
shl, rol, ror
41.1%
Atomics lock xchg, lock cmpxchg,
lock xadd, lock add, lock
dec, lock inc
51.5%
Moving data mov, push, pop 58.9%
Checksum crc32 62.0%
Flag operations sete⁄setz, setc⁄setb, set-
ne⁄setnz, stc
67.5%
Arithmetics add, sub, mul, adc, lea, div,
imul, sbb, inc, dec
70.6%
Random numbers rdrand 72.4%
Control low jmp, jnb⁄jae⁄jnc 76.7%
String operations rep movsb 77.9%
allow half of the projects to be supported. Finally, by imple-
menting the other instructions in the table (50 in total), tool
writers could support 77.9% of the projects that we analyzed
in detail and 64.5% when counting also the projects that we
did not analyze in detail. An alternative to implementing rep
movsb would be int $0x80; however, we thought that this
instruction is di cult to implement because it is used for
system calls, and thus preferred rep movsb. In general, we
believe that the semantics of most instructions in the table
are relatively straightforward to support in comparison with
some other portions of the instruction set, such as extensions
for hardware transactional memory [72].
RQ5: By implementing 50 instructions (5% of x86-
64's total number of instructions) tool writers could
support 64.5% of all projects that contain inline as-
sembly.
Note that, depending on the tool, another order could
be more suitableÐtool writers can consult the database to
determine the order that best suits their project.
5 Declarative Use Cases of Inline Assembly
Our syntactic analysis naturally turned up uses of the asm
keyword, but, perhaps surprisingly, not all of these were for
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
inserting instructions. A small number of projects used it
instead for declarative means, for example, to control the
behavior of the linker. While many tools can ignore or work
around these usages of inline assembly, we discuss some of
the examples found as a irst step towards characterizing the
remaining łsoupž of non-C notations used in C codebases, as
noted in the Introduction. Additionally, we discuss examples
in which a mix of instruction representations was used to
encode certain instructions.
Specifying assembler names. Some projects use the inline
assembly asm keyword to specify the names of symbols, thus
preventing name mangling. For example,
AES_ECB_encrypt(...) asm("AES_ECB_encrypt");
is a function declaration with an inline assembly label that
speciies its symbol name in the machine code. In this exam-
ple, the function was implemented in macro assembly, so, in
order to guarantee binary compatibility, the name must not
be mangled. Labels are also used when the symbol cannot be
written in plain C (e.g., because it contains special characters
that are forbidden in C), and when symbol names need to be
accessible by a native function interface.
Linker warnings. A few projects used inline assembler di-
rectives to emit linker warnings when incompatible or dep-
recated functions of a library were included. C library imple-
mentations often make use of this; for example, using
asm(".section .gnu.warning.gets; .ascii
\"Please do not use gets!\"; .text");
at global scope causes the linker to emit a warning when
the unsecure gets function is linked.
Symbol versioning. Several libraries used symbol version-
ing to refer to older libc functions in order for code compiled
on a recent platform to work also on older platforms. A com-
mon example is memcpy, where current Linux versions link
to the relatively new glibc function memcpy@@GLIBC_2.14.
Most other standard library functions link to older glibc ver-
sions; for example, memset links to memset@@GLIBC_2.2.5.
If the most recent memcpy is not needed, and older platforms
should be supported, one can directly bind memcpy to the
older 2.2.5 version, for example, using
asm(".symver memcpy,memcpy@GLIBC_2.2.5").
Register variables. Programmers can use inline assembly
to associate local or global variables with a speciic regis-
ter [24]. For example, one could store an interpreter's pro-
gram counter in the %rsi register:
register unsigned char *pc asm("%rsi");
Such code was used for performance optimization.
Instruction representations. Inline assembly instructions
are normally written using mnemonics, which are textual
representations of the assembly instructions. However, 16.6%
of the projects with inline assembly (27 of 163) deviated from
this, either by avoiding instruction mnemonics entirely or
by combining mnemonics to surprising efect.
A number of projects denoted the pause instruction as
rep; nop. Even though the rep (0xF3) preix is unspec-
iied for nop (0x90), the resulting opcode corresponds to
that of the pause (0xFE90) instruction. This works because
portions of the preix-opcode space are, in efect, aliased,
and the assembler will accept an aliased combination in
place of the more direct encoding. In other cases, instruc-
tions were directly speciied by their opcode, for example,
.byte 0x0f, 0x01, 0xd0 to represent the xgetbv instruc-
tion. Some projects even mixed both representations within
an instruction; for example, .byte 0x66; clflush %0 was
used to specify clflushopt, because prepending 0x66 to
the opcode of clflush (0x0FAE) yields the opcode for
clflushopt (0x660FAE).
Programmers resort to such notations to allow use of older
assemblers which fail to recognize mnemonics, but can pro-
cess opcodes or simpler instructions. Such notations were
also used for less common architectures, for example, for
VIA's Padlock extensions [65]. While tool writers could treat
common patterns not speciied by their mnemonics as spe-
cial cases, canonicalizing them would be more comprehen-
sive, as also rare or unknown combinations of instruction-
representations could be supported.
6 Threats to Validity
We used a standard methodology [21] to identify validity
threats, which we mitigated where possible. We considered
internal validity (i.e., whether we controlled all confounding
variables), construct validity (i.e., whether the experiment
measured what we wanted to measure), and external validity
(i.e., whether our results are generalizable).
6.1 Internal Validity
The greatest threat to internal validity is posed by errors in
the analysis. We used a manual best-efort approach to ana-
lyze x86-64 inline assembly fragments detected by our string
search. It cannot be ruled out that we incorrectly included
inline assembly that works only for other architectures (e.g.,
x86-32), or, conversely, that we rejected some erroneously. To
address this, we carefully analyzed inline assembly fragments
and repeated analyses when we had doubts or when we
found a single inline assembly fragment in several projects,
so we believe that errors in the analysis have little impact
on the result. A threat in the qualitative analysis is that bi-
ases in our judgements inluenced the outcome of the study;
however, since we also used a quantitative approach, gross
distortions or misinterpretations are unlikely.
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
6.2 Construct Validity
The main threat to construct validity is that we used a source-
code-based search to determine the usage of inline assembly.
This approach enabled us to analyze the usage of inline as-
sembly independent of conditions such as operating system,
compiler and its version, platform, and availability of func-
tions and intrinsics. However, while conducting this survey,
we found that some system library headers (which are not
part of the project repository) contained inline assembly
in macros. For example, GCC provides a cpuinfo.h header
ile that wraps the cpuid instruction. We ignored such sys-
tem header libraries, and inspected only the source code of
the projects. While we recognize that this could have had
a minor impact on the quantitative analysis, we would ex-
pect the qualitative analysis to remain unafected, as the
macros were used for the same purposes as inline assembly
fragments. Note that, if the goal had been to analyze what
inline assembly instructions are actually executed on a given
system, a binary-level approach would have been more ap-
propriate. Similarly, to analyze which instructions appear in
built binaries in any particular coniguration, analysis after
C preprocessing would have been more useful.
6.3 External Validity
There are several threats to external validity, which are given
by the scope of our work.
Sample Set. One problem could be that the set of projects
is not representative of user-level C. To mitigate this and
increase the variety of projects, we employed two difer-
ent strategies to collect samples for analysis, one based on
GitHub stars as a proxy for popularity and one based on
keywords. Nevertheless, the number of stars of a project
might not relect its popularity, and our search keyword
could also bias the results. While inline assembly could difer
in domains not represented in the survey, we believe that
the overall results would difer only marginally, given the
large body of source code that we examined (1264 projects
and 56 million LOC).
OS sotware. We excluded projects with software that typ-
ically forms part of an operating system, which we would
expect to use more inline assembly than typical user appli-
cations, for example, in order to implement interrupt logic,
context switches, clearing pages, and for virtualization exten-
sions [5, 42]. The usage of inline assembly in such projects
would best be analyzed separately (especially when consid-
ering the size of operating systems), which we will consider
as part of future work. The indings of our survey are thus
not generalizable to such software.
Macro assembly code. We analyzed only inline assembly
in detail and not macro assembly, which is stored in separate
iles. Macro assembly is used to implement larger program
parts. This is relected in the high average number of 888.3
LOC of macro assembly in the 7.8% of projects that used
macro assembler. Note that projects with inline assembly
were likely to also contain macro assembly, namely with
33.5%. While inline assembly is syntactically and semanti-
cally embedded into C code (e.g., C code can access registers,
and inline assembly can access local C variables), macro as-
sembler communicates only via the calling convention of
the platform. As macro assembly can be called via native
function interfaces by C execution environments and allows
modular reasoning by analysis tools, we generally ignored
it. Our indings are not generalizable to macro assembly.
Architectures. In our study, we focused on x86 inline assem-
bly. However, when inline assembly was used for a particular
use case, it was typically implemented for several common
architectures (e.g., x86, ARM, and PowerPC). Most projects
provided both x86-32 and x86-64 implementations, which
were either the same or only slightly diferent (also see [30]).
In rare cases, x86 lacked an inline assembly implementation
that other architectures provided; for example, reversing the
individual bits in an integer is available on ARM using the in-
struction rbit, with no equivalent x86 instruction. However,
in general, we believe that we would have come to similar
conclusions regarding the usage of inline assembly for other
mainstream architectures.
GitHub. We performed the survey on open-source GitHub
projects, and our indings might not apply to proprietary
projects. Additionally, our indings might not be general-
izable to older code, where inline assembly may be more
frequent, since 89.1% of the projects we analyzed had their
irst commit in 2008 or later (the year GitHub was launched).
7 Related Work
To the best of our knowledge, inline assembly has to date
attracted little research attention, and consequently we con-
sider a wider context of related work.
LinuxAPI usage. Ourmethodologywas inspired by a study
of Linux API usage which analyzed the frequency of system
calls at the binary level to recommend an implementation
order [64]. While we adopted a similar perspective, we ana-
lyzed the usage of inline assembly in C projects. Additionally,
we directly analyzed the source code because we were inter-
ested in inline assembly usage independent of, for example,
compilers and compiler versions.
Inline assembly and teaching. Anguita et al. discussed
student motivation when learning about assembly-level ma-
chine organization in computer architecture classes [2]. In
these classes, students were taught instructions that high-
level languages lack (e.g., cpuid and rdtsc) and those that
can improve the performance of a program (e.g., by prefetch-
ing data or using SIMD instructions). We found strong simi-
larities between those instructions and the most frequently
used inline assembly instructions, which further supports
the validity of both studies.
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
Linker. Kell et al. studied the semantic role of linkers in
C [34]. As with inline assembly, linker features are used in
C programs, but transcend the language. Furthermore, some
linker-relevant functionality, such as symbol versioning, is
expressed in inline assembly.
C preprocessor. Ernst et al. explored the role of the C pre-
processor [17]. As with linker features, the C preprocessor
is also relied upon by C programs, but is not part of the
language. They found that the preprocessor servedÐamong
other purposesÐto include inline assembly.
Formal veriication. Some formal veriication approaches
support inline assembly and⁄or macro assembly [5]. For ex-
ample, Vx86 translates macro assembly to C code by abstract-
ing its functionality [42]. Manual approaches assume that
such inline assembly portions need to be converted to C
functions [31]. Note that it is more straightforward to trans-
late macro assembler, because C code mixed with assembler
typically exchanges values between registers and variables.
Binary analysis. Tools that analyze or process binaries are
widely established [3, 4, 11, 35, 46, 56] and could analyze
C projects after they have been compiled to machine code.
However, they are not always applicable, for example, when
analyzing the high-level semantics of a program or when
converting between diferent source languages.
8 Conclusion
We analyzed 1264 GitHub projects to determine the usage
of inline assembly in C projects using both quantitative and
qualitative analyses.
Our results demonstrate that inline assembly is relatively
common in C projects. 28.1% of the most popular C projects
contain inline assembly fragments, even when operating-
system-level software, which might be more likely to con-
tain inline assembly, is excluded. Inline assembly fragments
typically consist of a single instruction, and most projects
with inline assembly contain fewer than ten fragments. We
found that the majority of projects use the same subset of
instructions: by implementing 50 instructions, tool writers
could support as much as 64.5% of all projects that contain
inline assembly. 17.3% of the remaining projects use macro-
metaprogramming techniques and⁄or many inline assembly
fragments, for example, to beneit from SIMD instruction
set extensions. By implementing the remainder of the total
of 167 instructions and the SIMD instruction set extensions,
tool writers could support the majority of projects łin the
wildž. Another challenge to implementing inline assembly is
that invalid combinations of mnemonics are used that form
valid opcodes when converted to machine code.
We found that inline assembly is often used in crypto-
graphic applications. However, networking applications, me-
dia applications, databases, language implementations, con-
currency libraries, math libraries, text processing and web
servers also contain inline assembly. It is therefore likely
that tools have to deal with inline assembly, even if they are
intended for a speciic domain. Inline assembly is used for
multicore programming, for example, to implement compiler
barriers, memory barriers, and atomics. It is employed for
performance optimization, namely for SIMD instructions,
endianness conversions, hash functions, and bitscan opera-
tions. Further, it is used when a functionality is unavailable
in C, for example, for determining the elapsed clock cycles,
for feature detection, debug interrupts, data prefetching, and
generating secure random numbers. Finally, larger inline
assembly fragments use moves, arithmetic instructions, and
control low instructions as łillerž instructions. Interestingly,
the inline assembly syntax of compilers is not only used to
insert instructions but also to control symbol names, linker
warnings, symbol versioning, and register variables.
We believe that the results of our study are important to
tool writers who consider adding support for inline assembly.
Our study gives guidance on the need for such support and
helps to plan and prioritize the implementation of instruc-
tions. Additionally, this study could be useful to language
designers, as it reveals where plain C is inadequate to a task
and where developers fall back on assembler instructions.
Finally, compiler writers could obtain feedback on which in-
structions are frequently used, for example, to handle them
speciically in compiler warnings [59] (e.g., by analyzing
whether constraints and side efects are speciied correctly).
9 Future Work
Our study opened up several directions for future work. One
question is how inline assembly inluences program correct-
ness, since its use is error-prone; for example, undeclared
side efects are not detected by state-of-the-art compilers and
might remain as undetected faults or hard-to-debug errors in
the source code. This question might be addressed by novel
bug-inding tools that speciically target inline assembly. Sim-
ilarly, an open question is whether compilers handle inline
assembly correctly in every case. In recent years, random
program generators for testing compilers [50, 60, 71] and
other tools [14, 33] have been successful in identifying bugs.
Future work could investigate whether generating programs
with inline assembly could expose additional compiler bugs.
While investigating inline assembly, we found that many
programs use compiler intrinsics as an alternative to inline
assembly. However, we did not investigate the usage of com-
piler intrinsics, which could be done as part of a future study.
Finally, we believe that our study could be extended, for ex-
ample, by investigating inline assembly in software (I) that is
close to the machine (e.g., in operating systems), (II) in other
languages (e.g., in C++), and (III) for other architectures (e.g.,
for ARM), and by investigating macro assembly.
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
Acknowledgments
We thank the anonymous reviewers for their valuable com-
ments and suggestions, and Ingrid Abfalter for her proof-
reading and editorial assistance. We thank all members of
the Virtual Machine Research Group at Oracle Labs and the
Institute of System Software at Johannes Kepler University
Linz for their support and contributions. The authors from
Johannes Kepler University Linz are funded in part by a
research grant from Oracle. Kell was supported by EPSRC
Programme Grant łREMS: Rigorous Engineering for Main-
stream Systemsž, EP⁄K008528⁄1.
References
[1] Sarita V. Adve and Kourosh Gharachorloo. 1996. Shared Memory
Consistency Models: A Tutorial. Computer 29, 12 (Dec. 1996), 66±76.
htps://doi.org/10.1109/2.546611
[2] Mancia Anguita and F. Javier Fernández-Baldomero. 2007. Software
Optimization for Improving Student Motivation in a Computer Ar-
chitecture Course. IEEE Transactions on Education 50, 4 (Nov 2007),
373±378. htps://doi.org/10.1109/TE.2007.906603
[3] Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitel-
baum. 2005. CodeSurfer⁄x86ÐA Platform for Analyzing x86 Executa-
bles. In Proceedings of the 14th International Conference on Compiler
Construction (CC’05). Springer-Verlag, Berlin, Heidelberg, 250±254.
htps://doi.org/10.1007/978-3-540-31985-6_19
[4] Gogul Balakrishnan and Thomas Reps. 2010. WYSINWYX: What
You See is Not What You eXecute. ACM Trans. Program. Lang.
Syst. 32, 6, Article 23 (Aug. 2010), 84 pages. htps://doi.org/10.1145/
1749608.1749612
[5] Christoph Baumann, Bernhard Beckert, Holger Blasum, and Thorsten
Bormer. 2009. Better avionics software reliability by code veriication.
In Proceedings, embedded world Conference, Nuremberg, Germany.
[6] Ryad Benadjila, Olivier Billet, Shay Gueron, and Matt J. Robshaw.
2009. The Intel AES Instructions Set and the SHA-3 Candidates. In
Proceedings of the 15th International Conference on the Theory and
Application of Cryptology and Information Security: Advances in Cryp-
tology (ASIACRYPT ’09). Springer-Verlag, Berlin, Heidelberg, 162±178.
htps://doi.org/10.1007/978-3-642-10366-7_10
[7] Daniel J. Bernstein. 2005. Cache-timing attacks on AES. (2005).
[8] binutils. 2017. Using as. (2017). htps://sourceware.org/binutils/docs/
as/index.html (Accessed October 2017).
[9] Hans-J. Boehm. 2005. Threads Cannot Be Implemented As a Library.
In Proceedings of the 2005 ACM SIGPLAN Conference on Programming
Language Design and Implementation (PLDI ’05). ACM, New York, NY,
USA, 261±268. htps://doi.org/10.1145/1065010.1065042
[10] Hudson Borges, André C. Hora, and Marco Tulio Valente. 2016. Under-
standing the Factors That Impact the Popularity of GitHub Repositories.
In 2016 IEEE International Conference on Software Maintenance and
Evolution, ICSME 2016, Raleigh, NC, USA, October 2-7, 2016. 334±344.
htps://doi.org/10.1109/ICSME.2016.31
[11] Derek Bruening and Qin Zhao. 2011. Practical Memory Checking with
Dr. Memory. In Proceedings of the 9th Annual IEEE/ACM International
Symposium on Code Generation and Optimization (CGO ’11). IEEE
Computer Society, Washington, DC, USA, 213±223.
[12] Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: unas-
sisted and automatic generation of high-coverage tests for complex
systems programs. In Proceedings of the 8th USENIX conference on
Operating systems design and implementation (OSDI’08). USENIX Asso-
ciation, Berkeley, CA, USA, 209±224.
[13] Yu-Fang Chen, Chang-Hong Hsu, Hsin-Hung Lin, Peter Schwabe,
Ming-Hsien Tsai, Bow-Yaw Wang, Bo-Yin Yang, and Shang-Yi Yang.
2014. Verifying Curve25519 Software. In Proceedings of the 2014 ACM
SIGSAC Conference on Computer and Communications Security (CCS
’14). ACM, New York, NY, USA, 299±309. htps://doi.org/10.1145/
2660267.2660370
[14] Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John
Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing Static Ana-
lyzers with Randomly Generated Programs. In Proceedings of the 4th
International Conference on NASA Formal Methods (NFM’12). Springer-
Verlag, Berlin, Heidelberg, 120±125. htps://doi.org/10.1007/978-3-
642-28891-3_12
[15] Will Dietz, Peng Li, John Regehr, and Vikram Adve. 2012. Understand-
ing Integer Overlow in C⁄C++. In Proceedings of the 34th International
Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway,
NJ, USA, 760±770.
[16] Vijay D'Silva, Mathias Payer, and Dawn Song. 2015. The Correctness-
Security Gap in Compiler Optimization. In Proceedings of the 2015 IEEE
Security and Privacy Workshops (SPW ’15). IEEE Computer Society,
Washington, DC, USA, 73±87. htps://doi.org/10.1109/SPW.2015.33
[17] Michael D. Ernst, Greg J. Badros, and David Notkin. 2002. An Empirical
Analysis of C Preprocessor Use. IEEE Trans. Softw. Eng. 28, 12 (Dec.
2002), 1146±1170. htps://doi.org/10.1109/TSE.2002.1158288
[18] David Evans, John Guttag, James Horning, and Yang Meng Tan. 1994.
LCLint: A Tool for Using Speciications to Check Code. (1994), 87±96.
htps://doi.org/10.1145/193173.195297
[19] David Evans and David Larochelle. 2002. Improving Security Using
Extensible Lightweight Static Analysis. IEEE Softw. 19, 1 (Jan. 2002),
42±51. htps://doi.org/10.1109/52.976940
[20] Jason Evans. 2006. A Scalable Concurrent malloc(3) Implementation
for FreeBSD. (2006). htps://people.freebsd.org/~jasone/jemalloc/
bsdcan2006/jemalloc.pdf (Accessed October 2017).
[21] Robert Feldt and Ana Magazinius. 2010. Validity Threats in Empirical
Software Engineering Research - An Initial Survey. In Proceedings of
the 22nd International Conference on Software Engineering & Knowledge
Engineering (SEKE’2010), Redwood City, San Francisco Bay, CA, USA,
July 1 - July 3, 2010. 374±379.
[22] Mike Frysinger. 2015. Amd64 [un]ixes in SDL_endian.h. (2015). htps:
//discourse.libsdl.org/t/amd64-un-fixes-in-sdl-endian-h/11792 (Ac-
cessed October 2017).
[23] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusiewicz, Florian
Mendel, Christian Rechberger, Martin Schläfer, and Thomsen Sùren S.
2009. Grùstl - a SHA-3 candidate. In Symmetric Cryptography (Dagstuhl
Seminar Proceedings), Helena Handschuh, Stefan Lucks, Bart Preneel,
and Phillip Rogaway (Eds.). Schloss Dagstuhl - Leibniz-Zentrum fuer
Informatik, Germany, Dagstuhl, Germany. htp://drops.dagstuhl.de/
opus/volltexte/2009/1955
[24] GCC Manual. 2017. Variables in Speciied Registers. (2017). htps:
//gcc.gnu.org/onlinedocs/gcc/Explicit-Register-Variables.html (Ac-
cessed October 2017).
[25] Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken. 2016.
Stratiied Synthesis: Automatically Learning the x86-64 Instruction Set.
In Proceedings of the 37th ACM SIGPLAN Conference on Programming
Language Design and Implementation (PLDI ’16). ACM, New York, NY,
USA, 237±250. htps://doi.org/10.1145/2908080.2908121
[26] Gerard J Holzmann. 2002. UNO: Static source code checking for user-
deined properties. In Proc. IDPT, Vol. 2.
[27] Intel. 2010. How To Benchmark Code Execution Times on Intel




[28] Intel. 2014. Intel® Digital Random Number Generator
(DRNG) Software Implementation Guide. (2014). htps:
//sotware.intel.com/sites/default/files/managed/4d/91/DRNG_
Sotware_Implementation_Guide_2.0.pdf (Accessed October 2017).
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
[29] International Organization for Standardization. 2011. ISO⁄IEC
9899:2011. (2011).
[30] Andreas Jaeger. 2003. Porting to 64-bit GNU⁄Linux Systems. In Pro-
ceedings of the GCC Developers Summit. 107±121.
[31] Rob Johnson and David Wagner. 2004. Finding User⁄Kernel Pointer
Bugs with Type Inference. In Proceedings of the 13th Conference on
USENIX Security Symposium - Volume 13 (SSYM’04). USENIX Associa-
tion, Berkeley, CA, USA, 9±9.
[32] Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer,
Daniel M. German, and Daniela Damian. 2014. The Promises and
Perils of Mining GitHub. In Proceedings of the 11th Working Conference
on Mining Software Repositories (MSR 2014). ACM, New York, NY, USA,
92±101. htps://doi.org/10.1145/2597073.2597074
[33] Timotej Kapus and Cristian Cadar. 2017. Automatic Testing of Sym-
bolic Execution Engines via Program Generation and Diferential Test-
ing. In Proceedings of the 32Nd IEEE/ACM International Conference on
Automated Software Engineering (ASE 2017). IEEE Press, Piscataway,
NJ, USA, 590±600.
[34] Stephen Kell, Dominic P. Mulligan, and Peter Sewell. 2016. The Miss-
ing Link: Explaining ELF Static Linking, Semantically. In Proceed-
ings of the 2016 ACM SIGPLAN International Conference on Object-
Oriented Programming, Systems, Languages, and Applications (OOPSLA
2016). ACM, New York, NY, USA, 607±623. htps://doi.org/10.1145/
2983990.2983996
[35] Vladimir Kiriansky, Derek Bruening, and Saman P. Amarasinghe. 2002.
Secure Execution via Program Shepherding. In Proceedings of the 11th
USENIX Security Symposium. USENIX Association, Berkeley, CA, USA,
191±206.
[36] Robbert Krebbers and Freek Wiedijk. 2015. A Typed C11 Semantics
for Interactive Theorem Proving. In Proceedings of the 2015 Conference
on Certiied Programs and Proofs (CPP ’15). ACM, New York, NY, USA,
15±27. htps://doi.org/10.1145/2676724.2693571
[37] John B. Lacy. 1993. CryptoLib: Cryptography in Software. In Proceed-
ings of the 4th USENIX Security Symposium, Santa Clara, CA, USA,
October 4-6, 1993.
[38] Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Frame-
work for Lifelong Program Analysis & Transformation. In Proceedings
of the International Symposium on Code Generation and Optimization:
Feedback-directed and Runtime Optimization (CGO ’04). IEEE Computer
Society, Washington, DC, USA, 75±88.
[39] Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. 2012. When Prefetch-
ing Works, When It Doesn't, and Why. ACM Trans. Archit. Code
Optim. 9, 1, Article 2 (March 2012), 29 pages. htps://doi.org/10.1145/
2133382.2133384
[40] A. Liu and P. Ning. 2008. TinyECC: A Conigurable Library for Elliptic
Curve Cryptography in Wireless Sensor Networks. In 2008 Interna-
tional Conference on Information Processing in Sensor Networks (ipsn
2008). 245±256. htps://doi.org/10.1109/IPSN.2008.47
[41] Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang,
Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéJàVu: A Map of
Code Duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA,
Article 84 (Oct. 2017), 28 pages. htps://doi.org/10.1145/3133908
[42] Stefan Maus, Michal Moskal, and Wolfram Schulte. 2008. Vx86: X86
Assembler Simulated in C Powered by Automated Theorem Proving.
(2008), 284±298. htps://doi.org/10.1007/978-3-540-79980-1_22
[43] Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nien-
huis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016.
Into the Depths of C: Elaborating the De Facto Standards. In Proceed-
ings of the 37th ACM SIGPLAN Conference on Programming Language
Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 1±15.
htps://doi.org/10.1145/2908080.2908081
[44] mrigger. 2017. Inline Assembler. (2017). htps://github.com/
elliotchance/c2go/issues/228 (Accessed October 2017).
[45] George C. Necula, Scott McPeak, Shree Prakash Rahul, and Westley
Weimer. 2002. CIL: Intermediate Language and Tools for Analysis
and Transformation of C Programs. In Compiler Construction, 11th
International Conference, CC 2002, Held as Part of the Joint European
Conferences on Theory and Practice of Software, ETAPS 2002, Grenoble,
France, April 8-12, 2002, Proceedings. 213±228. htps://doi.org/10.1007/
3-540-45937-5_16
[46] Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework
for heavyweight dynamic binary instrumentation. In Proceedings of the
ACM SIGPLAN 2007 Conference on Programming Language Design and
Implementation, San Diego, California, USA, June 10-13, 2007. 89±100.
htps://doi.org/10.1145/1250734.1250746
[47] Lionel M. Ni and Kai Hwang. 1985. Vector-Reduction Techniques for
Arithmetic Pipelines. IEEE Trans. Comput. C-34, 5 (May 1985), 404±411.
htps://doi.org/10.1109/TC.1985.1676580
[48] Joe Olivas, Mike Chynoweth, and Tom Propst. 2015. Beneitting Power
and Performance Sleep Loops. (2015). htps://sotware.intel.com/
en-us/articles/benefiting-power-and-performance-sleep-loops (Ac-
cessed October 2017).
[49] John Regehr. 2013. Safe, Eicient, and Portable Rotate in C⁄C++. (2013).
htps://blog.regehr.org/archives/1063 (Accessed October 2017).
[50] John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and
Xuejun Yang. 2012. Test-case Reduction for C Compiler Bugs. In
Proceedings of the 33rd ACM SIGPLAN Conference on Programming
Language Design and Implementation (PLDI ’12). ACM, New York, NY,
USA, 335±346. htps://doi.org/10.1145/2254064.2254104
[51] Manuel Rigger, Matthias Grimmer, Christian Wimmer, Thomas
Würthinger, and Hanspeter Mössenböck. 2016. Bringing Low-level
Languages to the JVM: Eicient Execution of LLVM IR on Trule. In
Proceedings of the 8th International Workshop on Virtual Machines and
Intermediate Languages (VMIL 2016). ACM, New York, NY, USA, 6±15.
htps://doi.org/10.1145/2998415.2998416
[52] Manuel Rigger, Roland Schatz, Matthias Grimmer, and Hanspeter
Mössenböck. 2017. Lenient Execution of C on a Java Virtual Ma-
chine: Or: How I Learned to Stop Worrying and Run the Code. In
Proceedings of the 14th International Conference on Managed Languages
and Runtimes (ManLang 2017). ACM, New York, NY, USA, 35±47.
htps://doi.org/10.1145/3132190.3132204
[53] Manuel Rigger, Roland Schatz, Rene Mayrhofer, Matthias Grimmer,
and Hanspeter Mössenböck. Sulong, and Thanks For All the Bugs:
Finding Errors in C Programs by Abstracting from the Native Execu-
tion Model. In Proceedings of the Twenty-Third International Conference
on Architectural Support for Programming Languages and Operating
Systems (ASPLOS 2018). htps://doi.org/10.1145/3173162.3173174
[54] SDL. 2017. Simple DirectMedia Layer. (2017). htps://www.libsdl.org/
(Accessed October 2017).
[55] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and
Dmitriy Vyukov. 2012. AddressSanitizer: A Fast Address Sanity
Checker. In 2012 USENIX Annual Technical Conference, Boston, MA,
USA, June 13-15, 2012. 309±318.
[56] Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan
Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin
Poosankam, and Prateek Saxena. 2008. BitBlaze: A New Approach
to Computer Security via Binary Analysis. In Proceedings of the 4th
International Conference on Information Systems Security (ICISS ’08).
Springer-Verlag, Berlin, Heidelberg, 1±25. htps://doi.org/10.1007/
978-3-540-89862-7_1
[57] Henry Spencer and Geof Collyer. 1992. #ifdef Considered Harm-
ful, or Portability Experience with C News. In USENIX Sum-
mer 1992 Technical Conference, San Antonio, TX, USA, June 8-
12, 1992. htps://www.usenix.org/conference/usenix-summer-1992-
technical-conference/ifdef-considered-harmful-or-portability
[58] Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer:
fast detector of uninitialized memory use in C++. In Proceedings of the
13th Annual IEEE/ACM International Symposium on Code Generation
and Optimization, CGO 2015, San Francisco, CA, USA, February 07 - 11,
An Analysis of x86-64 Inline Assembly in C Programs VEE ’18, March 25, 2018, Williamsburg, VA, USA
2015. 46ś55. htps://doi.org/10.1109/CGO.2015.7054186
[59] Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding and Analyzing
Compiler Warning Defects. In Proceedings of the 38th International
Conference on Software Engineering (ICSE ’16). ACM, New York, NY,
USA, 203±213. htps://doi.org/10.1145/2884781.2884879
[60] Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler
Bugs via Live Code Mutation. In Proceedings of the 2016 ACM SIGPLAN
International Conference on Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA 2016). ACM, New York, NY,
USA, 849±863. htps://doi.org/10.1145/2983990.2984038
[61] Piotr Szczechowiak, Leonardo B. Oliveira,Michael Scott, Martin Collier,
and Ricardo Dahab. 2008. NanoECC: Testing the Limits of Elliptic
Curve Cryptography in Sensor Networks. (2008), 305±320. htps:
//doi.org/10.1007/978-3-540-77690-1_19
[62] Reinhard Tartler, Daniel Lohmann, ChristianDietrich, Christoph Egger,
and Julio Sincero. 2012. Coniguration Coverage in the Analysis of
Large-scale System Software. SIGOPS Oper. Syst. Rev. 45, 3 (Jan. 2012),
10±14. htps://doi.org/10.1145/2094091.2094095
[63] Lucas Torri, Guilherme Fachini, Leonardo Steinfeld, Vesmar Camara,
Luigi Carro, and Érika Cota. 2010. An evaluation of free⁄open
source static analysis tools applied to embedded software. In 2010
11th Latin American Test Workshop. 1±6. htps://doi.org/10.1109/
LATW.2010.5550368
[64] Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E.
Porter. 2016. A Study of Modern Linux API Usage and Compatibil-
ity: What to Support when You're Supporting. In Proceedings of the
Eleventh European Conference on Computer Systems (EuroSys ’16). ACM,
New York, NY, USA, Article 16, 16 pages. htps://doi.org/10.1145/
2901318.2901341
[65] VIA. 2005. New VIA PadLock SDK Extends Security Support in VIA
C7®⁄C7®-M Processors for Windows and Linux Software Developers.
(2005). htps://www.viatech.com/en/2005/11/new-via-padlock-sdk-
extends-security-support-in-via-c7c7-m-processors-for-windows-
and-linux-sotware-developers/ (Accessed October 2017).
[66] Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zel-
dovich, and M. Frans Kaashoek. 2012. Undeined Behavior: What
Happened to My Code?. In Proceedings of the Asia-Paciic Workshop
on Systems (APSYS ’12). ACM, New York, NY, USA, Article 9, 7 pages.
htps://doi.org/10.1145/2349896.2349905
[67] Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-
Lezama. 2013. Towards Optimization-safe Systems: Analyzing the
Impact of Undeined Behavior. In Proceedings of the Twenty-Fourth
ACM Symposium on Operating Systems Principles (SOSP ’13). ACM, New
York, NY, USA, 260±275. htps://doi.org/10.1145/2517349.2522728
[68] Henry S Warren. 2013. Hacker’s delight. Pearson Education.
[69] Deng Xu. 2011. [Frama-c-discuss] inline assembly code. (2011).
htps://lists.gforge.inria.fr/pipermail/frama-c-discuss/2011-March/
002589.html (Accessed October 2017).
[70] Zhongxing Xu, Ted Kremenek, and Jian Zhang. 2010. AMemoryModel
for Static Analysis of C Programs. In Leveraging Applications of Formal
Methods, Veriication, and Validation - 4th International Symposium on
Leveraging Applications, ISoLA 2010, Heraklion, Crete, Greece, October
18-21, 2010, Proceedings, Part I. 535±548. htps://doi.org/10.1007/978-
3-642-16558-0_44
[71] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Find-
ing and Understanding Bugs in C Compilers. In Proceedings of the
32Nd ACM SIGPLAN Conference on Programming Language Design
and Implementation (PLDI ’11). ACM, New York, NY, USA, 283±294.
htps://doi.org/10.1145/1993498.1993532
[72] Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar.
2013. Performance Evaluation of Intel Transactional Synchronization
Extensions for High-performance Computing. In Proceedings of the
International Conference on High Performance Computing, Networking,
Storage and Analysis (SC ’13). ACM, New York, NY, USA, Article 19,
11 pages. htps://doi.org/10.1145/2503210.2503232
Appendix
Table 15 shows the instructions sorted by their frequency.
VEE ’18, March 25, 2018, Williamsburg, VA, USA M. Rigger et al.
Table 15. Instruction table with instructions that were contained in at least 2 projects
instruction # projects % projects instruction # projects % projects instruction # projects % projects
rdtsc 54 27.4 test 9 4.6 aeskeygena 4 2.0
cpuid 50 25.4 jc 8 4.1 cld 4 2.0
mov 49 24.9 movdqa 8 4.1 ja 4 2.0
43 21.8 shr 8 4.1 jbe 4 2.0
lock xchg 28 14.2 xgetbv 8 4.1 lock bts 4 2.0
pause 27 13.7 bsf 7 3.6 lods 4 2.0
lock cmpxchg 26 13.2 call 7 3.6 pclmulqdq 4 2.0
xor 25 12.7 inc 7 3.6 pslldq 4 2.0
add 21 10.7 int $0x03 7 3.6 psllq 4 2.0
bswap 18 9.1 jnc 7 3.6 psrldq 4 2.0
jmp 18 9.1 nop 7 3.6 rep stos 4 2.0
lock xadd 17 8.6 por 7 3.6 sar 4 2.0
pop 14 7.1 prefetch 7 3.6 setnz 4 2.0
push 14 7.1 setc 7 3.6 stos 4 2.0
cmp 13 6.6 dec 6 3.0 imul 3 1.5
mfence 13 6.6 lock add 6 3.0 lock or 3 1.5
mul 13 6.6 neg 6 3.0 lock sub 3 1.5
sfence 13 6.6 rdrand 6 3.0 movzb 3 1.5
sub 13 6.6 rep movs 6 3.0 pand 3 1.5
adc 12 6.1 crc32 5 2.5 pushf 3 1.5
bsr 12 6.1 lock dec 5 2.5 shrd 3 1.5
shl 12 6.1 lock inc 5 2.5 div 2 1.0
jz 11 5.6 movdqu 5 2.5 emms 2 1.0
lea 11 5.6 pshufd 5 2.5 ldcw 2 1.0
lfence 11 5.6 psrlq 5 2.5 int $0x80 2 1.0
or 11 5.6 rdtscp 5 2.5 jl 2 1.0
ror 11 5.6 ret 5 2.5 ldmxcsr 2 1.0
jnz 10 5.1 aesdec 4 2.0 lock and 2 1.0
setz 10 5.1 aesdeclast 4 2.0 popf 2 1.0
and 9 4.6 aesenc 4 2.0 punpcklb 2 1.0
pxor 9 4.6 aesenclast 4 2.0 punpckldq 2 1.0
rol 9 4.6 aesimc 4 2.0 stmxcsr 2 1.0
