GuardNN: Secure DNN Accelerator for Privacy-Preserving Deep Learning by Hua, Weizhe et al.
GuardNN: Secure DNN Accelerator for
Privacy-Preserving Deep Learning
Weizhe Hua, Muhammad Umar, Zhiru Zhang, and G. Edward Suh
School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
{wh399, mu94, zhiruz, gs272}@cornell.edu
Abstract—This paper proposes GuardNN, a secure deep neural
network (DNN) accelerator, which provides strong hardware-
based protection for user data and model parameters even in
an untrusted environment. GuardNN shows that the architecture
and protection can be customized for a specific application to pro-
vide strong confidentiality and integrity protection with negligible
overhead. The design of the GuardNN instruction set reduces
the TCB to just the accelerator and enables confidentiality
protection without the overhead of integrity protection. GuardNN
also introduces a new application-specific memory protection
scheme to minimize the overhead of memory encryption and
integrity verification. The scheme shows that most of the off-
chip meta-data in today’s state-of-the-art memory protection can
be removed by exploiting the known memory access patterns
of a DNN accelerator. GuardNN is implemented as an FPGA
prototype, which demonstrates effective protection with less than
2% performance overhead for inference over a variety of modern
DNN models.
I. INTRODUCTION
Modern machine learning (ML) models such as deep neural
networks (DNNs) are compute and memory intensive, and are
often executed on hardware accelerators in the cloud [1], [2],
[3]. At the same time, the data-intensive nature of DNNs raises
a series of concerns for security and privacy. For example, ML
algorithms typically require collecting, storing, and processing
a large amount of personal and potentially private data which
can be exposed or misused by a remote server if it is either
compromised or malicious. This paper proposes a secure
DNN accelerator architecture, named GuardNN, which enables
privacy-preserving ML under untrusted environments.
A promising approach to providing strong confidentiality
and integrity guarantees under untrusted environments is to
create a hardware-protected trusted execution environment
(TEE), also called an enclave as in Intel SGX [4]. So far this
approach has primarily been studied in the context of general-
purpose processors. This paper extends the approach to DNN
accelerators as shown in Figure 1. To protect sensitive data,
the secure DNN accelerator keeps all confidential information
including inputs, outputs, training data, and network parameters
(weights) in an encrypted form outside of the trusted hardware
boundary, such as a dedicated ASIC/FPGA accelerator chip or
an accelerator IP in an SoC. Each accelerator contains a unique
private key that can only be used by the accelerator hardware
itself. Users can authenticate the accelerator remotely using the
corresponding public key and the certificate, and send private
inputs and weights encrypted to the accelerator for processing.
In this way, the secure accelerator can ensure that private
Fig. 1: Secure ML accelerator overview.
user data and model parameters cannot be accessed by an
adversary even if they control software or physically access the
accelerator. The secure accelerator can also protect the integrity
of ML computation by incorporating remote attestation and
off-chip integrity verification.
While GuardNN adopts the high-level approach of the secure
enclaves such as Intel SGX, it addresses the new challenges
raised by the ML acceleration to protect a large amount of
data in a heterogeneous system and improves both security
and performance over the general-purpose enclaves. Unlike
processors that are allowed to perform arbitrary operations and
memory accesses, custom accelerators only need to support
a relatively small set of operations and often have a memory
access pattern that is specific to the target application. This
application-specific nature of accelerators enables GuardNN
to customize both its security architecture and protection
mechanisms for DNN computations, and provide strong security
with almost no performance overhead.
The following summarizes the key benefits and insights
that GuardNN provides compared to directly applying today’s
enclave approach to DNN accelerators. 1) GuardNN carefully
designs its architecture and instructions to enable confidentiality
and integrity protection without trusting a host CPU that
controls scheduling and resource allocation. This design reduces
the trusted computing base (TCB) to just the accelerator.
2) The GuardNN instruction set enables confidentiality-only
protection, which is sufficient for privacy-preserving ML,
without the complexity and overhead of integrity protection.
3) GuardNN introduces a new application-specific memory
protection (ASMP) scheme, which enables memory encryption
and integrity verification with almost zero overhead by cus-
tomizing protection for the DNN memory access patterns. 4)
We make an observation that the memory access pattern and
the timing of a DNN accelerator without dynamic pruning is
independent of input and weight values, which enables strong
1
ar
X
iv
:2
00
8.
11
63
2v
1 
 [c
s.C
R]
  2
6 A
ug
 20
20
side-channel protection not available in today’s TEEs.
To validate and evaluate the GuardNN design, we imple-
mented a prototype system based on CHaiDNN [5], an open-
source DNN accelerator from Xilinx. This FPGA prototype
implements the customized memory encryption scheme for
confidentiality protection. The experimental results on a Xilinx
Zynq UltraScale+ FPGA board demonstrate functional cor-
rectness and show that the overhead of memory encryption is
negligible. For more detailed analyses, we performed additional
experiments using a combination of RTL simulation for
CHaiDNN and cycle-level simulation for a memory system.
The simulation results show that GuardNN can provide both
memory encryption and integrity verification with almost no
overhead. The results suggest that today’s general-purpose
memory protection schemes lead to 30% performance overhead
as memory bandwidth is often a bottleneck in DNN accelerators.
The prototype and the analysis also show that the area overhead
of GuardNN mainly comes from AES encryption engines and
is small compared to today’s DNN accelerators.
This paper makes the following major contributions:
• We present a secure DNN accelerator architecture and
interface, which enables secure DNN computation in un-
trusted environments with a minimal TCB and decoupled
protection for confidentiality and integrity.
• We propose an application-specific memory protection
scheme, which minimizes the performance overhead
of memory protection by customizing protection for
application-specific memory access patterns.
• We demonstrate the low performance overhead of
GuardNN through both a functional FPGA prototype and
detailed simulation studies. The overhead of GuardNN is
less than 2% for DNN inference on the state-of-the-art
models.
II. SECURE DNN ACCELERATOR ARCHITECTURE
A. Threat Model
We assume that a DNN accelerator is capable of running
both DNN inference and training. A scheduler runs on a host
CPU and coordinates computation and data movement by
communicating with a remote user and issuing commands
to the DNN accelerator. The remote user sends inputs and a
DNN model and receives final results.
The goal of a secure DNN accelerator is to protect the
confidentiality and optionally the integrity of DNN data and
computation in an environment where only the accelerator itself
can be trusted. For privacy, the secure DNN accelerator aims
to protect inputs, outputs (prediction results), training data,
network parameters (weights), and all intermediate results as
secrets. On the other hand, we consider the DNN network
architecture as public information and do not hide the network
structure. For integrity, the secure DNN accelerator aims to
detect any unauthorized changes to its state and execution so
that a user can verify that the output represents the outcome
of the given DNN model/input.
The DNN accelerator itself is trusted and authenticated
by the remote user using a unique private key that is only
Enc/IV 
EngineDRAM
Cipher
-text
Host 
Interface
VN
Plain
-text 
DNN 
Functional 
Unit
r/wAddr.
DRAM
Controller
DRAM
Interface
GuardNN Accelerator
Memory 
Protection
Unit
Network
Inst.
DRAM
Interface
Host 
CPU
DNN 
software
DNN 
model
Remote
User
CA
Network
Micro-
Controller
SKAccel
PKAccel
Fig. 2: GuardNN architecture overview — The green and red
boxes represent trusted and untrusted components, respectively. The
grey and blue boxes stand for original and newly added hardware.
known by the accelerator hardware. The accelerator hardware
needs to be designed and fabricated by a trusted manufacturer.
The manufacturer also needs to securely embed a private key
specific to each accelerator instance, and provide a certificate.
We assume that the internal operations and state of the DNN
accelerator cannot be directly observed or changed by an
adversary whereas anything outside of the accelerator including
off-chip memory and a host processor are assumed untrusted.
We aim to prevent information leakage through memory and
timing side-channels by ensuring external memory accesses
and timing are independent of secret data. Other physical side-
channels, such as the power and EM side-channels, are not
considered. We also do not consider adversarial examples that
exploit weaknesses in a model itself.
B. Key Insights and Features
Today’s secure enclaves for processors aim to provide
general-purpose protection and support arbitrary code inside
an enclave. On the other hand, the interface and protection
mechanisms can be customized for the target application
scenario and workload when protecting an application-specific
accelerator. GuardNN leverages this high-level insight to reduce
its trusted computing base (TCB), reduce complexity and
overhead, and provide stronger security. Here, we summarize
the key insights and features in GuardNN for application-
specific protection for privacy-preserving deep learning.
Insight 1. Small TCB: The accelerator can allow untrusted
host to manage scheduling and resource allocation if no
instruction can leak secrets.
DNN accelerators typically rely on scheduling and opti-
mization algorithms on a host processor to determine which
operations to run. Directly extending todays secure enclave
will lead to a large TCB with complex mechanisms, requiring
protection mechanisms for both the host processor and the
accelerator, and a secure communication channel between them.
Instead, GuardNN ensures confidentiality without trusting a
host processor by designing its accelerator instruction set so
that sensitive information is always encrypted no matter which
instruction is executed. The outputs are encrypted so that they
can be decrypted only by the remote user who initiates a
session and sends an encrypted model, weights, and inputs.
The untrusted host processor chooses which DNN operations to
be performed, but cannot make the accelerator produce outputs
2
TABLE I: The security features provided by GuardNN.
Security Function Mechanism Threat
Key Generation True random number generator Replay/key guessing
Key Exchange DHE key-exchange protocol Untrusted host/network
Off-chip Mem.
Protection
Application-specific memory
encryption and integrity verification
Untrusted host/
physical attacks
Restricted
Instruction Set
No instruction allows outputting
secrets in plaintext Untrusted host
Remote
Attestation
Hashes of input, output, weights,
and instructions; Sign the hashes Untrusted host
Side-channel
Protection
The memory access pattern and the
timing are independent of secrets
Memory and timing
side-channels
in plaintext. This design significantly reduces the size of the
TCB and the cost of adding security protection.
Insight 2. Confidentiality-Only Protection: By always en-
crypting outputs and making the external behavior independent
of data, confidentiality can be provided without integrity.
The custom instruction set enables GuardNN to decou-
ple confidentiality and integrity protection, and protect the
confidentiality of private data without paying the cost of
integrity protection. Regardless of the sequences of instruction,
private data are always encrypted outside the accelerator. In
addition, the memory access patterns and execution times
of DNN accelerators without dynamic pruning [6], [7], [8],
[9], [10], [11] are independent of input data values. Hence,
the confidentiality guarantees of GuardNN do not depend on
the integrity of the instruction sequences and data values. In
contrast, the secure enclaves require integrity protection even
for confidentiality; since the trusted software inside the enclave
is allowed to output confidential information unencrypted, the
integrity of the program must be protected even when only
confidentiality is needed.
Insight 3. Application-Specific Memory Protection: For
accelerators, the overhead of memory protection can be greatly
reduced by customizing protection for a specific application.
The cryptographic protection of off-chip memory, typically
represents the main source of overhead in secure processors [4],
[12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22],
[23], [24], [25]. The traditional memory protection requires
additional meta-data such as version numbers and Message
Authentication Codes (MACs) stored in memory. Meta-data
accesses add delays and also consume memory bandwidth,
which is often a scarce resource in DNN accelerators. Modern
ML models increasingly require a more massive amount
of memory, and memory protection can lead to nontrivial
performance overhead for bandwidth-limited ML models. For
example, recommendation networks may need tens to hundreds
of GBs for embedding tables with poor temporal locality [26].
The overhead of memory protection can be significantly
reduced by customizing the protection for DNN computation.
The key observation is that application-specific accelerators
move data between on-chip and off-chip memory following a
relatively simple pattern that is tailored to an application and
largely known at the design time. For example, the memory
accesses in a DNN accelerator follow a simple and static pattern
specific to a given DNN structure. By exploiting the application-
TABLE II: GuardNN accelerator state.
Name Description
K
ey
SKAccel Private key of the accelerator
[PKAccel, Cert] Public key of the accelerator and certificate
KSession AES session key
KMEnc AES key for memory (DRAM) protection
C
T
R CTRW, CTRIN Counters for weights and inputs
CTRF,W, IDF,R Counter/ID for features
H
as
h HW, HIN, HOUT Hashes for weights, inputs, and outputs
HInst Hash of instructions and parameters
specific memory pattern for DNNs, GuardNN calculates version
numbers from its state without storing them in off-chip
memory; when integrity protection is needed, it maintains
MACs at a coarse granularity that matches the accelerator’s
access granularity. This application-specific memory protection
(ASMP) enables confidentiality and integrity protection with
almost no performance overhead. Moreover, for confidentiality-
only protection, ASMP requires no off-chip meta-data and can
be added to an existing DNN accelerator with no impact on
scheduling and data placement.
Insight 4. Side Channel Resilience: Accelerators tend to
have more regular memory access patterns and provide an
opportunity for stronger side-channel resilience at low costs.
For a DNN accelerator without dynamic pruning techniques,
the memory access pattern and the timing of a given DNN
model are agnostic to inputs and weights. In that sense, DNN
accelerators provide strong protection against software-visible
side-channels such as timing and memory access patterns,
which is expensive in general-purpose processors.
C. GuardNN Architecture
Here we introduce the GuardNN architecture and the
protection mechanisms. Figure 2 shows the high-level block
diagram, and Table I summarizes the protection mechanisms.
The accelerator needs to be able to establish a secure
communication channel with a remote user. For this purpose,
GuardNN requires a unique private key (SKAccel) and a true
random number generator (TRNG) in each secure accelerator,
and introduces an instruction that allows the accelerator to
securely exchange a symmetric key (KSession) with the remote
user. We assume that the user obtains the corresponding public
key using a public key infrastructure (PKI) as in Intel SGX or
Trusted Platform Modules (TPMs). DNN model parameters,
inputs, and outputs are sent through the secure communication
channel. GuardNN provides instructions to import encrypted
inputs and parameters, and produce an encrypted output.
During the execution, the GuardNN accelerator receives
instructions from its host CPU to perform DNN computations.
The instruction set is carefully designed to ensure that confi-
dential information is always encrypted outside the accelerator
no matter which instruction runs. For side-channel protection,
GuardNN requires that the timing and memory access pattern
of the accelerator are independent of secret data such as user
data and model parameters. This ensures that confidentiality is
protected without trusting the host.
3
TABLE III: Instruction set of the GuardNN accelerator.
Mnemonic Description
B
as
e
Forward Compute the forward propagation of a conv/dense layer
Backward Compute the backpropagation of a conv/dense layer
Aggregate Update the weights using the gradients
G
ua
rd
N
N
GetPK Return the accelerator public key and the certificate
InitSession
Establish a secure comm. channel, reset state and KMEnc,
enable ME (and IV, enable hashes for integrity protection)
SetWeight
Decrypt the weights with KSession and encrypt with KMEnc,
(compute the hash of the weights if IV protection is enabled)
SetInput
Decrypt the input with KSession and encrypt with KMEnc,
(compute the hash of the input if IV protection is enabled)
ExportOutput Decrypt the output with KMEnc and encrypt with KSession
SignOutput Sign the hashes (HW, HIN, HOUT, and HInst) with SKAccel
SetReadFID Set IDF,R for an address range
To protect data in external memory (DRAM), GuardNN
includes a memory encryption (Enc) engine that encrypts data
in DRAM, and an integrity verification (IV) engine that detects
unauthorized changes on a read from external memory. To
minimize the performance overhead of memory protection,
GuardNN introduces a new memory protection scheme as
detailed in Section IV. The memory protection scheme is based
on the counter-mode encryption, and GuardNN maintains a set
of counters which ensure the same version number is never
re-used for encryption. Table II summarizes the cryptographic
keys and the accelerator state in GuardNN.
For integrity protection, at runtime, GuardNN computes the
hashes of inputs and weights when they are imported, and the
hash of each running instruction and its input arguments. Then,
GuardNN provides an instruction that signs the hash of the
output and other hashes using the accelerator’s private key so
that the user can check the initial state and the execution. As
shown in Figure 2, a microcontroller is used to perform these
instructions.
III. GUARDNN INSTRUCTION SET
This section describes the GuardNN instruction set and
shows how the instructions can be used for DNN inference.
A. GuardNN Instructions
The GuardNN instruction set is designed to be an extension
that can be added to a DNN accelerator without changing
the base instructions. Table III shows three instructions for
common DNN operations and the GuardNN extension for
security functions. A user can choose if integrity protection is
needed when initiating a session. If it is enabled, the operations
inside the parentheses will be performed.
GetPK: Returns the public key (PK) and the certificate (Cert).
InitSession: Given a public key from a remote user,
the accelerator runs a key exchange protocol to agree on a
symmetric session key and establish a secure communication
channel with the user. In this work, we use the DHE key-
exchange protocol and the standard AES counter mode (AES-
CM) for encrypted communications for inputs, weights, and
outputs. We will not elaborate the details of our particular
implementation here, since this is a well-known problem that
GuardNN
Accel.
Host
CPU
User
PKAccel, Cert
PKUE InitSession
DigSign,PKXE SetWeight
GetPK
SetInput
SetReadFID
ExportOut
Forward
1-11-2
2-1 2-2
Enc(W), Arch.3-1
3-2
Enc(in)4-1
4-2
5
6
7-1
Enc(out)7-2
2-3
ExportSign8-1DigSign8-2
Fig. 3: DNN inference using the GuardNN instructions.
can be solved by standard protocols. Other protocols such
as SSL can also be used for encrypted and authenticated
communication.
The accelerator also clears all states (keys, data, and hashes),
sets a new memory encryption key (KMEnc), resets all counters
to zero, and enables memory protection. If integrity protection
is enabled, memory integrity verification (IV) and hashing of
instructions and their operands are also enabled.
SetWeight and SetInput: On SetWeight, the acceler-
ator imports encrypted weights; these weights are decrypted
with the session key (KSession) and protected by the accelera-
tor’s memory encryption and optionally integrity verification
schemes. Then, the weight counter (CTRW) is incremented
(see Section IV). Similarly, on SetInput, the accelerator
imports the encrypted input from the user into its protected
memory, and increments the input counter (CTRIN) . If integrity
protection is enabled, the accelerator also computes the hash
of the input/weights for remote attestation.
ExportOutput: The accelerator reads the output of DNN
computation from its protected memory, and re-encrypts the
output with KSession so that the output can be sent to the user.
SignOutput: The accelerator computes a digital signature
of the hashes of the input, output, weights, and the sequence
of instructions/operands using its private key (SKAccel). By
verifying this signature using the corresponding public key,
the user can verify that the output was produced by the
particular accelerator using the correct initial state and the
correct sequence of instructions.
SetReadFID: The application-specific memory protec-
tion scheme (detailed in Section IV) uses the feature map ID
(i.e., IDF,R) to determine the version number when decrypting
feature values. This number is pre-determined based on the
network structure and scheduling, and does not need to be
trusted for confidentiality as it only affects decryption. To
reduce hardware overhead, GuardNN lets the host CPU to set
the feature ID for an address range (base and bound addresses).
B. DNN Execution using GuardNN Instructions
Figure 3 illustrates how DNN inference is performed using
GuardNN instructions. 1© The host uses GetPK to obtain the
PK and the certificate of the accelerator, and sends them to
the remote user. The user uses a PKI to verify that the PK
belongs to a trustworthy accelerator.
2© The user initiates a secure session with the accelerator
using InitSession. Our implementation uses the DHE key-
exchange protocol; the user sends the public part of a randomly-
4
(a) Traditional memory encryption and
integrity verification — Plaintext (U) is
encrypted with CTR which consists of
the address (PA) and a VN. The MACs
and VNs associated with the encrypted
data (V) are stored in DRAM. A Merkle
tree protects the integrity of the VNs.
(b) Proposed memory encryption and
integrity verification — VN is gener-
ated on-chip, and therefore eliminat-
ing the off-chip storage for VNs and
the Merkle tree. The MAC is calcu-
lated at the granularity that matches
the accelerator’s memory accesses.
Fig. 4: Memory encryption and integrity verification.
generated ephemeral key-pair (PKUE), and the accelerator also
generates an ephemeral key-pair and sends the public part
(PKXE) signed by its private key SKAccel. The user verifies that
the received ephemeral key belongs to the trusted accelerator
and computes a symmetric key, as does the accelerator. The
host CPU passes messages between the two.
3© The user sends a plaintext DNN definition1, and encrypted
weights. The host CPU parses the model, loads the encrypted
weights into DRAM, and calls SetWeight. The accelerator
decrypts the weights and protects them with the application-
specific memory encryption scheme. 4© Similarly, the user
sends an encrypted input that is imported by the accelerator
on the SetInput instruction. At this point, the accelerator is
ready to execute the DNN inference.
5© For each layer, the host CPU uses SetReadFID to set
IDF,R for the input features, and 6© issues Forward to start the
computation. The accelerator performs the computation while
keeping both input and output features encrypted in memory.
For training, the host CPU also uses the base instructions for
training (Backward and Aggregate) to update the weights.
7© After all the layers are finished, the host CPU uses
ExportOut to obtain an encrypted output, and sends the
encrypted output to the user. 8© If integrity protection is needed,
the host also sends the digital signature from the SignOutput
instruction. The user receives the output and digital signature,
decrypts the output, and verifies the hash values and the digital
signature to check the integrity of the output.
IV. APPLICATION-SPECIFIC MEMORY PROTECTION
This section presents the application-specific memory pro-
tection (ASMP) scheme in GuardNN, which provides secure
yet low-overhead memory protection leveraging regular and
mostly static memory access patterns of DNN accelerators.
A. Memory Protection Basics
Memory Encryption – As shown in Figure 4a, existing
techniques [28], [29], [14] typically use the counter-mode
encryption (AES-CM) to hide AES latency. AES-CM requires
1The network definition (e.g., prototxt file in Caffe[27]) can be viewed as
the DFG of a DNN model.
a non-repeating counter value for each encryption under the
same AES key. In a secure processor, the counter value often
consists of the physical memory address (PA) of the data
block that will be encrypted and a (per-block) version number
that is incremented on each memory write. When a data
block is written, the encryption engine increments the VN
and then encrypts the data. When a data block is read, the
encryption engine retrieves the VN used for encryption and
then decrypts the block. Let KEnc, U , V be the AES encryption
key, plaintext, and ciphertext, respectively. The AES encryption
can be formulated as V =U⊕AESKEnc(PA||VN), where || and
⊕ represent bit-field concatenation and XOR, respectively.
As general-purpose processors can have an arbitrary memory
access pattern, the VN for each cache block can be any value
at a given time. In order to determine the VN for a later read,
a secure processor needs to store the VNs in DRAM. To avoid
re-using the same counter value, the AES key needs to change
once the VN reaches its maximum, which implies that the
size of the VN needs to be large enough to avoid frequent
re-encryption. For example, Intel SGX [29] uses a 56-bit VN
for each 64-byte data block, which introduces 11% storage and
bandwidth overhead. In general, the VNs cannot fit on-chip
and are stored in DRAM. As the VNs are stored in the off-chip
memory, the integrity and freshness of VNs also need to be
protected with MACs to ensure the confidentiality.
Integrity Verification – To prevent off-chip data from being
altered by an attacker, integrity verification cryptographically
checks if the value from DRAM is the most recent value
written by the processor. For this purpose, a MAC of the
data value, the memory address, and the VN is computed
and stored for each data block on a write, and checked
on a read from DRAM. However, only checking the MAC
cannot guarantee the freshness of the data; a replay attack
can replace the data and the corresponding VN and MAC in
memory with stale values without being detected. To defeat
the replay attack, a Merkle tree (i.e., hash tree) [30] is used
to verify the MACs hierarchically in a way that the root of
the tree is stored on-chip. As shown in Figure 4a, a state-
of-the-art method [31] uses a Merkle tree to protect the
integrity of the VNs in memory, and includes a VN in a
MAC to ensure the freshness of data. Previous designs use
Carter-Wegman MAC [29] and AES-GCM [32] as the hash
function. Let us denote the key, plaintext, and ciphertext as
KIV,U,V , respectively. The MAC of an encrypted data block
can be calculated as MAC = HKIV(V ||PA||VN). The overhead
of integrity verification is nontrivial as it requires traversing
the tree stored in DRAM. To mitigate this overhead, recently
verified MACs are stored in a cache.
B. Application-Specific Version Number Generation
The main overhead of memory protection comes from storing
and accessing VNs and MACs for VNs in the off-chip mem-
ory. Because DNN accelerators are often memory-intensive,
these additional meta-data accesses can lead to non-trivial
performance overhead. We propose to significantly reduce the
memory protection overhead by generating VNs without storing
5
Addr 𝐂𝐓𝐑𝐅,𝐖𝐂𝐓𝐑𝐈𝐍0
Addr 𝐂𝐓𝐑𝐖1
Write 
Features
Addr 𝐈𝐃𝐅,𝐑
68 9
𝐂𝐓𝐑𝐈𝐍
62
0
Read
Features
Weights
64-bit Version Number (VN)64-bit Address (16-byte aligned)
127 0
0…0
0…0
0…0
64
Fig. 5: Counter construction for reading and writing features,
inputs, outputs, and weights.
them in memory and customizing the protection granularity
based on the application-specific memory access pattern.
As their memory accesses are customized for a particu-
lar application, specialized accelerators tend to have more
predictable and regular memory access patterns compared to
general-purpose CPUs. In particular, both DNN inference and
training can be scheduled statically based on the network
structure. For example, most popular ML frameworks such
as Caffe [27], TensorFlow [33], and MXNet [34] adopt
declarative programming, where the frameworks optimize the
static data-flow graph (DFG) before execution. Given a DFG,
the operations and the corresponding memory accesses in that
DFG are scheduled statically. The scheduler can assign a VN
for each memory access without storing the VNs in memory.
Moreover, DNNs have the same access pattern to a large
chunk of memory. For example, DNNs without dynamic
pruning write the output feature maps of a layer to DRAM the
same number of times. As VNs reflect the maximum number of
writes to the corresponding memory block, this regular memory
access pattern means that we only need one VN for all the
output features of a layer. For example, if DNN accelerators
only write the output features to DRAM once per layer, we
can simply use the layer number as part of the VN.
Based on the observations, we propose to generate VNs
from the accelerator state instead of storing them in memory.
When the DNN scheduler is trusted, the scheduler can assign
VNs for both memory reads and writes based on the DFG and
the schedule of DNN operations. In GuardNN, the accelerator
generates VNs for memory writes using on-chip counters and
ensures that the same VN value is never reused for encrypting
one memory block even with an untrusted scheduler. For
memory reads, GuardNN receives VNs from the scheduler
on the host CPU. The scheduler can easily determine the VNs
for reads as it owns the DFG and controls the scheduling of
DNN. Note that VNs for reads do not affect confidentiality
because they are only used for decryption. Once the VN is
determined, the encryption (Enc) engine can decrypt/encrypt
each 128-bit data block using the same equations in AES-
CM. As the VNs no longer need to be stored in DRAM, the
integrity protection for VNs (e.g., Merkle tree) also becomes
unnecessary.
For integrity protection, MACs still need to be stored in
memory. We propose to reduce the overhead of integrity
protection by customizing the size of a memory block that
each MAC protects to match the data movement granularity of
the accelerator. For example, the DNN accelerator that we use
for a prototype reads a 512-B chunk from memory at a time.
𝒇𝟏𝑣𝑎
𝒇𝟐𝑣𝑏
𝒇𝟑𝑣𝑐
𝒘𝒃𝒘𝒂 𝒘𝒄
…
(a) Inference.
𝒈𝟏𝑣𝑎
𝒈𝟐𝑣𝑏
𝒈𝟑𝑣𝑐
𝒘𝒂, 𝒇𝟏 𝒘𝒃, 𝒇𝟐 𝒘𝒄, 𝒇𝟑
𝒘𝒂
∗ 𝒘𝒃
∗ 𝒘𝒄
∗
…
(b) Training.
Fig. 6: DNN data-flow subgraphs.
C. Version Number Generation for DNNs
This subsection describes how application-specific memory
protection is applied to DNN inference and training. Figure 5
shows the counter values used for memory protection in
GuardNN. Each counter value includes the address of the
128-bit memory block being encrypted/decrypted and a 64-bit
VN, and is used as the input to AES-CM encryption. Note that
using one VN for multiple memory locations does not sacrifice
security because the counter value already includes a memory
address. The VNs are constructed differently for the three types
of memory accesses: feature reads, feature writes, and weight
accesses to ensure that the counter values are unique for each
encryption and never re-used.
DNN Inference – The inference of a specific DNN model
can be represented as a DFG, where a vertex represents a
layer and edges represent input/output features and weights
for the layer. Figure 6a shows a representative subgraph for
feedforward networks such as AlexNet [35] and VGG [36].
For writing new features to memory, we introduce CTRIN
and CTRF,W in the accelerator state to keep track of the
number of inputs and the number of times that feature maps
are written for the current input. CTRIN is incremented for
each new input (SetInput instruction). CTRF,W is reset on
a new input and incremented after each DNN computation
instruction (Forward) that writes output features. For example,
in Figure 6a, the subscript of each edge indicates its CTRF,W
value. If the feature maps are written once at the end of each
layer, CTRF,W corresponds to the layer number. VNs for feature
writes include both CTRIN and CTRF,W.
For reading the input features written by the previous layer,
GuardNN uses IDF,R from the scheduler on the host CPU
to form the VN, and thus avoids tracking the status of the
inference and supports more complicated scheduling. IDF,R
corresponds to the value of CTRF,W used to write the features.
As the IDF,R is only used in decryption, the confidentiality is
not compromised even if the IDF,R value is incorrect.
The weights are read-only during inference. Therefore, we
can use a constant as the VN for the weights until they are
updated. To allow updating weights, GuardNN adds CTRW in
the accelerator state and keeps track of the number of updates
to the weights (SetWeight instruction).
Algorithm 1 shows the VN generation for the key instructions
used in DNN inference. Note that CTRIN, CTRW, and CTRF,W
are all kept in on-chip registers, and there is no VN stored
in external memory. The on-chip counters can be made large
enough to avoid overflows. For the 53-bit CTRIN in our design,
an accelerator with a throughput of 1000 inputs per second
can run for 0.28 million years before an overflow.
6
Algorithm 1: The pseudo-code for GuardNN instructions —
data.Addr returns the address of each 16-byte data block. ASMP.STK(data,
Addr, VN) encrypts data with the key K and VN and stores data at Addr.
ASMP.LDK(Addr, VN) reads data from Addr and decrypts it with key K
and VN. STK(data, Addr) and LDK(Addr) encrypts and decrypts data with
the key K using the encryption scheme used in the secure communication
channel.
Input : input data in, input features xl , and weights wl of layer l
Output : output feature xl+1 of layer l, output prediction out
1 SetInput {/* Re-encrypt the input using ASMP */
2 in = LDKSession (in.Addr);
3 CTRIN++, CTRF,W = 0;
4 ASMP.STKMEnc (in, x0.Addr, {0 || CTRIN || CTRF,W});
5 CTRF,W++;}
6 Forward {/* Forward propagation of a layer */
7 wl = ASMP.LDKMEnc (wl .Addr, {1 || CTRW});
8 xl = ASMP.LDKMEnc (xl .Addr, {0 || CTRIN || IDF,R});
9 xl+1 = ReLU(xl ∗wl );
10 ASMP.STKMEnc (xl+1, xl+1.Addr, {0 || CTRIN || CTRF,W});
11 CTRF,W++;}
12 ExportOutput {/* Re-encrypt the output */
13 out = ASMP.LDKMEnc (xL.Addr, {0 || CTRIN || IDF,R});
14 STKSession (out, out.Addr); }
DNN Training – One iteration of training consists of a
forward propagation and a backpropagation. The forward
propagation is the same as inference except that all features are
saved, and can use the VN generation strategy for inference.
Here, we describe the VN generation for the backpropagation.
Figure 6b shows the DFG of the backpropagation. Each vertex
first computes the gradients flowing to the previous vertex
using the gradients flowing to current vertex and the associated
weights (e.g., g1 = g2 ∗wb). Then, the layer’s weights are
updated using the incoming gradients and the saved features
(e.g., wb+=−η ·g2 ∗ f2).
The VNs are constructed the same way as shown in Figure 5.
The backpropagation only adds additional reads to the features
and does not affect the VN generation for features. The
VNs for weights still use CTRW as all weights are updated
the same number of times. However, CTRIN and CTRW are
incremented more frequently; CTRIN is incremented on each
training iteration even if there is no new input. CTRW tracks
the number of updates to the weights. Each gradient edge in
the DFG has a corresponding feature edge. As the gradients
and the features are stored in different memory locations, the
gradients can use the VN for the corresponding features.
V. SECURITY ANALYSIS
As our threat model assumes that the internal operations and
state of an accelerator cannot be directly observed or changed,
an adversary can only attack external interfaces or components:
1) communication with a user, 2) off-chip memory, 3) host
interface, and 4) side channels.
Communication channel – GuardNN provides mechanisms
to establish a secure communication channel with a remote
user that protects confidentiality, integrity, and freshness. More
specifically, the current implementation supports the standard
DHE key-exchange protocol, and a true random number
generator for secure key generation. The user can also use a PKI
to ensure that they are interacting with a trustworthy accelerator.
As secure network communication is a well-established area
with standard solutions such as SSL, secure accelerators can
adopt an existing solution for this attack surface.
Off-chip memory – GuardNN includes memory encryption
and integrity verification to protect confidentiality and integrity
of data stored in external memory. The memory protection
unit is enabled on the InitSession instruction before
any sensitive data are placed in memory, and all off-chip
memory accesses are protected with no exception. The memory
encryption key also is changed (newly generated using the
TRNG) for every session. The proposed memory protection
scheme uses the standard AES counter-mode encryption and
the standard MAC construction used in today’s secure processor
designs, with the only difference that that the version numbers
(VNs) are not stored in off-chip memory. Therefore, if the
VN is unique for each write to a given memory location,
the proposed memory protection scheme is equivalent to the
standard AES counter-mode encryption and MAC. As discussed
in Section IV-B, GuardNN ensures that the VN is different for
each write by increasing the counters inside the accelerator
after each write to features/weights.
The accelerator’s host interface – The host CPU can
arbitrarily change the instructions and input operands to the
accelerator. However, GuardNN’s instruction set does not
include any instruction that outputs confidential information
in plaintext. In fact, memory accesses and outputs from the
GuardNN accelerators are always encrypted. Although the on-
chip counter values (CTRIN, CTRW, and CTRF,W) for VNs
can be affected by some GuardNN instructions, these counters
can only be incremented. If one of the counters overflows, the
accelerator stalls and a user needs to initiate a new session.
Hence, no instruction can be used to make the encryption
engine reuse the same counter value.
For integrity, the accelerator computes the hashes of weights,
inputs, outputs, and a sequence of instructions and supports
remote attestation. While a malicious host can perform a
DoS attack and also make the GuardNN accelerator generate
incorrect results, such changes in the initial state or the
instruction sequence can be detected by the remote user.
Side channels – DNN accelerators without dynamic pruning
have memory access patterns and timing that are independent
of private values such as inputs, outputs, and weights. In that
sense, they are secure aginst memory and timing side channels.
GuardNN does not protect against physical side channels.
VI. EXPERIMENTAL RESULTS
A. Methodology
DNN Accelerator – We use CHaiDNN [5], an open-
source accelerator from Xilinx for our experimental evaluation.
CHaiDNN consists of an array of processing elements to
perform multiply–accumulate operations, and is built for 8-
bit quantized operands. To hide the memory access latency,
CHaiDNN exploits double buffering as shown in Figure 7.
At a high level, the accelerator overlaps loading weights for
the next block with the computation for the current block.
7
Fig. 7: Timing diagram of a conv layer in CHaiDNN, showing
double buffering for features and weights.
RTL simulator
Total latency,
Bandwidth
overhead
Memory 
protection 
simulator
HLS-generated 
RTL
DRAMSim2
req.
lat.
Caffe
model
events
Perf.
evaluationt, lat.
Fig. 8: The block diagram of the cycle-level simulator for the
secure DNN accelerator.
CHaiDNN is synthesized using Vivado HLS 2019.1, targeting
Zynq UltraScale+ MPSoC (XCZU9EG).
Benchmarks – We evaluate GuardNN on a variety of DNN
architectures — LeNet, AlexNet, VGG, GoogleNet, and ResNet.
For inference, we run DNN networks using CHaiDNN. For
training, we approximate the backpropagation based on the
inference in CHaiDNN because CHaiDNN does not support
DNN training and we could not find an open-source training
accelerator. More specifically, computing the gradients in the
backpropagation (gi−1 = wi ∗ gi) is approximated with the
inference. The weight update is not emulated as no similar
operations is available in CHaiDNN.
FPGA Prototype – We implemented a prototype of the
GuardNN accelerator on the Xilinx ZCU102 by adding the VN
generator and encryption engines (AES-128) to the CHaiDNN
accelerator. The prototype uses a CHaiDNN configuration with
512 DSP blocks and 8-bit weights/activations. The AES engines
are fully-pipelined with a 12-cycle latency. The FPGA prototype
is used to validate our instruction set design and is functional
for memory encryption protection. We also obtained estimates
of the area and performance of the microcontroller used to
implement the GuardNN extension.
Cycle-level Simulation – We use cycle-level simulations
to (1) compare the overhead of multiple memory protection
schemes, (2) study the overhead of integrity protection, and (3)
evaluate the overhead for DNN training. As shown in Figure 8,
the simulator has three main components — accelerator,
memory protection, and off-chip memory. The CHaiDNN
accelerator is simulated in a cycle-accurate RTL simulator
to generate a trace of computation and memory events. Then,
a memory protection simulator (in Python) uses the event trace
to calculate the total execution time and the bandwidth usage
by simulating protection mechanisms and DRAM accesses.
The memory accesses are simulated using DRAMSim2 [37].
TABLE IV: Throughput (fps) for CHaiDNN and GuardNN.
Method
Network Architecture
AlexNet GoogleNet ResNet VGG
CHaiDNN 164.07 65.03 24.13 9.04
GuardNN 163.62 64.69 23.68 8.98
Overhead 0.27% 0.53% 1.83% 0.60%
TABLE V: Baseline execution time (ms).
# of DDR
Channels
Network Architecture
LeNet AlexNet GoogleNet ResNet
Inf.
1 0.208 33.723 95.184 227.151
2 0.159 16.542 51.621 131.883
4 0.147 14.161 46.103 118.128
Train. 1 0.820 135.964 379.418 907.651
We simulate 64-bit DDR3 channels at 666 MHz, where each
channel contains one rank of eight banks. The total capacity of
the simulated DRAM is 8GB. Most experiments use one DDR
channel except for the one that studies the impact of increasing
the number of DDR channels. The DRAM parameters are
verified against Verilog timing models from Micron.
For the baseline memory encryption, we implement the
recent memory encryption engine (MEE) design from Intel [29]
as the state-of-the-art. This baseline uses a multi-level 8-ary
Merkle tree with 56-bit VNs and MACs, and works at a
64-byte granularity. Similarly, for integrity verification, we
implemented the baseline that uses one MAC for each 64-byte
block. Because the DNN accelerator has a largely streaming
memory access pattern, increasing the VN/MAC cache does
not help unless it is big enough to capture temporal locality
across layers. Therefore, we includes a 4-KB on-chip cache,
which achieves the near-optimal performance, for VNs and
MACs in the baseline scheme. The VN/MAC cache uses the
LRU replacement policy with write-back and write-allocate
policies. To study the ideal case for the baseline protection,
we simulate a fully-associative cache for VNs and MACs.
GuardNN has no on-chip caches for VNs and MACs,
and protects the integrity using a MAC per 512-byte block,
matching CHaiDNN’s granularity for writing output features.
B. FPGA Prototype Results
Table IV shows the throughput for various DNNs on our
FPGA prototype. The performance overhead is less than 2%
on all networks for ImageNet dataset. The relatively higher
overhead for ResNet is not due to our scheme, but a result
of inefficient scheduling by HLS for memory accesses in the
element-wise addition layer. The results show that we can get
near-zero memory encryption overhead in real systems.
In our FPGA prototype, we use an open-source AES-128
IP core [38] that uses 9.0K LUTs and 3.0K FFs. The area
overhead of one AES core is 8.2% and 2.6% in LUTs and FFs,
respectively. Because the FPGA clock (200 MHz) is much
slower than the memory bus clock, three AES engines are
needed to match the memory bandwidth used by CHaiDNN.
8
LeNet AlexNet GoogleNet ResNet geomean
1.0
1.1
1.2
1.3
M
em
or
y 
Tr
af
fic
 In
cr
ea
se
NP ASMPEncIV BPEnc BPEncIV
(a) Inference.
LeNet AlexNet GoogleNet ResNet geomean
1.0
1.2
1.4
M
em
or
y 
Tr
af
fic
 In
cr
ea
se
NP ASMPEncIV BPEnc BPEncIV
(b) Training.
Fig. 9: The mem. traffic increase of inference and training.
We also implemented the microcontroller (see Figure 2) as a
Xilinx MicroBlaze [39], for which the controller program can
fit within 256KB local memory. The soft processor’s resource
usage (overhead) was 3.7K LUTs (3.4%), 2.4K FFs (2.1%),
64 BRAMs (11.0%) & 6 DSPs (0.9%).
Here, we present the latency of various GuardNN instructions
using the compute and memory intensive VGG network as an
example. GuardNN needs to perform a key exchange and load
weights once per session. On the MicroBlaze, the GetPK and
InitSession (specifically, ECDHE–ECDSA and SHA-256
for deriving KSession) take 25ms. The key-exchange latency
is independent of a network. Importing (decrypting and re-
encrypting) weights on SetWeight takes 43ms for the 138M
parameters using AES engines. The CHaiDNN latency of
inference for a single input is 111ms for VGG. For each input,
GuardNN adds overhead to import an input, and export/sign
an output. SetInput for a single input image only takes
0.05 ms. For the 1000-class output, the ExportOutput and
SignOutput take 7 ms. This recurring per-output latency
can be hidden by pipelining it with the inference of the next
input. Thus the GuardNN extension instructions incur negligible
overhead when processing multiple inputs.
C. Simulation Results
We compare the accelerator performance with three different
protection schemes: no protection (NP), today’s baseline
memory protection (BP), and the application-specific memory
protection (ASMP). For BP and ASMP, we also study the
encryption-only protection (Enc), and protection involving
both encryption and integrity verification (EncIV): BPEnc and
BPEncIV for BP and ASMPEnc and ASMPEncIV for ASMP.
Memory Traffic Increase – As the throughput of a DNN
accelerator is often limited by the memory bandwidth, we
first compare the memory traffic increase. The memory traffic
increase is defined as the ratio between the total number of
memory accesses with and without memory protection.
ASMPEnc has no impact on the memory traffic because
it does not require any meta-data (i.e., VNs) to be stored
in off-chip memory. Figure 9 compares the memory traffic
LeNet AlexNet GoogleNet ResNet geomean
1.0
1.2
N
or
m
al
iz
ed
 E
xe
c.
 T
im
e
NP
ASMPEnc
ASMPEncIV
BPEnc
BPEncIV
(a) Inference.
LeNet AlexNet GoogleNet ResNet geomean
1.0
1.2
N
or
m
al
iz
ed
 E
xe
c.
 T
im
e
NP
ASMPEnc
ASMPEncIV
BPEnc
BPEncIV
(b) Training.
Fig. 10: The normalized execution time of the DNN inference
and training on different networks models.
LeNet AlexNet GoogleNet ResNet
1.0
1.1
1.2
1.3
1.4
1.5
N
or
m
al
iz
ed
 E
xe
cu
tio
n 
Ti
m
e
1.
00
1.
00
1.
00
1.
00
1.
00
1.
00
1.
00
1.
00
1.
00
1.
00
1.
00
1.
001.
02
1.
02
1.
02
1.
02
1.
01
1.
01
1.
01
1.
01
1.
01
1.
00
1.
00
1.
00
1.
09 1
.1
5
1.
15 1.
16
1.
04
1.
10
1.
08
1.
09
1.
02 1.
05
1.
04
1.
03
1.
25
1.
44
1.
32
1.
24
1.
05
1.
20
1.
17
1.
16
1.
01 1.
04
1.
03
1.
04
ASMPEnc DDR1
ASMPEnc DDR2
ASMPEnc DDR4
ASMPEncIV DDR1
ASMPEncIV DDR2
ASMPEncIV DDR4
BPEnc DDR1
BPEnc DDR2
BPEnc DDR4
BPEncIV DDR1
BPEncIV DDR2
BPEncIV DDR4
Fig. 11: The normalized execution time of DNN inference
with 1, 2, and 4 DDR channels.
increase of ASMPEncIV, BPEnc, and BPEncIV. BPEnc introduces
15.8% and 17.6% more memory accesses on average for
inference and training, respectively. BPEncIV issues 29.0%
and 33.9% more memory requests on average for inference
and training, respectively. The memory traffic increase is larger
for training because the training process accesses more data
and has more frequent cache evictions in the VN/MAC cache.
ASMPEncIV increases the memory traffic by 0.8% and 0.2% for
inference and training, respectively. The results demonstrate the
advantage of the ASMP, which uses a MAC per 512-byte data
block to match the accelerator’s data movement granularity.
Performance Overhead – Table V shows the simulated ex-
ecution time of CHaiDNN with no protection. Figure 10 shows
the performance of the baseline protection and the application-
specific protection. The execution time is normalized to the
one with no protection (NP). BPEnc is 1.14× slower than no
protection. BPEncIV is 1.29× and 1.30× slower for inference
and training, respectively. ASMPEnc and ASMPEncIV achieve
near-zero performance overhead; the overhead is less than 2%
for inference and 1% for training on all networks.
DRAM Channels – As TPU-v1 [1] has 64 times more
multipliers than CHaiDNN, if we scale the memory bandwidth
of TPU-v1 by 64 times, the memory bandwidth of CHaiDNN
should be 4.25 Gbps. Therefore, the experiments so far used
one DDR channel. Figure 11 shows the normalized execution
9
time with 1, 2, and 4 DDR channels. The performance of
GuardNN with the application-specific memory protection
(ASMPEnc and ASMPEncIV) is almost the same as the one with
no protection for all configurations. For the baseline protection
scheme (BP), the performance overhead is noticeably higher
with only one DDR channel as the accelerator becomes more
memory-bound. The overhead is reduced with more DDR
channels as more DNN layers become compute-bound.
D. ASIC Area Overhead Estimate
The application-specific memory protection scheme does not
require an on-chip cache for version numbers and MACs, and
only use a small number of registers for keys and counters.
In that sense, the area overhead mainly comes from the AES
engines, used for both encryption and integrity verification.
The area and power overhead will be low for an ASIC design.
TPU-v1 runs at 700 MHz in 28nm and has a peak memory
bandwidth of 272 Gbps. The area and power consumption
of TPU-v1 is 331 mm2 and 75 W, respectively. An ASIC
low-power AES engine [40] achieves a 991 Mbps throughput
at 875 MHz in 28 nm. The area and power consumption of
the AES engine is 0.0031 mm2 and 3.85 mW, respectively.
Assuming the AES engine runs at 700 MHz with the same
power consumption, 344 AES engines are required to match
the memory bandwidth of TPU, which leads to 0.3% area and
1.8% power overhead. We can also use a smaller number of
high-performance AES engines with similar overall overhead.
VII. DISCUSSION
Static and Dynamic Pruning – Static pruning still results
in a static network model that can be executed by GuardNN.
At a glance, it may appear that application-specific memory
protection does not work with dynamic pruning, which skips
memory accesses for some features and weights at run time.
However, skipping VNs does not affect the security of memory
protection as long as the VNs are not reused. The decryption
and integrity verification will also be functional as long
as a write and the corresponding reads use the same VN.
We implemented multiple dynamic pruning schemes such as
Compressed Sparse Row [41], Compressed Sparse Column [42],
[43], and Run-Length Compression [6], [44] in PyTorch and
emulated the proposed memory protection scheme in software.
The study shows that the ASMP is still applicable to DNNs with
dynamic pruning; pruning removes both writes and following
reads to the pruned features and weights, and the VNs from the
proposed scheme can still be used for unpruned features and
weights. In that sense, we believe that the GuardNN architecture
can be extended to support dynamic pruning. However, dynamic
pruning will introduce new side channels through memory
accesses and timing, and may require additional protection
against side-channel attacks.
Application to Other Accelerators – While this paper
focuses on building a secure DNN accelerator, we believe that
the key insights from this study are also applicable to other
application-specific accelerators. For example, GuardNN shows
how the size of the TCB can be reduced and confidentiality-
only protection can be provided without the overhead of
integrity protection by leveraging the application-specific
instruction set of an accelerator. GuardNN also shows that
both security and performance can be improved by customizing
protection based on the application-specific characteristics,
such as memory access patterns, of an accelerator. As case
studies, we studied applying the application-specific memory
protection scheme to an H.264 video decoder [45], [46],
the Darwin genome assembly accelerator [47], and vertex-
centric graph accelerators [48], [49], [50], [51] based on open-
source RTL implementations. We found that the proposed
VN generation approach can be applied to all three types of
accelerators to minimize performance overhead of memory
protection.
VIII. RELATED WORK
Privacy-Preserving Deep Learning – GuardNN provides
hardware-based protection for DNN inference and training
in an untrusted environment. Alternatively, fully homomor-
phic encryption (FHE) can provide stronger protection by
performing all computations in an encrypted format. While FHE
algorithms provide strong cryptographic guarantees without
trusting any remote hardware or software, they come with
significant overhead [52], [53]. Recent studies [54], [55] show
the overhead for DNN inference by optimizing linear operations.
However, cryptographic solutions are still multiple orders
of magnitude slower than the baseline with no protection.
GuardNN provides a design point that provides hardware-based
security with much lower performance overhead.
TEEs [4], [12], [13], [14], [15], [16], [17], [18], [19],
[20], [21], [22], [23], [24], [25] provide hardware-protected
execution environments where confidentiality and integrity are
ensured even under an untrusted OS or physical attacks. Recent
proposals use the secure enclave directly [56], [57] or extend
the enclave with a secure GPU accelerator [58] to enable
remote DNN computations with strong privacy and integrity
guarantees. GuardNN extends the high-level TEE approach to
DNN accelerators and shows that secure accelerators have a
potential to provide both higher performance and higher security
compared to the general-purpose platforms by customizing its
architecture and protection for a specific application.
Memory Encryption and Integrity Verification – Recent
designs for memory encryption [59], [60] use the counter-mode
with smaller VNs to optimize memory encryption. For integrity
verification, recent efforts [31], [61], [62], [63] propose counter-
based integrity-tree design to reduce the performance overhead.
Morphable counters [64] further reduce the overhead by
compressing the counters. Another line of research attempts to
optimize the integrity tree traversal. Prior works propose to store
the VNs in the last-level cache to exploit the locality [30], [65]
and reduce the latency of integrity verification by predicting
VNs or using unverified VNs speculatively [66], [67], [68].
The application-specific memory protection in GuardNN is
built on the previous memory protection schemes but cus-
tomized for DNN accelerators. The application-specific memory
10
protection leverages the characteristics of DNN accelerators to
remove off-chip VNs, and significantly reduces the performance
overhead compared to the state-of-the-art.
Side-channel Attacks and Protection – A variety of side-
channel attacks have been shown to work against DNN
accelerators. Memory and timing side-channels have been used
to infer the underlying network structure of an accelerator with
encrypted weights [69], [70]. GuardNN has a fixed memory
access pattern and execution time, and is is secure against
memory and timing side-channels. GuardNN needs additional
countermeasures when dynamic pruning is used, or when other
physical side-channels are considered. ORAM [71], [72], [73]
offers a strong security guarantee for memory accesses with
high overhead. Physical side-channel attacks on DNNs have
been also recently exploited. A power side-channel attack has
been used to retrieve the input image from a DNN accelerator
[74]. Electromagnetic side-channel emanations have been used
to recover the entire network topology including weights, albeit
on a microcontroller-based inference engine [75].
IX. CONCLUSION
In this paper, we propose a secure DNN accelerator,
named GuardNN, with a particular focus on enabling privacy-
preserving ML. We discuss the architecture, interface, and
implementation of GuardNN in detail. Our FPGA prototype
shows that the GuardNN accelerator only adds less than 2%
performance overhead on multiple DNN models.
11
REFERENCES
[1] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav
Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden,
Al Borchers, et al. In-Datacenter Performance Analysis of a Tensor
Processing Unit. Int’l Symp. on Computer Architecture (ISCA), 2017.
[2] Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael,
Adrian Caulfield, Todd Massengill, Ming Liu, et al. Serving DNNs in
Real Time at Datacenter Scale with Project Brainwave . IEEE Micro,
38(2):8–20, 2018.
[3] Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai,
Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, and Yida Wang.
Optimizing memory-access patterns for deep learning accelerators, 2020.
[4] Frank McKeen, Ilya Alexandrovich, Ittai Anati, Dror Caspi, Simon
Johnson, Rebekah Leslie-Hurd, and Carlos Rozas. Intel Software Guard
Extensions (Intel SGX) Support for Dynamic Memory Management
Inside an Enclave. Hardware and Architectural Support for Security and
Privacy (HASP), 2016.
[5] Xilinx. CHaiDNN-v2: HLS based Deep Neural Network Accelerator
Library for Xilinx Ultrascale+ MPSoCs. https://github.com/Xilinx/
CHaiDNN, Jun 2018.
[6] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli,
Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W.
Keckler, and William J. Dally. SCNN: An Accelerator for Compressed-
sparse Convolutional Neural Networks. In ISCA, 2017.
[7] Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Na-
talie Enright Jerger, and Andreas Moshovos. Cnvlutin: Ineffectual-neuron-
free Deep Neural Network Computing. In ISCA, 2016.
[8] B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M.
Hernndez-Lobato, G. Y. Wei, and D. Brooks. Minerva: Enabling Low-
Power, Highly-Accurate Deep Neural Network Accelerators. In ISCA,
2016.
[9] V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, R. K. Gupta, and H. Es-
maeilzadeh. SnaPEA: Predictive Early Activation for Reducing Com-
putation in Deep Convolutional Neural Networks. In Int’l Symp. on
Computer Architecture (ISCA), Jun 2018.
[10] Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, and G. Edward
Suh. Boosting the Performance of CNN Accelerators with Dynamic
Fine-Grained Channel Gating. In Proceedings of the 52Nd Annual
IEEE/ACM International Symposium on Microarchitecture, MICRO ’52,
pages 139–150, New York, NY, USA, 2019. ACM.
[11] Weizhe Hua, Yuan Zhou, Christopher M De Sa, Zhiru Zhang, and
G. Edward Suh. Channel Gating Neural Networks. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d'Alche´-Buc, E. Fox, and R. Garnett,
editors, Advances in Neural Information Processing Systems 32, pages
1884–1894. Curran Associates, Inc., 2019.
[12] Thaynara Alves and D. Felton. Trustzone: Integrated hardware and
software security. 01 2004.
[13] David Lie Chandramohan Thekkath, Mark Mitchell, Patrick Lincoln,
Dan Boneh, John Mitchell, and Mark Horowitz. Architectural support
for copy and tamper resistant software. In Proceedings of the Ninth
International Conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS IX, pages 168–177, New
York, NY, USA, 2000. ACM.
[14] G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, and
Srinivas Devadas. AEGIS: Architecture for Tamper-evident and Tamper-
resistant Processing. In Proceedings of the 17th Annual International
Conference on Supercomputing, ICS ’03, pages 160–171, New York, NY,
USA, 2003. ACM.
[15] Jun Yang, Youtao Zhang, and Lan Gao. Fast secure processor for
inhibiting software piracy and tampering. In Proceedings of the
36th Annual IEEE/ACM International Symposium on Microarchitecture,
MICRO 36, page 351, USA, 2003. IEEE Computer Society.
[16] R. B. Lee, P. C. S. Kwan, J. P. McGregor, J. Dwoskin, and Zhenghong
Wang. Architecture for protecting critical secrets in microprocessors.
In 32nd International Symposium on Computer Architecture (ISCA’05),
pages 2–13, 2005.
[17] Christopher W. Fletcher, Marten van Dijk, and Srinivas Devadas. A
secure processor architecture for encrypted computation on untrusted
programs. In Proceedings of the Seventh ACM Workshop on Scalable
Trusted Computing, STC ’12, pages 3–8, New York, NY, USA, 2012.
ACM.
[18] Siddhartha Chhabra, Brian Rogers, Yan Solihin, and Milos Prvulovic.
Secureme: A hardware-software approach to full system security. In
Proceedings of the International Conference on Supercomputing, ICS ’11,
page 108119, New York, NY, USA, 2011. Association for Computing
Machinery.
[19] Jakub Szefer and Ruby B. Lee. Architectural support for hypervisor-
secure virtualization. In Proceedings of the Seventeenth International
Conference on Architectural Support for Programming Languages and
Operating Systems, ASPLOS XVII, page 437450, New York, NY, USA,
2012. Association for Computing Machinery.
[20] R. N. M. Watson, J. Woodruff, P. G. Neumann, S. W. Moore, J. Anderson,
D. Chisnall, N. Dave, B. Davis, K. Gudka, B. Laurie, S. J. Murdoch,
R. Norton, M. Roe, S. Son, and M. Vadera. Cheri: A hybrid capability-
system architecture for scalable software compartmentalization. In 2015
IEEE Symposium on Security and Privacy, pages 20–37, 2015.
[21] D. Evtyushkin, J. Elwell, M. Ozsoy, D. Ponomarev, N. A. Ghazaleh, and
R. Riley. Iso-x: A flexible architecture for hardware-managed isolated
execution. In 2014 47th Annual IEEE/ACM International Symposium on
Microarchitecture, pages 190–202, 2014.
[22] D. Champagne and R. B. Lee. Scalable architectural support for trusted
software. In HPCA - 16 2010 The Sixteenth International Symposium
on High-Performance Computer Architecture, pages 1–12, 2010.
[23] Victor Costan, Ilia Lebedev, and Srinivas Devadas. Sanctum: Minimal
hardware extensions for strong software isolation. In 25th USENIX
Security Symposium (USENIX Security 16), pages 857–874, Austin, TX,
Aug 2016. USENIX Association.
[24] Thomas Bourgeat, Ilia Lebedev, Andrew Wright, Sizhuo Zhang, Arvind,
and Srinivas Devadas. Mi6: Secure enclaves in a speculative out-of-order
processor. In Proceedings of the 52nd Annual IEEE/ACM International
Symposium on Microarchitecture, MICRO ’52, page 4256, New York,
NY, USA, 2019. Association for Computing Machinery.
[25] Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanovic, and
Dawn Song. Keystone: An open framework for architecting trusted
execution environments. In Proceedings of the Fifteenth European
Conference on Computer Systems, EuroSys20, 2020.
[26] U. Gupta, C. Wu, X. Wang, M. Naumov, B. Reagen, D. Brooks,
B. Cottel, K. Hazelwood, M. Hempstead, B. Jia, H. S. Lee, A. Malevich,
D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang. The architectural
implications of facebook’s dnn-based personalized recommendation. In
2020 IEEE International Symposium on High Performance Computer
Architecture (HPCA), pages 488–501, 2020.
[27] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan
Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe:
Convolutional architecture for fast feature embedding. In Proceedings
of the 22nd ACM International Conference on Multimedia, MM 14,
page 675678, New York, NY, USA, 2014. Association for Computing
Machinery.
[28] Michael Henson and Stephen Taylor. Memory encryption: A survey of
existing techniques. ACM Comput. Surv., 46(4):53:1–53:26, Mar 2014.
[29] S. Gueron. Memory encryption for general-purpose processors. IEEE
Security Privacy, 14(6):54–62, Nov 2016.
[30] B. Gassend, G. E. Suh, D. Clarke, M. van Dijk, and S. Devadas. Caches
and hash trees for efficient memory integrity verification. In The Ninth
International Symposium on High-Performance Computer Architecture,
2003. HPCA-9 2003. Proceedings., pages 295–306, Feb 2003.
[31] Brian Rogers, Siddhartha Chhabra, Milos Prvulovic, and Yan Solihin.
Using Address Independent Seed Encryption and Bonsai Merkle Trees to
Make Secure Processors OS- and Performance-Friendly. In Proceedings
of the 40th Annual IEEE/ACM International Symposium on Microarchi-
tecture, MICRO 40, pages 183–196, Washington, DC, USA, 2007. IEEE
Computer Society.
[32] Morris J. Dworkin. Sp 800-38d. recommendation for block cipher modes
of operation: Galois/counter mode (gcm) and gmac. Technical report,
Gaithersburg, MD, USA, 2007.
[33] Martı´n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy
Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey
Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga,
Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay
Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.
Tensorflow: A system for large-scale machine learning. In Proceedings
of the 12th USENIX Conference on Operating Systems Design and
Implementation, OSDI’16, pages 265–283, Berkeley, CA, USA, 2016.
USENIX Association.
[34] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang,
Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet:
12
A Flexible and Efficient Machine Learning Library for Heterogeneous
Distributed Systems. CoRR, abs/1512.01274, 2015.
[35] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet
Classification with Deep Convolutional Neural Networks. In Proceedings
of the 25th International Conference on Neural Information Processing
Systems - Volume 1, NIPS’12, pages 1097–1105, USA, 2012. Curran
Associates Inc.
[36] Karen Simonyan and Anderw Zisserman. Very Deep Convolu-
tional Networks for Large-Scale Image Recognition. arXiv e-print,
arXiv:1409.15568, Apr 2015.
[37] P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle
Accurate Memory System Simulator. IEEE Computer Architecture Letters,
2011.
[38] Michael Muehlberghuber. AES-128. https://github.com/mbgh/aes128-hdl.
[39] Xilinx. MicroBlaze Soft Processor Core. https://www.xilinx.com/
microblaze.
[40] W. Shan, A. Fan, J. Xu, J. Yang, and M. Seok. A 923 gbps/w, 113-
cycle, 2-sbox energy-efficient aes accelerator in 28nm cmos. In 2019
Symposium on VLSI Circuits, pages C236–C237, 2019.
[41] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and
A. Moshovos. Cnvlutin: Ineffectual-neuron-free deep neural network
computing. In 2016 ACM/IEEE 43rd Annual International Symposium
on Computer Architecture (ISCA), pages 1–13, 2016.
[42] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J.
Dally. EIE: Efficient Inference Engine on Compressed Deep Neural
Network. In Int’l Symp. on Computer Architecture (ISCA), Jun 2016.
[43] Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. Efficient
processing of deep neural networks: A tutorial and survey. CoRR,
abs/1703.09039, 2017.
[44] Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An energy-efficient
reconfigurable accelerator for deep convolutional neural networks. IEEE
Journal of Solid-State Circuits, 52(1):127–138, 2017.
[45] Information technology Coding of audio-visual objects Part 10:
Advanced Video Coding. Standard ISO/IEC 14496-10:2014, International
Organization for Standardization, Geneva, CH, 2014.
[46] Xinheng Liu, Yao Chen, Tan Nguyen, Swathi Gurumani, Kyle Rupnow,
and Deming Chen. High level synthesis of complex applications:
An h.264 video decoder. In Proceedings of the 2016 ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, FPGA
16, page 224233, New York, NY, USA, 2016. Association for Computing
Machinery.
[47] Yatish Turakhia, Gill Bejerano, and William J. Dally. Darwin: A
genomics co-processor provides up to 15,000x acceleration on long
read assembly. In Proceedings of the Twenty-Third International
Conference on Architectural Support for Programming Languages and
Operating Systems, ASPLOS 18, page 199213, New York, NY, USA,
2018. Association for Computing Machinery.
[48] Tayo Oguntebi and Kunle Olukotun. Graphops: A dataflow library for
graph analytics acceleration. In Proceedings of the 2016 ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, FPGA
16, page 111117, New York, NY, USA, 2016. Association for Computing
Machinery.
[49] Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and
Huazhong Yang. Foregraph: Exploring large-scale graph processing
on multi-fpga architecture. In Proceedings of the 2017 ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, FPGA
17, page 217226, New York, NY, USA, 2017. Association for Computing
Machinery.
[50] E. Nurvitadhi, G. Weisz, Y. Wang, S. Hurkat, M. Nguyen, J. C. Hoe,
J. F. Martnez, and C. Guestrin. Graphgen: An fpga framework for vertex-
centric graph computation. In 2014 IEEE 22nd Annual International
Symposium on Field-Programmable Custom Computing Machines, pages
25–28, 2014.
[51] A. Mukkara, N. Beckmann, M. Abeydeera, X. Ma, and D. Sanchez.
Exploiting locality in graph analytics through hardware-accelerated
traversal scheduling. In 2018 51st Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), pages 1–14, 2018.
[52] Nathan Dowlin, Ran Gilad-Bachrach, Kim Laine, Kristin Lauter, Michael
Naehrig, and John Wernsing. Cryptonets: Applying neural networks
to encrypted data with high throughput and accuracy. Int’l Conf. on
Machine Learning (ICML), pages 201–210, 2016.
[53] Jian Liu, Mika Juuti, Yao Lu, and N. Asokan. Oblivious Neural Network
Predictions via MiniONN Transformations. ACM Conf. on Computer
and Communications Security (CCS), pages 619–631, 2017.
[54] Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan.
Gazelle: A low latency framework for secure neural network inference.
In Proceedings of the 27th USENIX Conference on Security Symposium,
SEC18, page 16511668, USA, 2018. USENIX Association.
[55] Pratyush Mishra, Ryan Lehmkuhl, Akshayaram Srinivasan, Wenting
Zheng, and Raluca Ada Popa. Delphi: A cryptographic inference service
for neural networks. In 29th USENIX Security Symposium (USENIX
Security 20), Boston, MA, Aug 2020. USENIX Association.
[56] Florian Tramer and Dan Boneh. Slalom: Fast, verifiable and private
execution of neural networks in trusted hardware. In International
Conference on Learning Representations, 2019.
[57] Shruti Tople, Karan Grover, Shweta Shinde, Ranjita Bhagwan, and
Ramachandran Ramjee. Privado: Practical and secure DNN inference.
CoRR, abs/1810.00602, 2018.
[58] Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. Graviton: Trusted
execution environments on gpus. In 13th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 18), pages 681–
696, Carlsbad, CA, Oct 2018. USENIX Association.
[59] G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, and
Srinivas Devadas. Efficient memory integrity verification and encryption
for secure processors. In Proceedings of the 36th Annual IEEE/ACM
International Symposium on Microarchitecture, MICRO 36, pages 339–,
Washington, DC, USA, 2003. IEEE Computer Society.
[60] Chenyu Yan, Daniel Englender, Milos Prvulovic, Brian Rogers, and Yan
Solihin. Improving cost, performance, and security of memory encryption
and authentication. SIGARCH Comput. Archit. News, 34(2):179–190,
May 2006.
[61] Reouven Elbaz, David Champagne, Ruby B. Lee, Lionel Torres, Gilles
Sassatelli, and Pierre Guillemin. Tec-tree: A low-cost, parallelizable tree
for efficient defense against memory replay attacks. In Pascal Paillier and
Ingrid Verbauwhede, editors, Cryptographic Hardware and Embedded
Systems - CHES 2007, pages 289–302, Berlin, Heidelberg, 2007. Springer
Berlin Heidelberg.
[62] W. Eric Hall and Charanjit S. Jutla. Parallelizable authentication trees. In
Proceedings of the 12th International Conference on Selected Areas in
Cryptography, SAC’05, pages 95–109, Berlin, Heidelberg, 2006. Springer-
Verlag.
[63] Meysam Taassori, Ali Shafiee, and Rajeev Balasubramonian. Vault:
Reducing paging overheads in sgx with efficient integrity verification
structures. In Proceedings of the Twenty-Third International Conference
on Architectural Support for Programming Languages and Operating
Systems, ASPLOS ’18, pages 665–678, New York, NY, USA, 2018.
ACM.
[64] G. Saileshwar, P. Nair, P. Ramrakhyani, W. Elsasser, J. Joao, and
M. Qureshi. Morphable counters: Enabling compact integrity trees
for low-overhead secure memories. In 2018 51st Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO), pages 416–427,
Oct 2018.
[65] J. Lee, T. Kim, and J. Huh. Reducing the memory bandwidth overheads of
hardware security support for multi-core processors. IEEE Transactions
on Computers, 65(11):3384–3397, Nov 2016.
[66] Weidong Shi, H. S. Lee, M. Ghosh, Chenghuai Lu, and A. Boldyreva.
High efficiency counter mode security architecture via prediction and
precomputation. In 32nd International Symposium on Computer
Architecture (ISCA’05), pages 14–24, June 2005.
[67] Weidong Shi and Hsien-Hsin S. Lee. ase. In Proceedings of the
39th Annual IEEE/ACM International Symposium on Microarchitecture,
MICRO 39, pages 103–112, Washington, DC, USA, 2006. IEEE
Computer Society.
[68] T. S. Lehman, A. D. Hilton, and B. C. Lee. Poisonivy: Safe speculation
for secure memory. In 2016 49th Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), pages 1–13, Oct 2016.
[69] Weizhe Hua, Zhiru Zhang, and G. Edward Suh. Reverse engineering
convolutional neural networks through side-channel information leaks.
In Proceedings of the 55th Annual Design Automation Conference, DAC
’18, pages 4:1–4:6, New York, NY, USA, 2018. ACM.
[70] Mengjia Yan, Christopher W. Fletcher, and Josep Torrellas. Cache
telepathy: Leveraging shared resource attacks to learn DNN architectures.
CoRR, abs/1808.04761, 2018.
[71] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation
on oblivious rams. J. ACM, 43(3):431–473, May 1996.
[72] Emil Stefanov, Marten Van Dijk, Elaine Shi, T.-H. Hubert Chan,
Christopher Fletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas. Path
13
oram: An extremely simple oblivious ram protocol. J. ACM, 65(4):18:1–
18:26, Apr 2018.
[73] C. W. Fletcher, L. Ren, A. Kwon, M. v. Dijk, E. Stefanov, D. Serpanos,
and S. Devadas. A low-latency, low-area hardware oblivious ram
controller. In 2015 IEEE 23rd Annual International Symposium on
Field-Programmable Custom Computing Machines, pages 215–222, May
2015.
[74] Lingxiao Wei, Bo Luo, Yu Li, Yannan Liu, and Qiang Xu. I know what
you see: Power side-channel attack on convolutional neural network
accelerators. In Proceedings of the 34th Annual Computer Security
Applications Conference, ACSAC ’18, pages 393–406, New York, NY,
USA, 2018. ACM.
[75] Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. CSI
NN: Reverse engineering of neural network architectures through
electromagnetic side channel. In 28th USENIX Security Symposium
(USENIX Security 19), pages 515–532, Santa Clara, CA, Aug 2019.
USENIX Association.
14
