Characterizing Parallel Multipliers for Detecting Hardware Trojans by Akira Ito et al.
Characterizing Parallel Multipliers for
Detecting Hardware Trojans










Creative Commons : 表示 - 非営利 - 改変禁止
http://creativecommons.org/licenses/by-nc-nd/3.0/deed.ja
Characterizing Parallel Multipliers for
Detecting Hardware Trojans




This paper presents a new analysis method for estimating the detectability
of a hardware trojan (HT) that causes a path delay fault (PDF) to parallel
multipliers. The proposed method characterizes a parallel multiplier with the
average delay of all paths in a multiplier. We show that the average delay, which
is determined by its multiplier structure, has a relation to the HT detectabil-
ity. The validity of our method is evaluated by an experiment using Monte
Carlo tests that measure the detection probabilities of HTs inserted into typi-
cal multipliers, and multiple regression analysis. In addition, we demonstrate
how the amounts of inserted delay have impacts on the HT detectability. The
result shows that, given an inserted delay amount and a multiplier structure,
our analysis is useful for estimating the detectability.
Keywords: Hardware trojans, Arithmetic algorithms, Multipliers, Path delay faults
1 Introduction
Hardware Trojan (HT) threats have been a topic of significant interest in hardware
security research. An HT is a hardware-oriented backdoor that can be inserted into
cryptographic hardware to retrieve secret information. Modern IC chips including
cryptographic hardware are manufactured by work division among many parties
(e.g., fabless companies, design houses, IP venders, and semiconductor foundries),
which might not be always trustworthy. In other words, a malicious party may
insert an HT into cryptographic hardware to retrieve secret information from the
chip users and/or their clients.
There are many previous works on HT insertion and detection (i.e., countermea-
sures). In earlier related works, many HTs that employ circuit function modification
have been investigated. For example, one HT modifies the cryptographic datapath
Vol. \jvolume No. \jnumber \jyear
Journal of Applied Logics — IFCoLog Journal of Logics and their Applications
A. Ito et al.
to output the secret key when the HT is triggered. Another HT described in [1]
modifies the datapath to cause one or more faults in the cryptographic operations
to extract the secret key by using a type of fault-based cryptanalysis called differen-
tial fault analysis [2]. The abovementioned HTs consist of trigger and payload units.
The trigger unit activates the payload unit only when the cryptographic hardware
has specific input values. Since the trigger values are very limited and unknown to
chip testers, it is difficult to detect them during Monte-Carlo-based chip tests. Fur-
thermore, it is impractical to perform an exhaustive test for cryptographic hardware
because the primary inputs are usually longer than 128 bits. However, as these HTs
modify the chip geometry and/or explicitly add extra specific blocks/paths, many
countermeasures against them exploits the differences between the malicious chips
and the Trojan-free golden models. For example, it is known to check products with
scanning electron microscope (SEM) images of manufactured chips, or compare the
side channel information of chips with the golden models. Similarly, an HT that
modifies the dopant polarities of cells to cause stuck-at faults intentionally [3] can
even be detected by SEM imaging with feasible additional procedures.
In contrast, a new type of HT called a Path Delay HT (PDHT), presented
in [4], causes faults (i.e., bugs) in multipliers seemingly without modifying the circuit
functions at the logic or cell level (i.e., without additional trigger/payload units or
stuck-at faults). While integer multiplication is one of the major operations in
public key cryptography (e.g., RSA [5] and elliptic curve cryptography [6]), bugs
in multipliers in public key cryptographic hardware can be exploited by attackers
to retrieve secret keys [7]. The approach used by a PDHT involves finding a rarely
sensitized path called a rare path (RP) in a multiplier1 and replacing gates along the
RP with the same functional gates with larger delays such that the RP delay becomes
larger than the critical delay. The output of a PDHT-inserted multiplier is buggy due
to the setup time violation only when the inputs are the specific values required to
sensitize the RP. Note that it is quite difficult to detect PDHTs by performing Monte
Carlo tests because RPs are sensitized by few inputs. It is also difficult to detect
PDHTs even with SEM images because PDHTs are inserted without additional units
or faulty cells. Thus, PDHTs are considered to pose a serious threat to information
system security.
On the other hand, while there are many hardware algorithms for parallel mul-
tiplication [8], the generality and applicability of PDHTs to such various multipliers
are unclear. Although RPs are fully sensitized only by specific inputs, the extra
delay added to an RP also influences other paths. If this influence is non-negligible,
1A path is sensitized if all of the gates on the path switch in a clock cycle. Note that gate
switching due to glitch effects (or dynamic hazard) is not discussed in this paper, as in [4].
2
Characterizing Parallel Multipliers
a PDHT-inserted multiplier can generate faulty output values even when the RP is
not fully sensitized. In other words, PDHTs can be detected during Monte Carlo
tests in such cases. The influence depends on the characteristics of the RP in the
multiplier related to its hardware algorithm. While a method of suppressing the
detectability (i.e., influence) of PDHT insertion was proposed in [4], the detectabil-
ity of the inserted PDHT was evaluated through only one multiplication algorithm.
Accordingly, the characteristics of RPs in multipliers, namely, the extent to which
hardware algorithms for multiplication impact the insertion of PDHTs with low
detection probabilities in Monte Carlo tests, should be studied to develop PDHT
countermeasures.
In this paper, we present analyses of some typical multipliers from the viewpoints
of RP characteristics and PDHT insertion/detection probability. We discuss how the
delay added to an RP affects other paths. In particular, we present analyses of the
statistical properties of the switching probability and number of gates along an RP.
As a result, we demonstrate that the detectability is closely related to the difference
between the critical delay and average delay of all of the paths in the multiplier in
addition to the first-order statistical moment of the switching probability and the
number of gates along the RP. We validate our argument by presenting the results of
experimental PDHT insertion into some typical multipliers of different bit lengths.
Here, we attempt various amounts of inserted delays in order to clarify the effect of
delay insertion. Consequently, we demonstrate that multipliers based on redundant
binary trees provide greater detectability than other multipliers.
2 Path Delay HT
The basic approach used by a PDHT involves modifying a path in a multiplier
and letting the multiplier output faulty values when specific values that sensitize
the path are input. The faulty outputs of such PDHT-inserted multipliers enable
the attackers who inserted the PDHT to retrieve the secret key based on the bug
attack [7]. While the conventional HTs are based on circuit function modification
or stuck-at faults caused by added/modified blocks, paths, and gates, PDHTs are
based on path delay faults [9]. In the path delay fault model, the path delay becomes
longer than the clock period due to the long delays of gates along the path. The
PDHT intentionally causes such path delay faults by replacing the gates along the
RP with gates with the same function but longer delays.
An RP is a path sensitized with an extremely low probability. The RP can be
sensitized only by attackers who know specific values, while it cannot be sensitized
and detected by Monte Carlo tests. In addition, as PDHTs employ only valid and
3
A. Ito et al.
Figure 1: Flowchart of PDHT insertion.
correct cells (i.e., gates), it is difficult to detect PDHTs via conventional HT detec-
tion methods, which employ reverse-engineering techniques to find added/modified
suspicious blocks, paths, and cells (even at the dopant level).
Figure 1 shows the flowchart of PDHT insertion presented in [4], which consists
of two phases: RP selection and delay distribution. The RP selection phase finds
an RP. Since the detectability of a PDHT depends on the probability of activating
the RP (i.e., the total switching probability of the gates throughout the RP), the
RP selection exploits two values related to switching probability: controllability
and observability [10]. Hence, in the RP selection phase, the controllabilities and
observabilities are first calculated.
Controllability is the probability of 0 or 1 on a wire, which connects two gates,
if the primary inputs of the circuit are uniformly distributed. Observability is the
probability that a value on a wire affects the primary output of the circuit.
Let C0(m) and C1(m) be the controllabilities of 0 and 1 on wire m, respectively,
and let B0(m) and B1(m) be the observabilities of 0 and 1 on wire m, respectively.
For example, let us consider a two-input AND gate. Let i and j be the input wires
to the AND gate and the value of each wire be independent. Let k be the output
signal. In this case, some controllabilities and observabilities are as follows:
C0(k) = 1 − C1(i)C1(j), (1)













Figure 2: Example circuit.
where C0(k) and C1(k) are represented by the controllabilities of inputs i and
j. The above equations indicate that the path activation probabilities (i.e., PDHT
detectability) can be roughly calculated based on the controllabilities of the primary
inputs. For RP selection, the controllabilities and observabilities are asymptotically
calculated using a Monte Carlo method with logic simulation due to the difficulty
of calculating their exact values for large multipliers deductively.
An RP is then selected according to the calculated controllabilities and observ-
abilities. When an output wire of a gate has low controllability, it can be said to
have low switching probability. A wire with low observability has a small influence
on the primary output. Therefore, the RP is selected by identifying a series of wires
(i.e., path) with lowest possible controllability and observability. More precisely, the
wire with the lowest controllability is selected and is then extended to a primary in-
put and an output with lowest possible controllabilities and observabilities. On the
other hand, since the selected path is not always sensitizable, a SAT solver is used
to check whether the path can be sensitized. Thus, an RP with a lowest possible
probability of activation can be selected. In the delay distribution phase, a delay is
added to each gate along the selected RP to minimize the probability of setup time
violations (i.e., PDHT detection) during Monte Carlo testing. Since the number
of paths in the multiplier increases exponentially as the gate depth increases, it is
difficult to determine how an added delay affects other paths. Therefore, in [4], a
genetic algorithm was used to determine how to add a delay to each gate along an
RP.
3 Analysis of RP characteristics
In this section, we present analyses of RP characteristics and discuss their relations
to PDHT detectability. We assume here that a PDHT can be detected if a setup time
violation (i.e., an erroneous output) occurs during a Monte Carlo test with a clock
period equal to the critical delay. Our analysis method focuses on the controllability
and number of gates along the RP. When a setup time violation occurs, a primary
5
A. Ito et al.
input vector should partially sensitize the RP. In other words, it is important to
analyze the number of switched gates along the RP when the delay is longer than
the critical delay. To clarify the importance of the signal controllability and number
of gates along the RP, we first consider an example circuit into which a PDHT has
been inserted. This circuit is depicted in Fig. 2, where α is the original gate delay
and β(= α + γ) is the modified gate delay obtained by adding an extra delay γ. For
simplicity we assume that α is roughly the same regardless of the gate function. In
Fig. 2, the RP is denoted by the red and green lines, and each gate along the RP
has a delay of β. A path including some of the gates along the RP is denoted by the
blue and green lines, and this path can be sensitized by Monte Carlo testing. The
two paths share the partial path denoted by the green line.
In this case, when the path is sensitized, a setup time violation occurs if 2α +
2β > dCP , where dCP (≈ 5α) is the critical delay. More generally, let x and y
be the numbers of switched gates with delays of α and β, respectively, along an
sensitized path. Note that x + y should be less than the number of gates along the
critical path (and x + y ≤ 5 in this case). Since the delay of the path is given by
xα + yβ = (x + y)α + yγ, the added delay yγ mainly determines whether or not
a setup time violation occurs. The added unit delay γ is smaller if the number of
gates along the RP is larger. In addition, y is smaller if the controllability of signals
along the RP is smaller. Thus, the controllability and number of gates along the
RP have essential roles in determining the possibility of PDHT insertion/detection.
Let us then consider the general case. Let PRP and P be an RP and a path that
can be sensitized by Monte Carlo testing, respectively. In addition, let dg and d′g
denote the delays of gate g before and after delay insertion, respectively2. A path
is defined as a set of gates. For example, gate g in path P is denoted as g ∈ P and
the delay of P is denoted as dP . The delay of P after delay insertion is
d′P = dP +
∑
g∈P ∩PRP
(d′g − dg), (5)
where P ∩ PRP denotes a set of switched gates on both P and PRP (i.e., sensitized
gates on RP). The condition that causes a setup time violation by activating P is
d′P > dCP , where dCP denotes the critical path delay. For simplicity, the added
delay is assumed to be uniformly distributed over the gates3. Then, Eq. (5) can be
rewritten as
2In general, each gate delay differs owing to whether a rising or falling transition happends in
the gate and which input port is active. The following discussion can also applied to such a more
precise model.
3This assumption was made because it enabled the effects of switched gates along the RP to
be estimated. In [4], it was demonstrated that an optimally distributed delay based on the genetic
algorithm can reduce the detectability to at most one-fourth of its original value in comparison with
6
Characterizing Parallel Multipliers




where |PRP | and |P ∩ PRP | denote the number of gates along PRP and switched
gates, respectively. In addtion, dRP and d′RP are the RP delays before and after
delay insertion, respectively. Thus, the |P ∩ PRP | condition that represents PDHT
detection by activating P is




Since P of Eq. (7) represents an arbitrary path in a multiplier, we analyze the
statistical properties of the right hand side (RHS) of Eq. (7). Let E[f(P )] be the
first-order statistical moment of a function f(P ) (i.e., the average value of f(P )).








dCP − E[dP ]
d′RP − dRP
. (8)
Thus, if the value obtained from Eq. (8) is smaller, the PDHT detection prob-
ability is larger. The derived equation indicates that the detection probability is
larger (or smaller) if
1. The total delay added to the RP is larger (or smaller).
2. The number of gates in the RP is smaller (or larger).
3. The difference between the critical delay and average delay of paths (i.e.,
E[dP ]) is smaller (or larger).
In the following, we describe the relations between the above conditions and
hardware algorithms for parallel multiplication. In general, a multiplier consists of
three parts: a partial product generator, partial product accumulator (PPA), and
final stage adder (FSA). In this study, we focus on the PPA and FSA because there
are various algorithms for them and the latency and circuit area of a multiplier
heavily depend on the algorithms. Here, we consider two typical types of PPAs:
Wallace trees [11] and redundant binary addition (RBA) trees [12]. Wallace trees are
a uniformly distributed delay. However, it is too difficult to analyze the effects of such non-uniformly
distributed delays.
7
A. Ito et al.
among the fastest PPAs and are the optimal trees of three-input, two-output carry-
save adders (CSAs) in terms of gate depth (i.e., delay). RBA trees are binary trees of
four-input, two-output redundant CSAs based on a redundant binary representation.
In an RBA tree, each digit of an integer is represented redundantly using 0, 1, and
-1, which makes it possible to construct a PPA by using a symmetrical tree of four-
input, two-output CSAs. Thus, Wallace trees have short delays while RBA trees
have high efficiencies in terms of area-delay.
The path delay difference in a Wallace tree is large due to the asymmetrical tree
structure. Thus, it would be difficult to detect PDHTs in multipliers with Wallace
trees according to the above conditions. On the other hand, the path delay difference
in an RBA tree is small owing to the symmetrical structure of binary trees, which
implies that multipliers with RBA trees are more resistant to PDHT insertion than
those with Wallace trees.
As FSAs, we focus on three typical carry-propagation adders (CPAs): ripple-
carry adders (RCAs), block carry-lookahead adders (BCLAs) [13], and KoggeStone
adders (KSAs) [14]. RCAs, BCLAs, and KSAs are among the optimal CPAs for
circuit area, area-delay efficiency, and delay, respectively. RCAs are the simplest
and most compact two-input CPAs. An n-bit RCA consists of n full adders, and
the i-th (0 ≤ i ≤ n − 1) full adder computes the i-th sum bit si and (i + 1)-th carry
bit ci+1 from the i-th input bit and i-th carry bit ci. The critical delays of RCAs
are the largest among the common CPAs due to the long carry propagation path.
BCLAs have smaller critical delays than RCAs due to the use of the carry-lookahead
technique. A BCLA consists of several small RCAs and a carry-lookahead unit. It
generates some carry bits directly from the inputs using the carry-lookahead unit to
make its critical delay smaller than that of an RCA. The main drawback of BCLAs is
that their carry-lookahead units require gates with large fan-in and fan-out, which
diminish the circuit performance. KSAs are the fastest CPAs based on parallel
prefix operations, which define how to implement each carry bit generation block.
KSAs perform addition with minimal gate depth (i.e., delay) of O(log(n)) and only
gate with a fan-out of two. Thus, KSAs usually achieve the smallest delays at the
expense of circuit area. Since the lengths of most paths in RCAs are short, the
average delay of an RCA is small relative to the critical delay. On the other hand,
since BCLAs and KSAs calculate carries in parallel, many paths would be related
to carry propagation. Therefore, KSAs and BCLAs have more long paths (whose
delays are close to the critical delay) than RCAs. Thus, the differences between
the critical and average delays for BCLAs and KSAs are small, and, consequently,
PDHTs in multipliers with BCLAs and KSAs should be more easily detected than




Our argument in Section 3 was validated by experimentally inserting PDHTs into
the six above-mentioned types of multipliers with 16-, 24-, and 32-bit operands
combining two PPAs and three FSAs. In total, we evaluated 18 multipliers in the
experiment. Note that such multipliers are frequently used for public key cryptog-
raphy with more than 100-bit multiplication.
4.1 Evaluation of PDHT detection probability
To evaluate the probability of PDHT detection during Monte Carlo testing, we
applied 107 random input vectors to PDHT-inserted multipliers and then counted
the number of setup time violations by performing gate-level timing simulations. In
addition, to analyze the dependencies of the setup time violations on the switching
probabilities and gate counts of the delay-added paths, we randomly selected a path
in each multiplier, added a delay to the selected path like a PDHT, and applied
106 random input vectors to the multiplier. We uniformly added a unit delay to
each gate along the randomly selected path such that the resulting path delay was
from 1.2 to 2.0 times the critical delay. For each multiplier, we repeated the above
evaluation process 1,000 times.
Figure 3 presents the histograms of the numbers of such multipliers with ran-
domly selected delay-added paths, where the horizontal axes indicate the detection
probabilities of 107 random inputs, and the red, green, blue, cyan and purple bars
denote the detection probabilities where the total path delays are from 1.2, 1.4, 1.6,
1.8 and 2.0 times the critical delay, respectively. Figures 4, 5 and 6 show the PDHT
detection probabilities for all 18 multipliers, where the values of zero indicate that
the corresponding RPs could not be detected in our experiment. From Figs 3 – 6, we
can confirm that the average detection probabilities of the RCAs are smaller than
those of the BCLAs and KSAs for any operand length and that all of the average
detection probabilities tend to be smaller if the operand length is larger.
We discuss these results in detail in the following subsections.
4.2 Evaluation of path delays
We then evaluated the average path delay in each multiplier. As described in Section
3, Eq. (8) is essential for evaluating PDHT detectability (i.e., the number of setup
time violations), which is closely related to the number of gates switched along the
RP to cause setup time violation. Table 1 shows the critical delay, average delay of
a randomly selected path, and difference between them for each multiplier. Table 1
reveals that the critical and average delays of the multipliers with Wallace trees are
9

















































































































































Figure 3: Detection probabilities of 1,000 randomly chosen paths.
10
Characterizing Parallel Multipliers
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0































Figure 4: Detection probabilities of RPs inserted into 16-bit multipliers.
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0































Figure 5: Detection probabilities of RPs inserted into 24-bit multipliers.
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0































Figure 6: Detection probabilities of RPs inserted into 32-bit multipliers.
smaller than those of the multipliers with RBA trees, although both logic depths
are given as O(log(n)), where n is the number of input bits of the multiplier. This
tendency occurred because extra logic for converting redundant binary into common
binaries exists in multipliers with RBAs due to the redundant binary representation.
11
A. Ito et al.
Regarding the FSAs, the critical delays of the RCA-based multipliers are larger
than those of the BCLA- and KSA-based multipliers, and the magnitude relation
of the average delays is opposite to that of the critical delays. Note again that the
detection probability is higher when the value obtained from Eq. (8) is smaller. The
result indicates that it is easier to detect PDHTs in multipliers with KSAs rather
than BCLAs or RCAs. In contrast, the difference between the critical and average
delays in a PPA would be trivial.
Based on the above results, we discuss the detection probabilities of randomly-
inserted PDHTs where the insertion delays vary from 1.2 to 2.0 times the critical
path delay in Fig. 3. From the figure, we can first confirm that the detection prob-
abilities are lower when the insertion delay is smaller. This corresponds to the first
condition described in Section 3. In addition, the result indicates that the detec-
tion probabilities of PDHTs inserted into the multipliers based on RCA vary greatly
depending on the insertion delay value, compared to those of PDHTs inserted into
the KSA based multipliers. We can explain the difference by our analysis method
mentioned in Section 3 as follows.
The basic idea of our method is to estimate at least how many gates are needed
to cause the setup time violations. For example, let us consider the case where
the original delays of all gates are zero and only the gates on RP have the delays
inserted by introducing the PDHT. The half number of gates on RP is needed to be
switched in order to cause an error when the insertion delay is twice the critical delay.
However, since the all gates on the circuit have delays in practice, the number of
gates required to cause an error would be smaller than the half number. In addition,
the number of required gates depends on the ratio of each gate delay on the circuit
to the critical delay. If each gate delay is much smaller than the critical delay, the
almost half of gates on RP is needed to detect the PDHT. On the other hand, if
each gate delay is large, the number of gates required to be detected is smaller. In
our analysis mentioned in Section 3, we substitute the average path delays for each
gate delays.
From the table, we confirm that the difference in the average delay of randomly
selected paths between the multipliers based on RCA and KSA is small. On the
other hand, the difference of the critical delays between them is large. The number
of gates needed to cause an error is larger in the case of RCA–based multipliers in
comparison with KSA–based ones. This effect would be larger when the insertion
delay is smaller. In our experiments, at the smallest insertion delay (i.e., 1.2 times
the critical path delay), an error must cause when 83% of gates were activated. In
the case of KSA–based multiplier, it is highly likely that the number of gates required
to cause an error is actually smaller than the above number, but in the case of RCA,










16 0.663 0.127 0.536
24 0.975 0.144 0.830
32 1.290 0.154 1.136
BCLA
16 0.332 0.132 0.200
24 0.347 0.149 0.198
32 0.400 0.162 0.238
KSA
16 0.257 0.142 0.115
24 0.268 0.162 0.106
32 0.310 0.171 0.139
Wallace tree
RCA
16 0.58 0.100 0.480
24 0.885 0.116 0.769
32 1.190 0.129 1.061
BCLA
16 0.246 0.104 0.142
24 0.290 0.123 0.167
32 0.319 0.136 0.183
KSA
16 0.197 0.111 0.085
24 0.224 0.131 0.093
32 0.247 0.142 0.105
Table 1: Critical and average delays of randomly chosen paths of various multipliers
number fo gates to cause an error is thought to be inversely proportional to the
ratio of the insertion delay to the critical delay. In fact, the ratio appears in the
denominator of Eq. 8 when we divide its numerator and denominator by the critical
delay. From the above reasons, it cecomes more difficult to detect RBA rather than
KSA and the detection becomes harder as the insertion delay decreases.
4.3 Evaluation of switching probability
The controllability of signals along the path significantly influences the detection
probability because a random vector can easily sensitize a path consisting of gates
with high switching probabilities. Figure 7 shows the histograms of the gate switch-
ing probabilities for 1,000 randomly selected paths in each multiplier, where the
horizontal axes indicate the logarithmic mean switching probabilities. Table 2 also
lists the corresponding average switching probabilities of the gates along the RPs.
Figure 7 confirms that the logarithmic means differ considerably depending on
the PPA algorithm. The multipliers with Wallace trees have smaller logarithmic
13
A. Ito et al.
PPA FSA 16-bit 24-bit 32-bit
RBA
tree
RCA 0.1589 0.1505 0.1523
BCLA 0.1099 0.0641 0.0455
KSA 0.1013 0.0459 0.0234
Wallace
tree
RCA 0.0378 0.0317 0.0322
BCLA 0.0640 0.0600 0.0517
KSA 0.0641 0.0449 0.0374
Table 2: switching probabilities of RPs
mean switching probabilities than those with RBA trees. In other words, there
are many paths with high activation probabilities in RBA trees, basically because
Wallace trees have asymmetric structures including many gates with switching prob-
abilities close to 0 or 1 while RBA trees have symmetric binary tree structures. The
critical delays of the above FSAs and PPAs are given by O(log(n)), except for the
RCAs. Since the logic depths of the paths in such FSAs and PPAs increase gradu-
ally with increasing operand length, the switching probability of each path does not
strongly depend on the operand length. Similarly, as mentioned in Section 3, many
paths in an RCA have far smaller logic depths than the critical path, which indicates
that the switching probability also does not strongly depend on the operand length
for an RCA.
4.4 Evaluation of our method
To conduct a theoretic validation of our argument, we performed multiple regression
analysis between the detection probabilities of 1,000 paths randomly selected from
each multiplier and predictor variables: (i) the controllabilities, (ii) the numbers of
gates along randomly selected paths, and (iii) M , given by (dCP − E[dP ])/(d′RP −
dRP ). Note that each variable is normalized to unit variance in order to make
the influence of the variable on the multiple regression model meaningful and un-
derstandable. In this evaluation, we focus on 32-bit multipliers, which are more
frequently used in practical cryptographic HW than 16- and 24-bit ones. Table 3
shows the multiple regression analysis coefficients and t-stats. The R-squared value
obtained from the multiple regression was 0.699. Table 3 confirms that |t-stat| is
large for M and the switching probability, which indicates their significant influ-


























































































































































coefficient -530.1 34.88 163.1
t-stat -255.6 17.21 59.86
Table 3: Results of multiple regression analysis
15
A. Ito et al.
5 Conclusion
This paper presented analyses of parallel multiplication hardware algorithms from
the viewpoints of RP characteristics and PDHT detectability. We discussed the
theoretical aspects of RPs in multipliers and their relations to PDHT detectability.
Our argument was validated through the experimental insertion of PDHTs into var-
ious multipliers. In addition, we confirmed the effectiveness of our method through
the experimental insertion with different amounts of inserted delays. The results
of multiple regression analysis confirmed that the proposed evaluation method pri-
marily explains PDHT detectability. The multiplier combined with an RBA tree
and KSA yielded the highest detectability among the evaluated multipliers. The
development of dedicated multiplication hardware algorithms that impede PDHT
insertion remains a topic for future work.
Acknowledgment
This research has been supported by JSPS KAKENHI Grants No. 17H00729, No.
16K12436 and No. 16J05711.
References
[1] Shivam Bhasin, Jean-Luc Danger, Sylvain Guilley, Xuan Thuy Ngo, and Laurent
Sauvage. Hardware trojan horses in cryptographic ip cores. In Fault Diagnosis and
Tolerance in Cryptography (FDTC), 2013 Workshop on, pages 15–29. IEEE, 2013.
[2] Eli Biham and Adi Shamir. Differential fault analysis of secret key cryptosystems.
Advances in Cryptology—CRYPTO, pages 513–525, 1997.
[3] Georg T Becker, Francesco Regazzoni, Christof Paar, and Wayne P Burleson. Stealthy
dopant-level hardware trojans. In International Workshop on Cryptographic Hardware
and Embedded Systems, pages 197–214. Springer, 2013.
[4] Samaneh Ghandali, Georg T Becker, Daniel Holcomb, and Christof Paar. A design
methodology for stealthy parametric trojans and its application to bug attacks. In
International Conference on Cryptographic Hardware and Embedded Systems, pages
625–647. Springer, 2016.
[5] Ronald L Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining digital
signatures and public-key cryptosystems. Communications of the ACM, 21(2):120–126,
1978.
[6] Neal Koblitz. Elliptic curve cryptosystems. Mathematics of computation, 48(177):203–
209, 1987.
[7] Eli Biham, Yaniv Carmeli, and Adi Shamir. Bug attacks. In Annual International
Cryptology Conference, pages 221–240. Springer, 2008.
16
Characterizing Parallel Multipliers
[8] Koren Israel. Computer arithmetic algorithms. AK Peters, Ltd, 2002.
[9] Gordon L Smith. Model for delay faults based upon paths. In ITC, pages 342–351,
1985.
[10] Sunil K Jain and Vishwani D Agrawal. Stafan: An alternative to fault simulation. In
Proceedings of the 21st Design Automation Conference, pages 18–23. IEEE Press, 1984.
[11] Christopher S Wallace. A suggestion for a fast multiplier. IEEE Transactions on
Electronic Computers, (1):14–17, 1964.
[12] Naofumi Takagi, Hiroto Yasuura, and Shuzo Yajima. High-speed vlsi multiplication
algorithm with a redundant binary addition tree. IEEE Transactions on Computers,
(9):789–796, 1985.
[13] Amos R Omondi. Computer arithmetic systems algorithms, architecture and imple-
mentation. 1994.
[14] Peter M Kogge and Harold S Stone. A parallel algorithm for the efficient solution of
a general class of recurrence equations. IEEE Transactions on Computers, 100(8):786–
793, 1973.
Received \jreceived
