Sequence-aware watermark design for soft IP embedded processors by Kufel, J. et al.
        
Citation for published version:
Kufel, J, Wilson, PR, Hill, S, Al-Hashimi, BM & Whatmough, PN 2015, 'Sequence-aware watermark design for
soft IP embedded processors', IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 1.
https://doi.org/10.1109/TVLSI.2015.2399457
DOI:
10.1109/TVLSI.2015.2399457
Publication date:
2015
Document Version
Early version, also known as pre-print
Link to publication
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other
users, including reprinting/ republishing this material for advertising or promotional purposes, creating new
collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this
work in other works.
University of Bath
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Download date: 13. May. 2019
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1
Sequence-Aware Watermark Design for Soft IP
Embedded Processors
Jedrzej Kufel, Peter Wilson, Stephen Hill, Bashir M. Al-Hashimi, and Paul N. Whatmough
Abstract—This paper describes a design approach for in-
corporating sequence-aware watermarks in soft IP Embedded
Processors. The influence of watermark sequence parameters on
detection, area and power overheads is examined, and conse-
quently a sequence-aware method for incorporating sequence-
aware watermarks in soft IP Embedded Processors is proposed.
The intrinsic parameters of sequences, such as the activity factor
and the overlapping factor are introduced, and their impact on
correlation results is demonstrated. Measurement experimental
results from FPGA and ASIC validate the design approach and
demonstrate the resulting IP protection and subsequent costs for
constrained embedded processors. Results presented in this paper
show that the tradeoff occurs between the watermark robustness
against third party IP attacks and hardware implementation
costs. The analysis of this tradeoff is provided and an application
specific watermark implementation is proposed.
Index Terms—Watermarking, IP Protection, Embedded Pro-
cessors, Correlation Power Analysis.
I. INTRODUCTION
TECHNOLOGY scaling and innovations in modern pro-cesses are allowing increasingly complex systems to be
implemented on a single die [1]. To support this design
complexity it is desirable to source sub-systems, such as
CPUs, from external Intellectual Property (IP) suppliers. IP
blocks are usually delivered as either hard-macros, full circuit
layouts, or soft-macros, typically register-transfer level (RTL)
descriptions. The Virtual Socket Interface (VSI) Alliance [2]
proposes three approaches to the problem of securing an
IP, namely deterrent, protection and detection. The deterrent
approach may deter the infringement from occurring through
patents, copyrights, contracts or lawsuits [2]. However, it does
not provide physical protection. The protection approach pre-
vents unauthorized use of IP through encryption. Nonetheless,
encryption and rights managements support in EDA tools is
far from universal and pain-free [3]. Therefore, IP blocks are
often supplied as unprotected design files that System-on-
Chip (SoC) integrators can use without any complication of
their design flow. As a result, auditing the presence of IP in
finished products is an important challenge for IP providers.
De-encapsulation and die-level reverse engineering can be
used to prove the presence of IP, but the process is slow and
J. Kufel is with ARM Ltd., Cambridge, CB1 9NJ, U.K. (e-mail: An-
drew.Kufel@arm.com).
P. Wilson and B. M. Al-Hashimi are with the School of Electronics and
Computer Science, University of Southampton, SO17 1BJ, U.K. (e-mail:
prw@ecs.soton.ac.uk; bmah@ecs.soton.ac.uk).
S. Hill is with ARM Ltd., Austin, TX 78746, U.S.A. (e-mail:
Stephen.Hill@arm.com).
P. N. Whatmough is with ARM Ltd., Cambridge, CB1 9NJ, U.K. (e-mail:
Paul.Whatmough@arm.com).
costly [3], [4]. It is therefore desirable to identify IP candidates
to be short-listed for more thorough investigation.
The VSI Alliance proposes digital watermarking as one of
detection methods for physical IP protection, at various design
levels [2]. In a hard IP, a digital watermark is represented
as physical modifications to the IC layout. Techniques alter
the placement of technology library cells through the parity
modification [5] or scattering [6], [7], modify interconnects in
digital or analog devices [8], [9] or utilize intrinsic features
of physical IC layout, obtained with EDA tools [10]. In a
firm IP, a digital watermark is embedded through application
of additional constraints during the optimization steps, such
as partitioning [11], [12], graph coloring [13], [14], template
matching or operation scheduling [15]. The hard and firm IP
protection techniques generate highly tamper-resistant water-
marks with negligible area overheads. However, an access to
a watermarked design, such as a micro photograph, GDSII
file, fully placed and routed or a partial netlist are required.
Hence, such techniques are not in the scope of this paper. The
use of the soft IP is more desirable as it offers the end user the
highest level of flexibility [2]. Therefore, this paper focuses on
digital watermarks embedded in a soft IP.
Embedded processors are constrained in terms of circuit
area and power consumption. Therefore, area and power over-
head minimization of the watermark circuit must be addressed
for IP protection of embedded processors. The primary motiva-
tion of this work is the analysis of the current power watermark
circuit design, which enables the non-invasive detection of
a watermark in a fabricated device. Furthermore, this work
investigates the reduction of the area and power overheads,
necessary for highly constrained embedded processors. Such
investigation is performed through the analysis of intrinsic
parameters of watermarking sequences and their impact on
hardware implementation costs and detection performance.
None of the previous power watermark publications [16]–
[18] have compared watermark sequences in such way. The
commonly used sequence for power watermarks, found in the
literature [16]–[18], is a 32-bit maximum length sequence (m-
sequence). In this work, sequences not previously discussed
in the area of IP power watermarking, such as Barker codes,
are compared with m-sequence [18]. Although Barker codes
are not new and can be found in the field of communication
technology [19] and radar technology [20], their use in the
context of IP power watermarking is novel. The sequences
have been chosen to demonstrate various combinations of
intrinsic parameters. Nevertheless, the provided theoretical
analysis is valid for any other sequence. The theoretical
analysis is validated with measurements using FPGA and two
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 2
ASIC designs. Furthermore, the strong relationship between
the choice of a watermark sequence and hardware implemen-
tation costs and detection performance is shown and the best
sequence for digital power watermarks is determined.
The paper is organized as follows. In Section II, previous
work in the area of soft IP protection is analyzed. Section III,
demonstrates the architecture of a power watermark circuit.
Section IV, provides the in-depth analysis of the Pearson
correlation coefficient and introduces the intrinsic parameters
of watermark sequences. The sequences for power watermarks
are discussed in Section V. The simulation results of sequences
are given in Section VI. Section VIII validates simulation
results on FPGA and ASICs, discusses the detection perfor-
mance and hardware implementation costs. In Section IX, the
secure digital signature generation methodology is discussed
for short length sequences. Section X analyzes the application
specific watermark implementation and Section XI compares
non-triggering and trigger-based watermark implementations
with regards to third party IP attacks. Section XII, concludes
the paper.
II. RELATED WORK
The methods for the protection of soft IP can be divided
into two groups; invasive and non-invasive. The invasive
detection techniques require an access to device internals,
such as input and output (I/O) ports, memories and a full
or partial knowledge of a system. The Finite State Machines
(FSM) have been successfully used through an addition of
extra states [21], [22], partial [23], [24] or complete [25]
reuse of existing states, and allow a significant reduction
of hardware implementation costs. To extract the embedded
digital watermarks the output ports of a device are observed,
while the dedicated activation vectors, integrated as a part of a
test kernel [26], are applied to the input ports. Other techniques
embed digital watermarks in look-up tables (LUT) on FPGA
[27], but require a dedicated co-processor, to scan the data
being fetched from the memory or execute special watermark
instructions [28]. Upon detection of a unique input sequence
or instruction, a watermark is copied to the specific memory
location. In a typical invasive watermarking approach, such as
FSM [21]–[25], the watermark is interwoven with the original
IP and the removal of the watermarked logic compromises
the design. To perform a post-fabrication detection, an IP
provider requires an architectural knowledge, such as I/O ports
or memory. In case of soft IP, an IP provider knows the
architecture of the IP, however, he/she may lack the in-depth
knowledge of how it will be integrated as part of a system.
In the non-invasive detection techniques, the system’s
knowledge is significantly reduced and an access to device
internals is not required. The sources of information, also
known as side-channel parameters, such as electromagnetic
(EM) field radiation and power consumption can be used to
detect an embedded watermarks. Techniques based on analysis
of EM field [29], [30] offer a high degree of detectability
by placing an EM sensor close to a device and perform-
ing an EM field characterization with a spectrum analyzer.
Nevertheless, in this paper we focus on non-invasive power
analysis techniques. The watermark detection is achieved
by placing a current sensor and measuring a device power
consumption. Since the embedded power watermark causes a
deterministic power overhead, the threshold-based [16], [17]
or statistical-based [18] detection techniques can be applied.
In the threshold-based techniques, a device must be held in a
reset state during the power measurement. Moreover, due to
the nature of the algorithm deeply embedded watermark power
signals cannot be detected. Therefore, statistical-based power
analysis techniques, such as Correlation Power Analysis (CPA)
[31], are used to detect deeply embedded watermark power
signals, using the dynamic or static current variations on the
supply voltage rail. The architecture of a power watermark
consist of two circuits: a watermark generation circuit (WGC)
and a watermark power pattern generator (WPPG) [16]–[18].
The architecture of the WGC depends on the watermark
sequence. Nonetheless, it is kept relatively small, and 32
registers have been reported in [16]–[18]. The WPPG consists
of shift registers and determines the power consumed by the
watermark circuit. Its size is closely related to the system size.
In [16], 92 out of 1332 lookup tables (LUT) on FPGA, a 6.9%
of system area, were used with each LUT configured as 16-
bit shift register, for simple arithmetic coder core. Similarly
in [18], 16 LUTs were used to watermark the Advanced
Encryption Standard (AES) cryptographic core. As can be
seen, the majority of area overhead in the current state-of-the-
art power watermark architecture is caused by the significant
size WPPG circuit. Although, the WPPG circuit area overhead
can be minimized through reuse of existing LUTs [16], [17],
such approach is specific to FPGA architecture. Moreover, the
device must be held in a reset state to successfully perform a
watermark detection. However, this paper focuses on detecting
an embedded watermark during the active processor mode.
Furthermore, many techniques have been demonstrated which
allow detection of a negligible sized circuits, through integra-
tion of ring oscillator networks [32], power measurements of
multiple supply pads [33] or the combination of numerous
side-channel parameters [34]. Nevertheles, the design knowl-
edge is too fine and destructive IC tests are often necessary.
Therefore, the CPA [18] remains the current state-of-the-art
for power watermark detection and is used in this paper.
III. WATERMARK ARCHITECTURE
A power watermark is a redundant circuit added to an
existing IP block, with the aim of superimposing a weak
but deterministic signal on a supply voltage rail. In Fig.
1(a), a typical embedded system is shown, with multiple IP
blocks sub-sourced from various IP vendors. The watermark is
embedded in one of the IP blocks and consists of two circuits:
a watermark generation circuit (WGC), and a watermark
power pattern generator (WPPG) [18]. The WGC generates the
watermark sequence (′푊푀퐴푅퐾 ′) which controls the WPPG
load circuit. Hence, the WPPG consumes power in clock
cycles where 푊푀퐴푅퐾 is ’1’. Simulation results in Fig.
1(b), demonstrate the effect of an additional watermark circuit
on the device total power (in relative terms). The watermark
power signal (middle) is added to the power consumed by the
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 3
(a)
Samples
Am
pl
itu
de
Embedded System Power
Samples
Am
pl
itu
de
Watermark Power
Samples
Am
pl
itu
de
Total Device Power
(b)
Fig. 1. (a) Architecture of a power watermark circuit. (b) Simulation results
of the effect of a watermark power signal on device total power.
embedded system (top), and generates the device total power
(bottom). Since the watermark power signal is a much lower
amplitude, it is deeply embedded in the overall device power
signal. An analytical technique is therefore required, which
determines the possibility and the accuracy of a watermark
existence. Such a technique is the Correlation Power Analysis
(CPA) and has been used in this work as the fundamental
power watermark detection technique.
IV. PEARSON CORRELATION COEFFICIENT ANALYSIS
The CPA uses the statistical correlation technique to detect
a deeply embedded power watermark signal. The Pearson
correlation coefficient is computed as in Eq. (1). In this section,
the modifications to Eq. (1) are proposed, to include the
intrinsic parameters of watermark sequences.
The CPA [31] requires an information extracted from the
measured power consumption of a device, recorded using an
oscilloscope. The sampling frequency of an oscilloscope, 푓푠,
is much greater than the frequency of a system clock, 푓푐푙푘,
i.e. 푓푠 >> 푓푐푙푘. The power vector, 푌 , is found by averaging
all the samples within a single clock cycle, as in [18]. The
watermark model vector, 푋 , represents a watermark sequence.
As both vectors must be of equal length, a watermark sequence
is repeated many times within the 푋 vector. Therefore, the 푋
vector consists of multiple periods of a watermark sequence,
to find a single Pearson correlation coefficient, 휌, given by
휌 =
푁
푁∑
푖=1
푋푖푌푖 −
푁∑
푖=1
푋푖
푁∑
푖=1
푌푖√
푁
푁∑
푖=1
푋2푖 − (
푁∑
푖=1
푋푖)2
√
푁
푁∑
푖=1
푌 2푖 − (
푁∑
푖=1
푌푖)2
(1)
Where 푋 can be represented by a binary sequence and 푌
consists of the sampled power signal. 푁 is the length of both
vectors and contains only full periods (푀 ) of a watermark
sequence (푁 ≡ 0 (푚표푑 푀)). Since both model and power
vectors may be out of phase, 푋 is repeatedly rotated by a
single clock cycle and the correlation is recomputed [18]. The
number of rotations is 푀 . Once all 푀 correlation values have
been found, they can be represented by a spread spectrum
graph (see Fig. 2) [18]. The watermark is only regarded as
detected, if a single significant correlation coefficient can be
resolved, as demonstrated in Fig. 2.
0 1000 2000 3000 4000−0.01
−0.005
0
0.005
0.01
0.015
Watermark Model Rotation
ρ
Fig. 2. Spread spectrum of correlation coefficients.
The dynamic power of a canonical static CMOS gate is
linearly proportional to the switching activity, 훼 [35]. In
digital power watermarking, the activity factor is intrinsic to a
watermark sequence, and the dynamic power is consumed in
clock cycles when watermark sequence is ’1’. Therefore, Eq.
(1) can be modified to incorporate the activity factor, 훼, of a
watermark sequence into the Pearson correlation. In Section
VI, Table II, various watermark sequences are considered
and it is demonstrated that 훼 differs between sequences. To
compare the detection performance of potential watermark
sequences, it is crucial to consider the 훼 parameter. Since
vectors 푋 and 푌 can be out of phase, the watermark model,
푋 , is rotated 푀 times and correlation computation is repeated
[18]. In Eq. (1), vector 푋 is substituted with 푋 ′, which
represents the rotated vector 푋 . If both vectors are in phase,
then 푋 ′ = 푋 . Additionally, vector 푌 can be represented as
푋 + 훽, where 푋 is the original vector of the watermark
model and 훽 is the noise present in the system, such as
global switching noise of digital IP blocks, environmental and
measurement noise. Therefore, 휌 becomes
휌 =
푁
푁∑
푖=1
푋′푖(푋푖 + 훽푖)−
푁∑
푖=1
푋′푖
푁∑
푖=1
푋푖 + 훽푖√
푁
푁∑
푖=1
푋′2푖 − (
푁∑
푖=1
푋′푖)2
√
푁
푁∑
푖=1
(푋푖 + 훽푖)2 − (
푁∑
푖=1
푋푖 + 훽푖)2
(2)
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 4
Since both vectors 푋 and 푋 ′ represent a binary sequence,
the sum of all terms in a vector (
∑
푋푖 and
∑
푋 ′푖) is the
Hamming Weight, 퐻 , of a sequence. Moreover, as both 푋
and 푋 ′ represent the same but rotated binary sequence, the
Hamming Weight is the same. Furthermore, if both vectors
푋 ′ and 푋 are in phase we have
푁∑
푖=1
푋 ′푖푋푖 =
푁∑
푖=1
푋푖 = 퐻 (3)
However, since vectors 푋 ′ and 푋 can be out of phase, the
overlapping factor, 휃, is introduced, such that
푁∑
푖=1
푋 ′푖푋푖 = 휃퐻 (4)
To illustrate this point, consider 2 watermark model vectors,
where one is the cyclically rotated version of the other.
푋 : 1111000000 1111000000 (5)
푋 ′ : 0111100000 0111100000 (6)
푁∑
푖=1
푋푖푋
′
푖 = 6 = 휃퐻 휃 = 6/8 = 0.75 (7)
The overlapping factor, 휃, is 1 when both vectors are in phase,
and 휃 < 1 when both vectors are out of phase, including
other rotations of vector 푋 ′. In Section VI, Table II, 휃푀퐴푋 , is
shown and describes the highest overlapping factor, 휃, under
the assumption that 푋 and 푋 ′ are not in phase. As shown
in Table II, 휃푀퐴푋 varies significantly between sequences.
Furthermore, the Hamming Weight, 퐻 , can be substituted
as the product of the activity factor, 훼, and the length 푁
of vectors, since 훼 = 퐻푁 . Finally, the Pearson’s correlation
coefficient, 휌, can be described as a function of activity,
overlapping factor, and both 푋 and 푋 ′ vectors as follows
휌(훼, 휃,푋푖,푋
′
푖) =
푁훼(휃 − 훼) +
푁∑
푖=1
(푋′
푖
훽푖) − 훼
푁∑
푖=1
훽푖
√
훼(1 − 훼)
√
푁2훼(1 − 훼) + 푁(2
푁∑
푖=1
(푋푖훽푖) +
푁∑
푖=1
훽2
푖
) −
푁∑
푖=1
훽푖(2훼푁 +
푁∑
푖=1
훽푖)
(8)
The terms
∑푁
푖=1(푋
′
푖훽푖) and
∑푁
푖=1(푋푖훽푖) in Eq. (8) depend
on the position of ’1’ in a watermark sequence. However, since
푁 >> 1, Eq. (8) can be simplified to
휌(훼, 휃) =
푁훼(휃 − 훼)√
훼(1− 훼)
√
푁2훼(1− 훼) +푁
푁∑
푖=1
훽2푖
, 푁 >> 1 (9)
By definition, in a spread spectrum a single correlation peak
should be distinguishable, to consider a watermark detected
[31]. The maximum correlation coefficient, 휌푃퐸퐴퐾 , is
(a)
(b)
Fig. 3. The influence of the activity factor, 훼, and the maximum overlapping
factor, 휃푀퐴푋 , on (a) maximum correlation coefficient, 휌푃퐸퐴퐾 , and (b)
correlation coefficient difference, 휌퐷퐼퐹퐹 ; MATLAB simulations.
휌푃퐸퐴퐾 = 휌(훼, 1) =
푁훼(1− 훼)√
훼(1− 훼)
√
푁2훼(1− 훼) +푁
푁∑
푖=1
훽2푖
, 푁 >> 1
(10)
It is expected (Fig. 2), that the highest 휌푃퐸퐴퐾 occurs when
both vectors 푋 and 푋 ′ are in phase, i.e. 휃 = 1. In the noiseless
environment, 휌푃퐸퐴퐾 is 1 for all sequences, since
휌푃퐸퐴퐾 = 휌(훼, 1) =
푁훼(1− 훼)√
훼(1− 훼)
√
푁2훼(1− 훼)
= 1 (11)
In Fig. 3(a), MATLAB simulation of Eq. (10) is shown with
various watermark sequences and noise levels. The noise
vector, 훽, consists of normally distributed random values.
The frequency spectrum is shown in Section VI, Fig. 6(a).
Therefore, it approximates white noise, to represent the global
switching noise of digital IP blocks, environmental and mea-
surement noise [36]. Since the mean value of 훽 is 0, the
power follows the variance, 휎2. To increase the power of 훽
(Fig. 3(a)), the standard deviation, 휎, of the generated random
values is increased. From Eq. (10), 휌푃퐸퐴퐾 is principally
influenced by the activity factor, 훼. Due to the parabolic shape
of the graph, watermark sequences with 훼 ≈ 50% produce
the highest 휌푃퐸퐴퐾 . As the noise increases, the term 푁
푁∑
푖=1
훽2푖
becomes dominant and the graph becomes flatter. The 휌푃퐸퐴퐾
decreases and watermark sequences produce similar results.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 5
(a)
(b)
Fig. 4. (a) Block diagram of the 12-bit LFSR. (b) States of LFSR in
consecutive clock cycles.
In practice, a power supply noise and measurement error
give rise to undesirable spurious correlation coefficients. If
such spurii are considered as the system noise floor, the
correlation coefficient difference, 휌퐷퐼퐹퐹 , can be described as
the distance from 휌푃퐸퐴퐾 to the noise floor, as in Eq. (12).
휌퐷퐼퐹퐹 = 휌푃퐸퐴퐾 − 휌(훼, 휃,푋푖, 푋 ′푖)
=
푁훼(1− 휃)√
훼(1− 훼)
√
푁2훼(1− 훼) +푁
푁∑
푖=1
훽2푖
, 휃 < 1, 푁 >> 1
(12)
The simulation of Eq. (12) with various watermark sequences
is shown in Fig. 3(b). As expected, 휌퐷퐼퐹퐹 is influenced by
both 훼 and 휃푀퐴푋 parameters. If the noise standard deviation,
휎, is increased, 휌퐷퐼퐹퐹 approaches 0. This means that there
is no distinctive correlation peak in a spread spectrum graph,
and a watermark cannot be found.
In this section, the intrinsic parameters of sequences, such as
the activity factor, 훼, and the overlapping factor, 휃, have been
introduced, and their impact on correlation coefficient values
has been demonstrated. The significance of Eq. (10) and Eq.
(12) is the ability to design embedded power watermarks with
low overheads. In the following sections, the sequences for
power watermarks are presented and the discussion, supported
by simulation results, of differences between 휌퐷퐼퐹퐹 and
휌푃퐸퐴퐾 is provided. Furthermore, the detection performance
of watermark sequences is established.
V. SEQUENCES FOR POWER WATERMARKS
In this section, two types of binary sequences are discussed.
These are sequences generated with the Linear Feedback Shift
Register (LFSR), as demonstrated in the current state-of-the-
art power watermark architecture [16]–[18], and Barker codes.
Such sequences have been chosen to demonstrate the impact
of various intrinsic parameters and lengths on detection perfor-
mance, hardware implementation costs and robustness against
third party IP attacks, analyzed in the following sections.
A. Linear Feedback Shift Register
The binary sequence generated with the LFSR is also known
as the maximum length sequence (m-sequence). The block
diagram of the 12-bit LFSR is shown in Fig. 4(a) and can be
described by the following polynomial [37]:
1푥12+1푥11+1푥10+0푥9+0푥8+0푥7+0푥6+0푥5+1푥4+0푥3+0푥2+0푥
(13)
The degree of the polynomial is directly related to the length
of the LFSR and contains ′0′ and ′1′ coefficients, with ′1′
corresponding to the taps of registers connected to XOR gates
(shown in green in Fig. 4(b)), forming a feedback path. The
last register in the LFSR is used as an output and generates
the m-sequence (shown in red in Fig. 4(b)). For 푀 number
of registers, the length of the m-sequence is 2푀 − 1.
B. Barker Codes
The m-sequence is a unipolar sequence represented by ′1′
and ′0′. The Barker codes are bipolar sequences represented
by ′1′ and ′−1′. Therefore, they must be transformed into their
unipolar representation, by substituting all ′− 1′ with ′0′. The
commonly used Barker codes [38] are shown in Table I, and
are used throughout this paper.
TABLE I
BARKER CODES
Length Bipolar Sequence Unipolar Sequence
2 +1 -1 1 0
3 +1 +1 -1 1 1 0
4 +1 +1 -1 +1 1 1 0 1
5 +1 +1 +1 -1 +1 1 1 1 0 1
7 +1 +1 +1 -1 -1 +1 -1 1 1 1 0 0 1 0
11 +1 +1 +1 -1 -1 -1 +1 -1 -1 +1 -1 1 1 1 0 0 0 1 0 0 1 0
The generation of Barker codes can be achieved with
simple circular shift registers. Since no feedback loop exists,
the 푀 -bit Barker code requires 푀 number of registers. For
comparison, a 12-bit LFSR generates m-sequence with the
length of 4, 095 clock cycles, while using only 12 registers.
The 11-bit Barker code generates a sequence with the length
of 11 clock cycles and requires 11 registers. Therefore, the
m-sequence architecture allows generation of much longer
sequences with less number of registers. This has a direct
impact on security of an embedded power watermarks, as
discussed in Section XI.
VI. SIMULATION RESULTS
The summary of sequences, discussed in Section V, and
their parameters are shown in Table II. As can be seen, 훼, and
휃푀퐴푋 vary for all Barker codes. However, for m-sequences 훼
decreases and tends to 50%, and 휃푀퐴푋 is constant and equals
0.5, as the length of a sequence increases.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 6
(a) (b)
(c) (d)
Fig. 5. Maximum correlation coefficient, 휌푃퐸퐴퐾 (a, b), and correlation coefficient difference, 휌퐷퐼퐹퐹 (c, d), simulation results of watermark sequences;
MATLAB simulations.
TABLE II
PARAMETERS OF WATERMARK SEQUENCES
Watermark Bit Period activity factor Maximum
Sequence Length (clock cycles) 훼 [%] 휃푀퐴푋
Barker
2-bit 2 50% 0
3-bit 3 66.6% 0.5
4-bit 4 75% 0.667
5-bit 5 80% 0.75
7-bit 7 57.15% 0.5
11-bit 11 45.45% 0.4
m-sequence
6-bit 63 50.8% 0.5
8-bit 255 50.2% 0.5
12-bit 4095 50.01% 0.5
A. Maximum Correlation Coefficient
The maximum correlation coefficient, 휌푃퐸퐴퐾 , of water-
mark sequences at various noise levels is shown in Fig. 5(a),
and Fig. 5(b). For noise signals with relatively low power (Fig.
5(a)), watermark sequences with 훼 ≈ 50% produce the highest
휌푃퐸퐴퐾 . This is as expected based on results in Section IV,
Fig. 3(a). As the system noise level increases, the watermark
signal-to-noise ratio (SNR) decreases. Finally, it reaches the
point where the watermark power signal is too low to be
reliably detected. In the marginal case, results of 휌푃퐸퐴퐾 are
dictated by the robustness of a watermark sequence against the
correlation to the noise present in the system. Therefore, the
results can be considered as the noise-to-sequence correlation,
휌푁푂퐼푆퐸 . In Fig. 5(b), it can be seen that when the noise
power approaches 38푑퐵, 휌푃퐸퐴퐾 of m-sequences are higher
than other sequences. As the noise level is further increased,
the length of a sequence determines the noise-to-sequence
correlation, with longer sequences producing higher 휌푃퐸퐴퐾 .
To understand the reason of such behaviour, consider 휌 in
Eq. (1) when no watermark is present. Vector 푌 , which
originally represents the measured power signal and contains
the watermark model 푋 and system noise 훽, is replaced
with 훽, since no watermark exists. If the substitution of the
Hamming Weight (Section IV) is followed, 휌푁푂퐼푆퐸 can be
represented as
휌푁푂퐼푆퐸 =
푁∑
푖=1
푋 ′푖훽푖 − 훼
푁∑
푖=1
훽푖
√
훼(1− 훼)
√
푁
푁∑
푖=1
훽2푖 − (
푁∑
푖=1
훽푖)2
(14)
We simulated Eq. (14) with sequences of various lengths and
훼 and found that 훼 had no effect on 휌푁푂퐼푆퐸 for very low
SNR. Therefore the diminishing effect of 훼 on correlation
coefficients can be observed, as the noise power is increased.
The period of a watermark sequence (푀 ) determines the
number of frequency components in the watermark model.
Short sequences contain only a few frequency components,
Fig. 6(b). As the length of a sequence increases, more
frequency components appear in the frequency spectrum of
the watermark model, Fig. 6(c). In Fig. 6(a), the frequency
spectrum of the noise signal obtained from simulations is
shown. If the convolution of the watermark model and noise
signal is considered, Fig. 6(d) and Fig. 6(e), the overlapping
area between the two signals increases with the length of a
sequence. Therefore, more information contained within the
noise signal is retained, Fig. 6(f) and Fig. 6(g). At the same
time, the correlation between the two signals increases, which
causes 휌푃퐸퐴퐾 to be higher for longer m-sequences.
B. Correlation Coefficient Difference
Equation (12) demonstrates that 휌퐷퐼퐹퐹 is determined by 훼
and 휃 (휃푀퐴푋 ) parameters. It should be noted that the highest
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 7
0 1 2 3 4 5
Frequency (MHz)
Am
pl
itu
de
(a)
0 1 2 3 4 5
Frequency (MHz)
Am
pl
itu
de
(b)
0 1 2 3 4 5
Frequency (MHz)
Am
pl
itu
de
(c)
0 1 2 3 4 5 6
Clock cycles (105)
(d)
0 1 2 3 4 5 6
Clock cycles (105)
(e)
0 1 2 3 4 5
Frequency (MHz)
Am
pl
itu
de
(f)
0 1 2 3 4 5
Frequency (MHz)
Am
pl
itu
de
(g)
Fig. 6. Frequency spectra of (a) noise 훽, (b) 2-bit Barker code, (c) 6-bit
m-sequence. Convolutions of (d) 2-bit Barker code, (e) 6-bit m-sequence,
with noise 훽. Frequency spectra of convolved (f) 2-bit Barker code, (g) 6-bit
m-sequence, with noise 훽.
0 1000 2000 3000 4000−0.1
0
0.1
0.2
0.3
Watermark Model Rotation
ρ
(a)
0 2 4 6 8 10−0.1
0
0.1
0.2
0.3
Watermark Model Rotation
ρ
(b)
0 1000 2000 3000 4000−0.01
0
0.01
Watermark Model Rotation
ρ
(c)
0 2 4 6 8 10−0.01
0
0.01
Watermark Model Rotation
ρ
(d)
Fig. 7. Spread spectra of (a, c) 12-bit m-sequence and (b, d) 11-bit Barker
code for noise with power of 6푑퐵 and 40푑퐵, respectively.
휌퐷퐼퐹퐹 occur for watermark sequences with the highest 훼
and lowest 휃푀퐴푋 (see Fig. 3(b)). In Fig. 5(c), the 2-bit
Barker code produces the highest 휌퐷퐼퐹퐹 , since 훼 = 50%,
and 휃푀퐴푋 = 0. Maximum length sequences (m-sequences)
produce much lower 휌퐷퐼퐹퐹 than most of the Barker codes,
since 훼 reduces when 푀 is increased and tends to 50%, while
휃푀퐴푋 remains at 50%. The difference between subsequent
m-sequences is minimal due to the same 휃푀퐴푋 and similar
훼. However, since longer watermark sequences contain more
frequency components that correspond with the noise signal,
there are multiple correlation coefficients with values close
to 휌푃퐸퐴퐾 . This causes 휌퐷퐼퐹퐹 to be much lower as the
watermark sequence length increases (see Fig. 5(d)).
In Fig. 5, the relationship between 휌퐷퐼퐹퐹 and 휌푃퐸퐴퐾
varies with power of the generated noise signal and watermark
sequences. In Fig. 7(a) and Fig. 7(b), the 12-bit m-sequence
and 11-bit Barker code are shown, for 6푑퐵 noise signal. As
can be seen for 12-bit m-sequence (Fig. 7(a)), 휌퐷퐼퐹퐹 and
휌푃퐸퐴퐾 have similar values (0.25), since the noise floor in
the spread spectrum is close to 0. However, for 11-bit Barker
code (Fig. 7(b)) 휌퐷퐼퐹퐹 (0.27) is higher than 휌푃퐸퐴퐾 (0.25).
Increasing the noise power to 40푑퐵 (Fig. 7(c) and Fig. 7(d)),
causes 휌퐷퐼퐹퐹 to be of much lower amplitude than 휌푃퐸퐴퐾 ,
since the noise floor increases and gets closer to 휌푃퐸퐴퐾 . In
Fig. 7(c), the noise floor is of similar amplitude as 휌푃퐸퐴퐾
in Fig. 7(d). This is as expected, since 휌푃퐸퐴퐾 is higher for
longer sequences, for very low SNR (Fig. 5(b)). However,
these are separate test cases and as shown in Section VI-C,
the threshold levels differ based on a sequence. Therefore, as
shown in Fig. 7(d), at 40푑퐵 11-bit Barker code is clearly
detectable. However, in Fig. 7(c) the noise floor in the spread
spectrum is significant to detect the 12-bit m-sequence.
C. Null Hypothesis Significance Test
In Section VI-A and Section VI-B, the influence of water-
mark sequence length on noise-to-sequence correlations was
demonstrated. In Fig. 5(b), results of 휌푃퐸퐴퐾 are higher for
longer m-sequences as the noise level increases. However,
based on results of 휌퐷퐼퐹퐹 in Fig. 5(d), other correlation
coefficients exist which make the spread spectrum more even,
and no significant peaks can be distinguished, at high back-
ground noise levels. To compare the detection performance
of watermark sequences, the Null Hypothesis Significance
Test (NHST) [36] was performed for each sequence. The
percentage of rejected null hypotheses was found by applying
a 5% threshold to results, where the null hypothesis states
that the watermark does not exists. The 휌퐷퐼퐹퐹 was chosen to
describe the detection performance of a watermark sequence,
since it considers multiple correlation values in a spread
spectrum. If 휌퐷퐼퐹퐹 is above the threshold, the null hypothesis
can be rejected with 5% possibility of a false alarm. This
means that there is a 5% possibility of detecting a watermark
which does not exist. To minimize the possibility of false
alarms lower percentages, hence higher threshold levels can be
used. To determine the threshold for each watermark sequence
separately, the simulation was repeated 100 times with no
watermark present in the system. The null hypothesis was
found by plotting a distribution of 휌퐷퐼퐹퐹 from which a
5% threshold level was determined [36]. The process was
repeated 10 times and the average threshold level was found
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 8
32 36 38 40 42 43 44 45 460
20
40
60
80
100
σ2 [dB]
N
ul
l H
yp
ot
he
sis
 R
eje
cti
on
 [%
]
 
 
2−bit Barker code
3−bit Barker code
4−bit Barker code
5−bit Barker code
7−bit Barker code
11−bit Barker code
6−bit m−sequence
8−bit m−sequence
12−bit m−sequence
Fig. 8. NHST of simulated watermark sequences with 5% threshold level;
MATLAB simulations.
for each watermark sequence. Next, the watermark was added
and the simulations were repeated in the same way. The
threshold levels were applied to each sequence, to determine
the detection performance, as in Fig. 8. The difference between
Barker codes is clearly distinguishable, however it is not as
strong as in Fig. 5(d). This demonstrates the higher noise-
to-sequence correlations of shorter sequences. Nevertheless,
as the length of sequences increases, the null hypothesis
rejection ratio decreases most quickly for longer m-sequences.
When noise approaches 46푑퐵, most sequences reach the 5%
threshold.
VII. DESIGN APPROACH
To verify the theory and simulation results of Section IV
and Section VI, the watermark circuit (see Fig. 1(a)), was
embedded in an ARM Cortex-M0 microcontroller IP core
implemented on a Xilinx Virtex-II Pro XC2VP30 FPGA, along
with an on-chip bus and both program and data memories. An
FPGA was used for illustration purposes to demonstrate the re-
lationship between the WPPG size and detection performance,
as in Section VIII-B.
The architecture of the watermark generation circuit (WGC)
depends on the watermark sequence. The LFSRs (Section
V-A) were used for m-sequences, and simple circular shift
register were used for Barker codes (Section V-B). The output
from the last register (′푊푀퐴푅퐾 ′) serves as the clock enable
signal for the watermark power pattern generator (WPPG).
The WPPG dissipates power due to shifting data in the flip-
flops, when enabled by the 푊푀퐴푅퐾 signal. In the Xilinx
FPGA, a single LUT can be configured as a 16-bit shift register
(SRL16). To increase the SNR between the watermark power
signal and the system noise signal, the number of SRL16
blocks is increased. To generate the maximum power in clock
cycles when watermark sequence is ′1′, each SRL16 block is
pre-initialized with ′1010...′ sequence. In Table III, the size
of the watermark circuit implemented on FPGA is shown
for deterministic sequences, discussed in Section VIII-A and
various WPPG sizes.
Additionally, to aid with experimental results and investigate
the impact of process variation (PV) on watermark sequence
detection results, two ASIC designs were fabricated in TSMC
65푛푚 low leakage CMOS technology, with nominal operating
(a) (b)
Fig. 9. Layout and die photo of test chips. (a) chip I, (b) chip II.
TABLE III
AREA OF WATERMARK CIRCUIT IMPLEMENTED ON FPGA
Watermark WPPG Registers FPGA Area
Sequence (SRL16) Slices Overhead
ARM Cortex-M0 IP core - 2,696 -
7-bit Barker code
- 4 -
8 12 0.45%
16 20 0.74%
32 36 1.34%
11-bit Barker code
- 6 -
8 14 0.52%
16 22 0.82%
32 38 1.41%
6-bit m-sequence
- 5 -
8 13 0.48%
16 21 0.78%
32 37 1.37%
8-bit m-sequence
- 5
8 13 0.48%
16 21 0.78%
32 37 1.37%
12-bit m-sequence
- 8
8 16 0.59%
16 24 0.89%
32 40 1.48%
voltage of 1.2푉 . The designs were completed using industry
standard EDA tools. In the first design (chip I), the watermark
circuit (′푊 ′) was embedded as a hard macro block, on a
separate power domain, Fig. 9(a). The SoC consists of the
ARM Cortex-M0 microcontroller IP core, along with an on-
chip bus and numerous commercial IP blocks. In the second
design (chip II), the watermark circuit was embedded from an
RTL description, Fig. 9(b). Therefore, the watermark circuit
was propagated through the entire design flow, which is closer
to the intended usage scenario when embedding watermarked
soft IP. The chip consists of dual core ARM Cortex-A5
microprocessor IP core and caches. The SoC, shown as the
unmarked circuitry, consists of the ARM Cortex-M0 along
with an on-chip bus and numerous commercial IP blocks and
the watermark circuit. The watermark circuit architecture is the
same in both chips, Fig. 10. To accommodate the possibility of
generating various watermark sequences, the watermark circuit
contains two sequence generators which can be configured as
either 32-bit LFSRs or a simple 32-bit circular shift registers.
The WPPG design contains 1, 024 registers, divided into 32
words. Upon watermark sequence bit ′1′, all 32 words are
rotated in a word-wise fashion. Therefore, to generate the
maximum power, words are initialized to 0푥퐹퐹퐹퐹퐹퐹퐹퐹 ,
and 0푥00000000, consecutively. For ASIC implementation of
the watermark circuit, see [39].
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 9
Fig. 10. Schematic diagram of the watermark circuit embedded in test chips.
VIII. EXPERIMENTAL RESULTS
In the FPGA implementation the core voltage was measured
very close to the package. In the ASIC implementation, the
power domains were connected off-chip and the total current
consumed by the chip was measured, using the shunt 270푚Ω
resistor. The operating frequency of both FPGA and ASICs
was 10푀퐻푧. Such frequency was appropriate to demonstrate
the effect of an embedded power watermark and compare
the detection performance of watermarking sequences. The
voltage and current signals were measured using an Agilent
MSO6032A oscilloscope with Agilent 1130A active differen-
tial probe, at a sampling frequency of 500푀퐻푧 . Therefore,
50 samples per single clock cycle were averaged to obtain
the power vector, 푌 . The length of both 푋 and 푌 vectors
was approximately 300, 000 clock cycles. We attempted to
detect the watermark while running the Dhrystone benchmark.
This is one of the most common benchmarks used in the
industry to measure the performance of a processor, and
reflects the activities of the integer IP processor core, such
as integer arithmetic, string operations, logic decisions and
memory accesses [40].
A. Repeatability of Results
In Fig. 11, FPGA measurements of 휌푃퐸퐴퐾 are shown.
Results match the simulation results of Section VI, Fig. 5(b).
As the SNR between the watermark power signal and the
system noise signal is high (128 to 16, SRL16), all watermark
sequences generate similar 휌푃퐸퐴퐾 . As the SNR decreases, the
impact of 훼 diminishes and the length of the watermark se-
quence becomes the major factor. Therefore, longer sequences
produce higher 휌푃퐸퐴퐾 .
The simulation results in Section VI show a clear dif-
ferentiation between short and long watermark sequences.
Moreover, no significant variations have been found between
results when simulated multiple times. However, experimental
FPGA results indicate that some watermark sequences are less
repeatable than others. This means that when the experiment
is repeated multiple times, distributions of 휌푃퐸퐴퐾 or 휌퐷퐼퐹퐹
128 64 32 16 80
0.01
0.02
0.03
0.04
0.05
0.06
0.07
SRL16 Blocks
ρ P
EA
K
 
 
7−bit Barker code
11−bit Barker code
6−bit m−sequence
8−bit m−sequence
12−bit m−sequence
Fig. 11. FPGA experimental results of 휌푃퐸퐴퐾 .
0
0.02
0.04
0.06
0.08
ρ D
IF
F
 
2−
bit
 Ba
rke
r
 
3−
bit
 Ba
rke
r
 
4−
bit
 Ba
rke
r
 
5−
bit
 Ba
rke
r
 
7−
bit
 Ba
rke
r
 
11
−b
it B
ark
er
 
6−
bit
 m
−s
eq
ue
nc
e
8−
bit
 m
−s
eq
ue
nc
e
12
−b
it m
−s
eq
ue
nc
e
(a)
0
0.02
0.04
0.06
0.08
ρ D
IF
F
 
2−
bit
 Ba
rke
r
 
3−
bit
 Ba
rke
r
 
4−
bit
 Ba
rke
r
 
5−
bit
 Ba
rke
r
 
7−
bit
 Ba
rke
r
 
11
−b
it B
ark
er
 
6−
bit
 m
−s
eq
ue
nc
e
8−
bit
 m
−s
eq
ue
nc
e
12
−b
it m
−s
eq
ue
nc
e
(b)
0
0.02
0.04
0.06
0.08
ρ D
IF
F
 
2−
bit
 Ba
rke
r
 
3−
bit
 Ba
rke
r
 
4−
bit
 Ba
rke
r
 
5−
bit
 Ba
rke
r
 
7−
bit
 Ba
rke
r
 
11
−b
it B
ark
er
 
6−
bit
 m
−s
eq
ue
nc
e
8−
bit
 m
−s
eq
ue
nc
e
12
−b
it m
−s
eq
ue
nc
e
(c)
0
0.02
0.04
0.06
0.08
ρ D
IF
F
 
2−
bit
 Ba
rke
r
 
3−
bit
 Ba
rke
r
 
4−
bit
 Ba
rke
r
 
5−
bit
 Ba
rke
r
 
7−
bit
 Ba
rke
r
 
11
−b
it B
ark
er
 
6−
bit
 m
−s
eq
ue
nc
e
8−
bit
 m
−s
eq
ue
nc
e
12
−b
it m
−s
eq
ue
nc
e
(d)
Fig. 12. Box plots of 휌퐷퐼퐹퐹 at various sizes of WPPG circuit on FPGA:
(a) 64 SRL16, (b) 32 SRL16, (c) 16 SRL16, (d) 8 SRL16.
vary from test to test. In Fig. 12, the variance of results is
shown in terms of box plots, for various sizes of WPPG circuit.
Each box represents the combined distributions of 휌퐷퐼퐹퐹 ,
obtained from multiple tests. As in Section VI-C, the 휌퐷퐼퐹퐹
is used, since it considers other correlation coefficients in the
spread spectrum. Nevertheless, the same variance occurs for
휌푃퐸퐴퐾 . The test was repeated 3 times for each watermark
sequence and the 100 point distributions were found for each
test. The FPGA was re-configured between each test, and
the delay between the start of the program and the start of
the watermark circuit was modified. This causes the noise
characteristics to vary between consecutive tests and correlate
differently for some sequences. As can be seen in Fig. 12, short
watermark sequences correlate with much higher variance for
most WPPG circuit sizes. Additionally in Fig. 12(d), medians
of very short watermark sequences do not match the simulation
results discussed in Section VI, Fig. 5. According to the
simulation results shorter watermark sequences produce higher
휌퐷퐼퐹퐹 , which are not observed for 2, 3, 4, and 5 bits Barker
codes. As the period of a watermark sequence increases,
the variance of results is significantly lower. Results become
deterministic and the expected behaviour can be predicted.
The above process was repeated on both test chips. The
number of test repetitions was increased to 5, to test the
susceptibility of watermark sequences to run-to-run noise
variations. Additionally, 30 chips were characterized and 3
corners were chosen: fast, slow, and typical. We investigate
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 10
0
0.02
0.04
0.06
0.08
ρ D
IF
F
2−
bi
t B
ar
ke
r (
SL
OW
) 
2−
bi
t B
ar
ke
r (
TY
P)
 
2−
bi
t B
ar
ke
r (
FA
ST
) 
3−
bi
t B
ar
ke
r (
SL
OW
) 
3−
bi
t B
ar
ke
r (
TY
P)
 
3−
bi
t B
ar
ke
r (
FA
ST
) 
4−
bi
t B
ar
ke
r (
SL
OW
) 
4−
bi
t B
ar
ke
r (
TY
P)
 
4b
it 
Ba
rk
er
 (F
AS
T)
 
5−
bi
t B
ar
ke
r (
SL
OW
) 
5−
bi
t B
ar
ke
r (
TY
P)
 
5−
bi
t B
ar
ke
r (
FA
ST
) 
7−
bi
t B
ar
ke
r (
SL
OW
) 
7−
bi
t B
ar
ke
r (
TY
P)
 
7−
bi
t B
ar
ke
r (
FA
ST
) 
11
−b
it 
Ba
rk
er
 (S
LO
W
) 
11
−b
it 
Ba
rk
er
 (T
YP
) 
11
−b
it 
Ba
rk
er
 (F
AS
T)
 
6−
bi
t m
−s
eq
ue
nc
e 
(S
LO
W
) 
6−
bi
t m
−s
eq
ue
nc
e 
(T
YP
) 
6−
bi
t m
−s
eq
ue
nc
e 
(F
AS
T)
 
8−
bi
t m
−s
eq
ue
nc
e 
(S
LO
W
) 
8−
bi
t m
−s
eq
ue
nc
e 
(T
YP
) 
8−
bi
t m
−s
eq
ue
nc
e 
(F
AS
T)
 
12
−b
it 
m
−s
eq
ue
nc
e 
(S
LO
W
) 
12
−b
it 
m
−s
eq
ue
nc
e 
(T
YP
) 
12
−b
it 
m
−s
eq
ue
nc
e 
(F
AS
T)
 
(a)
0
0.02
0.04
0.06
0.08
ρ D
IF
F
2−
bi
t B
ar
ke
r (
SL
OW
) 
2−
bi
t B
ar
ke
r (
TY
P)
 
2−
bi
t B
ar
ke
r (
FA
ST
) 
3−
bi
t B
ar
ke
r (
SL
OW
) 
3−
bi
t B
ar
ke
r (
TY
P)
 
3−
bi
t B
ar
ke
r (
FA
ST
) 
4−
bi
t B
ar
ke
r (
SL
OW
) 
4−
bi
t B
ar
ke
r (
TY
P)
 
4−
bi
t B
ar
ke
r (
FA
ST
) 
5−
bi
t B
ar
ke
r (
SL
OW
) 
5−
bi
t B
ar
ke
r (
TY
P)
 
5−
bi
t B
ar
ke
r (
FA
ST
) 
7−
bi
t B
ar
ke
r (
SL
OW
) 
7−
bi
t B
ar
ke
r (
TY
P)
 
7−
bi
t B
ar
ke
r (
FA
ST
) 
11
−b
it 
Ba
rk
er
 (S
LO
W
) 
11
−b
it 
Ba
rk
er
 (T
YP
) 
11
−b
it 
Ba
rk
er
 (F
AS
T)
 
6−
bi
t m
−s
eq
ue
nc
e 
(S
LO
W
) 
6−
bi
t m
−s
eq
ue
nc
e 
(T
YP
) 
6−
bi
t m
−s
eq
ue
nc
e 
(F
AS
T)
 
8−
bi
t m
−s
eq
ue
nc
e 
(S
LO
W
) 
8−
bi
t m
−s
eq
ue
nc
e 
(T
YP
) 
8−
bi
t m
−s
eq
ue
nc
e 
(F
AS
T)
 
12
−b
it 
m
−s
eq
ue
nc
e 
(S
LO
W
) 
12
−b
it 
m
−s
eq
ue
nc
e 
(T
YP
) 
12
−b
it 
m
−s
eq
ue
nc
e 
(F
AS
T)
 
(b)
Fig. 13. Box plots of 휌퐷퐼퐹퐹 on test chips: (a) chip I, (b) chip II.
the impact of PV, which occurs in the foundry during the chip
fabrication. It should be noted that the current consumption
measurement included the noise of the system caused by the
SoC and RAM on both chips, and the clock tree of the dual
core Cortex-A5 on chip II.
Results of 휌퐷퐼퐹퐹 obtained from both test chips are shown
in Fig. 13. First, consider the impact of run-to-run variations
(size of box plots), when the test is repeated multiple times,
and the chip is re-configured between tests. Results confirm
the FPGA conclusions. Short period sequences cause much
higher variance in results than longer period sequences. Next,
consider the impact of PV on 휌퐷퐼퐹퐹 , which is represented
by the variance between the boxes in the box plot for the
same chip and the same sequence length. For example in Fig.
13(b), the width of the boxes for 2-bit Barker code varies.
Moreover, as can be seen, the median for the same boxes
significantly differs between the slow and fast corners and
the typical corner. It should be noted that the size of box
plots is similar for most sequences. However, medians differ
considerably for short period sequences on both test chips.
Experimental results demonstrate that short period se-
quences are not suitable for embedded power watermarking,
due to high variance of results and strong sensitivity to
PV. Therefore, results are non-deterministic and the expected
detection performance cannot be estimated.
B. Detectability
To determine the detection performance, the Null Hypothe-
sis Significance Test [36] was performed on FPGA measure-
ments. The 5% threshold levels were found for each water-
128 64 32 16 80
20
40
60
80
100
SRL16 Blocks
N
ul
l H
yp
ot
he
sis
 R
eje
cti
on
 [%
]
 
 
7−bit Barker code
11−bit Barker code
6−bit m−sequence
8−bit m−sequence
12−bit m−sequence
Fig. 14. Null Hypothesis Significance Test of watermark sequences imple-
mented on FPGA, with 5% threshold level.
mark sequence, when a watermark signal was not present.
Furthermore, the thresholds were applied to the results in Fig.
12 and the average null hypothesis rejection ratio was found,
Fig. 14. We focused on deterministic watermark sequences,
as discussed in Section VIII-A. Results in Fig. 14, match
the simulation results of Section VI, Fig. 8. Longer period
watermark sequences such as 12-bit m-sequence approach the
threshold level much faster than shorter sequences, such as 7
and 11 bits Barker codes. Therefore, it is possible to reduce the
area and power overheads with shorter watermark sequences,
through reduction of WPPG registers.
C. Area and Power Overheads
Minimization of area and power overheads is one of the
major factors of all power watermarks implemented on embed-
ded processors. In Section VI-C, various watermark sequences
were simulated and it was shown that shorter sequences
produce higher null hypothesis rejection ratio than longer
sequences, for the same noise power. The theory in Section IV
and simulation results of Section VI have been validated on
FPGA and test chips. Experimental results from the FPGA in
Section VIII-B, demonstrated that shorter period watermark
sequences, such as 7 and 11 bits Barker codes, achieve the
null hypothesis rejection ratio close to 95%, when the number
of SRL16 blocks for WPPG is 8. To achieve the similar de-
tectability with the 12-bit m-sequence, 32 SRL16 blocks must
be used. Therefore, shorter Barker codes enable area overhead
reduction of approximately 75%, by reducing the number
of WPPG registers. To estimate the power reduction, the
watermark circuits were synthesized using 65푛푚1 technology
library. The fully placed and routed watermark circuit netlist,
embedded in chip I, was simulated using Synopsys VCS, and a
value change dump (VCD) file was created from the switching
activity of the circuit. The estimate of the power consumption
was obtained with Synopsys Primetime-PX, using the VCD
file obtained from simulations. Results are shown in Table IV.
The size of the WPPG circuit was varied, while keeping the
75% ratio between sequences. As the size of the WPPG is
reduced, the 7-bit Barker code enables greater area and static
power minimization, when compared to the 11-bit Barker
1TSMC 65푛푚 low leakage technology library.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 11
TABLE IV
AREA AND POWER REDUCTION IN ASIC
Watermark WPPG Area 65nm
Sequence Registers Reduction 푃퐷푌푁 푃푆푇퐴푇퐼퐶 푃푇푂푇퐴퐿
Reduction Reduction Reduction
7-bit Barker code 512 74.9% 73.2% 74.8% 73.3%
11-bit Barker code 512 74.8% 75.3% 74.8% 75.2%
12-bit m-sequence 2048 - - - -
7-bit Barker code 256 74.8% 74.5% 75.7% 74.5%
11-bit Barker code 256 74.5% 75.5% 73.9% 75.5%
12-bit m-sequence 1024 - - - -
7-bit Barker code 128 74.7% 75.8% 75.7% 75.8%
11-bit Barker code 128 74.3% 77.7% 75.3% 77.6%
12-bit m-sequence 512 - - - -
7-bit Barker code 64 74.3% 76% 75.8% 75.9%
11-bit Barker code 64 73.4% 77.4% 75% 77.3%
12-bit m-sequence 256 - - - -
7-bit Barker code 32 73.4% 72.7% 73.6% 72.7%
11-bit Barker code 32 71.7% 73.1% 71.8% 73.1%
12-bit m-sequence 128 - - - -
code. The 7-bit Barker code requires 7 registers, while 11-bit
Barker code requires 11 registers. However, since the activity
factor, 훼, of the 11-bit Barker code is lower by 12% (Table II),
it consumes less total power for all WPPG sizes. Furthermore,
as can be seen the total power reduction of at least 73% is
achieved when using short watermark sequences, such as 7 or
11 bits Barker codes, instead of longer m-sequences, due to
lower implementation requirements of the WPPG circuit. The
reason for this is Eq. (10) and Eq. (12) shown in Section IV.
IX. SECURE DIGITAL SIGNATURE
The watermarks discussed in previous sections transmit
a single bit of information, to determine the presence of
an IP. The watermark implementation followed [16]–[18], to
establish the influence of sequence parameters on hardware
implementation costs and detection performance. However,
as the watermark can only be regarded as found or not,
the IP candidates must be short listed for more thorough
investigation. Therefore, the digital signature, such as author of
a core, serial number or license agreement is not conveyed and
anyone can claim an ownership, once he detects a watermark
in a system [41]. Furthermore, as discussed in Section XI,
short period sequences are more vulnerable to various types
of attacks. This includes both Barker codes and short period
m-sequences.
To overcome the limitations of such short sequences and
generate a digital signature, the private/public key encryption
and the cryptographic hash functions, such as 푀퐷5 [42],
can be used as in [5], [13], [21]–[23], [27]. However, as the
encryption and the cryptographic hash functions are used, the
encoded signatures vary with conveyed messages. Hence, the
power pattern parameters, such as 훼 and 휃푀퐴푋 , change along
with the implementation costs, to provide a high detection per-
formance, Fig. 8. To ensure the most cost-efficient parameters
are utilized, the encoded signature can be generated as in [41].
In Fig. 15(a), the implementation algorithm is shown. The
digital signature (”Cortex-M0”) is encrypted with a private
key, known only to the IP vendor. To reduce the length of
the output bitstream, the encrypted message is later encoded
using the cryptographic hash function (MD5). Furthermore,
the hash encrypted bit sequence is used to modulate the cost-
efficient sequence. In Fig. 15(a), the 7-bit Barker code is used
for illustration purposes. To generate bit ′1′, a full period of
a 7-bit Barker code is used. To generate bit ′0′, the inverse
(a) (b)
0 1 2 3 4 5 6−0.02
−0.01
0
0.01
0.02
Watermark Model Rotation
ρ
(c)
0 1 2 3 4 5 6−0.02
−0.01
0
0.01
0.02
Watermark Model Rotation
ρ
(d)
1 25 50 75 100 128−0.02
−0.01
0
0.01
0.02
Digital Signature Bits
ρ D
IF
F
(e)
1 25 50 75 100 128−0.02
−0.01
0
0.01
0.02
Digital Signature Bits
ρ D
IF
F
(f)
Fig. 15. Implementation (a) and detection (b) diagrams of secure digital
signature. Detection of a correct (c) and an incorrect (d) signature bit. Correct
(e) and an incorrect (f) detection of a digital signature.
of a sequence is used. The inverted sequence demonstrates
different 훼 and 휃푀퐴푋 parameters. However, the parameters
complement each other (Fig. 3(b)) and similar 휌푃퐸퐴퐾 and
휌퐷퐼퐹퐹 results are expected, when compared with the non-
inverted sequence. In such way, the highly robust digital
signature is generated. The detection algorithm is shown in
Fig. 15(b). The device power signal (trace) is measured with
an oscilloscope. The power matrix is created by dividing a
power trace, such that each signature bit corresponds to a
specific trace. The Correlation Power Analysis is applied to
each trace separately and the correlation spectra, 휌푃퐸퐴퐾 and
휌퐷퐼퐹퐹 are found. To demonstrate the use of such algorithm,
we simulated the digital signature of Fig. 15(a), and introduced
the normally distributed noise of 32푑퐵, as in Section IV. The
size of the obtained power matrix was 128 푥 300, 000 clock
cycles. When a watermark model for a particular signature bit
is correct, a high positive correlation peak can be noticed, Fig.
15(c). Otherwise, when a model is incorrect and represents the
inverted sequence, a high negative correlation peak is seen,
Fig. 15(d). Furthermore, if a model of another sequence was
used or the data was not properly arranged, the correlation
value would be close to 0. Finally, 휌퐷퐼퐹퐹 corresponding to
all signature bits are plotted, Fig. 15(e). In case 휌퐷퐼퐹퐹 have
similar positive values, the correct hash encoded sequence is
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 12
considered as found and can be further decoded and decrypted
using the public key. In Fig. 15(f), an error bit was introduced
at the 4푡ℎ bit of a hashed sequence. As can be seen, the
negative 휌퐷퐼퐹퐹 peak occurs and the error bit is detected. This
means that the detected signature does not match the expected
signature and the IP differs from the expected IP.
The use of the private/public key and the cryptographic
hash functions ensures the embedded power signature is highly
robust. Although the attacker may not know the generation
scheme of the digital signature, due to the limited number of
signature combinations when short sequences, such as Barker
codes are considered, it is feasible for an attacker to record all
power data and use the brute force attack, to reverse engineer
the hash encrypted message. Nevertheless, if such occurs the
IP vendor’s private key remains uncompromised [5], and it
can still be used to prove the IP infringement. Additionally, if
an attacker obtains the RTL a digital simulation of the design
can be performed to check if any registers follow a Barker
code. Since, the Barker code is public and there are only a
few codes that can be used, this approach is also feasible. As
all Barker codes are short and require only few clock cycles
for the generation, the number of false alarms caused by other
registers switching in the similar pattern increases with the
system size. Moreover, the attacker would have to inspect most
of such occurrences which may become time consuming for
large designs. The methodology, Fig. 15, demonstrates that it is
computationally infeasible for an attacker to forge an owner’s
signature and provide stronger evidence of an IP ownership.
However, as discussed in Section XI-A, an attacker can tamper
with an owner’s signature and change the sign of one or
more correlation results. To prevent such an attack a new
trigger-based watermark generation methodology is proposed
in Section X.
The proposed methodology, Fig. 15, is however impractical
in case of the longer m-sequences. Although it is possible
to hash encode the signature with the longer m-sequence
as in [18], the 훼 and 휃푀퐴푋 parameters would change and
larger WPPG circuit would be required. In Fig. 15, due to the
modulation of the watermark sequence with the hash encoded
bit sequence the original parameters of a watermark sequence
are retained. Nevertheless, the proposed approach requires an
additional circuitry to implement the key encrypted and hash
reduced sequence (Fig. 15). In a typical implementation, an
extra 128-bit shift register would be required to hold the hash
value and a state machine would have to be implemented to
achieve the desired modulation. In an FPGA, such a shift
register requires 8 LUTs, configured as 16-bit shift registers
(SRL16). Although the basic state machine with only few
states would be sufficient, the final hardware implementation
would approximate the m-sequence approach (Table III). Fur-
thermore, an ASIC implementation would require the entire
128 registers to be implemented.
In this section, the secure digital signature approach was
proposed for short period watermark sequences. As it will be
discussed further in Section XI longer period m-sequences are
robust against various types of attacks but require bigger area
to implement significant size WPPG circuit. Shorter sequences
offer a reduced area and power overheads but are not as
robust as longer period m-sequences. The robustness of shorter
sequences can be improved with the approach demonstrated
in Fig. 15 but the area overhead gains vanish. The tradeoff
between longer m-sequences and shorter sequences occurs
and the watermark implementation must be reconsidered for
various types and sizes of systems.
X. APPLICATION SPECIFIC WATERMARK
IMPLEMENTATION
In small processors, such as microcontrollers (e.g. ARM
Cortex-M0), the area overhead of the secure short sequence
implementation (Fig. 15) may be excessive. Since, a small
WPPG is sufficient to generate a strong enough watermark
power consumption, the longer m-sequence approach [18] may
be a better solution. In bigger processors, such as application
processors (i.e. ARM Cortex-A9), the WGC circuit has neg-
ligible impact and the WPPG size is the main factor. Since
the WPPG size increases relatively linearly with the system
size, the area overhead of the WPPG circuit for longer m-
sequence would certainly be larger than the area overhead of
the secure short sequence implementation. Therefore, the use
of encoded short sequences is expected to be more suitable,
since it allows both area and power overhead minimization
through a reduction of the WPPG circuit implementation.
Furthermore, in embedded systems the area and power
overheads are often prioritized and it is not viable to generate
the watermark power signal at all times. In such systems,
the watermark is required to be active non-deterministically
and for a short period of time. To ensure such operation, the
watermark circuit activation time can be modulated with a
specific system instruction, to increase the attacker’s effort
and computational time of the simulation. Since an attacker
must know when a watermark sequence is active, finding
the activation time without a full knowledge of a system
architecture is infeasible. If an attacker obtains a power signal,
the watermark signal may be to weak to be found or an
erroneous correlation peaks may be generated, due to incorrect
assumptions of the implemented architecture. When an IP
owner tries to extract the embedded watermark pattern, it
uses a special trigger to combine multiple power acquisitions
into a single trace, where a watermark pattern is continuous.
Such a trigger is however not known to an attacker and will
significantly increase the effort required for a successful attack.
Additionally, the secure implementation of short sequences can
be performed as in [39], to significantly reduce the area and
power overheads. The visibility of an overridden clock enable
signal due to watermark circuit can be kept to minimum since
a simple XOR gate would ensure the clock gate is modulated
according to a watermark sequence. This also ensures that
if an attacker embeds his own ”always-ON” watermark they
may violate the original area and power specification, which
is easily detectable. Furthermore, the watermark embedded by
an attacker can be of much lower amplitude, since they would
use the WPPG circuit to achieve the desired watermark power
consumption. The IP owner instead would use the original
processor to emulate the WPPG circuit. This minimizes the
occurrence of error bits with implementation in Fig. 15. If an
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 13
attacker wishes to understand the watermark implementation
they would need to re-simulate the entire RTL to understand
the watermark activation scheme, which is not a trivial task.
XI. IP ATTACKS AND ROBUSTNESS
In this section, the security of watermarks in the non-
triggered and trigger-based implementations are compared
against various types of third party IP attacks. The security,
commonly known as the robustness, is determined by the
attacks a watermark is able to withstand and the effort of an
attacker. As it is difficult to quantify the robustness of side-
channel watermarks [41], this section discusses it in relative
terms [18]. In the classical cryptographic scenario an attacker
aims to retrieve a secret key. If such occurs, the security of a
system is breached and allows an attacker to extract sensitive
information. In case of IP watermarks, the system’s security
is not the aim of an attack, but the legal rights to an IP. This
section discusses attacks against watermarks and focuses on
the most prominent approaches, such as tampering, finding
ghosts and forging. These are illustrated using the commonly
used ”Alice and Bob” scenario, often used in Cryptography,
where ”Alice” and ”Bob” denote two individuals at either end
of a communications channel, with cryptographic techniques
applied to ensure their conversation is secret.
A. Tampering
Bob (attacker) can tamper with Alice’s (IP owner) solution,
by removing Alice’s signature (watermark) and adding own
signature. Due to the transparent nature of the RTL description
and unprotected design files provided to the SoC integrators,
Bob has virtually unlimited access to a design. Therefore, it
is not possible to prevent Bob from adding own watermark.
However, it is crucial that Alice’s watermark circuit is hidden,
such that Bob cannot easily find it.
In the non-triggered implementation the use of longer m-
sequences is a more appropriate solution. Shorter sequences,
such as Barker codes and short m-sequences, require an
additional circuitry to increase the security capabilities. Nev-
ertheless, the algorithm (Fig. 15) does not provide complete
protection against tampering attacks, since an attacker only
needs to change the sign of the correlation and not remove
the entire correlation. Also, such an approach is only feasible
in bigger application processors due to the vanishing effect of
the area overhead reduction offered by shorter sequences.
If one however considers a secure short sequence approach
in the trigger-based implementation (Section X), the robust-
ness against tampering attacks can be significantly improved.
The shortfalls of the secure algorithm identified earlier are
overcome by the trigger-based watermark generation. Further-
more, in bigger application processors the area overhead is
negligible due to removal of the WPPG circuit.
B. Finding Ghosts and Forging
Bob can attempt to find a ghost signature, such as specific
power pattern, and claim that an IP contains his own wa-
termark. Furthermore, Bob can forge Alice’s implementation
and watermark other solutions, which do not belong to Alice.
In such case, Bob demonstrates that Alice’s signature is not
genuine since it can be found in another IP.
In the non-triggered implementation, the robustness of
sequences against such attacks is limited to their intrinsic
characteristics. In the case of the commonly used m-sequence,
the robustness increases with the number of registers used
for WGC. For example, a 32-bit m-sequence is more robust
than 12-bit m-sequence, since it contains more frequency
components (Fig. 6(c)). Therefore, it increases the number of
watermark combinations and the amount of transmitted infor-
mation. Nevertheless, the detection performance of longer m-
sequences must be complemented by increasing the number of
WPPG registers, which has the direct impact on the watermark
robustness against tampering attacks.
In the trigger-based implementation, the use of the secure
short sequences is more suitable due to the watermark gen-
eration algorithm. It is very hard for an attacker to detect a
watermark without knowing the architecture of the processor
core (Section X) and the modulation of the entire (or most)
processor core prevents an attacker from generating a stronger
signature.
XII. CONCLUSIONS
The goal of this work was to achieve a design method
for incorporating sequence-aware watermarks in soft IP Em-
bedded Processors. Using a new theoretical definition, the
relationship between the watermark sequence parameters and
detection performance has been illustrated and validated with
simulations and experimental results of FPGA and ASIC
designs of embedded processors. It has been shown that the
tradeoffs occur between shorter and longer sequences, in terms
of hardware implementation costs and robustness against third
party attacks. The tradeoffs have been analyzed and it has been
concluded that for smaller systems the commonly used long
m-sequence approach is a better solution due to its robustness
against third party attacks. However, in bigger systems the
trigger-based secure short sequences achieve better hardware
implementation costs without sacrificing the robustness per-
formance.
ACKNOWLEDGMENT
The authors would like to thank EuroPractice Mini-ASIC
program for silicon fabrication and packaging, Prof. Steve
Gunn for his insightful comments, Dr. Jatin Mistry, James
Myers, Prof. David Flynn and Anand Savanth for their help
in fabricating and testing the silicon test chips, and Dr. Sheng
Yang for constructive discussions.
REFERENCES
[1] G.E. Moore. Cramming More Components onto Integrated Circuits.
Electronics, 38(8):114–117, April 1965.
[2] VSI Alliance. VSI Alliance Architecture Document: Version 1.0, 1997.
[3] VSI Alliance. Intellectual Property Protection: Schemes, Alternatives
and Discussion, Aug 2001.
[4] R. Torrance et al. The State-of-the-Art in IC Reverse Engineering. In
CHES, volume 5747 of Lecture Notes in Computer Science, pages 363–
381. 2009.
14
[5] A.B. Kahng et al. Constraint-Based Watermarking Techniques for
Design IP Protection. TCAD, 20(10):1236–1252, Oct 2001.
[6] M. Ni et al. Constraint-Based Watermarking Technique for Hard IP
Core Protection in Physical Layout Design Level. In ICSICT, pages
1360–1363, 2004.
[7] X. Cai et al. A Watermarking Technique for Hard IP Protection in
Post-Layout Design Level. In ASICON, pages 1317–1320, Oct 2007.
[8] N. Narayan et al. IP Protection for VLSI Designs via Watermarking of
Routes. In ASIC/SOC, pages 406–410, Sep 2001.
[9] T. Nie et al. A Post Layout Watermarking Method for IP Protection. In
ISCAS, pages 6206–6209, 2005.
[10] Y. Du et al. IP protection platform based on watermarking technique.
In ISQED, pages 287–290, Mar 2009.
[11] G. Qu. Publicly Detectable Techniques for the Protection of Virtual
Components. In DAC, pages 474–479, 2001.
[12] A.E. Caldwell et al. Effective Iterative Techniques for Fingerprinting
Design IP. TCAD, 23(2):208–215, Feb 2004.
[13] G. Qu. Publicly Detectable Watermarking for Intellectual Property
Authentication in VLSI Design. TCAD, 21(11):1363–1368, Nov 2002.
[14] F. Koushanfar et al. Behavioral Synthesis Techniques for Intellectual
Property Protection. TODAES, 10(3):523–545, July 2005.
[15] D. Kirovski et al. Local Watermarks: Methodology and Application to
Behavioral Synthesis. TCAD, 22(9):1277–1283, Nov 2003.
[16] D. Ziener et al. FPGA Core Watermarking Based on Power Signature
Analysis. In FPT, pages 205–212, Dec 2006.
[17] D. Ziener et al. Power Signature Watermarking of IP Cores for FPGAs.
Journal of Signal Processing Systems, 51:123–136, Apr 2008.
[18] G. Becker et al. Side-Channel Based Watermarks for Integrated Circuits.
In HOST, pages 30–35, Jun 2010.
[19] J. Mikulka et al. CCK and Barker Coding Implementation in IEEE
802.11b Standard. In Radioelektronika, pages 1–4, 2007.
[20] X. Chen et al. A New Algorithm to Optimize Barker Code Sidelobe
Suppression Filters. TAES, 26(4):673–677, 1990.
[21] I. Torunoglu et al. Watermarking-Based Copyright Protection of Se-
quential Functions. JSSC, 35(3):434–440, Mar 2000.
[22] E. Charbon et al. Watermarking Techniques for Electronic Circuit
Design. volume 2613 of Lecture Notes in Computer Science, pages
147–169. 2003.
[23] A. Abdel-Hamid et al. A Public-Key Watermarking Technique for IP
Designs. In DATE, volume 1, pages 330–335, Mar 2005.
[24] A. Cui et al. A Hybrid Watermarking Scheme for Sequential Functions.
In ISCAS, pages 2333–2336, May 2011.
[25] A. Abdel-Hamid et al. Fragile IP Watermarking Techniques. In AHS,
pages 513–519, Jun 2008.
[26] A. Cui et al. A Robust FSM Watermarking Scheme for IP Protection
of Sequential Circuit Design. TCAD, 30(5):678–690, May 2011.
[27] E. Castillo et al. IPP@HDL: Efficient Intellectual Property Protection
Scheme for IP Cores. TVLSI, 15(5):578–591, May 2007.
[28] L. Parrilla et al. Protection of Microprocessor-Based Cores for FPL
Devices. In SPL, pages 15–20, Mar 2010.
[29] J.J. Quisquater et al. ElectroMagnetic Analysis (EMA): Measures and
Counter-measures for Smart Cards. In E-smart, volume 2140 of Lecture
Notes in Computer Science, pages 200–210. 2001.
[30] L. Sauvage et al. Electromagnetic Radiations of FPGAs: High Spatial
Resolution Cartography and Attack on a Cryptographic Module. TRETS,
2(1):1–24, Mar 2009.
[31] E. Brier et al. Correlation Power Analysis With a Leakage Model. In
CHES, volume 3156 of Lecture Notes in Computer Science, pages 135–
152. 2004.
[32] X. Zhang et al. RON: An On-Chip Ring Oscillator Network For
Hardware Trojan Detection. In DATE, pages 1–6, Mar 2011.
[33] J. Aarestad et al. Detecting Trojans Through Leakage Current Analysis
Using Multiple Supply Pad 퐼퐷퐷푄푠. TIFS, 5(4):893–904, Dec 2010.
[34] S. Narasimhan et al. Hardware Trojan Detection by Multiple-Parameter
Side-Channel Analysis. IEEE Transactions on Computers, 62(11):2183–
2195, Aug 2012.
[35] M. Keating et al. Low Power Methodology Manual: For System-on-Chip
Design. Springer, 2007.
[36] J. Goodwin et al. Power analysis detectable watermarks for protecting
intellectual property. In ISCAS, pages 2342–2345, Mar 2010.
[37] P. Alfke. Efficient Shift Registers, LFSR Counters, and Long Pseudo-
Random Sequence Generators. Tech. rep., Xilinx, July 1996. xAPP052.
[38] R.H. Barker. Group Synchronizing of Binary Digital Sequences. In
Communication Theory, pages 273–287. 1953.
[39] J. Kufel, P. Wilson, S. Hill, B.M. Al-Hashimi, P.N. Whatmough,
and J. Myers. Clock-modulation based watermark for protection of
embedded processors. In DATE, pages 1–6, March 2014.
[40] R. York. Benchmarking in Context: Dhrystone. ARM, White Paper,
Mar 2002.
[41] G.T. Becker et al. Detecting Software Theft in Embedded Systems: A
Side-Channel Approach. TIFS, 7(4):1144–1154, 2012.
[42] R.L. Rivest. RFC 1321: The MD5 Message-Digest Algorithm. Internet
Activities Board, April 1992.
Jedrzej Kufel received the M.Eng. degree (first
class Hons.) in Mechatronics and Robotic Systems
from the University of Liverpool, in 2010. He is
currently pursuing the Ph.D. degree with the School
of Electronics and Computer Science, University
of Southampton. In 2014, he joined the IoTBU
department at ARM Ltd., Cambridge, U.K..
Peter R. Wilson (M’99, SM’06) was born in Edin-
burgh, Scotland, and received the B.Eng. (Hons.) in
Electrical and Electronic Engineering from Heriot-
Watt University, Edinburgh, Scotland, in 1988; an
M.B.A from the Edinburgh Business School, Scot-
land in 1999, and Ph.D. from the University of
Southampton, England in 2002.
Dr Wilson is currently an Associate Professor in
Electronic and Electrical Engineering at the School
of Electronics and Computer Science, University of
Southampton, UK. His current research interests in-
clude modeling of magnetic components in electric circuits, power electronics,
renewable energy systems, integrated circuit design, VHDL-AMS modeling
and simulation, and the development of electronic design tools.
Stephen Hill is currently ARMs Director of CPU
Engineering in the US. Previously he lead ARM
CPU Core R&D and before that he was a mi-
croarchitect and logic designers working multiple
processor generations. He studied Physics at the
University of Bristol, UK and microelectronics at
the University of Southampton, UK.
Bashir M. Al-Hashimi (M’99-SM’01-F’09) is a
Professor of Computer Engineering and Director
of the Pervasive Systems Center in University of
Southampton, UK. He is ARM Professor of Com-
puter Engineering, and Co-Director of the ARM-
ECS research center. His research interests include
methods, algorithms and design automation tools for
low-power design and test of embedded computing
systems.
Paul N. Whatmough received the B.Eng. degree
(first class Hons.) in Electronic Communications
Engineering from the University of Lancaster, in
2003, the M.Sc. degree (with distinction) in Com-
munications Systems and Signal Processing from the
University of Bristol, in 2004, and the Doctorate
degree from University College London, in 2012,
all in the U.K.
From 2005 to 2008, he held the position of
Research Scientist at Philips Research Labs, Redhill,
U.K. (which became NXP Semiconductors Research
in 2006), focussing on digital radio approaches for multi-standard cellular
systems. In 2008, he joined the R&D department at ARM Ltd., Cambridge,
U.K., where he is currently Staff Research Engineer. His research interests
are in low-power circuits, algorithms and architectures relating to wireless,
DSP and embedded computing.
