Abstract-Lattice-based cryptography is one of the leading candidates for NIST's post-quantum standardisation effort, providing efficient key encapsulation and signature schemes. Most of these schemes base their hardness on variants of LWE, and thus rely heavily on error samplers to provide necessary uncertainty by obfuscating computations on secret information. Because of this it is a clear and obvious target for side-channel analysis, with numerous types of attacks targeting this component to gain secret-key information. In order to bring potential lattice-based cryptographic standards to practical realisation, it is important to protect these modules from past and future fault and side-channel attacks. This paper proposes countermeasures that exploit the distributions expected from these error samples, that is either Gaussian or binomial, by using statistical tests to verify the samplers are operating properly. The novel countermeasures are designed to protect against all previous fault attacks on error samplers. We optimize hardware implementation of the proposed tests to avoid division and square root calculations, however, the countermeasure we propose is sufficiently generic to be suitable also for software. We measure the impact of these countermeasures on performance and area consumption on a Xilinx Artix-7 FPGA. Our countermeasure achieve promising performance while resulting in a minimal overhead.
I. INTRODUCTION
Post-quantum (or quantum-safe) cryptography has seen a substantial expansion recently, in part due to the NIST call for quantum-safe algorithms [NIS16a] . This call is essential to secure the future of secure communications that use publickey cryptography since the schemes currently used, based on the hardness of factoring prime numbers (RSA) or the discrete logarithm problem (ECC/ECDSA), will be solved in polynomial time with a scalable quantum computer.
Amongst the submissions to the NIST call, schemes based on the hardness of lattice problems seem to be the leading candidates and make up the largest proportion of submissions. Lattice-based cryptography is a very appealing candidate due to its security; offering average-case to worst-case hardness, its efficiency; outperforming many other candidates in software or hardware, and versatility; having schemes for advanced cryptographic services such as identity-based encryption, as well as standard primitives such as encryption, signatures, and key encapsulation [HPO + 15] .
This research was supported in part by EPSRC via grant EP/N011635/1 and by the European Union Horizon 2020 SAFEcrypto project (grant no. 644729).
Error sampling is one of the main modules in lattice-based cryptography, which are critical components as these hide computations on secret information and make the schemes computationally hard. Of the 26 lattice-based candidates submitted to NIST for post-quantum standardisation, this research is applicable to at least 18 which utilise either Gaussian or binomial samplers. However, many of the side-channel attacks on lattice-based cryptographic schemes have exploited vulnerabilities in error samplers [Pes16] , [EFGT17] , [EFGT16] , [BHLY16] , [BBK16] , [PBY17] . This research proposes countermeasures to all previous physical attacks on error samplers.
The main contribution of this paper is to propose countermeasures, via statistical tests, for the normally distributed error samplers used in lattice-based cryptography. The tests are grouped by their computational complexity, categorised as low cost, standard, and expensive, which can be used depending on the application, device, or security level. The tests are applicable to Gaussian and binomial error samplers which cover most lattice-based schemes, especially those submitted to NIST for post-quantum standardisation. The proposed tests are then demonstrated in conjunction with the most efficient error sampler in hardware [HKR + 18], a look-up based technique, to evaluate the latency and area costs in hardware. The results show that error samplers are still able to remain practical, with a minimal overhead in speed and area, whilst operating in a sound and genuine fashion.
II. PRELIMINARIES
We denote the sample size, n, and standard deviation, σ. Error samplers typically use standard deviation, but to simplify the calculations we propose in Section IV, we instead use the statistical variance, s, which translates simply as s = σ 2 . Calculations on variance have no impact on the test results and mean we can avoid explicitly calculating square roots.
We use typical statistics notations to define the target mean, µ, the target variance, s, the sample mean,x, and the sample variance,s. We also require pre-stored values for specific ttests, t α/2 , and chi-squared tests, χ 2 α , for n − 1 degrees of freedom and significance level α (here we use α = 0.99), which are used for comparisons against our calculated statistics. These pre-stored test statistics can be changed to the whims of the implementer, with no impact on performance.
Definitions for the equations we use to calculate the test statistics can be found in Table I. Most lattice-based schemes have their security foundations on the learning with errors (LWE) problem [Reg05] : Given some uniformly distributed vectors a i ∈ Z n q , integers n and q, and b i ≡ a i , s + e i mod q, where the secret-key s is chosen uniformly at random from Z n q and each e i follow some small error distribution, find s given access to pairs (a i , b i ). The problem asks an adversary to either find s ∈ Z n q given A ∈ Z n×m q and b ≡ A T s+e mod q or to distinguish between (a i , b i ) and (a i , u i ) where u i is chosen uniformly at random.
In other words, solving a system of linear equations is usually easy, but as soon as an error (e i ) is added to the equations, it becomes a hard mathematical problem and there exists no quantum or classical algorithm that could solve this problem in polynomial time. Thus, schemes based on LWE, with large enough parameters, are considered quantum-secure.
The error generated to hide these secret-key operations has a statistically 'normal' shape, which are typically either Gaussian or binomial. From Table ? ?, it can be seen that half of the lattice-based key encapsulation (KEM) schemes submitted to the NIST post-quantum standardisation use binomial sampling, the other half use Gaussian sampling. Lattice-based signature schemes typically require Gaussian sampling. A number of side-channel attacks have targeted these modules [BHLY16] , [EFGT16] , [PBY17] , [EFGT17] , [EFGT18] in order to gain secret-key information or to break the cryptographic scheme. Moreover, there has been little work on countermeasures for these attacks beyond shuffling [RVV13] , [Saa17] , which may not be sufficient [Pes16] .
NIST have also stressed for the need for efficient sidechannel protected schemes [NIS16b] : "Schemes that can be made resistant to side-channel attack at minimal cost are more desirable than those whose performance is severely hampered by any attempt to resist side-channel attacks."
Binomial samplers are generally realized by uniformly sampling two k-bit vectors and computing their respective Hamming weights. The binomial variable is obtained by subtracting the Hamming weights of both k-bit vectors. Gaussian samplers on the other hand are much more complex, with the number of different techniques ranging from arithmeticbased to lookup In any case, irrespective of the method used to derive the Gaussian or binomial variables, it is important they are correct and secure from any type of side-channel analysis, such as fault attacks, which is the goal of this paper. Moreover, it is also important to merely ensure these error samplers are operating correctly, following requirements from strict theoretical foundations. Providing assurances for this is an important attribute in itself and is essential if lattice-based cryptographic schemes are implemented into real-world applications.
III. RELATED WORK
Implementations of lattice-based cryptographic schemes have been investigated in the past using side-channel analysis. Many of these have targeted error sampling modules, such as Gaussian samplers, in order to gain secret-key information.
Fault analysis applied to lattice-based signature schemes is described in depth by Bindel et al. [BBK16] . From their analysis they conclude that zeroing of the error samplers is applicable to all the signature schemes they consider with a small number of faults (either 1 or 2). This attack is shown in practice by Espitau et al. [EFGT16] , reporting fault attacks on the error sampling components of a number of latticebased signature schemes. Some countermeasures have been suggested for error samplers, the most used technique employs shuffling [RVV13] , [Saa17] . However, these countermeasures would do little against a fault attack and as Pessl [Pes16] shows, this countermeasure is still not sufficient, recovering secret-key information in a practical real-world setting.
Many other side-channel attacks on lattice-based cryptographic schemes have been shown that are specific to software implementations [BHLY16] , [Pes16] , [EFGT17] , exploiting information leakage via cache memory, power leakage, template attacks, or using branch tracing. The error sampler is not always the attack target here and other components can be used by an adversary, this is also illustrated in the analysis by Bindel et al. [BBK16] .
IV. PROPOSED COUNTERMEASURES
In this section we discuss the proposed countermeasures for error samplers. The countermeasures are categorised, where each level deploys increasingly powerful statistical analysis, hence aims at thwarting more powerful adversaries, but comes at a higher cost in performance and hardware area consumption. The discussions about these tests are generic enough to be applied to hardware or software, but optimizations are specific to hardware. A summary of the equations we use for these countermeasures is shown in Table I .
A. Low Cost
This test counts the number of repetitions in the observation and raises an alarm flag if the repetitions exceed an improbable value (user defined, for example; 10). Fault attacks including zeroing attacks and early loop abort result in a constant stream of the same value(s), leaving secret-key information visible or predictable to an adversary.
B. Standard
This countermeasure will calculate the sample mean (x) and sample variance (s), whilst also checking for repetitions. Checking the sample variance is particularly important, as this parameter is linked to the hardness of the cryptographic scheme's LWE problem. Minimising the variance of the error sampler has the potential to make the LWE instances easier to solve. These countermeasures would also spot errors and bugs in the sampler, and any malicious implementations. In hardware, this test will be computed after a power-of-two sample size so we do not have to explicitly compute the division. This is convenient for schemes like Kyber [SAB + 17] and Dilithium(-G) [LDK + 17] which require n = 256 samples for each respective key encapsulation or signature.
For the sample mean calculation, we require one accumulation of all of the n outputs, followed by a power-of-two division: ( n i=1 x i )/n. For the sample variance calculation, we use the same register for the mean calculation, as well as the power-of-two division, the only extra element we need is an extra accumulator to accumulate the sum of the squared outputs to calculate:
These tests are much more effective than the low cost variant at spotting bugs and errors in the sampler, as well as more sophisticated physical attacks. For example, an attack which could minimise the range of values output from an error sampler would make the LWE instances much more easier to solve. The calculations here would spot an attack of this type and report an error. These tests will also ensure that the Gaussian distribution we have in practice is the same distribution we require theoretically.
There are online alternatives for these equations, proposed by Welford [Wel62] . However, these equations require explicit division (thus, floating point numbers) which would be inefficient in hardware. For software testing, this would be a better option, which would also provide consistently updated testing, rather than testing after a fixed number of trials.
C. Expensive
This test will include those in the previous categories, as well as a chi-squared test for comparing observed and expected values. Essentially, ensuring the output distribution is exactly what we require from a theoretical standpoint. It will also spot poor pseudo-randomness, as well as any programming errors, malicious error sampler designs, or erroneous activity caused by damage to the device. For this we require a lookup table to store a histogram of counts for each output of the error sampler, i.e. the observed values. This is then compared to the CDT table, i.e. the expected values, to verify that the frequencies of these outputs are valid. These observed and expected values are then used in a chi-squared goodness-offit test via the calculation: . If this test statistic is within certain bounds we fail to reject our null hypothesis for normality. Like the previous tests, these bounds are also known in advanced and are hence pre-stored on the device. Observed values that have been generated via a source of poor pseudo-randomness would be spotted here due to the 
Test Level Test Description Test Formula Low Cost
Check for repetitions A counter for if 
fact that our expected values have been calculated based on theoretically sufficient randomness.
V. HARDWARE DESIGN
The hardware design of fault attack countermeasures has been undertaken as separate synthesizable HDL modules for the three categories previously discussed. The user may opt to choose any of these in the design as the security level or resource budget of the system allows. The hardware results are shown in Table II and a brief overview now follows.
The low cost countermeasure is the simplest of the three and requires a pipeline register to hold the previous valid value of sampler (x i ). A comparator compares the i th and i + 1 th valid sampler outputs and in case of a match a counter (ctr) is incremented. An alarm flag is raised as the counter exceeds a legal user defined bound c.
The standard level countermeasure uses the calculation of x ands and requires accumulation of sampler outputs and its squared values, in two separate accumulation registers. Forx, every n = 256 samples, a power-of-two division (by 256) is carried out by simply right shifting the accumulator value by 8 (to evaluate ( n i=1 x i )/n). Similarly for thes calculation, we use the squared output ofx and subtract that from the sum of the squared samples to calculate:
2 /n)/n. By keeping n as a power of two, the actual division is avoided, which is preferable since division is slow and expensive in hardware. Hence, the division by n (to get approximations forx,s) and by √ n (in the SEx approximation) is simply a right shift by 8 and 4, respectively. The values of t α/2 and n/s are pre-calculated values. To keep the hardware generic, DSP multipliers are avoided. The standard test can raise either of the two alarms in case the hypothesis bound is exceeded.
The expensive countermeasure is rightly named so for the cost it incurs in terms of the hardware overhead. This countermeasure requires an array of registers, array depth being as many as number of possible error sampler outputs (32 for σ = 3.33). This register array holds the sample count and is initialized with zeros as the system resets. For each sampler output generated, a register in the ctr array, respective to the sample output value observed is incremented, to keep count of the observed values. As soon as the n th sampler output is used to update the ctr array, the observed values are stored in a shadow array for further calculations of the chi-squared test, while the original observed values array is For each of the 32 observed values, subtraction from the prestored expected values is carried out, the difference is squared and a division is simply carried out by multiplication by precalculated quotients. After 32 cycles, the accumulated value is used to calculate the test statistic, failure of result to lie within the legal bound raises an alarm.
VI. RESULTS
The hardware results for the proposed countermeasures are shown in Table II , demonstrated on a Xilinx Artix-7 FPGA. In order to fully realise the performance analysis of the proposed countermeasures, we integrate each test with a constant-time CDT sampler. The CDT sampler design has been shown in the past to be the most efficient method in hardware [HKR + 18] utilising minimal hardware resources whilst maintaining a high throughput performance, requiring 6 clock cycles per sample.
The low cost countermeasure can be implemented in only 3 slices on the FPGA. Moreover, this countermeasure has no impact on the throughput performance of the error sampler. Essentially making this countermeasure extremely relevant to hardware implementations of any lattice-based schemes, having almost zero degradation in area or performance, as well as protecting many potential attacks, faults, or errors.
The standard countermeasure is realised using slightly more hardware resources, however still remains relatively inexpensive. On its own, the countermeasure requires 24 slices, adding an 44% to the CDT sampler with the integrated countermeasures. However, this countermeasure, as well as the low cost one, has a fixed cost and will not require additional hardware resources if the error sampler were slower and/or larger. The standard countermeasure also has minimal impact on performance, as most of the calculations are computed in parallel to the running of the error sampler. Overall it requires only one additional clock cycle to complete the calculations after the error sampler has completed generating its n samples.
The expensive countermeasures require significantly larger hardware resources than the ones previously proposed. This is essentially due to the requirements of keeping a histogram of all the observed values output from the error sampler. On its own, the countermeasure utilises 126 slices, which is nearly 4x larger than just the CDT sampler. Once integrated into the CDT sampler, the hardware resources required are either 129 or 149 slices, depending on whether block-RAM is used, where the countermeasure takes around 85% of the overall area consumption. The expensive countermeasure has a small impact on performance; the calculations needed at the end of n error samples is just 32 clock cycles. VII. APPLICATIONS Typically, a side-channel countermeasure would flag malicious or erroneous activity by invalidating the output by outputting logical false (⊥) instead. This symbol is also typically used in cryptography to signify incorrect decryption or verification. In hardware, countermeasures could also be linked to the reset of the design in order to ensure the device's countermeasures correctly withstand the attack.
Alternatively, many lattice-based cryptographic schemes instantiate conditionals in the final stages of their protocol, such as an if-statement, typically for security purposes. This is especially true for signature schemes which require rejection conditions so that no information about the secretkey is leaked with the signature. With respect to NIST postquantum candidates, this can be seen in Kyber [ Fig. 4 , Line 22]. Adding to this conditional would have little impact on the performance of the scheme and would be a suitable place to include the results of our proposed countermeasures. Specifically, the ciphertext or signature should only be output if the countermeasures output an 'accept', i.e. that the error sampler is operating correctly, in combination with the conditionals that are already in place.
Another application of these countermeasures is on cryptographic outputs that follow a normal distribution. For example, the lattice-based signature scheme, BLISS [DDLL13] , has outputs that follow a Gaussian distribution. The proposed tests here could also be applied to validate the signature outputs of BLISS, and any other lattice-based cryptographic outputs that have a normal structure, which which might be more efficient than other countermeasures, such as verify-after-sign.
VIII. CONCLUSIONS
The aim of this paper is to propose cheap and efficient fault attack countermeasures for use in error samplers in latticebased cryptography. These types of tests are not only important to protect against fault attacks and side-channel analysis, but also to ensure confidence that the error distributions produced in real-world applications are correct. The results for the proposed designs show that it is possible to protect against attacks on error samplers as well as having little or no effect on the efficiency or area consumption of the module. The proposed novel countermeasures not only thwart fault attacks but also mean other errors (such as bugs, environmental damage, and programming errors) are observed which is particularly important for IoT applications.
