98 research outputs found

    Sequential Circuit Design for Embedded Cryptographic Applications Resilient to Adversarial Faults

    Get PDF
    In the relatively young field of fault-tolerant cryptography, the main research effort has focused exclusively on the protection of the data path of cryptographic circuits. To date, however, we have not found any work that aims at protecting the control logic of these circuits against fault attacks, which thus remains the proverbial Achilles’ heel. Motivated by a hypothetical yet realistic fault analysis attack that, in principle, could be mounted against any modular exponentiation engine, even one with appropriate data path protection, we set out to close this remaining gap. In this paper, we present guidelines for the design of multifault-resilient sequential control logic based on standard Error-Detecting Codes (EDCs) with large minimum distance. We introduce a metric that measures the effectiveness of the error detection technique in terms of the effort the attacker has to make in relation to the area overhead spent in implementing the EDC. Our comparison shows that the proposed EDC-based technique provides superior performance when compared against regular N-modular redundancy techniques. Furthermore, our technique scales well and does not affect the critical path delay

    A mobile data acquisition system

    Get PDF
    A mobile data aquisition (MobiDAQ) was developed for the ATLAS central hadronic calorimeter (TileCal). MobiDAQ has been designed in order to test the functionalities of the TileCal front-end electronics and to acquire calibration data before the final back-end electronics were built and tested. MobiDAQ was also used to record the first cosmic ray events acquired by an ATLAS subdetector in the underground experimental area

    On the Improving of Approximate Computing Quality Assurance

    Get PDF
    Approximate computing (AC) has been predominantly recommended for implementation in error-tolerant applications as it offers a reduced resource usage, e.g.,~area and power, for a trade-off in output quality. However, AC implementation has not been adopted in commercial designs yet as it is still falling short in providing a good enough quality. Thus, continued research in the field in the field of improving quality of AC designs is indispensable. In this direction, a recent study exploited the use of machine learning (ML) to improve output quality. Nonetheless, the idea of quality assurance in AC designs could be improved in many aspects. In the work we present in this thesis, we propose a few practical methods to improve an ML-based quality assurance methodology, which consist of an ML-model that select the most suitable design from a library of AC circuits. For instance, we extend the library of AC designs used for the ML-based approach with larger data path circuits. Larger designs, however, result in an exponential growth of complexity. Thus we propose the use of data pre-processing in order to reduce this hurdle by prioritizing designs based on their physical properties. Another direction of improving AC circuits designs in general, and the ML-based model in particular is design space exploration (DSE). We therefore propose a novel DSE that drastically reduces the design space based on the aimed targets for area, latency and power of the AC circuit. Moreover, even with a narrowed design space, the number of AC designs to be assessed for their quality could be enormous. Thus, as part of this thesis, we propose a DSE that uses an intricate mathematical modeling for designs to assess their quality. In another effort in improving quality assurance for AC design, we introduce a highly reliable model that uses a minimal overhead. This work is achieved by using redundant AC modules to form an approximate quadruple modular redundancy (AQMR) design. The proposed AQMR is superior to the exact triple modular redundancy (TMR) by offering a better reliability on top of the resource savings resulting from the implementation of AC

    architect: Arbitrary-precision Constant-hardware Iterative Compute

    Get PDF
    Many algorithms feature an iterative loop that converges to the result of interest. The numerical operations in such algorithms are generally implemented using finite-precision arithmetic, either fixed or floating point, most of which operate least-significant digit first. This results in a fundamental problem: if, after some time, the result has not converged, is this because we have not run the algorithm for enough iterations or because the arithmetic in some iterations was insufficiently precise? There is no easy way to answer this question, so users will often over-budget precision in the hope that the answer will always be to run for a few more iterations. We propose a fundamentally new approach: armed with the appropriate arithmetic able to generate results from most-significant digit first, we show that fixed compute-area hardware can be used to calculate an arbitrary number of algorithmic iterations to arbitrary precision, with both precision and iteration index increasing in lockstep. Thus, datapaths constructed following our principles demonstrate efficiency over their traditional arithmetic equivalents where the latter’s precisions are either under- or over-budgeted for the computation of a result to a particular accuracy. For the execution of 100 iterations of the Jacobi method, we obtain a 1.60x increase in frequency and 15.7x LUT and 50.2x flip-flop reductions over a 2048-bit parallel-in, serial-out traditional arithmetic equivalent, along with 46.2x LUT and 83.3x flip-flop decreases versus the state-of-the-art online arithmetic implementation

    Demonstration of Inexact Computing Implemented in the JPEG Compression Algorithm using Probabilistic Boolean Logic applied to CMOS Components

    Get PDF
    Probabilistic computing offers potential improvements in energy, performance, and area compared with traditional digital design. This dissertation quantifies energy and energy-delay tradeoffs in digital adders, multipliers, and the JPEG image compression algorithm. This research shows that energy demand can be cut in half with noisesusceptible16-bit Kogge-Stone adders that deviate from the correct value by an average of 3 in 14 nanometer CMOS FinFET technology, while the energy-delay product (EDP) is reduced by 38 . This is achieved by reducing the power supply voltage which drives the noisy transistors. If a 19 average error is allowed, the adders are 13 times more energy-efficient and the EDP is reduced by 35 . This research demonstrates that 92 of the color space transform and discrete cosine transform circuits within the JPEG algorithm can be built from inexact components, and still produce readable images. Given the case in which each binary logic gate has a 1 error probability, the color space transformation has an average pixel error of 5.4 and a 55 energy reduction compared to the error-free circuit, and the discrete cosine transformation has a 55 energy reduction with an average pixel error of 20

    Theory of reliable systems

    Get PDF
    An attempt was made to refine the current notion of system reliability by identifying and investigating attributes of a system which are important to reliability considerations. Techniques which facilitate analysis of system reliability are included. Special attention was given to fault tolerance, diagnosability, and reconfigurability characteristics of systems

    Design of Soft Error Robust High Speed 64-bit Logarithmic Adder

    Get PDF
    Continuous scaling of the transistor size and reduction of the operating voltage have led to a significant performance improvement of integrated circuits. However, the vulnerability of the scaled circuits to transient data upsets or soft errors, which are caused by alpha particles and cosmic neutrons, has emerged as a major reliability concern. In this thesis, we have investigated the effects of soft errors in combinational circuits and proposed soft error detection techniques for high speed adders. In particular, we have proposed an area-efficient 64-bit soft error robust logarithmic adder (SRA). The adder employs the carry merge Sklansky adder architecture in which carries are generated every 4 bits. Since the particle-induced transient, which is often referred to as a single event transient (SET) typically lasts for 100~200 ps, the adder uses time redundancy by sampling the sum outputs twice. The sampling instances have been set at 110 ps apart. In contrast to the traditional time redundancy, which requires two clock cycles to generate a given output, the SRA generates an output in a single clock cycle. The sampled sum outputs are compared using a 64-bit XOR tree to detect any possible error. An energy efficient 4-input transmission gate based XOR logic is implemented to reduce the delay and the power in this case. The pseudo-static logic (PSL), which has the ability to recover from a particle induced transient, is used in the adder implementation. In comparison with the space redundant approach which requires hardware duplication for error detection, the SRA is 50% more area efficient. The proposed SRA is simulated for different operands with errors inserted at different nodes at the inputs, the carry merge tree, and the sum generation circuit. The simulation vectors are carefully chosen such that the SET is not masked by error masking mechanisms, which are inherently present in combinational circuits. Simulation results show that the proposed SRA is capable of detecting 77% of the errors. The undetected errors primarily result when the SET causes an even number of errors and when errors occur outside the sampling window

    Low-overhead fault-tolerant logic for field-programmable gate arrays

    Get PDF
    While allowing for the fabrication of increasingly complex and efficient circuitry, transistor shrinkage and count-per-device expansion have major downsides: chiefly increased variation, degradation and fault susceptibility. For this reason, design-time consideration of faults will have to be given to increasing numbers of electronic systems in the future to ensure yields, reliabilities and lifetimes remain acceptably high. Many mathematical operators commonly accelerated in hardware are suited to modification resulting in datapath error detection and correction capabilities with far lower area, performance and/or power consumption overheads than those incurred through the utilisation of more established, general-purpose fault tolerance methods such as modular redundancy. Field-programmable gate arrays are uniquely placed to allow further area savings to be made thanks to their dynamic reconfigurability. The majority of the technical work presented within this thesis is based upon a benchmark hardware accelerator---a matrix multiplier---that underwent several evolutions in order to detect and correct faults manifesting along its datapath at runtime. In the first instance, fault detectability in excess of 99% was achieved in return for 7.87% additional area and 45.5% extra latency. In the second, the ability to correct errors caused by those faults was added at the cost of 4.20% more area, while 50.7% of this---and 46.2% of the previously incurred latency overhead---was removed through the introduction of partial reconfiguration in the third. The fourth demonstrates further reductions in both area and performance overheads---of 16.7% and 8.27%, respectively---through systematic data width reduction by allowing errors of less than ±0.5% of the maximum output value to propagate.Open Acces

    Design and Implementation of Area and Power Efficient Low Power VLSI Circuits through Simple Byte Compression with Encoding Technique

    Get PDF
    Transition activity is one of the major factors for power dissipation in Low power VLSI circuits due to charging and discharging of internal node capacitances. Power dissipation is reduced through minimizing the transition activity by using proper coding techniques. In this paper Multi coding technique is implemented to reduce the transition activity up to 58.26%. Speed of data transmission basically depends on the number of bits transmitted through bus. When handling data for large applications huge storage space is required for processing, storing and transferring information. Data compression is an algorithm to reduce the number of bits required to represent information in a compact form. Here simple byte compression technique is implemented to achieve a lossless data compression. This compression algorithm also reduces the encoder computational complexity when handling huge bits of information. Simple byte compression technique improves the compression ratio up to 62.5%. As a cumulative effort of Simple byte compression with Multi coding techniques minimize area and power dissipation in low power VLSI circuits

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings
    corecore