31 research outputs found

    Very-Large-Scale-Integration Circuit Techniques in Internet-of-Things Applications

    Get PDF
    Heading towards the era of Internet-of-things (IoT) means both opportunity and challenge for the circuit-design community. In a system where billions of devices are equipped with the ability to sense, compute, communicate with each other and perform tasks in a coordinated manner, security and power management are among the most critical challenges. Physically unclonable function (PUF) emerges as an important security primitive in hardware-security applications; it provides an object-specific physical identifier hidden within the intrinsic device variations, which is hard to expose and reproduce by adversaries. Yet, designing a compact PUF robust to noise, temperature and voltage remains a challenge. This thesis presents a novel PUF design approach based on a pair of ultra-compact analog circuits whose output is proportional to absolute temperature. The proposed approach is demonstrated through two works: (1) an ultra-compact and robust PUF based on voltage-compensated proportional-to-absolute-temperature voltage generators that occupies 8.3× less area than the previous work with the similar robustness and twice the robustness of the previously most compact PUF design and (2) a technique to transform a 6T-SRAM array into a robust analog PUF with minimal overhead. In this work, similar circuit topology is used to transform a preexisting on-chip SRAM into a PUF, which further reduces the area in (1) with no robustness penalty. In this thesis, we also explore techniques for power management circuit design. Energy harvesting is an essential functionality in an IoT sensor node, where battery replacement is cost-prohibitive or impractical. Yet, existing energy-harvesting power management units (EH PMU) suffer from efficiency loss in the two-step voltage conversion: harvester-to-battery and battery-to-load. We propose an EH PMU architecture with hybrid energy storage, where a capacitor is introduced in addition to the battery to serve as an intermediate energy buffer to minimize the battery involvement in the system energy flow. Test-case measurements show as much as a 2.2× improvement in the end-to-end energy efficiency. In contrast, with the drastically reduced power consumption of IoT nodes that operates in the sub-threshold regime, adaptive dynamic voltage scaling (DVS) for supply-voltage margin removal, fully on-chip integration and high power conversion efficiency (PCE) are required in PMU designs. We present a PMU–load co-design based on a fully integrated switched-capacitor DC-DC converter (SC-DC) and hybrid error/replica-based regulation for a fully digital PMU control. The PMU is integrated with a neural spike processor (NSP) that achieves a record-low power consumption of 0.61 ”W for 96 channels. A tunable replica circuit is added to assist the error regulation and prevent loss of regulation. With automatic energy-robustness co-optimization, the PMU can set the SC-DC’s optimal conversion ratio and switching frequency. The PMU achieves a PCE of 77.7% (72.2%) at VIN = 0.6 V (1 V) and at the NSP’s margin-free operating point

    Design of variation-tolerant synchronizers for multiple clock and voltage domains

    Get PDF
    PhD ThesisParametric variability increasingly affects the performance of electronic circuits as the fabrication technology has reached the level of 32nm and beyond. These parameters may include transistor Process parameters (such as threshold voltage), supply Voltage and Temperature (PVT), all of which could have a significant impact on the speed and power consumption of the circuit, particularly if the variations exceed the design margins. As systems are designed with more asynchronous protocols, there is a need for highly robust synchronizers and arbiters. These components are often used as interfaces between communication links of different timing domains as well as sampling devices for asynchronous inputs coming from external components. These applications have created a need for new robust designs of synchronizers and arbiters that can tolerate process, voltage and temperature variations. The aim of this study was to investigate how synchronizers and arbiters should be designed to tolerate parametric variations. All investigations focused mainly on circuit-level and transistor level designs and were modeled and simulated in the UMC90nm CMOS technology process. Analog simulations were used to measure timing parameters and power consumption along with a “Monte Carlo” statistical analysis to account for process variations. Two main components of synchronizers and arbiters were primarily investigated: flip-flop and mutual-exclusion element (MUTEX). Both components can violate the input timing conditions, setup and hold window times, which could cause metastability inside their bistable elements and possibly end in failures. The mean-time between failures is an important reliability feature of any synchronizer delay through the synchronizer. The MUTEX study focused on the classical circuit, in addition to a number of tolerance, based on increasing internal gain by adding current sources, reducing the capacitive loading, boosting the transconductance of the latch, compensating the existing Miller capacitance, and adding asymmetry to maneuver the metastable point. The results showed that some circuits had little or almost no improvements, while five techniques showed significant improvements by reducing τ and maintaining high tolerance. Three design approaches are proposed to provide variation-tolerant synchronizers. wagging synchronizer proposed to First, the is significantly increase reliability over that of the conventional two flip-flop synchronizer. The robustness of the wagging technique can be enhanced by using robust τ latches or adding one more cycle of synchronization. The second approach is the Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly detecting a metastable event and correcting it by enforcing the previously stored logic value. This technique significantly reduces the resolution time down from uncertain synchronization technique is proposed to transfer signals between Multiple- Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional level-shifters between the domains or multiple power supplies within each domain. This interface circuit uses a synchronous set and feedback reset protocol which provides level-shifting and synchronization of all signals between the domains, from a wide range of voltage-supplies and clock frequencies. Overall, synchronizer circuits can tolerate variations to a greater extent by employing the wagging technique or using a MADAC latch, while MUTEX tolerance can suffice with small circuit modifications. Communication between MVD/MCD can be achieved by an asynchronous handshake without a need for adding level-shifters.The Saudi Arabian Embassy in London, Umm Al-Qura University, Saudi Arabi

    Fast All-Digital Clock Frequency Adaptation Circuit for Voltage Droop Tolerance

    Get PDF
    International audienceNaive handling of supply voltage droops in synchronous circuits results in conservative bounds on clock speeds, resulting in poor performance even if droops are rare. Adaptive strategies detect such potentially hazardous events and either initiate a rollback to a previous state or proactively reduce clock speed in order to prevent timing violations. The performance of such solutions critically depends on a very fast response to droops. However, state-of-the-art solutions incur synchronization delay to avoid that the clock signal is affected by metastability. Addressing the challenges discussed by Keith Bowman in his ASYNC 2017 keynote talk, we present an all-digital circuit that can respond to droops within a fraction of a clock cycle. This is achieved by delaying clock signals based on measurement values while they undergo synchronization simultaneously. We verify our solution by formally proving correctness, complemented by VHDL and Spice simulations of a 65 nm ASIC design confirming the theoretically obtained results

    Threat Analysis, Countermeaures and Design Strategies for Secure Computation in Nanometer CMOS Regime

    Get PDF
    Advancements in CMOS technologies have led to an era of Internet Of Things (IOT), where the devices have the ability to communicate with each other apart from their computational power. As more and more sensitive data is processed by embedded devices, the trend towards lightweight and efficient cryptographic primitives has gained significant momentum. Achieving a perfect security in silicon is extremely difficult, as the traditional cryptographic implementations are vulnerable to various active and passive attacks. There is also a threat in the form of hardware Trojans inserted into the supply chain by the untrusted third-party manufacturers for economic incentives. Apart from the threats in various forms, some of the embedded security applications such as random number generators (RNGs) suffer from the impacts of process variations and noise in nanometer CMOS. Despite their disadvantages, the random and unique nature of process variations can be exploited for generating unique identifiers and can be of tremendous use in embedded security. In this dissertation, we explore techniques for precise fault-injection in cryptographic hardware based on voltage/temperature manipulation and hardware Trojan insertion. We demonstrate the effectiveness of these techniques by mounting fault attacks on state-of-the-art ciphers. Physically Unclonable Functions (PUFs) are novel cryptographic primitives for extracting secret keys from complex manufacturing variations in integrated circuits (ICs). We explore the vulnerabilities of some of the popular strong PUF architectures to modeling attacks using Machine Learning (ML) algorithms. The attacks use silicon data from a test chip manufactured in IBM 32nm silicon-on-insulator (SOI) technology. Attack results demonstrate that the majority of strong PUF architectures can be predicted to very high accuracies using limited training data. We also explore the techniques to exploit unreliable data from strong PUF architectures and effectively use them to improve the prediction accuracies of modeling attacks. Motivated by the vulnerabilities of existing PUF architectures, we present a novel modeling attack resistant PUF architecture based on non-linear computing elements. Post-silicon validation results are used to demonstrate the effectiveness of the non-linear PUF architecture against modeling and fault-injection attacks. Apart from the techniques to improve the security of PUF circuits, we also present novel solutions to improve the performance of PUF circuits from the perspectives of IC fabrication and system/protocol design. Finally, we present a statistical benchmark suite to evaluate PUFs in conceptualization phase and also to enable fine-grained security assessments for varying PUF parameters. Data compressibility analyses for validating the statistical benchmark suite are also presented

    Area- and Energy- Efficient Modular Circuit Architecture for 1,024-Channel Parallel Neural Recording Microsystem.

    Full text link
    This research focuses to develop system architectures and associated electronic circuits for a next generation neuroscience research tool, a massive-parallel neural recording system capable of recording 1,024 channels simultaneously. Three interdependent prototypes have been developed to address major challenges in realization of the massive-parallel neural recording microsystems: minimization of energy and area consumption while preserving high quality in recordings. First, a modular 128-channel Δ-ΔΣ AFE using the spectrum shaping has been designed and fabricated to propose an area-and energy efficient solution for neural recording AFEs. The AFE achieved 4.84 fJ/C−s·mm2 figure of merit that is the smallest the area-energy product among the state-of-the-art multichannel neural recording systems. It also features power and area consumption of 3.05 ”W and 0.05 mm2 per channel, respectively while exhibiting 63.3 dB signal-to-noise ratio with 3.02 ”Vrms input referred noise. Second, an on-chip mixed signal neural signal compressor was built to reduce the energy consumption in handling and transmission of the recorded data since this occupies a large portion of the total energy consumption as the number of parallel recording increases. The compressor reduces the data rates of two distinct groups of neural signals that are essential for neuroscience research: LFP and AP without loss of informative signals. As a result, the power consumptions for the data handling and transmissions of the LFP and AP were reduced to about 1/5.35 and 1/10.54 of the uncompressed cases, respectively. In the total data handling and transmission, the measured power consumption per channel is 11.98 ”W that is about 1/9 of 107.5 ”W without the compression. Third, a compact on-chip dc-to-dc converter with constant 1 MHz switching frequency has been developed to provide reliable power supplies and enhance energy delivery efficiency to the massive-parallel neural recording systems. The dc-to-dc converter has only predictable tones at the output and it exhibits > 80% power conversion efficiency at ultra-light loads, < 100 ”W that is relevant power most of the multi-channel neural recording systems consume. The dc-to-dc converter occupies 0.375 mm2 of area which is less than 1/20 of the area the first prototype consumes (8.64 mm2).PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133244/1/sungyun_1.pd

    Energy Efficient Hardware Design for Securing the Internet-of-Things

    Full text link
    The Internet of Things (IoT) is a rapidly growing field that holds potential to transform our everyday lives by placing tiny devices and sensors everywhere. The ubiquity and scale of IoT devices require them to be extremely energy efficient. Given the physical exposure to malicious agents, security is a critical challenge within the constrained resources. This dissertation presents energy-efficient hardware designs for IoT security. First, this dissertation presents a lightweight Advanced Encryption Standard (AES) accelerator design. By analyzing the algorithm, a novel method to manipulate two internal steps to eliminate storage registers and replace flip-flops with latches to save area is discovered. The proposed AES accelerator achieves state-of-art area and energy efficiency. Second, the inflexibility and high Non-Recurring Engineering (NRE) costs of Application-Specific-Integrated-Circuits (ASICs) motivate a more flexible solution. This dissertation presents a reconfigurable cryptographic processor, called Recryptor, which achieves performance and energy improvements for a wide range of security algorithms across public key/secret key cryptography and hash functions. The proposed design employs circuit techniques in-memory and near-memory computing and is more resilient to power analysis attack. In addition, a simulator for in-memory computation is proposed. It is of high cost to design and evaluate new-architecture like in-memory computing in Register-transfer level (RTL). A C-based simulator is designed to enable fast design space exploration and large workload simulations. Elliptic curve arithmetic and Galois counter mode are evaluated in this work. Lastly, an error resilient register circuit, called iRazor, is designed to tolerate unpredictable variations in manufacturing process operating temperature and voltage of VLSI systems. When integrated into an ARM processor, this adaptive approach outperforms competing industrial techniques such as frequency binning and canary circuits in performance and energy.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147546/1/zhyiqun_1.pd

    Générateur distribué d'horloge pour puces globalement et localement synchrones de grande taille

    Get PDF
    This thesis addresses the problem of global synchronization of large system on chip (SoC). It focuses on the study of an alternative clock generation technique to conventional clock distribution and asynchronous communication. It allows implementation of highly reliable synchronous circuit. My PhD project aims to study and implement a large network (10x10) of all digital phase-locked loop (ADPLL), containing 100 nodes generating a clock for each local digital circuitry. The prototype was implemented on silicon generating clocks in the range 903-1161 MHz. It highlights a maximum phase error of less than 40 ps between two clocks in any neighboring zones. Another important result is the analysis of phase error between two non-neighboring oscillators in distance. By studying an FPGA prototype of the network, we obtained that maximum phase error at steady state between any clock signal and the reference signal is less than three steps of the PFD quantification steps. In order to validate the performance of synchronization in ASIC, we designed an on-chip clocking error measurement circuit. This circuit has a low rate for the off-chip readout (several MHz), and a high resolution (+-2.5 ps). Reconfigurability is another attractive feature. We have explored this feature and proposed a novel topology with different configurations for nodes on the border and in the kernel of the network. This topology has an advantage in prohibiting phase error propagation and reflection.Cette thĂšse aborde le problĂšme de la synchronisation globale de grand systĂšme sur puce (SoC). Il est centrĂ© sur l'Ă©tude d'une technique de remplacement de la distribution d'horloge classique et d'une communication asynchrone. Il permet la mise en Ɠuvre de circuit synchrone trĂšs fiable. Mon projet de thĂšse vise Ă  Ă©tudier et mettre en Ɠuvre un vaste rĂ©seau (10x10) de boucle Ă  verrouillage de phase tous numĂ©rique (ADPLL), contenant 100 nƓuds gĂ©nĂ©rant une horloge pour chaque circuit numĂ©rique local. Le prototype a Ă©tĂ© rĂ©alisĂ© sur les horloges de gĂ©nĂ©ration de silicium dans la gamme de 903-1161 MHz. Elle met en Ă©vidence une erreur de phase maximale de moins de 40 ps entre deux horloges dans toutes les zones voisines. Un autre rĂ©sultat important est l'analyse de l'erreur de phase entre les deux oscillateurs non-voisins dans la distance. En Ă©tudiant un prototype FPGA du rĂ©seau, on a obtenu que l'erreur de phase maximale Ă  l'Ă©tat d'Ă©quilibre entre un signal d'horloge et le signal de rĂ©fĂ©rence est infĂ©rieur Ă  trois Ă©tapes des Ă©tapes de quantification PFD. Afin de valider les performances de la synchronisation dans ASIC, nous avons conçu un circuit d'une erreur de mesure sur la puce d'horloge. Ce circuit a un taux faible de la lecture hors puce (quelques MHz), et une rĂ©solution Ă©levĂ©e (+ -2,5 ps). Reconfiguration constitue une autre caractĂ©ristique intĂ©ressante. Nous avons explorĂ© cette fonction et a proposĂ© une nouvelle topologie avec des configurations diffĂ©rentes pour les nƓuds sur la frontiĂšre et dans le noyau du rĂ©seau. Cette topologie prĂ©sente un avantage en interdisant la propagation des erreurs de phase et de rĂ©flexion

    Online Timing Slack Measurement and its Application in Field-Programmable Gate Arrays

    Get PDF
    Reliability, power consumption and timing performance are key concerns for today's integrated circuits. Measurement techniques capable of quantifying the timing characteristics of a circuit, while it is operating, facilitate a range of benefits. Delay variation due to environmental and operational conditions, and degradation can be monitored by tracking changes in timing performance. Using the measurements in a closed-loop to control power supply voltage or clock frequency allows for the reduction of timing safety margins, leading to improvements in power consumption or throughput performance through the exploitation of better-than worst-case operation. This thesis describes a novel online timing slack measurement method which can directly measure the timing performance of a circuit, accurately and with minimal overhead. Enhancements allow for the improvement of absolute accuracy and resolution. A compilation flow is reported that can automatically instrument arbitrary circuits on FPGAs with the measurement circuitry. On its own this measurement method is able to track the "health" of an integrated circuit, from commissioning through its lifetime, warning of impending failure or instigating pre-emptive degradation mitigation techniques. The use of the measurement method in a closed-loop dynamic voltage and frequency scaling scheme has been demonstrated, achieving significant improvements in power consumption and throughput performance.Open Acces

    Realization and Formal Analysis of Asynchronous Pulse Communication Circuits

    Get PDF
    This work presents an approach to constructing asynchronous pulsed communication circuits. These circuits use small delay elements to introduce a gate level sense of time, removing the need for either a clock or handshaking signal to be part of a high-speed communication link. This construction method allows the creation of links with better than normal jitter tolerance, allowing for simple circuit architectures that can easily be made robust to radiation induced soft error. A 5Gbps radiation-hardened link, targeted at use in detector modules at the LHC, will be presented. This application presents a special challenge due to both very high radiation levels (1+MGy life time dose) and the demand for minimum resource (area, power, cable cost) use. The presented link, realized in 130nm technology, is unique in that it has low power (~50mW end to end) and very low area 0.12mm^2 including electrostatic discharge protection, and I/O amplifiers. Due to its asynchronous construction and the gate design style, the link has essentially zero power dissipation when idle, and enters and exits its idle state with no delay. In addition to the construction of the link, this presentation covers the design and analysis methodology that can be used to create other asynchronous communication circuits. The methodology achieves higher performance than conventional static technology but needs only a reasonable design effort using tools and strategies that are only mildly extended versions of those familiar to digital static designers. It is used to construct the serializer, deserializer, and self-test circuitry for the presented link. In this case, a 5Gbps SER/DES and a 2GHz parallel pseudo-random number generator are implemented in 130nm CMOS technology using a gate design style that does not dissipate static power

    Effect of Clock and Power Gating on Power Distribution Network Noise in 2D and 3D Integrated Circuits

    Get PDF
    In this work, power supply noise contribution, at a particular node on the power grid, from clock/power gated blocks is maximized at particular time and the synthetic gating patterns of the blocks that result in the maximum noise is obtained for the interval 0 to target time. We utilize wavelet based analysis as wavelets are a natural way of characterizing the time-frequency behavior of the power grid. The gating patterns for the blocks and the maximum supply noise at the Point of Interest at the specified target time obtained via a Linear Programming (LP) formulation (clock gating) and Genetic Algorithm based problem formulation (Power Gating)
    corecore