11 research outputs found

    Emerging physical unclonable functions with nanotechnology

    Get PDF
    Physical unclonable functions (PUFs) are increasingly used for authentication and identification applications as well as the cryptographic key generation. An important feature of a PUF is the reliance on minute random variations in the fabricated hardware to derive a trusted random key. Currently, most PUF designs focus on exploiting process variations intrinsic to the CMOS technology. In recent years, progress in emerging nanoelectronic devices has demonstrated an increase in variation as a consequence of scaling down to the nanoregion. To date, emerging PUFs with nanotechnology have not been fully established, but they are expected to emerge. Initial research in this area aims to provide security primitives for emerging integrated circuits with nanotechnology. In this paper, we review emerging nanotechnology-based PUFs

    Overcoming nanoscale variations through statistical error compensation

    Get PDF
    Increasingly severe parameter variations that are observed in advanced nanoscale technologies create great obstacles in designing high-performance, next-generation digital integrated circuits (ICs). Conventional design principles impose increased design margins in power supply, device sizing, and operating frequency, leading to overly conservative designs which prevent the realization of potential benefits from nanotechnology advances. In response, robust digital circuit design techniques have been developed to overcome processing non-idealities. Statistical error compensation (SEC) is a class of system-level, communication-inspired techniques for designing energy efficient and robust systems. In this thesis, stochastic sensor network on chip (SSNOC), a known SEC technique, is applied to a computational kernel implemented with carbon nanotube field-effect transistors (CNFETs). With the aid of a well developed CNFET delay distribution modeling method, circuit simulations show up to 90× improvement of the SSNOC-based design in the circuit yield over the conventional design. The results verify the robustness of an SEC-based design under CNFET-specific variations. The error resiliency of SEC allows CNFET circuits to operate with reduced design margins under relaxed processing requirements, while concurrently maintaining the desired application-level performance

    Appropriateness of Imperfect CNFET Based Circuits for Error Resilient Computing Systems

    Get PDF
    With superior device performance consistently reported in extremely scaled dimensions, low dimensional materials (LDMs), including Carbon Nanotube Field Effect Transistor (CNFET) based technology, have shown the potential to outperform silicon for future transistors in advanced technology nodes. Studies have also demonstrated orders of magnitude improvement in energy efficiency possible with LDMs, in comparison to silicon at competing technology nodes. However, the current fabrication processes for these materials suffer from process imperfections and still appear to be inadequate to compete with silicon for the mainstream high volume manufacturing. Among the LDMs, CNFETs are the most widely studied and closest to high volume manufacturing. Recent works have shown a significant increase in the complexity of CNFET based systems, including demonstration of a 16-bit microprocessor. However, the design of such systems has involved significantly wider-than-usual transistors and avoidance of certain logic combinations. The resulting complexity of several thousand transistors in such systems is still far from the requirements of high-performance general-purpose computing systems having billions of transistors. With the current progress of the process to fabricate CNFETs, their introduction in mainstream manufacturing is expected to take several more years. For an earlier technology adoption, CNFETs appear to be suited for error-resilient computing systems where errors during computation can be tolerated to a certain degree. Such systems relax the need for precise circuits and a perfect process while leveraging the potential energy benefits of CNFET technology in comparison to conventional Si technology. In this thesis, we explore the potential applications using an imperfect CNFET process for error-resilient computing systems, including the impact of the process imperfections at the system level and methods to improve it. The current most widely adopted fabrication process for CNFETs (separation and placement of solution-based CNTs) still suffers from process imperfections, mainly from open CNTs due to missing of CNTs (in trenches connecting source and drain of CNFET). A fair evaluation of the performance of CNFET based circuits should thus take into consideration the effect of open CNTs, resulting in reduced drive currents. At the circuit level, this leads to failures in meeting 1) the minimum frequency requirement (due to an increase in critical path delay), and 2) the noise suppression requirement. We present a methodology to accurately capture the effect of open CNT imperfection in the state-of-the-art CNFET model, for circuit-level performance evaluation (both delay and glitch vulnerability) of CNFET based circuits using SPICE. A Monte Carlo simulation framework is also provided to investigate the statistical effect of open CNT imperfection on circuit-level performance. We introduce essential metrics to evaluate glitch vulnerability and also provide an effective link between glitch vulnerability and circuit topology. The past few years have observed significant growth of interest in approximate computing for a wide range of applications, including signal processing, data mining, machine learning, image, video processing, etc. In such applications, the result quality is not compromised appreciably, even in the presence of few errors during computation. The ability to tolerate few errors during computation relaxes the need to have precise circuits. Thus the approximate circuits can be designed, with lesser nodes, reduced stages, and reduced capacitance at few nodes. Consequently, the approximate circuits could reduce critical path delays and enhanced noise suppression in comparison to precise circuits. We present a systematic methodology utilizing Reduced Ordered Binary Decision Diagrams (ROBDD) for generating approximate circuits by taking an example of 16-bit parallel prefix CNFET adder. The approximate adder generated using the proposed algorithm has ~ 5x reduction in the average number of nodes failing glitch criteria (along paths to primary output) and 43.4% lesser Energy Delay Product (EDP) even at high open CNT imperfection, in comparison to the ideal case of no open CNT imperfection, at a mean relative error of 3.3%. The recent boom of deep learning has been made possible by VLSI technology advancement resulting in hardware systems, which can support deep learning algorithms. These hardware systems intend to satisfy the high-energy efficiency requirement of such algorithms. The hardware supporting such algorithms adopts neuromorphic-computing architectures with significantly less energy compared to traditional Von Neumann architectures. Deep Neural Networks (DNNs) belonging to deep learning domain find its use in a wide range of applications such as image classification, speech recognition, etc. Recent hardware systems have demonstrated the implementation of complex neural networks at significantly less power. However, the complexity of applications and depths of DNNs are expected to drastically increase in the future, imposing a demanding requirement in terms of scalability and energy efficiency of hardware technology. CNFET technology can be an excellent alternative to meet the aggressive energy efficiency requirement for future DNNs. However, degradation in circuit-level performance due to open CNT imperfection can result in timing failure, thus distorting the shape of non-linear activation function, leading to a significant degradation in classification accuracy. We present a framework to obtain sigmoid activation function considering the effect of open CNT imperfection. A digital neuron is explored to generate the sigmoid activation function, which deviates from the ideal case under imperfect process and reduced time period (increased clock frequency). The inherent error resilience of DNNs, on the other hand, can be utilized to mitigate the impact of imperfect process and maintain the shape of the activation function. We use pruning of synaptic weights, which, combined with the proposed approximate neuron, significantly reduces the chance of timing failures and helps to maintain the activation function shape even at high process imperfection and higher clock frequencies. We also provide a framework to obtain classification accuracy of Deep Belief Networks (class of DNNs based on unsupervised learning) using the activation functions obtained from SPICE simulations. By using both approximate neurons and pruning of synaptic weights, we achieve excellent system accuracy (only < 0.5% accuracy drop) with 25% improvement in speed, significant EDP advantage (56.7% less) even at high process imperfection, in comparison to a base configuration of the precise neuron and no pruning with the ideal process, at no area penalty. In conclusion, this thesis provides directions for the potential applicability of CNFET based technology for error-resilient computing systems. For this purpose, we present methodologies, which provide approaches to assess and design CNFET based circuits, considering process imperfections. We accomplish a DBN framework for digit recognition, considering activation functions from SPICE simulations incorporating process imperfections. We demonstrate the effectiveness of using approximate neuron and synaptic weight pruning to mitigate the impact of high process imperfection on system accuracy

    Hardware security design from circuits to systems

    Get PDF
    The security of hardware implementations is of considerable importance, as even the most secure and carefully analyzed algorithms and protocols can be vulnerable in their hardware realization. For instance, numerous successful attacks have been presented against the Advanced Encryption Standard, which is approved for top secret information by the National Security Agency. There are numerous challenges for hardware security, ranging from critical power and resource constraints in sensor networks to scalability and automation for large Internet of Things (IoT) applications. The physically unclonable function (PUF) is a promising building block for hardware security, as it exposes a device-unique challenge-response behavior which depends on process variations in fabrication. It can be used in a variety of applications including random number generation, authentication, fingerprinting, and encryption. The primary concerns for PUF are reliability in presence of environmental variations, area and power overhead, and process-dependent randomness of the challenge-response behavior. Carbon nanotube field-effect transistors (CNFETs) have been shown to have excellent electrical and unique physical characteristics. They are a promising candidate to replace silicon transistors in future very large scale integration (VLSI) designs. We present the Carbon Nanotube PUF (CNPUF), which is the first PUF design that takes advantage of unique CNFET characteristics. CNPUF achieves higher reliability against environmental variations and increases the resistance against modeling attacks. Furthermore, CNPUF has a considerable power and energy reduction in comparison to previous ultra-low power PUF designs of 89.6% and 98%, respectively. Moreover, CNPUF allows a power-security tradeoff in an extended design, which can greatly increase the resilience against modeling attacks. Despite increasing focus on defenses against physical attacks, consistent security oriented design of embedded systems remains a challenge, as most formalizations and security models are concerned with isolated physical components or a high-level concept. Therefore, we build on existing work on hardware security and provide four contributions to system-oriented physical defense: (i) A system-level security model to overcome the chasm between secure components and requirements of high-level protocols; this enables synergy between component-oriented security formalizations and theoretically proven protocols. (ii) An analysis of current practices in PUF protocols using the proposed system-level security model; we identify significant issues and expose assumptions that require costly security techniques. (iii) A System-of-PUF (SoP) that utilizes the large PUF design-space to achieve security requirements with minimal resource utilization; SoP requires 64% less gate-equivalent units than recently published schemes. (iv) A multilevel authentication protocol based on SoP which is validated using our system-level security model and which overcomes current vulnerabilities. Furthermore, this protocol offers breach recognition and recovery. Unpredictability and reliability are core requirements of PUFs: unpredictability implies that an adversary cannot sufficiently predict future responses from previous observations. Reliability is important as it increases the reproducibility of PUF responses and hence allows validation of expected responses. However, advanced machine-learning algorithms have been shown to be a significant threat to the practical validity of PUFs, as they can accurately model PUF behavior. The most effective technique was shown to be the XOR-based combination of multiple PUFs, but as this approach drastically reduces reliability, it does not scale well against software-based machine-learning attacks. We analyze threats to PUF security and propose PolyPUF, a scalable and secure architecture to introduce polymorphic PUF behavior. This architecture significantly increases model-building resistivity while maintaining reliability. An extensive experimental evaluation and comparison demonstrate that the PolyPUF architecture can secure various PUF configurations and is the only evaluated approach to withstand highly complex neural network machine-learning attacks. Furthermore, we show that PolyPUF consumes less energy and has less implementation overhead in comparison to lightweight reference architectures. Emerging technologies such as the Internet of Things (IoT) heavily rely on hardware security for data and privacy protection. The outsourcing of integrated circuit (IC) fabrication introduces diverse threat vectors with different characteristics, such that the security of each device has unique focal points. Hardware Trojan horses (HTH) are a significant threat for IoT devices as they process security critical information with limited resources. HTH for information leakage are particularly difficult to detect as they have minimal footprint. Moreover, constantly increasing integration complexity requires automatic synthesis to maintain the pace of innovation. We introduce the first high-level synthesis (HLS) flow that produces a threat-targeted and security enhanced hardware design to prevent HTH injection by a malicious foundry. Through analysis of entropy loss and criticality decay, the presented algorithms implement highly resource-efficient targeted information dispersion. An obfuscation flow is introduced to camouflage the effects of dispersion and reduce the effectiveness of reverse engineering. A new metric for the combined security of the device is proposed, and dispersion and obfuscation are co-optimized to target user-supplied threat parameters under resource constraints. The flow is evaluated on existing HLS benchmarks and a new IoT-specific benchmark, and shows significant resource savings as well as adaptability. The IoT and cloud computing rely on strong confidence in security of confidential or highly privacy sensitive data. As (differential) power attacks can take advantage of side-channel leakage to expose device-internal secrets, side-channel leakage is a major concern with ongoing research focus. However, countermeasures typically require expert-level security knowledge for efficient application, which limits adaptation in the highly competitive and time-constrained IoT field. We address this need by presenting the first HLS flow with primary focus on side-channel leakage reduction. Minimal security annotation to the high-level C-code is sufficient to perform automatic analysis of security critical operations with corresponding insertion of countermeasures. Additionally, imbalanced branches are detected and corrected. For practicality, the flow can meet both resource and information leakage constraints. The presented flow is extensively evaluated on established HLS benchmarks and a general IoT benchmark. Under identical resource constraints, leakage is reduced between 32% and 72% compared to the baseline. Under leakage target, the constraints are achieved with 31% to 81% less resource overhead

    Design and Simulation of an Efficient Quaternary Full-Adder Based on Carbon Nanotube Field Effect Transistor

    Get PDF
    The essential reason for implementing multilevel processing systems is to reduce the number of semiconductor elements and hence the complexity of system. Multilevel processing systems are realized much easier by carbon nanotube field effect transistors (CNTFET) than MOSFET transistors due to the CNTFET transistors' adjustable threshold voltage capabilities. In this paper, an efficient quaternary full-adder based on CNTFET technology is presented which consists of two half adder blocks, a quaternary decoder and a carry generator circuit. In the proposed architecture, the base-two and base-four circuit design techniques are combined to take the full advantages of both techniques namely simple implementation and low chip area occupation of the entire proposed quaternary full-adder. The proposed structure is evaluated using the Stanford 32nm CNTFET library in HSPICE software. The simulation results for the proposed full-adder structure utilizing a supply voltage of 0.9 volts, reveals the power consumption, propagation delay and energy index equal to 2.67 μW, 40 ps, and 10.68 aJ, respectively

    Multiple-Independent-Gate Field-Effect Transistors for High Computational Density and Low Power Consumption

    Get PDF
    Transistors are the fundamental elements in Integrated Circuits (IC). The development of transistors significantly improves the circuit performance. Numerous technology innovations have been adopted to maintain the continuous scaling down of transistors. With all these innovations and efforts, the transistor size is approaching the natural limitations of materials in the near future. The circuits are expected to compute in a more efficient way. From this perspective, new device concepts are desirable to exploit additional functionality. On the other hand, with the continuously increased device density on the chips, reducing the power consumption has become a key concern in IC design. To overcome the limitations of Complementary Metal-Oxide-Semiconductor (CMOS) technology in computing efficiency and power reduction, this thesis introduces the multiple- independent-gate Field-Effect Transistors (FETs) with silicon nanowires and FinFET structures. The device not only has the capability of polarity control, but also provides dual-threshold- voltage and steep-subthreshold-slope operations for power reduction in circuit design. By independently modulating the Schottky junctions between metallic source/drain and semiconductor channel, the dual-threshold-voltage characteristics with controllable polarity are achieved in a single device. This property is demonstrated in both experiments and simulations. Thanks to the compact implementation of logic functions, circuit-level benchmarking shows promising performance with a configurable dual-threshold-voltage physical design, which is suitable for low-power applications. This thesis also experimentally demonstrates the steep-subthreshold-slope operation in the multiple-independent-gate FETs. Based on a positive feedback induced by weak impact ionization, the measured characteristics of the device achieve a steep subthreshold slope of 6 mV/dec over 5 decades of current. High Ion/Ioff ratio and low leakage current are also simultaneously obtained with a good reliability. Based on a physical analysis of the device operation, feasible improvements are suggested to further enhance the performance. A physics-based surface potential and drain current model is also derived for the polarity-controllable Silicon Nanowire FETs (SiNWFETs). By solving the carrier transport at Schottky junctions and in the channel, the core model captures the operation with independent gate control. It can serve as the core framework for developing a complete compact model by integrating advanced physical effects. To summarize, multiple-independent-gate SiNWFETs and FinFETs are extensively studied in terms of fabrication, modeling, and simulation. The proposed device concept expands the family of polarity-controllable FETs. In addition to the enhanced logic functionality, the polarity-controllable SiNWFETs and FinFETs with the dual-threshold-voltage and steep-subthreshold-slope operation can be promising candidates for future IC design towards low-power applications

    Designing energy-efficient computing systems using equalization and machine learning

    Full text link
    As technology scaling slows down in the nanometer CMOS regime and mobile computing becomes more ubiquitous, designing energy-efficient hardware for mobile systems is becoming increasingly critical and challenging. Although various approaches like near-threshold computing (NTC), aggressive voltage scaling with shadow latches, etc. have been proposed to get the most out of limited battery life, there is still no “silver bullet” to increasing power-performance demands of the mobile systems. Moreover, given that a mobile system could operate in a variety of environmental conditions, like different temperatures, have varying performance requirements, etc., there is a growing need for designing tunable/reconfigurable systems in order to achieve energy-efficient operation. In this work we propose to address the energy- efficiency problem of mobile systems using two different approaches: circuit tunability and distributed adaptive algorithms. Inspired by the communication systems, we developed feedback equalization based digital logic that changes the threshold of its gates based on the input pattern. We showed that feedback equalization in static complementary CMOS logic enabled up to 20% reduction in energy dissipation while maintaining the performance metrics. We also achieved 30% reduction in energy dissipation for pass-transistor digital logic (PTL) with equalization while maintaining performance. In addition, we proposed a mechanism that leverages feedback equalization techniques to achieve near optimal operation of static complementary CMOS logic blocks over the entire voltage range from near threshold supply voltage to nominal supply voltage. Using energy-delay product (EDP) as a metric we analyzed the use of the feedback equalizer as part of various sequential computational blocks. Our analysis shows that for near-threshold voltage operation, when equalization was used, we can improve the operating frequency by up to 30%, while the energy increase was less than 15%, with an overall EDP reduction of ≈10%. We also observe an EDP reduction of close to 5% across entire above-threshold voltage range. On the distributed adaptive algorithm front, we explored energy-efficient hardware implementation of machine learning algorithms. We proposed an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification hardness across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈100× more energy efficient but has ≈1% higher error rate than a complex radial basis function classifier and is ≈10× less energy efficient but has ≈40% lower error rate than a simple linear classifier across a wide range of classification data sets. We also developed a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) under tight energy budgets. The FoG architecture takes advantage of the fact that in random forests a small portion of the weak classifiers (decision trees) might be sufficient to achieve high statistical performance. By dividing the random forest into smaller forests (Groves), and conditionally executing the rest of the forest, FoG is able to achieve much higher energy efficiency levels for comparable error rates. We also take advantage of the distributed nature of the FoG to achieve high level of parallelism. Our evaluation shows that at maximum achievable accuracies FoG consumes ≈1.48×, ≈24×, ≈2.5×, and ≈34.7× lower energy per classification compared to conventional RF, SVM-RBF , Multi-Layer Perceptron Network (MLP), and CNN, respectively. FoG is 6.5× less energy efficient than SVM-LR, but achieves 18% higher accuracy on average across all considered datasets

    Design of robust ultra-low power platform for in-silicon machine learning

    Get PDF
    The rapid development of machine learning plays a key role in enabling next generation computing systems with enhanced intelligence. Present day machine learning systems adopt an "intelligence in the cloud" paradigm, resulting in heavy energy cost despite state-of-the-art performance. It is therefore of great interest to design embedded ultra-low power (ULP) platforms with in-silicon machine learning capability. A self-contained ULP platform consists of the energy delivery, sensing and information processing subsystems. This dissertation proposes techniques to design and optimize the ULP platform for in-silicon machine learning by exploring a trade-off that exists between energy-efficiency and robustness. This trade-off arises when the information processing functionality is integrated into the energy delivery, sensing, or emerging stochastic fabrics (e.g., CMOS operating in near-threshold voltage or voltage overscaling, and beyond CMOS devices). This dissertation presents the Compute VRM (C-VRM) to embed the information processing into the energy delivery subsystem. The C-VRM employs multiple voltage domain stacking and core swapping to achieve high total system energy efficiency in near/sub-threshold region. A prototype IC of the C-VRM is implemented in a 1.2 V, 130 nm CMOS process. Measured results indicate that the C-VRM has up to 44.8% savings in system-level energy per operation compared to the conventional system, and an efficiency ranging from 79% to 83% over an output voltage range of 0.52 V to 0.6 V. This dissertation further proposes the Compute Sensor approach to embed information processing into the sensing subsystem. The Compute Sensor eliminates both the traditional sensor-processor interface, and the high-SNR/high-energy digital processing by moving feature extraction and classification functions into the analog domain. Simulation results in 65 nm CMOS show that the proposed Compute Sensor can achieve a detection accuracy greater than 94.7% using the Caltech101 dataset, which is within 0.5% of that achieved by an ideal digital implementation. The performance is achieved with 7x to 17x lower energy than the conventional architecture for the same level of accuracy. To further explore the energy-efficiency vs. robustness trade-off, this dissertation explores the use of highly energy efficient but unreliable stochastic fabrics to implement in-silicon machine learning kernels. In order to perform reliable computation on the stochastic fabrics, this dissertation proposes to employ statistical error compensation (SEC) as an effective error compensation technique. This dissertation makes a contribution to the portfolio of SEC by proposing embedded algorithmic noise tolerance (E-ANT) for low overhead error compensation. E-ANT operates by reusing part of the main block as estimator and thus embedding the estimator into the main block. System level simulation results in a commercial 45 nm CMOS process show that E-ANT achieves up to 38% error tolerance and up to 51% energy savings compared with an uncompensated system. This dissertation makes a contribution to the theoretical understanding of stochastic fabrics by proposing a class of probabilistic error models that can accurately model the hardware errors on the stochastic fabrics. The models are validated in a commercial 45 nm CMOS process and employed to evaluate the performance of machine learning kernels in the presence of hardware errors. Performance prediction of a support vector machine (SVM) based classifier using these models indicates that the probability of detection P_{det} estimated using the proposed model is within 3% for timing errors due to voltage overscaling when the error rate p_η ≤ 80%, within 5% for timing errors due to process variation in near threshold-voltage (NTV) region (0.3 V-0.7 V) and within 2% for defect errors when the defect rate p_{saf} is between 10^{-3} and 20%, compared with HDL simulation results. Employing the proposed error model and evaluation methodology, this dissertation explores the use of distributed machine learning architectures, named classifier ensemble, to enhance the robustness of in-silicon machine learning kernels. Comparative study of distributed architectures (i.e., random forest (RF)) and centralized architectures (i.e., SVM) is performed in a commercial 45 nm CMOS process. Employing the UCI machine learning repository as input, it is determined that RF-based architectures are significantly more robust than SVM architectures in presence of timing errors in the NTV region (0.3 V- 0.7 V). Additionally, an error weighted voting technique that incorporates the timing error statistics of the NTV circuit fabric is proposed to further enhance the robustness of RF architectures. Simulation results confirm that the error weighted voting technique achieves a P_{det} that varies by only 1.4%, which is 12x lower compared to centralized architectures

    Reliability-aware memory design using advanced reconfiguration mechanisms

    Get PDF
    Fast and Complex Data Memory systems has become a necessity in modern computational units in today's integrated circuits. These memory systems are integrated in form of large embedded memory for data manipulation and storage. This goal has been achieved by the aggressive scaling of transistor dimensions to few nanometer (nm) sizes, though; such a progress comes with a drawback, making it critical to obtain high yields of the chips. Process variability, due to manufacturing imperfections, along with temporal aging, mainly induced by higher electric fields and temperature, are two of the more significant threats that can no longer be ignored in nano-scale embedded memory circuits, and can have high impact on their robustness. Static Random Access Memory (SRAM) is one of the most used embedded memories; generally implemented with the smallest device dimensions and therefore its robustness can be highly important in nanometer domain design paradigm. Their reliable operation needs to be considered and achieved both in cell and also in architectural SRAM array design. Recently, and with the approach to near/below 10nm design generations, novel non-FET devices such as Memristors are attracting high attention as a possible candidate to replace the conventional memory technologies. In spite of their favorable characteristics such as being low power and highly scalable, they also suffer with reliability challenges, such as process variability and endurance degradation, which needs to be mitigated at device and architectural level. This thesis work tackles such problem of reliability concerns in memories by utilizing advanced reconfiguration techniques. In both SRAM arrays and Memristive crossbar memories novel reconfiguration strategies are considered and analyzed, which can extend the memory lifetime. These techniques include monitoring circuits to check the reliability status of the memory units, and architectural implementations in order to reconfigure the memory system to a more reliable configuration before a fail happens.Actualmente, el diseño de sistemas de memoria en circuitos integrados busca continuamente que sean más rápidos y complejos, lo cual se ha vuelto de gran necesidad para las unidades de computación modernas. Estos sistemas de memoria están integrados en forma de memoria embebida para una mejor manipulación de los datos y de su almacenamiento. Dicho objetivo ha sido conseguido gracias al agresivo escalado de las dimensiones del transistor, el cual está llegando a las dimensiones nanométricas. Ahora bien, tal progreso ha conllevado el inconveniente de una menor fiabilidad, dado que ha sido altamente difícil obtener elevados rendimientos de los chips. La variabilidad de proceso - debido a las imperfecciones de fabricación - junto con la degradación de los dispositivos - principalmente inducido por el elevado campo eléctrico y altas temperaturas - son dos de las más relevantes amenazas que no pueden ni deben ser ignoradas por más tiempo en los circuitos embebidos de memoria, echo que puede tener un elevado impacto en su robusteza final. Static Random Access Memory (SRAM) es una de las celdas de memoria más utilizadas en la actualidad. Generalmente, estas celdas son implementadas con las menores dimensiones de dispositivos, lo que conlleva que el estudio de su robusteza es de gran relevancia en el actual paradigma de diseño en el rango nanométrico. La fiabilidad de sus operaciones necesita ser considerada y conseguida tanto a nivel de celda de memoria como en el diseño de arquitecturas complejas basadas en celdas de memoria SRAM. Actualmente, con el diseño de sistemas basados en dispositivos de 10nm, dispositivos nuevos no-FET tales como los memristores están atrayendo una elevada atención como posibles candidatos para reemplazar las actuales tecnologías de memorias convencionales. A pesar de sus características favorables, tales como el bajo consumo como la alta escabilidad, ellos también padecen de relevantes retos de fiabilidad, como son la variabilidad de proceso y la degradación de la resistencia, la cual necesita ser mitigada tanto a nivel de dispositivo como a nivel arquitectural. Con todo esto, esta tesis doctoral afronta tales problemas de fiabilidad en memorias mediante la utilización de técnicas de reconfiguración avanzada. La consideración de nuevas estrategias de reconfiguración han resultado ser validas tanto para las memorias basadas en celdas SRAM como en `memristive crossbar¿, donde se ha observado una mejora significativa del tiempo de vida en ambos casos. Estas técnicas incluyen circuitos de monitorización para comprobar la fiabilidad de las unidades de memoria, y la implementación arquitectural con el objetivo de reconfigurar los sistemas de memoria hacia una configuración mucho más fiables antes de que el fallo suced

    Heterogeneous Reconfigurable Fabrics for In-circuit Training and Evaluation of Neuromorphic Architectures

    Get PDF
    A heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over fabrics having LUTs constructed with either individual technology alone. Herein, a hierarchical top-down design approach is used to develop the HSCFPGA starting from the configurable logic block (CLB) and slice structures down to LUT circuits and the corresponding device fabrication paradigms. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HSC-FPGA enables new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. HSC-FPGA offers a platform for fine-grained Logic-In-Memory architectures and runtime adaptive hardware. An orthogonal dimension of fabric heterogeneity is also non-determinism enabled by either low-voltage CMOS or probabilistic emerging devices. It can be realized using probabilistic devices within a reconfigurable network to blend deterministic and probabilistic computational models. Herein, consider the probabilistic spin logic p-bit device as a fabric element comprising a crossbar-structured weighted array. The Programmability of the resistive network interconnecting p-bit devices can be achieved by modifying the resistive states of the array\u27s weighted connections. Thus, the programmable weighted array forms a CLB-scale macro co-processing element with bitstream programmability. This allows field programmability for a wide range of classification problems and recognition tasks to allow fluid mappings of probabilistic and deterministic computing approaches. In particular, a Deep Belief Network (DBN) is implemented in the field using recurrent layers of co-processing elements to form an n x m1 x m2 x ::: x mi weighted array as a configurable hardware circuit with an n-input layer followed by i ≥ 1 hidden layers. As neuromorphic architectures using post-CMOS devices increase in capability and network size, the utility and benefits of reconfigurable fabrics of neuromorphic modules can be anticipated to continue to accelerate
    corecore