104 research outputs found

    GROK-FPGA: Generating Real on-Chip Knowledge for FPGA Fine-Grain Delays Using Timing Extraction

    Get PDF
    Circuit variation is one of the biggest problems to overcome if Moore\u27s Law is to continue. It is no longer possible to maintain an abstraction of identical devices without huge yield losses, performance penalties, and energy costs. Current techniques such as margining and grade binning are used to deal with this problem. However, they tend to be conservative, offering limited solutions that will not scale as variation increases. Conventional circuits use limited tests and statistical models to determine the margining and binning required to counteract variation. If the limited tests fail, the whole chip is discarded. On the other hand, reconfigurable circuits, such as FPGAs, can use more fine-grained, aggressive techniques that carefully choose which resources to use in order to mitigate variation. Knowing which resources to use and avoid, however, requires measurement of underlying variation. We present Timing Extraction, a methodology that allows measurement of process variation without expensive testers nor highly invasive techniques, rather, relying only on resources already available on conventional FPGAs. It takes advantage of the fact that we can measure the delay of logic paths between any two registers. Measuring enough paths, provides the information necessary to decompose the delay of each path into individual components-essentially, forming a system of linear equations. Determining which paths to measure requires simple graph transformation algorithms applied to a representation of the FPGA circuit. Ultimately, this process decomposes the FPGA into individual components and identifies which paths to measure for computing the delay of individual components. We apply Timing Extraction to 18 commercially available Altera Cyclone III (65 nm) FPGAs. We measure 22×28 logic clusters and the interconnect within and between cluster. Timing Extraction decomposes this region into 1,356,182 components, classified into 10 categories, requiring 2,736,556 path measurements. With an accuracy of ±3.2 ps, our measurements reveal regional variation on the order of 50 ps, systematic variation from 30 ps to 70 ps, and random variation in the clusters with σ=15 ps and in the interconnect with σ=62 ps

    Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems

    Get PDF
    NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the design principles to efficiently implement interconnection networks in the resource-constrained on-chip setting have stabilized. On the other hand, the requirements on embedded system design are far from stabilizing. Embedded systems are composed by assembling together heterogeneous components featuring differentiated operating speeds and ad-hoc counter measures must be adopted to bridge frequency domains. Moreover, an unmistakable trend toward enhanced reconfigurability is clearly underway due to the increasing complexity of applications. At the same time, the technology effect is manyfold since it provides unprecedented levels of system integration but it also brings new severe constraints to the forefront: power budget restrictions, overheating concerns, circuit delay and power variability, permanent fault, increased probability of transient faults. Supporting different degrees of reconfigurability and flexibility in the parallel hardware platform cannot be however achieved with the incremental evolution of current design techniques, but requires a disruptive approach and a major increase in complexity. In addition, new reliability challenges cannot be solved by using traditional fault tolerance techniques alone but the reliability approach must be also part of the overall reconfiguration methodology. In this thesis we take on the challenge of engineering a NoC architectures for the next generation systems and we provide design methods able to overcome the conventional way of implementing multi-synchronous, reliable and reconfigurable NoC. Our analysis is not only limited to research novel approaches to the specific challenges of the NoC architecture but we also co-design the solutions in a single integrated framework. Interdependencies between different NoC features are detected ahead of time and we finally avoid the engineering of highly optimized solutions to specific problems that however coexist inefficiently together in the final NoC architecture. To conclude, a silicon implementation by means of a testchip tape-out and a prototype on a FPGA board validate the feasibility and effectivenes

    Demonstration of monolithically integrated graphene interconnects for low-power CMOS applications

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 129-141).In recent years, interconnects have become an increasingly difficult design challenge as their relative performance has not improved at the same pace with transistor scaling. The specifications for complex features, clock frequency, supply current, and number of I/O resources have added even greater demands for interconnect performance. Furthermore, the resistivity of copper begins to degrade at smaller line widths due to increased scattering effects. Graphene has gathered much interest as an interconnect material due to its high mobility, high current carrying capacity, and high thermal conductivity. DC characterization of sub-50 nm graphene interconnects has been reported but very few studies exist on evaluating their performance when integrated with CMOS. Integrating graphene with CMOS is a critical step in establishing a path for graphene electronics. In this thesis, we characterize the performance of integrated graphene interconnects and demonstrate two prototype CMOS chips. A 0.35 prm CMOS chip implements an array of transmitter/receivers to analyze end-to-end data communication on graphene wires. Graphene sheets are synthesized by chemical vapor deposition, which are then subsequently transferred and patterned into narrow wires up to 1 mm in length. A low-swing signaling technique is applied, which results in a transmitter energy of 0.3-0.7 pJ/bit/mm, and a total energy of 2.4-5.2 pJ/bit/mm. We demonstrate a minimum voltage swing of 100 mV and bit error rates below 2x10-10. Despite the high sheet resistivity of graphene, integrated graphene links run at speeds up to 50 Mbps. Finally, a subthreshold FPGA was implemented in 0.18 pm CMOS. We demonstrate reliable signal routing on 4-layer graphene wires which replaces parts of the interconnect fabric. The FPGA test chip includes a 5x5 logic array and a TDC-based tester to monitor the delay of graphene wires. The graphene wires have 2.8x lower capacitance than the reference metal wires, resulting in up to 2.11x faster speeds and 1.54x lower interconnect energy when driven by a low-swing voltage of 0.4 V. This work presents the first graphene-based system application and demonstrates the potential of using low capacitance graphene wires for ultra-low power electronics.by Kyeong-Jae Lee.Ph.D

    Methodology and Ecosystem for the Design of a Complex Network ASIC

    Full text link
    Performance of HPC systems has risen steadily. While the 10 Petaflop/s barrier has been breached in the year 2011 the next large step into the exascale era is expected sometime between the years 2018 and 2020. The EXTOLL project will be an integral part in this venture. Originally designed as a research project on FPGA basis it will make the transition to an ASIC to improve its already excelling performance even further. This transition poses many challenges that will be presented in this thesis. Nowadays, it is not enough to look only at single components in a system. EXTOLL is part of complex ecosystem which must be optimized overall since everything is tightly interwoven and disregarding some aspects can cause the whole system either to work with limited performance or even to fail. This thesis examines four different aspects in the design hierarchy and proposes efficient solutions or improvements for each of them. At first it takes a look at the design implementation and the differences between FPGA and ASIC design. It introduces a methodology to equip all on-chip memory with ECC logic automatically without the user’s input and in a transparent way so that the underlying code that uses the memory does not have to be changed. In the next step the floorplanning process is analyzed and an iterative solution is worked out based on physical and logical constraints of the EXTOLL design. Besides, a work flow for collaborative design is presented that allows multiple users to work on the design concurrently. The third part concentrates on the high-speed signal path from the chip to the connector and how it is affected by technological limitations. All constraints are analyzed and a package layout for the EXTOLL chip is proposed that is seen as the optimal solution. The last part develops a cost model for wafer and package level test and raises technological concerns that will affect the testing methodology. In order to run testing internally it proposes the development of a stand-alone test platform that is able to test packaged EXTOLL chips in every aspect

    High Performance Network Evaluation and Testing

    Get PDF

    A multi-layer approach to designing secure systems: from circuit to software

    Get PDF
    In the last few years, security has become one of the key challenges in computing systems. Failures in the secure operations of these systems have led to massive information leaks and cyber-attacks. Case in point, the identity leaks from Equifax in 2016, Spectre and Meltdown attacks to Intel and AMD processors in 2017, Cyber-attacks on Facebook in 2018. These recent attacks have shown that the intruders attack different layers of the systems, from low-level hardware to software as a service(SaaS). To protect the systems, the defense mechanisms should confront the attacks in the different layers of the systems. In this work, we propose four security mechanisms for computing systems: (i ) using backside imaging to detect Hardware Trojans (HTs) in Application Specific Integrated Circuits (ASICs) chips, (ii ) developing energy-efficient reconfigurable cryptographic engines, (iii) examining the feasibility of malware detection using Hardware Performance Counters (HPC). Most of the threat models assume that the root of trust is the hardware running beneath the software stack. However, attackers can insert malicious hardware blocks, i.e. HTs, into the Integrated Circuits (ICs) that provide back-doors to the attackers or leak confidential information. HTs inserted during fabrication are extremely hard to detect since their overheads in performance and power are below the variations in the performance and power caused by manufacturing. In our work, we have developed an optical method that identifies modified or replaced gates in the ICs. We use the near-infrared light to image the ICs because silicon is transparent to near-infrared light and metal reflects infrared light. We leverage the near-infrared imaging to identify the locations of each gate, based on the signatures of metal structures reflected by the lowest metal layer. By comparing the imaged results to the pre-fabrication design, we can identify any modifications, shifts or replacements in the circuits to detect HTs. With the trust of the silicon, the computing system must use secure communication channels for its applications. The low-energy cost devices, such as the Internet of Things (IoT), leverage strong cryptographic algorithms (e.g. AES, RSA, and SHA) during communications. The cryptographic operations cause the IoT devices a significant amount of power. As a result, the power budget limits their applications. To mitigate the high power consumption, modern processors embed these cryptographic operations into hardware primitives. This also improves system performance. The hardware unit embedded into the processor provides high energy-efficiency, low energy cost. However, hardware implementations limit flexibility. The longevity of theIoTs can exceed the lifetime of the cryptographic algorithms. The replacement of the IoT devices is costly and sometimes prohibitive, e.g., monitors in nuclear reactors.In order to reconfigure cryptographic algorithms into hardware, we have developed a system with a reconfigurable encryption engine on the Zedboard platform. The hardware implementation of the engine ensures fast, energy-efficient cryptographic operations. With reliable hardware and secure communication channels in place, the computing systems should detect any malicious behaviors in the processes. We have explored the use of the Hardware Performance Counters (HPCs) in malware detection. HPCs are hardware units that count micro-architectural events, such as cache hits/misses and floating point operations. Anti-virus software is commonly used to detect malware but it also introduces performance overhead. To reduce anti-virus performance overhead, many researchers propose to use HPCs with machine learning models in malware detection. However, it is counter-intuitive that the high-level program behaviors can manifest themselves in low-level statics. We perform experiments using 2 ∼ 3 × larger program counts than the previous works and perform a rigorous analysis to determine whether HPCs can be used to detect malware. Our results show that the False Discovery Rate of malware detection can reach 20%. If we deploy this detection system on a fresh installed Windows 7 systems, among 1,323 binaries, 198 binaries would be flagged as malware

    A Hardware Security Solution against Scan-Based Attacks

    Get PDF
    Scan based Design for Test (DfT) schemes have been widely used to achieve high fault coverage for integrated circuits. The scan technique provides full access to the internal nodes of the device-under-test to control them or observe their response to input test vectors. While such comprehensive access is highly desirable for testing, it is not acceptable for secure chips as it is subject to exploitation by various attacks. In this work, new methods are presented to protect the security of critical information against scan-based attacks. In the proposed methods, access to the circuit containing secret information via the scan chain has been severely limited in order to reduce the risk of a security breach. To ensure the testability of the circuit, a built-in self-test which utilizes an LFSR as the test pattern generator (TPG) is proposed. The proposed schemes can be used as a countermeasure against side channel attacks with a low area overhead as compared to the existing solutions in literature

    Signaling in 3-D integrated circuits, benefits and challenges

    Get PDF
    Three-dimensional (3-D) or vertical integration is a design and packaging paradigm that can mitigate many of the increasing challenges related to the design of modern integrated systems. 3-D circuits have recently been at the spotlight, since these circuits provide a potent approach to enhance the performance and integrate diverse functions within amulti-plane stack. Clock networks consume a great portion of the power dissipated in a circuit. Therefore, designing a low-power clock network in synchronous circuits is an important task. This requirement is stricter for 3-D circuits due to the increased power densities. Synchronization issues can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics. Consequently, designing low power clock networks for 3-D circuits is an important issue. Resonant clock networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes. In this research, a design method to apply resonant clocking to synthesized clock trees is proposed. Manufacturing processes for 3-D circuits include some additional steps as compared to standard CMOS processes which makes 3-D circuits more susceptible to manufacturing defects and lowers the overall yield of the bonded 3-D stack. Testing is another complicated task for 3-D ICs, where pre-bond test is a prerequisite. Pre-bond testability, in turn, presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. A design methodology of resonant 3-D clock networks that support wireless pre-bond testing is introduced. To efficiently address this issue, inductive links are exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. The inductors comprising the LC tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test. Recent FPGAs are quite complex circuits which provide reconfigurablity at the cost of lower performance and higher power consumption as compared to ASIC circuits. Exploiting a large number of programmable switches, routing structures are mainly responsible for performance degradation in FPAGs. Employing 3-D technology can providemore efficient switches which drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches change the properties of signal paths in RRAM-based FPGAs. The on resistance of RRAMswitches is considerably lower than CMOS pass gate switches which results in lower RC delay for RRAM-based routing paths. This different nature in critical path and signal delay in turn affect the need for intermediate buffers. Thus the buffer allocation should be reconsidered. In the last part of this research, the effect of intermediate buffers on signal propagation delay is studied and a modified buffer allocation scheme for RRAM-based FPGA routing path is proposed

    Novel Computational Methods for Integrated Circuit Reverse Engineering

    Get PDF
    Production of Integrated Circuits (ICs) has been largely strengthened by globalization. System-on-chip providers are capable of utilizing many different providers which can be responsible for a single task. This horizontal structure drastically improves to time-to-market and reduces manufacturing cost. However, untrust of oversea foundries threatens to dismantle the complex economic model currently in place. Many Intellectual Property (IP) consumers become concerned over what potentially malicious or unspecified logic might reside within their application. This logic which is inserted with the intention of causing harm to a consumer has been referred to as a Hardware Trojan (HT). To help IP consumers, researchers have looked into methods for finding HTs. Such methods tend to rely on high-level information relating to the circuit, which might not be accessible. There is a high possibility that IP is delivered in the gate or layout level. Some services and image processing methods can be leveraged to convert layout level information to gate-level, but such formats are incompatible with detection schemes that require hardware description language. By leveraging standard graph and dynamic programming algorithms a set of tools is developed that can help bridge the gap between gate-level netlist access and HT detection. To help in this endeavor this dissertation focuses on several problems associated with reverse engineering ICs. Logic signal identification is used to find malicious signals, and logic desynthesis is used to extract high level details. Each of the proposed method have their results analyzed for accuracy and runtime. It is found that method for finding logic tends to be the most difficult task, in part due to the degree of heuristic\u27s inaccuracy. With minor improvements moderate sized ICs could have their high-level function recovered within minutes, which would allow for a trained eye or automated methods to more easily detect discrepancies within a circuit\u27s design
    • …
    corecore