258 research outputs found
H-SIMD machine : configurable parallel computing for data-intensive applications
This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations.
The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering.
In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed
Network Interface Design for Network-on-Chip
In the culture of globalized integrated circuit (IC, a.k.a chip) production, the use of Intellectual Property (IP) cores, computer aided design tools (CAD) and testing services from un-trusted vendors are prevalent to reduce the time to market. Unfortunately, the globalized business model potentially creates opportunities for hardware tampering and modification from adversary, and this tampering is known as hardware Trojan (HT). Network-on-chip (NoC) has emerged as an efficient on-chip communication infrastructure. In this work, the security aspects of NoC network interface (NI), one of the most critical components in NoC will be investigated and presented. Particularly, the NI design, hardware attack models and countermeasures for NI in a NoC system are explored.
An OCP compatible NI is implemented in an IBM0.18ìm CMOS technology. The synthesis results are presented and compared with existing literature. Second, comprehensive hardware attack models targeted for NI are presented from system level to circuit level. The impact of hardware Trojans on NoC functionality and performance are evaluated. Finally, a countermeasure method is proposed to address the hardware attacks in NIs
Null Convention Logic applications of asynchronous design in nanotechnology and cryptographic security
This dissertation presents two Null Convention Logic (NCL) applications of asynchronous logic circuit design in nanotechnology and cryptographic security. The first application is the Asynchronous Nanowire Reconfigurable Crossbar Architecture (ANRCA); the second one is an asynchronous S-Box design for cryptographic system against Side-Channel Attacks (SCA). The following are the contributions of the first application: 1) Proposed a diode- and resistor-based ANRCA (DR-ANRCA). Three configurable logic block (CLB) structures were designed to efficiently reconfigure a given DR-PGMB as one of the 27 arbitrary NCL threshold gates. A hierarchical architecture was also proposed to implement the higher level logic that requires a large number of DR-PGMBs, such as multiple-bit NCL registers. 2) Proposed a memristor look-up-table based ANRCA (MLUT-ANRCA). An equivalent circuit simulation model has been presented in VHDL and simulated in Quartus II. Meanwhile, the comparison between these two ANRCAs have been analyzed numerically. 3) Presented the defect-tolerance and repair strategies for both DR-ANRCA and MLUT-ANRCA. The following are the contributions of the second application: 1) Designed an NCL based S-Box for Advanced Encryption Standard (AES). Functional verification has been done using Modelsim and Field-Programmable Gate Array (FPGA). 2) Implemented two different power analysis attacks on both NCL S-Box and conventional synchronous S-Box. 3) Developed a novel approach based on stochastic logics to enhance the resistance against DPA and CPA attacks. The functionality of the proposed design has been verified using an 8-bit AES S-box design. The effects of decision weight, bitstream length, and input repetition times on error rates have been also studied. Experimental results shows that the proposed approach enhances the resistance to against the CPA attack by successfully protecting the hidden key --Abstract, page iii
inSense: A Variation and Fault Tolerant Architecture for Nanoscale Devices
Transistor technology scaling has been the driving force in improving the size, speed, and power consumption of digital systems. As devices approach atomic size, however, their reliability and performance are increasingly compromised due to reduced noise margins, difficulties in fabrication, and emergent nano-scale phenomena. Scaled CMOS devices, in particular, suffer from process variations such as random dopant fluctuation (RDF) and line edge roughness (LER), transistor degradation mechanisms such as negative-bias temperature instability (NBTI) and hot-carrier injection (HCI), and increased sensitivity to single event upsets (SEUs). Consequently, future devices may exhibit reduced performance, diminished lifetimes, and poor reliability.
This research proposes a variation and fault tolerant architecture, the inSense architecture, as a circuit-level solution to the problems induced by the aforementioned phenomena. The inSense architecture entails augmenting circuits with introspective and sensory capabilities which are able to dynamically detect and compensate for process variations, transistor degradation, and soft errors. This approach creates ``smart\u27\u27 circuits able to function despite the use of unreliable devices and is applicable to current CMOS technology as well as next-generation devices using new materials and structures. Furthermore, this work presents an automated prototype implementation of the inSense architecture targeted to CMOS devices and is evaluated via implementation in ISCAS \u2785 benchmark circuits. The automated prototype implementation is functionally verified and characterized: it is found that error detection capability (with error windows from 30-400ps) can be added for less than 2\% area overhead for circuits of non-trivial complexity. Single event transient (SET) detection capability (configurable with target set-points) is found to be functional, although it generally tracks the standard DMR implementation with respect to overheads
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis
Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions
ReDO: Cross-Layer Multi-Objective Design-Exploration Framework for Efficient Soft Error Resilient Systems
Designing soft errors resilient systems is a complex engineering task, which nowadays follows a cross-layer approach. It requires a careful planning for different fault-tolerance mechanisms at different system's layers: starting from the technology up to the software domain. While these design decisions have a positive effect on the reliability of the system, they usually have a detrimental effect on its size, power consumption, performance and cost. Design space exploration for cross-layer reliability is therefore a multi-objective search problem in which reliability must be traded-off with other design dimensions. This paper proposes a cross-layer multi-objective design space exploration algorithm developed to help designers when building soft error resilient electronic systems. The algorithm exploits a system-level Bayesian reliability estimation model to analyze the effect of different cross-layer combinations of protection mechanisms on the reliability of the full system. A new heuristic based on the extremal optimization theory is used to efficiently explore the design space. An extended set of simulations shows the capability of this framework when applied both to benchmark applications and realistic systems, providing optimized systems that outperform those obtained by applying state-of-the-art cross-layer reliability techniques
Recommended from our members
Preventing Code Reuse Attacks On Modern Operating Systems
Modern operating systems are often the target of attacks that exploit vulnerabilities to escalate their privilege level. Recently introduced hardening features prevent attackers from using traditional kernel exploitation methodologies and force them to employ techniques that were originally designed for user space exploitation —such as code reuse— to execute arbitrary code with elevated privileges. In this dissertation, we present novel protection mechanisms that render such methodologies ineffective and improve the security of today’s operating systems. Specifically, we present solutions that prevent the leakage and corruption of kernel code pointers without employing entities that execute on super-privileged mode (e.g., hypervisors). The leakage of code pointers is an essential step for the construction of reliable code reuse exploits and their corruption is typically necessary for mounting the attack. More concretely, we present the design and implementation of two systems: kR^X and kSplitStack.
kR^X is a system that diversifies the code layout to thwart attackers from constructing code reuse exploits statically. It also prevents the leakage of return addresses through XOR- based encryption or by hiding them among decoys (fake pointers to instructions that trap the kernel when executed). Finally, it couples the above with a self-protection mechanism that prevents attackers from leaking the diversified code layout, either by instrumenting every memory read instruction with range checks on x86-64 systems or by imposing limits through the segmentation unit on x86 systems. Evaluation results show that it imposes small runtime overhead on real-world applications when measured on legacy x86-64 systems (~3.63%) and significantly lower on x86 systems (~1.32%) and newer x86-64 CPUs that provide hardware assistance (~2.32%).
kSplitStack, on the other hand, provides stronger protection against leaks of return addresses and guarantees both their secrecy and their integrity by augmenting the isolation mechanism of kR^X on x86-64 systems. This is achieved through a split stack scheme: functions use an unprotected stack for their local variables but switch to a protected one when pushing or poping return addresses. Moreover, kSplitStack protects the secrecy and integrity of control data (e.g., the value of the instruction pointer) in interrupt contexts by redirecting them to protected stacks, thus thwarting attackers from leaking or corrupting code pointers by inducing interrupts or other hardware events. Finally, the evaluation of kSplitStack shows that it imposes a small runtime overhead, comparable to the one of kR^X, both on legacy x86-64 systems (~3.66%) and on newer CPUs with hardware assistance (~2.50%)
The detector control system of the ATLAS experiment
The ATLAS experiment is one of the experiments at the Large Hadron Collider, constructed to study elementary particle interactions in collisions of high-energy proton beams. The individual detector components as well as the common experimental infrastructure are supervised by the Detector Control System (DCS). The DCS enables equipment supervision using operator commands, reads, processes and archives the operational parameters of the detector, allows for error recognition and handling, manages the communication with external control systems, and provides a synchronization mechanism with the physics data acquisition system. Given the enormous size and complexity of ATLAS, special emphasis was put on the use of standardized hardware and software components enabling efficient development and long-term maintainability of the DCS over the lifetime of the experiment. Currently, the DCS is being used successfully during the experiment commissioning phase
Verification of the StarT-voyager bus interface units
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 64).by Wing-Chung Ho.M.S
- …