1,122 research outputs found
A Micro Power Hardware Fabric for Embedded Computing
Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor
Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective
On metrics of density and power efficiency, neuromorphic technologies have
the potential to surpass mainstream computing technologies in tasks where
real-time functionality, adaptability, and autonomy are essential. While
algorithmic advances in neuromorphic computing are proceeding successfully, the
potential of memristors to improve neuromorphic computing have not yet born
fruit, primarily because they are often used as a drop-in replacement to
conventional memory. However, interdisciplinary approaches anchored in machine
learning theory suggest that multifactor plasticity rules matching neural and
synaptic dynamics to the device capabilities can take better advantage of
memristor dynamics and its stochasticity. Furthermore, such plasticity rules
generally show much higher performance than that of classical Spike Time
Dependent Plasticity (STDP) rules. This chapter reviews the recent development
in learning with spiking neural network models and their possible
implementation with memristor-based hardware
Hybrid FPGA: Architecture and Interface
Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources
with different granularities, together with domain-specific coarse-grained units. This thesis proposes
a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to
improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture,
we examine three aspects to optimise the speed and area for domain-specific applications.
First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained
elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs,
(2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of
EBs, and (5) location of additional embedded elements such as memory.
Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity
EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications.
We then propose three routing optimisation methods to meet the additional routing demand introduced
by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on
EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We
study and compare the trade-offs in delay, area and routability of these three optimisation methods.
Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors,
multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed
point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs
in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking
into account both architectural and system-level issues. Furthermore, we investigate the trade-offs
between granularities and performance by composing small FPUs into a large FPU.
The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements,
by optimising for speed, area or a combination of speed and area
On Finding a Defect-free Component in Nanoscale Crossbar Circuits
AbstractWe propose a technique for the analysis of manufacturing yield of nano-crossbar architectures for different values of defect percentage and crossbar-size. We provide an estimate of the minimum-size crossbar to be fabricated wherein a defect-free crossbar of a given size can always be found with a guaranteed yield. Our technique is based on logical merging of two defective rows (or two columns) that emulate a defect-free row (or column). Experimental results show that the proposed method provides higher defect-tolerance compared to that of previous techniques
Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices
This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results
Template-based embedded reconfigurable computing
XIV+212hlm.;24c
- …