Search CORE

1,122 research outputs found

Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

Author: Khawam Sami
Publication venue: The University of Edinburgh
Publication date: 01/01/2006
Field of study

Edinburgh Research Archive

Nanoelectronic Design Based on a CNT Nano-Architecture

Author: Bao Liu
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

IntechOpen

Domain specific high performance reconfigurable architecture for a communication platform

Author: Ahmed Imran
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective

Author: Abbott
Bartolozzi
Bartolozzi
Benjamin
Bi
Bohte
Detorakis
Faisal
Fouda
Fouda
Furber
Gerstner
Gerstner
Goldberg
Gütig
Hinton
Hopfield
Huayaney
Hyvärinen
Ielmini
Indiveri
Isomura
Jolivet
Katz
Kim
Levy
Li
Lillicrap
Maass
Mead
Merolla
Moreno-Bote
Naous
Neftci
Neftci
Neftci
Paille
Park
Pfister
Prezioso
Puglisi
Qiao
Querlioz
Savin
Saïghi
Schultz
Shouval
Tiago
Urbanczik
Wan
Wang
Williams
Woo
Xie
Yarom
Yu
Yu
Publication venue: eScholarship, University of California
Publication date: 04/09/2019
Field of study

On metrics of density and power efficiency, neuromorphic technologies have the potential to surpass mainstream computing technologies in tasks where real-time functionality, adaptability, and autonomy are essential. While algorithmic advances in neuromorphic computing are proceeding successfully, the potential of memristors to improve neuromorphic computing have not yet born fruit, primarily because they are often used as a drop-in replacement to conventional memory. However, interdisciplinary approaches anchored in machine learning theory suggest that multifactor plasticity rules matching neural and synaptic dynamics to the device capabilities can take better advantage of memristor dynamics and its stochasticity. Furthermore, such plasticity rules generally show much higher performance than that of classical Spike Time Dependent Plasticity (STDP) rules. This chapter reviews the recent development in learning with spiking neural network models and their possible implementation with memristor-based hardware

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Hybrid FPGA: Architecture and Interface

Author: Yu Chi Wai
Yu Chi Wai
Publication venue: Computing, Imperial College London
Publication date: 01/07/2010
Field of study

Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources with different granularities, together with domain-specific coarse-grained units. This thesis proposes a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture, we examine three aspects to optimise the speed and area for domain-specific applications. First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs, (2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of EBs, and (5) location of additional embedded elements such as memory. Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications. We then propose three routing optimisation methods to meet the additional routing demand introduced by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We study and compare the trade-offs in delay, area and routability of these three optimisation methods. Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors, multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking into account both architectural and system-level issues. Furthermore, we investigate the trade-offs between granularities and performance by composing small FPUs into a large FPU. The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements, by optimising for speed, area or a combination of speed and area

Spiral - Imperial College Digital Repository

Programmable architectures for the automated design of digital FIR filters using evolvable hardware

Author: Hounsell Benjamin Iain
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

On Finding a Defect-free Component in Nanoscale Crossbar Circuits

Author: Bhattacharya Bhargab B.
Kule Malay
Rahaman Hafizur
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/12/2015
Field of study

AbstractWe propose a technique for the analysis of manufacturing yield of nano-crossbar architectures for different values of defect percentage and crossbar-size. We provide an estimate of the minimum-size crossbar to be fabricated wherein a defect-free crossbar of a given size can always be found with a guaranteed yield. Our technique is based on logical merging of two defective rows (or two columns) that emulate a defect-free row (or column). Experimental results show that the proposed method provides higher defect-tolerance compared to that of previous techniques

Elsevier - Publisher Connector

Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

Author: Roelke George R., IV
Publication venue: AFIT Scholar
Publication date: 01/01/2006
Field of study

This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

AFTI Scholar (Air Force Institute of Technology)

CiteSeerX

Template-based embedded reconfigurable computing

Author: Leijten-Nowak K.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

XIV+212hlm.;24c

Repository TU/e

Pure OAI Repository

uilis.unsyiah.ac.id