858 research outputs found
Recommended from our members
Investigation into the wafer-scale integration of fine-grain parallel processing computer systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis investigates the potential of wafer-scale integration (WSI) for the implementation of low-cost fine-grain parallel processing computer systems. As WSI is a relatively new subject, there was little work on which to base investigations. Indeed, most WSI architectures existed only as untried and sometimes vague proposals. Accordingly, the research strategy approached this problem by identifying a representative WSI structure and architecture on which to base investigations. An analysis of architectural proposals identified associative memory to be general purpose parallel processing component used in a wide range of WSI architectures. Furthermore, this analysis provided a set of WSI-level design requirements to evaluate the sustainability of different architectures as research vehicles. The WSI-ASP (WASP) device, which has a large associative memory as its main component is shown to meet these requirements and hence was chosen as the research vehicle. Consequently, this thesis addresses WSI potential through an in-depth investigation into the feasibility of implementing a large associative memory for the WASP device that meets the demanding technological constraints of WSI. Overall, the thesis concludes that WSI offers significant potential for the implementation of low-cost fine-grain parallel processing computer systems. However, due to the dual constraints of thermal management and the area required for the power distribution network, power density is a major design constraint in WSI. Indeed, it is shown that WSI power densities need to be an order of magnitude lower than VLSI power densities. The thesis demonstrates that for associative memories at least, VLSI designs are unsuited to implementation in WSI. Rather, it is shown that WSI circuits must be closely matched to the operational environment to assure suitable power densities. These circuits are significantly larger than their VLSI equivalents. Nonetheless, the thesis demonstrates that by concentrating on the most power intensive circuits, it is possible to achieve acceptable power densities with only a modest increase in area overheads.SER
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
Deep neural networks have achieved impressive results in computer vision and
machine learning. Unfortunately, state-of-the-art networks are extremely
compute and memory intensive which makes them unsuitable for mW-devices such as
IoT end-nodes. Aggressive quantization of these networks dramatically reduces
the computation and memory footprint. Binary-weight neural networks (BWNs)
follow this trend, pushing weight quantization to the limit. Hardware
accelerators for BWNs presented up to now have focused on core efficiency,
disregarding I/O bandwidth and system-level efficiency that are crucial for
deployment of accelerators in ultra-low power devices. We present Hyperdrive: a
BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel
binary-weight streaming approach, which can be used for arbitrarily sized
convolutional neural network architecture and input resolution by exploiting
the natural scalability of the compute units both at chip-level and
system-level by arranging Hyperdrive chips systolically in a 2D mesh while
processing the entire feature map together in parallel. Hyperdrive achieves 4.3
TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than
state-of-the-art BWN accelerators, even if its core uses resource-intensive
FP16 arithmetic for increased robustness
A characterization of parallel systems
technical reporta taxonomy for parallel processing systems is presented which has some advantages over previous taxonomies. The taxonomy characterizes parallel processing systems using four parameters: topology, communication, granularity, and operation. These parameters and used repetitively in a hierarchical fashion to produce a taxonomic structure which is extensible to the level of detail desired. Topology describes the structure of the priniciple interconnections. Communication describes the flow of data and programs through the system. Granularity describes the size of the largest repeated element, or grain. Operation describes the important functional properties of each grain, especially the ratio of storage to logic circuitry. Granularity and topology are structural parameters, while operation and communication are functional parameters which describe the behavior of the system components. A final section of this paper includes examples of the application of the taxonomy to several parallel processing systems
A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM
In this article, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in the OFDM based IEEE 802.11a Wireless LAN (WLAN) baseband processor. The 64-point FFT is realized by decomposing it into a 2-D structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use any 2-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25 ?m BiCMOS technology. The core area of this chip is 6.8 mm2. The average dynamic power consumption is 41 mW @ 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i. e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption
Optical computing: introduction by the guest editors to the feature in the 1 May 1988 issue
The feature in the 1 May 1988 issue of Applied Optics includes a collection of papers originally presented at the 1987 Lake Tahoe Topical Meeting on Optical Computing. These papers emphasize digital optical computing systems, optical interconnects, and devices for optical computing, but analog optical processing is considered as well
Two-level pipelined systolic array graphics engine
The authors report a VLSI design of an advanced systolic array graphics (SAG) engine built from pipelined functional units which can generate realistic images interactively for high-resolution displays. They introduce a structured frame store system as an environment for the advanced SAG engine and present the principles and architecture of the advanced SAG engine. They introduce pipelined functional units into this SAG engine to meet the performance requirements. This is done by a formal approach where the original systolic array is represented at bit level by a finite, vertex-weighted, edge-weighted, directed graph. Two architectures built from pipelined functional units are described. A prototype containing nine processing elements was fabricated in a 1.6-¿m CMOS technolog
- …