496 research outputs found
Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing
Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered
Recommended from our members
Investigation into the wafer-scale integration of fine-grain parallel processing computer systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis investigates the potential of wafer-scale integration (WSI) for the implementation of low-cost fine-grain parallel processing computer systems. As WSI is a relatively new subject, there was little work on which to base investigations. Indeed, most WSI architectures existed only as untried and sometimes vague proposals. Accordingly, the research strategy approached this problem by identifying a representative WSI structure and architecture on which to base investigations. An analysis of architectural proposals identified associative memory to be general purpose parallel processing component used in a wide range of WSI architectures. Furthermore, this analysis provided a set of WSI-level design requirements to evaluate the sustainability of different architectures as research vehicles. The WSI-ASP (WASP) device, which has a large associative memory as its main component is shown to meet these requirements and hence was chosen as the research vehicle. Consequently, this thesis addresses WSI potential through an in-depth investigation into the feasibility of implementing a large associative memory for the WASP device that meets the demanding technological constraints of WSI. Overall, the thesis concludes that WSI offers significant potential for the implementation of low-cost fine-grain parallel processing computer systems. However, due to the dual constraints of thermal management and the area required for the power distribution network, power density is a major design constraint in WSI. Indeed, it is shown that WSI power densities need to be an order of magnitude lower than VLSI power densities. The thesis demonstrates that for associative memories at least, VLSI designs are unsuited to implementation in WSI. Rather, it is shown that WSI circuits must be closely matched to the operational environment to assure suitable power densities. These circuits are significantly larger than their VLSI equivalents. Nonetheless, the thesis demonstrates that by concentrating on the most power intensive circuits, it is possible to achieve acceptable power densities with only a modest increase in area overheads.SER
Two-level pipelined systolic array graphics engine
The authors report a VLSI design of an advanced systolic array graphics (SAG) engine built from pipelined functional units which can generate realistic images interactively for high-resolution displays. They introduce a structured frame store system as an environment for the advanced SAG engine and present the principles and architecture of the advanced SAG engine. They introduce pipelined functional units into this SAG engine to meet the performance requirements. This is done by a formal approach where the original systolic array is represented at bit level by a finite, vertex-weighted, edge-weighted, directed graph. Two architectures built from pipelined functional units are described. A prototype containing nine processing elements was fabricated in a 1.6-¿m CMOS technolog
Efficient algorithms for reconfiguration in VLSI/WSI arrays
The issue of developing efficient algorithms for reconfiguring processor arrays in the presence of faulty processors and fixed hardware resources is discussed. The models discussed consist of a set of identical processors embedded in a flexible interconnection structure that is configured in the form of a rectangular grid. An array grid model based on single-track switches is considered. An efficient polynomial time algorithm is proposed for determining feasible reconfigurations for an array with a given distribution of faulty processors. In the process, it is shown that the set of conditions in the reconfigurability theorem is not necessary. A polynomial time algorithm is developed for finding feasible reconfigurations in an augmented single-track model and in array grid models with multiple-track switche
Periodic Application of Concurrent Error Detection in Processor Array Architectures
Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance
Fault tolerance issues in nanoelectronics
The astonishing success story of microelectronics cannot go on indefinitely. In fact, once
devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected
to impair their behaviour. Fault tolerant techniques will then be required. The aim of this
thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient
error rates for a selection of nanoelectronic gates, based upon quantum cellular automata
and single electron devices, in which the electrostatic interaction between electrons is used
to create Boolean circuits, are estimated. On the bases of such results, various fault tolerant
solutions are proposed, for both logic and memory nanochips. As for logic chips, traditional
techniques are found to be unsuitable. A new technique, in which the voting approach of
triple modular redundancy (TMR) is extended by cascading TMR units composed of
nanogate clusters, is proposed and generalised to other voting approaches. For memory
chips, an error correcting code approach is found to be suitable. Various codes are
considered and a lookup table approach is proposed for encoding and decoding. We are
then able to give estimations for the redundancy level to be provided on nanochips, so as to
make their mean time between failures acceptable. It is found that, for logic chips, space
redundancies up to a few tens are required, if mean times between failures have to be of the
order of a few years. Space redundancy can also be traded for time redundancy. As for
memory chips, mean times between failures of the order of a few years are found to imply
both space and time redundancies of the order of ten
An efficient asynchronous multiplier
An efficient asynchronous serial-parallel multiplier architecture is presented. If offers significant advantages over conventional clocked versions, without some of the drawbacks normally associated with similar asynchronous techniques, such as excessive area. It is shown how a general asynchronous communication element can be designed and illustrated with the CMOS multiplier chip implementation. It is also shown how the multiplier could form the basis for a faster and more robust implementation of the Rivest-Sharmir-Adleman (RSA) public-key cryptosystem
Effective network grid synthesis and optimization for high performance very large scale integration system design
制度:新 ; 文部省報告番号:甲2642号 ; 学位の種類:博士(工学) ; 授与年月日:2008/3/15 ; 早大学位記番号:新480
- …