104 research outputs found

    A block algorithm for the algebraic path problem and its execution on a systolic array

    Get PDF
    The solution of the algebraic path problem (APP) for arbitrarily sized graphs by a fixed-size systolic array processor (SAP) is addressed. The APP is decomposed into two subproblems, and SAP is designed for each one. Both SAPs combined produce a highly implementable versatile SAP. The proposed SAP has p*p processing elements (PEs) solving the APP of an N-vertex graph in N/sup 3//p/sup 2/+N/sup 2//p+3p-2 cycles. With slight modifications in the operations performed by the PEs, the problem is optimally solved in N/sup 3//p/sup 2/+3p-2 cycles.Peer ReviewedPostprint (published version

    A Comprehensive Methodology for Algorithm Characterization, Regularization and Mapping Into Optimal VLSI Arrays.

    Get PDF
    This dissertation provides a fairly comprehensive treatment of a broad class of algorithms as it pertains to systolic implementation. We describe some formal algorithmic transformations that can be utilized to map regular and some irregular compute-bound algorithms into the best fit time-optimal systolic architectures. The resulted architectures can be one-dimensional, two-dimensional, three-dimensional or nonplanar. The methodology detailed in the dissertation employs, like other methods, the concept of dependence vector to order, in space and time, the index points representing the algorithm. However, by differentiating between two types of dependence vectors, the ordering procedure is allowed to be flexible and time optimal. Furthermore, unlike other methodologies, the approach reported here does not put constraints on the topology or dimensionality of the target architecture. The ordered index points are represented by nodes in a diagram called Systolic Precedence Diagram (SPD). The SPD is a form of precedence graph that takes into account the systolic operation requirements of strictly local communications and regular data flow. Therefore, any algorithm with variable dependence vectors has to be transformed into a regular indexed set of computations with local dependencies. This can be done by replacing variable dependence vectors with sets of fixed dependence vectors. The SPD is transformed into an acyclic, labeled, directed graph called the Systolic Directed Graph (SDG). The SDG models the data flow as well as the timing for the execution of the given algorithm on a time-optimal array. The target architectures are obtained by projecting the SDG along defined directions. If more than one valid projection direction exists, different designs are obtained. The resulting architectures are then evaluated to determine if an improvement in the performance can be achieved by increasing PE fan-out. If so, the methodology provides the corresponding systolic implementation. By employing a new graph transformation, the SDG is manipulated so that it can be mapped into fixed-size and fixed-depth multi-linear arrays. The latter is a new concept of systolic arrays that is adaptable to changes in the state of technology. It promises a bonded clock skew, higher throughput and better performance than the linear implementation

    A Computational Model for Tensor Core Units

    Full text link
    To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is a hardware circuit for efficiently computing a dense matrix multiplication of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including matrix operations (dense and sparse multiplication, Gaussian Elimination), graph algorithms (transitive closure, all pairs shortest distances), Discrete Fourier Transform, stencil computations, integer multiplication, and polynomial evaluation. We finally highlight a relation between the TCU model and the external memory model

    Applications and implementation of neuro-connectionist architectures.

    Get PDF
    by H.S. Ng.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 91-97).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Neuro-connectionist Network --- p.2Chapter 2 --- Related Works --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.1.1 --- Kruskal's Algorithm --- p.5Chapter 2.1.2 --- Prim's algorithm --- p.6Chapter 2.1.3 --- Sollin's algorithm --- p.7Chapter 2.1.4 --- Bellman-Ford algorithm --- p.8Chapter 2.1.5 --- Floyd-Warshall algorithm --- p.9Chapter 3 --- Binary Relation Inference Network and Path Problems --- p.11Chapter 3.1 --- Introduction --- p.11Chapter 3.2 --- Topology --- p.12Chapter 3.3 --- Network structure --- p.13Chapter 3.3.1 --- Single-destination BRIN architecture --- p.14Chapter 3.3.2 --- Comparison between all-pair BRIN and single-destination BRIN --- p.18Chapter 3.4 --- Path Problems and BRIN Solution --- p.18Chapter 3.4.1 --- Minimax path problems --- p.18Chapter 3.4.2 --- BRIN solution --- p.19Chapter 4 --- Analog and Voltage-mode Approach --- p.22Chapter 4.1 --- Introduction --- p.22Chapter 4.2 --- Analog implementation --- p.24Chapter 4.3 --- Voltage-mode approach --- p.26Chapter 4.3.1 --- The site function --- p.26Chapter 4.3.2 --- The unit function --- p.28Chapter 4.3.3 --- The computational unit --- p.28Chapter 4.4 --- Conclusion --- p.29Chapter 5 --- Current-mode Approach --- p.32Chapter 5.1 --- Introduction --- p.32Chapter 5.2 --- Current-mode approach for analog VLSI Implementation --- p.33Chapter 5.2.1 --- Site and Unit output function --- p.33Chapter 5.2.2 --- Computational unit --- p.34Chapter 5.2.3 --- A complete network --- p.35Chapter 5.3 --- Conclusion --- p.37Chapter 6 --- Neural Network Compensation for Optimization Circuit --- p.40Chapter 6.1 --- Introduction --- p.40Chapter 6.2 --- A Neuro-connectionist Architecture for error correction --- p.41Chapter 6.2.1 --- Linear Relationship --- p.42Chapter 6.2.2 --- Output Deviation of Computational Unit --- p.44Chapter 6.3 --- Experimental Results --- p.46Chapter 6.3.1 --- Training Phase --- p.46Chapter 6.3.2 --- Generalization Phase --- p.48Chapter 6.4 --- Conclusion --- p.50Chapter 7 --- Precision-limited Analog Neural Network Compensation --- p.51Chapter 7.1 --- Introduction --- p.51Chapter 7.2 --- Analog Neural Network hardware --- p.53Chapter 7.3 --- Integration of analog neural network compensation of connectionist net- work for general path problems --- p.54Chapter 7.4 --- Experimental Results --- p.55Chapter 7.4.1 --- Convergence time --- p.56Chapter 7.4.2 --- The accuracy of the system --- p.57Chapter 7.5 --- Conclusion --- p.58Chapter 8 --- Transitive Closure Problems --- p.60Chapter 8.1 --- Introduction --- p.60Chapter 8.2 --- Different ways of implementation of BRIN for transitive closure --- p.61Chapter 8.2.1 --- Digital Implementation --- p.61Chapter 8.2.2 --- Analog Implementation --- p.61Chapter 8.3 --- Transitive Closure Problem --- p.63Chapter 8.3.1 --- A special case of maximum spanning tree problem --- p.64Chapter 8.3.2 --- Analog approach solution for transitive closure problem --- p.65Chapter 8.3.3 --- Current-mode approach solution for transitive closure problem --- p.67Chapter 8.4 --- Comparisons between the different forms of implementation of BRIN for transitive closure --- p.71Chapter 8.4.1 --- Convergence Time --- p.71Chapter 8.4.2 --- Circuit complexity --- p.72Chapter 8.5 --- Discussion --- p.73Chapter 9 --- Critical path problems --- p.74Chapter 9.1 --- Introduction --- p.74Chapter 9.2 --- Problem statement and single-destination BRIN solution --- p.75Chapter 9.3 --- Analog implementation --- p.76Chapter 9.3.1 --- Separated building block --- p.78Chapter 9.3.2 --- Combined building block --- p.79Chapter 9.4 --- Current-mode approach --- p.80Chapter 9.4.1 --- "Site function, unit output function and a completed network" --- p.80Chapter 9.5 --- Conclusion --- p.83Chapter 10 --- Conclusions --- p.85Chapter 10.1 --- Summary of Achievements --- p.85Chapter 10.2 --- Future development --- p.88Chapter 10.2.1 --- Application for financial problems --- p.88Chapter 10.2.2 --- Fabrication of VLSI Implementation --- p.88Chapter 10.2.3 --- Actual prototyping of Analog Integrated Circuits for critical path and transitive closure problems --- p.89Chapter 10.2.4 --- Other implementation platform --- p.89Chapter 10.2.5 --- On-line update of routing table inside the router for network com- munication using BRIN --- p.89Chapter 10.2.6 --- Other BRIN's applications --- p.90Bibliography --- p.9

    Applications and implementation of neuro-connectionist architectures.

    Get PDF
    by H.S. Ng.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 91-97).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Neuro-connectionist Network --- p.2Chapter 2 --- Related Works --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.1.1 --- Kruskal's Algorithm --- p.5Chapter 2.1.2 --- Prim's algorithm --- p.6Chapter 2.1.3 --- Sollin's algorithm --- p.7Chapter 2.1.4 --- Bellman-Ford algorithm --- p.8Chapter 2.1.5 --- Floyd-Warshall algorithm --- p.9Chapter 3 --- Binary Relation Inference Network and Path Problems --- p.11Chapter 3.1 --- Introduction --- p.11Chapter 3.2 --- Topology --- p.12Chapter 3.3 --- Network structure --- p.13Chapter 3.3.1 --- Single-destination BRIN architecture --- p.14Chapter 3.3.2 --- Comparison between all-pair BRIN and single-destination BRIN --- p.18Chapter 3.4 --- Path Problems and BRIN Solution --- p.18Chapter 3.4.1 --- Minimax path problems --- p.18Chapter 3.4.2 --- BRIN solution --- p.19Chapter 4 --- Analog and Voltage-mode Approach --- p.22Chapter 4.1 --- Introduction --- p.22Chapter 4.2 --- Analog implementation --- p.24Chapter 4.3 --- Voltage-mode approach --- p.26Chapter 4.3.1 --- The site function --- p.26Chapter 4.3.2 --- The unit function --- p.28Chapter 4.3.3 --- The computational unit --- p.28Chapter 4.4 --- Conclusion --- p.29Chapter 5 --- Current-mode Approach --- p.32Chapter 5.1 --- Introduction --- p.32Chapter 5.2 --- Current-mode approach for analog VLSI Implementation --- p.33Chapter 5.2.1 --- Site and Unit output function --- p.33Chapter 5.2.2 --- Computational unit --- p.34Chapter 5.2.3 --- A complete network --- p.35Chapter 5.3 --- Conclusion --- p.37Chapter 6 --- Neural Network Compensation for Optimization Circuit --- p.40Chapter 6.1 --- Introduction --- p.40Chapter 6.2 --- A Neuro-connectionist Architecture for error correction --- p.41Chapter 6.2.1 --- Linear Relationship --- p.42Chapter 6.2.2 --- Output Deviation of Computational Unit --- p.44Chapter 6.3 --- Experimental Results --- p.46Chapter 6.3.1 --- Training Phase --- p.46Chapter 6.3.2 --- Generalization Phase --- p.48Chapter 6.4 --- Conclusion --- p.50Chapter 7 --- Precision-limited Analog Neural Network Compensation --- p.51Chapter 7.1 --- Introduction --- p.51Chapter 7.2 --- Analog Neural Network hardware --- p.53Chapter 7.3 --- Integration of analog neural network compensation of connectionist net- work for general path problems --- p.54Chapter 7.4 --- Experimental Results --- p.55Chapter 7.4.1 --- Convergence time --- p.56Chapter 7.4.2 --- The accuracy of the system --- p.57Chapter 7.5 --- Conclusion --- p.58Chapter 8 --- Transitive Closure Problems --- p.60Chapter 8.1 --- Introduction --- p.60Chapter 8.2 --- Different ways of implementation of BRIN for transitive closure --- p.61Chapter 8.2.1 --- Digital Implementation --- p.61Chapter 8.2.2 --- Analog Implementation --- p.61Chapter 8.3 --- Transitive Closure Problem --- p.63Chapter 8.3.1 --- A special case of maximum spanning tree problem --- p.64Chapter 8.3.2 --- Analog approach solution for transitive closure problem --- p.65Chapter 8.3.3 --- Current-mode approach solution for transitive closure problem --- p.67Chapter 8.4 --- Comparisons between the different forms of implementation of BRIN for transitive closure --- p.71Chapter 8.4.1 --- Convergence Time --- p.71Chapter 8.4.2 --- Circuit complexity --- p.72Chapter 8.5 --- Discussion --- p.73Chapter 9 --- Critical path problems --- p.74Chapter 9.1 --- Introduction --- p.74Chapter 9.2 --- Problem statement and single-destination BRIN solution --- p.75Chapter 9.3 --- Analog implementation --- p.76Chapter 9.3.1 --- Separated building block --- p.78Chapter 9.3.2 --- Combined building block --- p.79Chapter 9.4 --- Current-mode approach --- p.80Chapter 9.4.1 --- "Site function, unit output function and a completed network" --- p.80Chapter 9.5 --- Conclusion --- p.83Chapter 10 --- Conclusions --- p.85Chapter 10.1 --- Summary of Achievements --- p.85Chapter 10.2 --- Future development --- p.88Chapter 10.2.1 --- Application for financial problems --- p.88Chapter 10.2.2 --- Fabrication of VLSI Implementation --- p.88Chapter 10.2.3 --- Actual prototyping of Analog Integrated Circuits for critical path and transitive closure problems --- p.89Chapter 10.2.4 --- Other implementation platform --- p.89Chapter 10.2.5 --- On-line update of routing table inside the router for network com- munication using BRIN --- p.89Chapter 10.2.6 --- Other BRIN's applications --- p.90Bibliography --- p.9

    Approaches to the implementation of binary relation inference network.

    Get PDF
    by C.W. Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 96-98).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Availability of Parallel Processing Machines --- p.2Chapter 1.1.1 --- Neural Networks --- p.5Chapter 1.2 --- Parallel Processing in the Continuous-Time Domain --- p.6Chapter 1.3 --- Binary Relation Inference Network --- p.10Chapter 2 --- Binary Relation Inference Network --- p.12Chapter 2.1 --- Binary Relation Inference Network --- p.12Chapter 2.1.1 --- Network Structure --- p.14Chapter 2.2 --- Shortest Path Problem --- p.17Chapter 2.2.1 --- Problem Statement --- p.17Chapter 2.2.2 --- A Binary Relation Inference Network Solution --- p.18Chapter 3 --- A Binary Relation Inference Network Prototype --- p.21Chapter 3.1 --- The Prototype --- p.22Chapter 3.1.1 --- The Network --- p.22Chapter 3.1.2 --- Computational Element --- p.22Chapter 3.1.3 --- Network Response Time --- p.27Chapter 3.2 --- Improving Response --- p.29Chapter 3.2.1 --- Removing Feedback --- p.29Chapter 3.2.2 --- Selecting Minimum with Diodes --- p.30Chapter 3.3 --- Speeding Up the Network Response --- p.33Chapter 3.4 --- Conclusion --- p.35Chapter 4 --- VLSI Building Blocks --- p.36Chapter 4.1 --- The Site --- p.37Chapter 4.2 --- The Unit --- p.40Chapter 4.2.1 --- A Minimum Finding Circuit --- p.40Chapter 4.2.2 --- A Tri-state Comparator --- p.44Chapter 4.3 --- The Computational Element --- p.45Chapter 4.3.1 --- Network Performances --- p.46Chapter 4.4 --- Discussion --- p.47Chapter 5 --- A VLSI Chip --- p.48Chapter 5.1 --- Spatial Configuration --- p.49Chapter 5.2 --- Layout --- p.50Chapter 5.2.1 --- Computational Elements --- p.50Chapter 5.2.2 --- The Network --- p.52Chapter 5.2.3 --- I/O Requirements --- p.53Chapter 5.2.4 --- Optional Modules --- p.53Chapter 5.3 --- A Scalable Design --- p.54Chapter 6 --- The Inverse Shortest Paths Problem --- p.57Chapter 6.1 --- Problem Statement --- p.59Chapter 6.2 --- The Embedded Approach --- p.63Chapter 6.2.1 --- The Formulation --- p.63Chapter 6.2.2 --- The Algorithm --- p.65Chapter 6.3 --- Implementation Results --- p.66Chapter 6.4 --- Other Implementations --- p.67Chapter 6.4.1 --- Sequential Machine --- p.67Chapter 6.4.2 --- Parallel Machine --- p.68Chapter 6.5 --- Discussion --- p.68Chapter 7 --- Closed Semiring Optimization Circuits --- p.71Chapter 7.1 --- Transitive Closure Problem --- p.72Chapter 7.1.1 --- Problem Statement --- p.72Chapter 7.1.2 --- Inference Network Solutions --- p.73Chapter 7.2 --- Closed Semirings --- p.76Chapter 7.3 --- Closed Semirings and the Binary Relation Inference Network --- p.79Chapter 7.3.1 --- Minimum Spanning Tree --- p.80Chapter 7.3.2 --- VLSI Implementation --- p.84Chapter 7.4 --- Conclusion --- p.86Chapter 8 --- Conclusions --- p.87Chapter 8.1 --- Summary of Achievements --- p.87Chapter 8.2 --- Future Work --- p.89Chapter 8.2.1 --- VLSI Fabrication --- p.89Chapter 8.2.2 --- Network Robustness --- p.90Chapter 8.2.3 --- Inference Network Applications --- p.91Chapter 8.2.4 --- Architecture for the Bellman-Ford Algorithm --- p.91Bibliography --- p.92Appendices --- p.99Chapter A --- Detailed Schematic --- p.99Chapter A.1 --- Schematic of the Inference Network Structures --- p.99Chapter A.1.1 --- Unit with Self-Feedback --- p.99Chapter A.1.2 --- Unit with Self-Feedback Removed --- p.100Chapter A.1.3 --- Unit with a Compact Minimizer --- p.100Chapter A.1.4 --- Network Modules --- p.100Chapter A.2 --- Inference Network Interface Circuits --- p.100Chapter B --- Circuit Simulation and Layout Tools --- p.107Chapter B.1 --- Circuit Simulation --- p.107Chapter B.2 --- VLSI Circuit Design --- p.110Chapter B.3 --- VLSI Circuit Layout --- p.111Chapter C --- The Conjugate-Gradient Descent Algorithm --- p.113Chapter D --- Shortest Path Problem on MasPar --- p.11
    • …
    corecore