Search CORE

41 research outputs found

Shape reconstruction using instruction systolic array

Author: James Flint (1251738)
Partheepan Kandaswamy (1254243)
Vassilios Chouliaras (1251600)
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes a novel, 2D mesh architecture prototype based on the Instruction Systolic Array (ISA) paradigm for distributed computing on fabrics. We discuss a real-time shape sensing and reconstruction application executing on this architecture and demonstrate a physical design for a wearable system based on ISA concept constructed out of off-the-shelf microcontrollers and sensors. Results demonstrate the application executes in 39 ms on our prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices

Loughborough University Institutional Repository

An instruction systolic array architecture for multiple neural network types

Author: Andrew Kane (6584816)
Publication venue
Publication date: 01/01/1998
Field of study

Modern electronic systems, especially sensor and imaging systems, are beginning to incorporate their own neural network subsystems. In order for these neural systems to learn in real-time they must be implemented using VLSI technology, with as much of the learning processes incorporated on-chip as is possible. The majority of current VLSI implementations literally implement a series of neural processing cells, which can be connected together in an arbitrary fashion. Many do not perform the entire neural learning process on-chip, instead relying on other external systems to carry out part of the computation requirements of the algorithm. The work presented here utilises two dimensional instruction systolic arrays in an attempt to define a general neural architecture which is closer to the biological basis of neural networks - it is the synapses themselves, rather than the neurons, that have dedicated processing units. A unified architecture is described which can be programmed at the microcode level in order to facilitate the processing of multiple neural network types. An essential part of neural network processing is the neuron activation function, which can range from a sequential algorithm to a discrete mathematical expression. The architecture presented can easily carry out the sequential functions, and introduces a fast method of mathematical approximation for the more complex functions. This can be evaluated on-chip, thus implementing the entire neural process within a single system. VHDL circuit descriptions for the chip have been generated, and the systolic processing algorithms and associated microcode instruction set for three different neural paradigms have been designed. A software simulator of the architecture has been written, giving results for several common applications in the field

Loughborough University Institutional Repository

System on fabrics utilising distributed computing

Author: Partheepan Kandaswamy (7202843)
Publication venue
Publication date: 01/01/2018
Field of study

The main vision of wearable computing is to make electronic systems an important part of everyday clothing in the future which will serve as intelligent personal assistants. Wearable devices have the potential to be wearable computers and not mere input/output devices for the human body. The present thesis focuses on introducing a new wearable computing paradigm, where the processing elements are closely coupled with the sensors that are distributed using Instruction Systolic Array (ISA) architecture. The thesis describes a novel, multiple sensor, multiple processor system architecture prototype based on the Instruction Systolic Array paradigm for distributed computing on fabrics. The thesis introduces new programming model to implement the distributed computer on fabrics. The implementation of the concept has been validated using parallel algorithms. A real-time shape sensing and reconstruction application has been implemented on this architecture and has demonstrated a physical design for a wearable system based on the ISA concept constructed from off-the-shelf microcontrollers and sensors. Results demonstrate that the real time application executes on the prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices

Loughborough University Institutional Repository

System on fabrics architecture using distributed computing

Author: James Flint (1251738)
Partheepan Kandaswamy (1254243)
Vassilios Chouliaras (1251600)
Publication venue
Publication date: 01/01/2018
Field of study

This paper describes a novel, distributed sensor network with parallel processing capability based on the Instruction Systolic Array (ISA). A new computing paradigm is introduced where spatially distributed sensors are closely coupled to processing elements and the whole array forms a parallel computer. This may find applications in wearable devices for sensing the position and other metrics of a human body and rapidly processing that data. A new programming model to implement the distributed computer on fabrics is described. The fabric-based distributed computing concept has been validated using a number of parallel applications including a real-time shape sensing and reconstruction application. The exemplar wearable system based on the ISA concept has been realized using off-the-shelf microcontrollers and sensors. Results show that the application executes on the prototype ISA implementation in real time thus confirming the viability of the proposed architecture for fabric-resident computing devices

Loughborough University Institutional Repository

Crossref

The instruction of systolic array (ISA) and simulation of parallel algorithms

Author: Ossama K. Muslih (7169165)
Publication venue
Publication date: 01/01/1989
Field of study

Systolic arrays have proved to be well suited for Very Large Scale Integrated technology (VLSI) since they: -Consist of a regular network of simple processing cells, -Use local communication between the processing cells only, -Exploit a maximal degree of parallelism. However, systolic arrays have one main disadvantage compared with other parallel computer architectures: they are special purpose architectures only capable of executing one algorithm, e.g., a systolic array designed for sorting cannot be used to form matrix multiplication. Several approaches have been made to make systolic arrays more flexible, in order to be able to handle different problems on a single systolic array. In this thesis an alternative concept to a VLSI-architecture the Soft-Systolic Simulation System (SSSS), is introduced and developed as a working model of virtual machine with the power to simulate hard systolic arrays and more general forms of concurrency such as the SIMD and MIMD models of computation. The virtual machine includes a processing element consisting of a soft-systolic processor implemented in the virtual.machine language. The processing element considered here was a very general element which allows the choice of a wide range of arithmetic and logical operators and allows the simulation of a wide class of algorithms but in principle extra processing cells can be added making a library and this library be tailored to individual needs. The virtual machine chosen for this implementation is the Instruction Systolic Array (ISA). The ISA has a number of interesting features, firstly it has been used to simulate all SIMD algorithms and many MIMD algorithms by a simple program transformation technique, further, the ISA can also simulate the so-called wavefront processor algorithms, as well as many hard systolic algorithms. The ISA removes the need for the broadcasting of data which is a feature of SIMD algorithms (limiting the size of the machine and its cycle time) and also presents a fairly simple communication structure for MIMD algorithms. The model of systolic computation developed from the VLSI approach to systolic arrays is such that the processing surface is fixed, as are the processing elements or cells by virtue of their being embedded in the processing surface. The VLSI approach therefore freezes instructions and hardware relative to the movement of data with the virtual machine and softsystolic programming retaining the constructions of VLSI for array design features such as regularity, simplicity and local communication, allowing the movement of instructions with respect to data. Data can be frozen into the structure with instructions moving systolically. Alternatively both the data and instructions can move systolically around the virtual processors, (which are deemed fixed relative to the underlying architecture). The ISA is implemented in OCCAM programs whose execution and output implicitly confirm the correctness of the design. The soft-systolic preparation comprises of the usual operating system facilities for the creation and modification of files during the development of new programs and ISA processor elements. We allow any concurrent high level language to be used to model the softsystolic program. Consequently the Replicating Instruction Systolic Array Language (RI SAL) was devised to provide a very primitive program environment to the ISA but adequate for testing. RI SAL accepts instructions in an assembler-like form, but is fairly permissive about the format of statements, subject of course to syntax. The RI SAL compiler is adopted to transform the soft-systolic program description (RISAL) into a form suitable for the virtual machine (simulating the algorithm) to run. Finally we conclude that the principles mentioned here can form the basis for a soft-systolic simulator using an orthogonally connected mesh of processors. The wide range of algorithms which the ISA can simulate make it suitable for a virtual simulating grid

Loughborough University Institutional Repository

A comparative study of synchronous and self-timed systolic array architectures.

Author: Hogg R. S.
Publication venue: Sheffield Hallam University,
Publication date
Field of study

This thesis examines systolic array architectures and their methods of control and communication synchronisation. Systolic array processors suffer from synchronisation problems associated with the clocking mechanism that causally restricts their scalability. To overcome this problem both return-to-zero (RTZ) and non-return-to zero (NRTZ) delay-insensitive self-timed (ST) techniques can be used to realise architectures that operate correctly in the presence of arbitrary delays at all levels in their design. As a consequence, RTZ and NRTZ versions of an existing systolic array architecture, namely the Single instruction Systolic Array (SISA), have been developed in order to investigate the potential for realising architecturally scaleable systolic arrays. The new architectures, called the RTZ and NRTZ ST-SISAs, have been compared with each other and against their synchronous counterpart to establish their relative trade-offs. The new designs exhibit several novel features including: variable length bit-serial data words, average case processing speeds dependent on data word length as well as computational complexity, a novel autonomous inter-processor data communication mechanism and architectural scalability independent of fabrication technology. This thesis introduces an implementation of the RTZ and NRTZ ST-SISA architectures, along with their performance and area characteristics. Guidelines have been developed from the resulting RTZ and NRTZ architectures allowing novel self-timed systolic architectures to be derived

Sheffield Hallam University Research Archive

Centre for Information Science Research Annual Report, 1987-1991

Author: Australian National University
Publication venue
Publication date: 09/11/2018
Field of study

Annual reports from various departments of the AN

The Australian National University

High performance dense linear algebra on a spatially distributed processor

Author: Behnam Robatmili
Doug Burger
Jeff Diamond
Kazushige Goto
Robert van de Geijn
Stephen W. Keckler
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

As technology trends have limited the performance scaling of conventional processors, industry and academic research has turned to parallel architectures on a single chip, including distributed uniprocessors and multicore chips. This paper examines how to extend the archtypical operation of dense linear algebra, matrix multiply, to an emerging class of uniprocessor architectures characterized by a large number of independent functional units, register banks, and cache banks connected by a 2-D on-chip network. We extend the well known algorithm for matrix multiplication by Goto to this spatially distributed class of uniprocessor and describe the optimizations of the innermost kernel, a systolic-like algorithm running on a general purpose uniprocessor. The resulting implementation yields the first demonstration of high-performance in an application executing on the TRIPS processor hardware, a next-generation distributed processor core. We show that such processors are indeed capable of substantial improvements in single threaded performance provided their spatial topography is taken into account

CiteSeerX

Crossref