41 research outputs found

    Shape reconstruction using instruction systolic array

    Get PDF
    This paper describes a novel, 2D mesh architecture prototype based on the Instruction Systolic Array (ISA) paradigm for distributed computing on fabrics. We discuss a real-time shape sensing and reconstruction application executing on this architecture and demonstrate a physical design for a wearable system based on ISA concept constructed out of off-the-shelf microcontrollers and sensors. Results demonstrate the application executes in 39 ms on our prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices

    An instruction systolic array architecture for multiple neural network types

    Get PDF
    Modern electronic systems, especially sensor and imaging systems, are beginning to incorporate their own neural network subsystems. In order for these neural systems to learn in real-time they must be implemented using VLSI technology, with as much of the learning processes incorporated on-chip as is possible. The majority of current VLSI implementations literally implement a series of neural processing cells, which can be connected together in an arbitrary fashion. Many do not perform the entire neural learning process on-chip, instead relying on other external systems to carry out part of the computation requirements of the algorithm. The work presented here utilises two dimensional instruction systolic arrays in an attempt to define a general neural architecture which is closer to the biological basis of neural networks - it is the synapses themselves, rather than the neurons, that have dedicated processing units. A unified architecture is described which can be programmed at the microcode level in order to facilitate the processing of multiple neural network types. An essential part of neural network processing is the neuron activation function, which can range from a sequential algorithm to a discrete mathematical expression. The architecture presented can easily carry out the sequential functions, and introduces a fast method of mathematical approximation for the more complex functions. This can be evaluated on-chip, thus implementing the entire neural process within a single system. VHDL circuit descriptions for the chip have been generated, and the systolic processing algorithms and associated microcode instruction set for three different neural paradigms have been designed. A software simulator of the architecture has been written, giving results for several common applications in the field

    System on fabrics utilising distributed computing

    Get PDF
    The main vision of wearable computing is to make electronic systems an important part of everyday clothing in the future which will serve as intelligent personal assistants. Wearable devices have the potential to be wearable computers and not mere input/output devices for the human body. The present thesis focuses on introducing a new wearable computing paradigm, where the processing elements are closely coupled with the sensors that are distributed using Instruction Systolic Array (ISA) architecture. The thesis describes a novel, multiple sensor, multiple processor system architecture prototype based on the Instruction Systolic Array paradigm for distributed computing on fabrics. The thesis introduces new programming model to implement the distributed computer on fabrics. The implementation of the concept has been validated using parallel algorithms. A real-time shape sensing and reconstruction application has been implemented on this architecture and has demonstrated a physical design for a wearable system based on the ISA concept constructed from off-the-shelf microcontrollers and sensors. Results demonstrate that the real time application executes on the prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices

    System on fabrics architecture using distributed computing

    Get PDF
    This paper describes a novel, distributed sensor network with parallel processing capability based on the Instruction Systolic Array (ISA). A new computing paradigm is introduced where spatially distributed sensors are closely coupled to processing elements and the whole array forms a parallel computer. This may find applications in wearable devices for sensing the position and other metrics of a human body and rapidly processing that data. A new programming model to implement the distributed computer on fabrics is described. The fabric-based distributed computing concept has been validated using a number of parallel applications including a real-time shape sensing and reconstruction application. The exemplar wearable system based on the ISA concept has been realized using off-the-shelf microcontrollers and sensors. Results show that the application executes on the prototype ISA implementation in real time thus confirming the viability of the proposed architecture for fabric-resident computing devices

    The instruction of systolic array (ISA) and simulation of parallel algorithms

    Get PDF
    Systolic arrays have proved to be well suited for Very Large Scale Integrated technology (VLSI) since they: -Consist of a regular network of simple processing cells, -Use local communication between the processing cells only, -Exploit a maximal degree of parallelism. However, systolic arrays have one main disadvantage compared with other parallel computer architectures: they are special purpose architectures only capable of executing one algorithm, e.g., a systolic array designed for sorting cannot be used to form matrix multiplication. Several approaches have been made to make systolic arrays more flexible, in order to be able to handle different problems on a single systolic array. In this thesis an alternative concept to a VLSI-architecture the Soft-Systolic Simulation System (SSSS), is introduced and developed as a working model of virtual machine with the power to simulate hard systolic arrays and more general forms of concurrency such as the SIMD and MIMD models of computation. The virtual machine includes a processing element consisting of a soft-systolic processor implemented in the virtual.machine language. The processing element considered here was a very general element which allows the choice of a wide range of arithmetic and logical operators and allows the simulation of a wide class of algorithms but in principle extra processing cells can be added making a library and this library be tailored to individual needs. The virtual machine chosen for this implementation is the Instruction Systolic Array (ISA). The ISA has a number of interesting features, firstly it has been used to simulate all SIMD algorithms and many MIMD algorithms by a simple program transformation technique, further, the ISA can also simulate the so-called wavefront processor algorithms, as well as many hard systolic algorithms. The ISA removes the need for the broadcasting of data which is a feature of SIMD algorithms (limiting the size of the machine and its cycle time) and also presents a fairly simple communication structure for MIMD algorithms. The model of systolic computation developed from the VLSI approach to systolic arrays is such that the processing surface is fixed, as are the processing elements or cells by virtue of their being embedded in the processing surface. The VLSI approach therefore freezes instructions and hardware relative to the movement of data with the virtual machine and softsystolic programming retaining the constructions of VLSI for array design features such as regularity, simplicity and local communication, allowing the movement of instructions with respect to data. Data can be frozen into the structure with instructions moving systolically. Alternatively both the data and instructions can move systolically around the virtual processors, (which are deemed fixed relative to the underlying architecture). The ISA is implemented in OCCAM programs whose execution and output implicitly confirm the correctness of the design. The soft-systolic preparation comprises of the usual operating system facilities for the creation and modification of files during the development of new programs and ISA processor elements. We allow any concurrent high level language to be used to model the softsystolic program. Consequently the Replicating Instruction Systolic Array Language (RI SAL) was devised to provide a very primitive program environment to the ISA but adequate for testing. RI SAL accepts instructions in an assembler-like form, but is fairly permissive about the format of statements, subject of course to syntax. The RI SAL compiler is adopted to transform the soft-systolic program description (RISAL) into a form suitable for the virtual machine (simulating the algorithm) to run. Finally we conclude that the principles mentioned here can form the basis for a soft-systolic simulator using an orthogonally connected mesh of processors. The wide range of algorithms which the ISA can simulate make it suitable for a virtual simulating grid

    A comparative study of synchronous and self-timed systolic array architectures.

    Get PDF
    This thesis examines systolic array architectures and their methods of control and communication synchronisation. Systolic array processors suffer from synchronisation problems associated with the clocking mechanism that causally restricts their scalability. To overcome this problem both return-to-zero (RTZ) and non-return-to zero (NRTZ) delay-insensitive self-timed (ST) techniques can be used to realise architectures that operate correctly in the presence of arbitrary delays at all levels in their design. As a consequence, RTZ and NRTZ versions of an existing systolic array architecture, namely the Single instruction Systolic Array (SISA), have been developed in order to investigate the potential for realising architecturally scaleable systolic arrays. The new architectures, called the RTZ and NRTZ ST-SISAs, have been compared with each other and against their synchronous counterpart to establish their relative trade-offs. The new designs exhibit several novel features including: variable length bit-serial data words, average case processing speeds dependent on data word length as well as computational complexity, a novel autonomous inter-processor data communication mechanism and architectural scalability independent of fabrication technology. This thesis introduces an implementation of the RTZ and NRTZ ST-SISA architectures, along with their performance and area characteristics. Guidelines have been developed from the resulting RTZ and NRTZ architectures allowing novel self-timed systolic architectures to be derived

    Centre for Information Science Research Annual Report, 1987-1991

    Get PDF
    Annual reports from various departments of the AN

    High performance dense linear algebra on a spatially distributed processor

    Full text link
    As technology trends have limited the performance scaling of conventional processors, industry and academic research has turned to parallel architectures on a single chip, including distributed uniprocessors and multicore chips. This paper examines how to extend the archtypical operation of dense linear algebra, matrix multiply, to an emerging class of uniprocessor architectures characterized by a large number of independent functional units, register banks, and cache banks connected by a 2-D on-chip network. We extend the well known algorithm for matrix multiplication by Goto to this spatially distributed class of uniprocessor and describe the optimizations of the innermost kernel, a systolic-like algorithm running on a general purpose uniprocessor. The resulting implementation yields the first demonstration of high-performance in an application executing on the TRIPS processor hardware, a next-generation distributed processor core. We show that such processors are indeed capable of substantial improvements in single threaded performance provided their spatial topography is taken into account

    Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

    Full text link
    corecore