86,243 research outputs found

    Finding Connected Components on a Scan Line Array Processor

    Get PDF
    This paper provides a new approach to labeling the connected components of an n x n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaranteed to complete in o(n lg n) time as well as algorithms likely to approach O(n) time for all or most images. The best previous solutions require using a more complicated architecture or require Omega(n lg n) time. We also show that on a restricted version of the architecture, any algorithm requires Omega(n lg n) time in the worst case

    OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

    Full text link
    There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

    Configurable 3D-integrated focal-plane sensor-processor array architecture

    Get PDF
    A mixed-signal Cellular Visual Microprocessor architecture with digital processors is described. An ASIC implementation is also demonstrated. The architecture is composed of a regular sensor readout circuit array, prepared for 3D face-to-face type integration, and one or several cascaded array of mainly identical (SIMD) processing elements. The individual array elements derived from the same general HDL description and could be of different in size, aspect ratio, and computing resources

    Analog hardware for learning neural networks

    Get PDF
    This is a recurrent or feedforward analog neural network processor having a multi-level neuron array and a synaptic matrix for storing weighted analog values of synaptic connection strengths which is characterized by temporarily changing one connection strength at a time to determine its effect on system output relative to the desired target. That connection strength is then adjusted based on the effect, whereby the processor is taught the correct response to training examples connection by connection

    A Tutorial Introduction to Mosaic Pascal

    Get PDF
    In this report we describe a Pascal system that has been developed for programming Mosaic multi- computers. The system that we discuss runs on our Sun workstations, and we assume some familiarity with the use thereof. We assume the reader to be also familiar with programming in Pascal, and with message-passing programs. We describe how the Pascal language has been extended to perform message passing. We discuss a few implementation aspects that are relevant only to those users who have a need (or desire) to control some machine-specific aspects. The latter requires some detailed knowledge of the Mosaic system

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained ReconïŹgurable Array (CGRA) architectures accelerate the same inner loops that beneïŹt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efïŹciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ïŹ‚exibility, performance, and power-efïŹciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ïŹne-tuning of source code

    Electrically reconfigurable logic array

    Get PDF
    To compose the complicated systems using algorithmically specialized logic circuits or processors, one solution is to perform relational computations such as union, division and intersection directly on hardware. These relations can be pipelined efficiently on a network of processors having an array configuration. These processors can be designed and implemented with a few simple cells. In order to determine the state-of-the-art in Electrically Reconfigurable Logic Array (ERLA), a survey of the available programmable logic array (PLA) and the logic circuit elements used in such arrays was conducted. Based on this survey some recommendations are made for ERLA devices

    A framework for FPGA functional units in high performance computing

    Get PDF
    FPGAs make it practical to speed up a program by defining hardware functional units that perform calculations faster than can be achieved in software. Specialised digital circuits avoid the overhead of executing sequences of instructions, and they make available the massive parallelism of the components. The FPGA operates as a coprocessor controlled by a conventional computer. An application that combines software with hardware in this way needs an interface between a communications port to the processor and the signals connected to the functional units. We present a framework that supports the design of such systems. The framework consists of a generic controller circuit defined in VHDL that can be configured by the user according to the needs of the functional units and the I/O channel. The controller contains a register file and a pipelined programmable register transfer machine, and it supports the design of both stateless and stateful functional units. Two examples are described: the implementation of a set of basic stateless arithmetic functional units, and the implementation of a stateful algorithm that exploits circuit parallelism
    • 

    corecore