1,178 research outputs found

    Broadcast with mask on a Massively Parallel Processing on a Chip

    Get PDF
    workshop drnoc2012The delay of instructions broadcast has a significant impact on the performance of Single Instruction Multiple Data (SIMD) architecture. This is especially true for massively parallel processing Systems-on-Chip (mppSoC), where the processing stage and that of setting up the communication mechanism need several clock periods. Subnetting is the strategy used to partition a single physical network into more than one smaller logical sub-networks (subnets). This technique better controls the broadcast instructions domain and the data traffic between network nodes. Furthermore, it allows to separate synchronous communications from asynchronous processing which maintains reliable communications and rapid processing through parallel processors. This paper describes the design of a communication model called broadcast with mask. This model is dedicated to mppSoC architecture with a huge number of processor elements because it maintains performances even when the number of processors increases. Simulation results and an FPGA implementation validate our approach

    Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture

    Get PDF
    We show that the latest version of massively parallel processing associative string processing architecture (System-V) is applicable for fast Monte Carlo simulation if an effective on-processor random number generator is implemented. Our lagged Fibonacci generator can produce 10810^8 random numbers on a processor string of 12K PE-s. The time dependent Monte Carlo algorithm of the one-dimensional non-equilibrium kinetic Ising model performs 80 faster than the corresponding serial algorithm on a 300 MHz UltraSparc.Comment: 8 pages, 9 color ps figures embedde

    G-MPSoC: Generic Massively Parallel Architecture on FPGA

    Get PDF
    International audienceNowadays, recent intensive signal processing applications are evolving and are characterized by the diversity of algorithms (filtering, correlation, etc.) and their numerous parameters. Having a flexible and pro-grammable system that adapts to changing and various characteristics of these applications reduces the design cost. In this context, we propose in this paper Generic Massively Parallel architecture (G-MPSoC). G-MPSoC is a System-on-Chip based on a grid of clusters of Hardware and Software Computation Elements with different size, performance, and complexity. It is composed of parametric IP-reused modules: processor, controller, accelerator, memory, interconnection network, etc. to build different architecture configurations. The generic structure of G-MPSoC facilitates its adaptation to the intensive signal processing applications requirements. This paper presents G-MPSoC architecture and details its different components. The FPGA-based implementation and the experimental results validate the architectural model choice and show the effectiveness of this design

    Parallel image compression

    Get PDF
    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed

    Master-Slave Control structure for massively parallel System on Chip

    Get PDF
    16th Euromicro Conference on Digital System DesignInternational audienceThe performance of massively parallel processing system depends mostly on the control configuration that is inherently part of the system. In particular, centralized control configuration is rigid and limits system scalability, and distributed control configuration is difficult to control in processing elements (PEs) interaction. Maintaining a flexible autonomous computation coupled with regular synchronous communication can assure a efficient parallel processing. The master-slave control structure is specified in such a way that previous features of the massively parallel System-on-Chip (mpSoC) are preserved and performance is improved. In this paper, we define the prototyping of a master-slave control structure for mpSoC in a FPGA-based platform. The structure implementation and related experiments using the vhdl language running on virtex6 ml605 of Xilinx board are described

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    Content addressable memory project

    Get PDF
    A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks
    corecore