1,178 research outputs found
Broadcast with mask on a Massively Parallel Processing on a Chip
workshop drnoc2012The delay of instructions broadcast has a significant impact on the performance of Single Instruction Multiple Data (SIMD) architecture. This is especially true for massively parallel processing Systems-on-Chip (mppSoC), where the processing stage and that of setting up the communication mechanism need several clock periods. Subnetting is the strategy used to partition a single physical network into more than one smaller logical sub-networks (subnets). This technique better controls the broadcast instructions domain and the data traffic between network nodes. Furthermore, it allows to separate synchronous communications from asynchronous processing which maintains reliable communications and rapid processing through parallel processors. This paper describes the design of a communication model called broadcast with mask. This model is dedicated to mppSoC architecture with a huge number of processor elements because it maintains performances even when the number of processors increases. Simulation results and an FPGA implementation validate our approach
Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture
We show that the latest version of massively parallel processing associative
string processing architecture (System-V) is applicable for fast Monte Carlo
simulation if an effective on-processor random number generator is implemented.
Our lagged Fibonacci generator can produce random numbers on a processor
string of 12K PE-s. The time dependent Monte Carlo algorithm of the
one-dimensional non-equilibrium kinetic Ising model performs 80 faster than the
corresponding serial algorithm on a 300 MHz UltraSparc.Comment: 8 pages, 9 color ps figures embedde
G-MPSoC: Generic Massively Parallel Architecture on FPGA
International audienceNowadays, recent intensive signal processing applications are evolving and are characterized by the diversity of algorithms (filtering, correlation, etc.) and their numerous parameters. Having a flexible and pro-grammable system that adapts to changing and various characteristics of these applications reduces the design cost. In this context, we propose in this paper Generic Massively Parallel architecture (G-MPSoC). G-MPSoC is a System-on-Chip based on a grid of clusters of Hardware and Software Computation Elements with different size, performance, and complexity. It is composed of parametric IP-reused modules: processor, controller, accelerator, memory, interconnection network, etc. to build different architecture configurations. The generic structure of G-MPSoC facilitates its adaptation to the intensive signal processing applications requirements. This paper presents G-MPSoC architecture and details its different components. The FPGA-based implementation and the experimental results validate the architectural model choice and show the effectiveness of this design
Parallel image compression
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed
Master-Slave Control structure for massively parallel System on Chip
16th Euromicro Conference on Digital System DesignInternational audienceThe performance of massively parallel processing system depends mostly on the control configuration that is inherently part of the system. In particular, centralized control configuration is rigid and limits system scalability, and distributed control configuration is difficult to control in processing elements (PEs) interaction. Maintaining a flexible autonomous computation coupled with regular synchronous communication can assure a efficient parallel processing. The master-slave control structure is specified in such a way that previous features of the massively parallel System-on-Chip (mpSoC) are preserved and performance is improved. In this paper, we define the prototyping of a master-slave control structure for mpSoC in a FPGA-based platform. The structure implementation and related experiments using the vhdl language running on virtex6 ml605 of Xilinx board are described
Content addressable memory project
A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks
- …