14,434 research outputs found

    A case study for NoC based homogeneous MPSoC architectures

    Get PDF
    The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

    Tribes Is Hard in the Message Passing Model

    Get PDF
    We consider the point-to-point message passing model of communication in which there are kk processors with individual private inputs, each nn-bit long. Each processor is located at the node of an underlying undirected graph and has access to private random coins. An edge of the graph is a private channel of communication between its endpoints. The processors have to compute a given function of all their inputs by communicating along these channels. While this model has been widely used in distributed computing, strong lower bounds on the amount of communication needed to compute simple functions have just begun to appear. In this work, we prove a tight lower bound of Ω(kn)\Omega(kn) on the communication needed for computing the Tribes function, when the underlying graph is a star of k+1k+1 nodes that has kk leaves with inputs and a center with no input. Lower bound on this topology easily implies comparable bounds for others. Our lower bounds are obtained by building upon the recent information theoretic techniques of Braverman et.al (FOCS'13) and combining it with the earlier work of Jayram, Kumar and Sivakumar (STOC'03). This approach yields information complexity bounds that is of independent interest

    Representing Conversations for Scalable Overhearing

    Full text link
    Open distributed multi-agent systems are gaining interest in the academic community and in industry. In such open settings, agents are often coordinated using standardized agent conversation protocols. The representation of such protocols (for analysis, validation, monitoring, etc) is an important aspect of multi-agent applications. Recently, Petri nets have been shown to be an interesting approach to such representation, and radically different approaches using Petri nets have been proposed. However, their relative strengths and weaknesses have not been examined. Moreover, their scalability and suitability for different tasks have not been addressed. This paper addresses both these challenges. First, we analyze existing Petri net representations in terms of their scalability and appropriateness for overhearing, an important task in monitoring open multi-agent systems. Then, building on the insights gained, we introduce a novel representation using Colored Petri nets that explicitly represent legal joint conversation states and messages. This representation approach offers significant improvements in scalability and is particularly suitable for overhearing. Furthermore, we show that this new representation offers a comprehensive coverage of all conversation features of FIPA conversation standards. We also present a procedure for transforming AUML conversation protocol diagrams (a standard human-readable representation), to our Colored Petri net representation

    Turbomachinery CFD on parallel computers

    Get PDF
    The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations

    Scalable data abstractions for distributed parallel computations

    Get PDF
    The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many current parallel programming models use a shared memory model to provide data abstraction but this doesn't scale well with large numbers of cores due to non-determinism and access latency. This paper proposes a simple programming model that allows scalable parallel programs to be expressed with distributed representations of data and it provides the programmer with the flexibility to employ shared or distributed styles of data-parallelism where applicable. It is capable of an efficient implementation, and with the provision of a small set of primitive capabilities in the hardware, it can be compiled to operate directly on the hardware, in the same way stack-based allocation operates for subroutines in sequential machines
    corecore