18,453 research outputs found

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

    Improving the scalability of parallel N-body applications with an event driven constraint based execution model

    Full text link
    The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

    Programming agent-based demographic models with cross-state and message-exchange dependencies: A study with speculative PDES and automatic load-sharing

    Get PDF
    Agent-based modeling and simulation is a versatile and promising methodology to capture complex interactions among entities and their surrounding environment. A great advantage is its ability to model phenomena at a macro scale by exploiting simpler descriptions at a micro level. It has been proven effective in many fields, and it is rapidly becoming a de-facto standard in the study of population dynamics. In this article we study programmability and performance aspects of the last-generation ROOT-Sim speculative PDES environment for multi/many-core shared-memory architectures. ROOT-Sim transparently offers a programming model where interactions can be based on both explicit message passing and in-place state accesses. We introduce programming guidelines for systematic exploitation of these facilities in agent-based simulations, and we study the effects on performance of an innovative load-sharing policy targeting these types of dependencies. An experimental assessment with synthetic and real-world applications is provided, to assess the validity of our proposal

    Advanced manned space flight simulation and training: An investigation of simulation host computer system concepts

    Get PDF
    The findings of a preliminary investigation by Southwest Research Institute (SwRI) in simulation host computer concepts is presented. It is designed to aid NASA in evaluating simulation technologies for use in spaceflight training. The focus of the investigation is on the next generation of space simulation systems that will be utilized in training personnel for Space Station Freedom operations. SwRI concludes that NASA should pursue a distributed simulation host computer system architecture for the Space Station Training Facility (SSTF) rather than a centralized mainframe based arrangement. A distributed system offers many advantages and is seen by SwRI as the only architecture that will allow NASA to achieve established functional goals and operational objectives over the life of the Space Station Freedom program. Several distributed, parallel computing systems are available today that offer real-time capabilities for time critical, man-in-the-loop simulation. These systems are flexible in terms of connectivity and configurability, and are easily scaled to meet increasing demands for more computing power

    MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

    Full text link
    Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.Comment: 20 pages, 8 figure
    • …
    corecore