104 research outputs found

    On the performance of broadcast algorithms in interconnection networks

    Get PDF
    Broadcast Communication is among the most primitive collective capabilities of any message passing network. Broadcast algorithms for the mesh have been widely reported in the literature. However, most existing algorithms have been studied within limited conditions, such as light traffic load and fixed network sizes. In other words, most of these algorithms have not been studied at different Quality of Service (QoS) levels. In contrast, this study examines the broadcast operation, taking into account the scalability, parallelism, a wide range of traffic loads through the propagation of broadcast messages. To the best of our knowledge, this study is the first to consider the issue of broadcast latency at both the network and node levels across different traffic loads. Results are shown from a comparative analysis confirming that the coded-path based broadcast algorithms exhibit superior performance characteristics over some existing algorithms

    OrthoNoC: a broadcast-oriented dual-plane wireless network-on-chip architecture

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksOn-chip communication remains as a key research issue at the gates of the manycore era. In response to this, novel interconnect technologies have opened the door to new Network-on-Chip (NoC) solutions towards greater scalability and architectural flexibility. Particularly, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. This work presents ORTHONOC, a wired-wireless architecture that differs from existing proposals in that both network planes are decoupled and driven by traffic steering policies enforced at the network interfaces. With these and other design decisions, ORTHONOC seeks to emphasize the ordered broadcast advantage offered by the wireless technology. The performance and cost of ORTHONOC are first explored using synthetic traffic, showing substantial improvements with respect to other wired-wireless designs with a similar number of antennas. Then, the applicability of ORTHONOC in the multiprocessor scenario is demonstrated through the evaluation of a simple architecture that implements fast synchronization via ordered broadcast transmissions. Simulations reveal significant execution time speedups and communication energy savings for 64-threaded benchmarks, proving that the value of ORTHONOC goes beyond simply improving the performance of the on-chip interconnect.Peer ReviewedPostprint (author's final draft

    Parallelization of Stochastic Evolution for Cell Placement

    Get PDF
    VLSI physical design and the problems related to it such as placement, channel routing, etc, carry inherent complexities that are best dealt with iterative heuristics. However the major drawback of these iterative heuristics has been the large runtime involved in reaching acceptable solutions especially when optimizing for multiple objectives. Among the acceleration techniques proposed, parallelization is one promising method. Distributed memory multiprocessor systems and shared memory multiprocessor systems have gained considerable attention in recent years of research. This idea of parallel computing has attracted both the researchers and manufacturers who are targeting to reduce the time to market. Our objective is to exploit the benefits of parallel computing for a time consuming placement problem in VLSI. Finding the best solution for the placement of n modules is a hard problem. Thus the enumerative search techniques, specially those which employ the brute force, are unaccepted for the circuits in which n (number of modules) is large. Constructive and Iterative heuristics play the key role in this scenario and hence are frequently used. We will use Stochastic Evolution for finding the optimal solution to the above mentioned placement problem where the major task in our objective will be the parallelization of Stochastic Evolution using different parallelization techniques and the comparison between these different parallelized versions based on the results achieved. The parallelization will be carried out using MPI (Message Passing Interface) on a distributed memory multiprocessor system and conclusion will be based on the results achieved that are expected to show speedup nearly equal to linear speedup when run over increasing number of processors
    corecore