54 research outputs found

    Fast Forwarding with Network Processors

    Get PDF
    Forwarding is a mechanism found in many network operations. Although a regular workstation is able to perform forwarding operations it still suffers from poor performances when compared to dedicated hardware machines. In this paper we study the possibility of using Network Processors (NPs) to improve the capability of regular workstations to forward data. We present a simple model and an experimental study demonstrating that even though NPs are less powerful than Host Processors (HPs) they can forward data more efficiently than HPs in some specific cases

    Fast Forwarding with Network Processors

    Get PDF
    Forwarding is a mechanism found in many network operations. Although a regular workstation is able to perform forwarding operations it still suffers from poor performances when compared to dedicated hardware machines. In this paper we study the possibility of using Network Processors (NPs) to improve the capability of regular workstations to forward data. We present a simple model and an experimental study demonstrating that even though NPs are less powerful than Host Processors (HPs) they can forward data more efficiently than HPs in some specific cases

    The All-Data-Based Evolutionary Hypothesis of Ciliated Protists with a Revised Classification of the Phylum Ciliophora (Eukaryota, Alveolata)

    Get PDF
    This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The file attached is the published version of the article

    Basic Routines for the Rank-2k Update: 2D Torus vs Reconfigurable Network

    No full text
    Our aim is to provide the Rank-2k update on different parallel machines. In this paper, we compare the performance obtained on a fixed 2D torus topology and on a reconfigurable system. This results in the development of two basic communication subroutines, namely scattering and matrix-transposition. And two basic computation subroutines, namely matrix product and Rank-2k update (both belongs to the level 3 BLAS). The preceding distributed-memory machines generation used fixed networks such as grid, multidimensional tori or hypercubes. Today, vendors propose machines with networks that can be reconfigured during program execution are available. A large number of possibilities are therefore available to the programmer, who can adapt his configuration during runtime to suit both best algorithm and data distribution. This dynamical reconfiguration obviously introduces an overhead through the setting of the network switch(es). This overhead must be taken into account in the cost of the whol..

    Adaptive Data Rate for Multiple Gateways LoRaWAN Networks

    No full text
    International audienceWe propose to optimize the LoRaWAN Adaptive Data Rate algorithm in case an inter-packet error correction scheme is available. We adjust its parameters based on the analysis of the LoRa channel with multiple reception gateways, supported by real-world traffic traces. The resulting protocol provides very high reliability even over low quality channels, with comparable Time on Air and similar downlink usage as the currently deployed mechanism. Simulations corroborate the analysis, both over a synthetic random wireless link and over replayed real-world packet transmission traces

    LOCCS: Low Overhead Communication and Computation Subroutines

    No full text
    Our aim is to provide one set of efficient basic subroutines for scientific computing which include both communications and computations. The overlap of communications and computations is done using asynchronous pipelining to minimize the overhead due to communications. With this set of routines, we provide to the user of parallel machines an easy SPMD type and efficient way of programming. The main purpose of theses routines is to be used in linear algebra applications but also in other fields like image processing or neural networks. This work was partially supported by ARCHIPEL S.A. under contract 820542, by the CNRS and the DRET 1 Introduction Libraries of routines have been proven to be the only way for efficient and secure programming. In scientific parallel computing, the most commonly used libraries are the BLAS, BLACS, PICL and the one provided by vendors. These building blocks allow the portability of codes and an efficient implementation on different machines. The devel..

    Optimization of the ScaLAPACK LU Factorization Routine Using Communication/Computation Overlap

    Get PDF
    This paper presents some works on the ScaLAPACK LU factorization. First, a complexity analysis is given. It allows to compute the optimal block size for the block scattered distribution used in ScaLAPACK LU . It also gives the communication phases that are interesting to overlap. Second, two optimizations based on computations/communications overlap are given with experimental results on Intel Paragon system

    Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors

    No full text
    In this paper, we make efficient use of asynchronous communications on the LU decomposition algorithm with pivoting and a column-scattered data decomposition to derive precise computational complexities. We then compare these results with experiments on the Intel iPSC/860 and Paragon machines and show that very good performances can be obtained on a ring with asynchronous communications

    Efficient Communication Operations in Reconfigurable Parallel Computers

    No full text
    Reconfiguration is largely an unexplored property in the context of parallel models of computation. However, it is a powerful concept as far as massively parallel architectures are concerned, because it overcomes the constraints due to the bissection width arising in most of distributed memory machines. In this paper, we show how to use reconfiguration in order to improve communication operations that are widely used in parallel applications. We propose quasi-optimal algorithms for broadcasting, scattering, gossiping and multi-scattering. Keywords: Reconfiguration, broadcast, scattering, gossiping, communications, distributed memory parallel computers 1 Introduction For massively parallel architectures, the hardware complexity of the interconnection network is much higher than that of the processing units: "the interconnection network employs 99% of the hardware involved" [JMM92]. Moreover, due to the communication-intensive nature of most computational tasks, their performance depen..
    • …
    corecore