526 research outputs found

    Performance evaluation of distributed crossbar switch hypermesh

    Get PDF
    The interconnection network is one of the most crucial components in any multicomputer as it greatly influences the overall system performance. Several recent studies have suggested that hypergraph networks, such as the Distributed Crossbar Switch Hypermesh (DCSH), exhibit superior topological and performance characteristics over many traditional graph networks, e.g. k-ary n-cubes. Previous work on the DCSH has focused on issues related to implementation and performance comparisons with existing networks. These comparisons have so far been confined to deterministic routing and unicast (one-to-one) communication. Using analytical models validated through simulation experiments, this thesis extends that analysis to include adaptive routing and broadcast communication. The study concentrates on wormhole switching, which has been widely adopted in practical multicomputers, thanks to its low buffering requirement and the reduced dependence of latency on distance under low traffic. Adaptive routing has recently been proposed as a means of improving network performance, but while the comparative evaluation of adaptive and deterministic routing has been widely reported in the literature, the focus has been on graph networks. The first part of this thesis deals with adaptive routing, developing an analytical model to measure latency in the DCSH, and which is used throughout the rest of the work for performance comparisons. Also, an investigation of different routing algorithms in this network is presented. Conventional k-ary n-cubes have been the underlying topology of contemporary multicomputers, but it is only recently that adaptive routing has been incorporated into such systems. The thesis studies the relative performance merits of the DCSH and k-ary n-cubes under adaptive routing strategy. The analysis takes into consideration real-world factors, such as router complexity and bandwidth constraints imposed by implementation technology. However, in any network, the routing of unicast messages is not the only factor in traffic control. In many situations (for example, parallel iterative algorithms, memory update and invalidation procedures in shared memory systems, global notification of network errors), there is a significant requirement for broadcast traffic. The DCSH, by virtue of its use of hypergraph links, can implement broadcast operations particularly efficiently. The second part of the thesis examines how the DCSH and k-ary n-cube performance is affected by the presence of a broadcast traffic component. In general, these studies demonstrate that because of their relatively high diameter, k-ary n-cubes perform poorly when message lengths are short. This is consistent with earlier more simplistic analyses which led to the proposal for the express-cube, an enhancement of the basic k-ary n-cube structure, which provides additional express channels, allowing messages to bypass groups of nodes along their paths. The final part of the thesis investigates whether this "partial bypassing" can compete with the "total bypassing" capability provided inherently by the DCSH topology

    Enabling parallel computing in CRASH

    Get PDF
    We present the new parallel version (pCRASH2) of the cosmological radiative transfer code CRASH2 for distributed memory supercomputing facilities. The code is based on a static domain decomposition strategy inspired by geometric dilution of photons in the optical thin case that ensures a favourable performance speed-up with increasing number of computational cores. Linear speed-up is ensured as long as the number of radiation sources is equal to the number of computational cores or larger. The propagation of rays is segmented and rays are only propagated through one sub-domain per time step to guarantee an optimal balance between communication and computation. We have extensively checked pCRASH2 with a standardised set of test cases to validate the parallelisation scheme. The parallel version of CRASH2 can easily handle the propagation of radiation from a large number of sources and is ready for the extension of the ionisation network to species other than hydrogen and helium.Comment: Accepted by Monthly Notices of the Royal Astronomical Society. 18 pages, 20 figure

    Solveur Parallèle pour l'Equation de Poisson sur Mailles Superposées et Hiérarchiques, dans le Cadre du Langage Python

    Get PDF
    Adaptive discretizations are important in compressible/incompressible flow problems since it is often necessary to resolve details on multiple levels, allowing large regions of space to be modeled using a reduced number of degrees of freedom (reducing the computational time). There are a wide variety of methods for adaptively discretizing space, but Cartesian grids have often outperformed them even at high resolutions due to their simple and accurate numerical stencils and their superior parallel performances. Such performance and simplicity are in general obtained applying a finite-difference scheme for the resolution of the problems involved, but this discretization approach does not present, by contrast, an easy adapting path. In a finite-volume scheme, instead, we can incorporate different types of grids, more suitable for adaptive refinements, increasing the complexity on the stencils and getting a greater flexibility. The Laplace operator is an essential building block of the Navier-Stokes equations, a model that governs fluid flows, but it occurs also in differential equations that describe many other physical phenomena, such as electric and gravitational potentials, and quantum mechanics. So, it is a very important differential operator, and all the studies carried out on it, prove its relevance. In this work will be presented 2D finite-difference and finite-volume approaches to solve the Laplacian operator, applying patches of overlapping grids where a more fined level is needed, leaving coarsermeshes in the rest of the computational domain. These overlapping grids will have generic quadrilateral shapes. Specifically, the topics covered will be: 1) introduction to the finite difference method, finite volume method, domain partitioning, solution approximation; 2)overview of different types of meshes to represent in a discrete way the geometry involved in a problem, with a focus on the octree data structure, presenting PABLO and PABLitO. The first one is an external library used to manage each single grid’s creation, load balancing andinternal communications, while the second one is the Python API of that library written ad hoc for the current project; 3) presentation of the algorithm used to communicate data between meshes (being all of them unaware of each other’s existence) using MPI inter-communicators and clarification of the monolithic approach applied building the final matrix for the system to solve, taking into account diagonal, restriction and prolongation blocks; 4) presentation of some results; conclusions, references. It is important to underline that everything is done under Python as programming framework, using Cython for the writing of PABLitO, MPI4Py for the communications between grids, PETSc4py for the assembling and resolution parts of the system of unknowns, NumPy for contiguous memory buffer objects. The choice of this programming language has been made because Python, easy to learn and understand, is today a significant contender for the numerical computing and HPC ecosystem, thanks to its clean style, its packages, its compilers and, why not, its specific architecture optimized versions.Les discrétisations adaptatives sont importantes dans les problèmes de flux compressible/incompressible puisqu'il est souvent nécessaire de résoudre des détails sur plusieurs niveaux, en permettant de modéliser de grandes régions d'espace en utilisant un nombre réduit de degrés de liberté (et en réduisant le temps de calcul). Il existe une grande variété de méthodes de discrétisation adaptative, mais les grilles cartésiennes sont les plus efficaces, grâce à leurs stencils numériques simples et précis et à leurs performances parallèles supérieures. Et telles performance et simplicité sont généralement obtenues en appliquant un schéma de différences finies pour la résolution des problèmes, mais cette approche de discrétisation ne présente pas, au contraire, un chemin facile d'adaptation.Dans un schéma de volumes finis, en revanche, nous pouvons incorporer différents types de maillages, plus appropriées aux raffinements adaptatifs, en augmentant la complexité sur les stencils et en obtenant une plus grande flexibilité. L'opérateur de Laplace est un élémentessentiel des équations de Navier-Stokes, un modèle qui gouverne les écoulements de fluides, mais il se produit également dans des équations différentielles qui décrivent de nombreux autres phénomènes physiques, tels que les potentiels électriques et gravitationnels. Il s'agit donc d'un opérateur différentiel très important, et toutes les études qui ont été effectuées sur celui-ci, prouvent sa pertinence. Dans ce travail seront présentés des approches de différences finies et de volumes finis 2D pour résoudre l'opérateur laplacien, en appliquant des patchs de grilles superposées où un niveau plus fin est nécessaire, en laissant des maillages plus grossiers dans le reste du domaine de calcul. Ces grilles superposées auront des formes quadrilatérales génériques. Plus précisément, les sujets abordés seront les suivants: 1) introduction à la méthode des différences finies, méthode des volumes finis, partitionnement des domaines, approximation de la solution; 2)récapitulatif des différents types de maillages pour représenter de façon discrète la géométrie impliquée dans un problème, avec un focus sur la structure de données octree, présentant PABLO et PABLitO. Le premier est une bibliothèque externe utilisée pour gérer la création de chaque grille, l'équilibrage de charge et les communications internes, tandis que la seconde est l'API Python de cette bibliothèque, écrite ad hoc pour le projet en cours; 3) la présentation de l'algorithme utilisé pour communiquer les données entre les maillages (en ignorant chacune l'existence de l'autre) en utilisant les intercommunicateurs MPI et la clarification de l'approche monolithique appliquée à la construction finale de la matrice pour résoudre le système, en tenant compte des blocs diagonaux, de restriction et de prolongement; 4) la présentation de certains résultats; conclusions, références. Il est important de souligner que tout est fait sous Python comme framework de programmation, en utilisant Cython pour l'écriture de PABLitO, MPI4Py pour les communications entre grilles, PETSc4py pour les parties assemblage et résolution du système d'inconnues, NumPy pour les objets à mémoire continue. Le choix de ce langage de programmation a été fait car Python, facile à apprendre et à comprendre, est aujourd'hui un concurrent significatif pour l'informatique numérique et l'écosystème HPC, grâce à son style épuré, ses packages, ses compilateurs et pourquoi pas ses versions optimisées pour des architectures spécifiques
    • …
    corecore