115 research outputs found

    Recent Advances in Graph Partitioning

    Full text link
    We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

    Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing

    Get PDF
    Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM

    Automating Topology Aware Mapping for Supercomputers

    Get PDF
    Petascale machines with hundreds of thousands of cores are being built. These machines have varying interconnect topologies and large network diameters. Computation is cheap and communication on the network is becoming the bottleneck for scaling of parallel applications. Network contention, specifically, is becoming an increasingly important factor affecting overall performance. The broad goal of this dissertation is performance optimization of parallel applications through reduction of network contention. Most parallel applications have a certain communication topology. Mapping of tasks in a parallel application based on their communication graph, to the physical processors on a machine can potentially lead to performance improvements. Mapping of the communication graph for an application on to the interconnect topology of a machine while trying to localize communication is the research problem under consideration. The farther different messages travel on the network, greater is the chance of resource sharing between messages. This can create contention on the network for networks commonly used today. Evaluative studies in this dissertation show that on IBM Blue Gene and Cray XT machines, message latencies can be severely affected under contention. Realizing this fact, application developers have started paying attention to the mapping of tasks to physical processors to minimize contention. Placement of communicating tasks on nearby physical processors can minimize the distance traveled by messages and reduce the chances of contention. Performance improvements through topology aware placement for applications such as NAMD and OpenAtom are used to motivate this work. Building on these ideas, the dissertation proposes algorithms and techniques for automatic mapping of parallel applications to relieve the application developers of this burden. The effect of contention on message latencies is studied in depth to guide the design of mapping algorithms. The hop-bytes metric is proposed for the evaluation of mapping algorithms as a better metric than the previously used maximum dilation metric. The main focus of this dissertation is on developing topology aware mapping algorithms for parallel applications with regular and irregular communication patterns. The automatic mapping framework is a suite of such algorithms with capabilities to choose the best mapping for a problem with a given communication graph. The dissertation also briefly discusses completely distributed mapping techniques which will be imperative for machines of the future.published or submitted for publicationnot peer reviewe

    Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks

    Get PDF
    Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units. Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems. No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology. Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties. Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved

    Adaptive Finite Elements for Systems of PDEs: Software Concepts, Multi-level Techniques and Parallelization

    Get PDF
    In the recent past, the field of scientific computing has become of more and more importance for scientific as well as for industrial research, playing a comparable role as experiment and theory do. This success of computational methods in scientific and engineering research is next to the enormous improvement of computer hardware to a large extend due to contributions from applied mathematicians, who have developed algorithms which make real life applications feasible. Examples are adaptive methods, high order discretization, fast linear and non-linear solvers and multi-level methods. The application of these methods in a large class of problems demands for suitable and robust tools for a flexible and efficient implementation. In order to play a crucial role in scientific and engineering research, besides efficiency in the numerical solution, also efficiency in problem setup and interpretation of simulation results is of utmost importance. As modeling and computing comes closer together, efficient computational methods need to be applied to new sets of equations. The problems to be addressed by simulation methods become more and more complicated, ranging over different scales, interacting on different dimensions and combining different physics. Such problems need to be implemented in a short period of time, solved on complicated domains and visualized with respect to the demand of the user. %Only a modular abstract simulation environment will fulfill these requirements and allow to setup, solve and visualize real-world problems appropriately. In this work, the concepts and the design of the C++ finite element toolbox AMDiS (adaptive multidimensional simulations) are described. It is shown, how abstract data structures and modern software concepts can help to design user-friendly finite element software, which provides large flexibility in problem definition while on the other hand efficiently solves these problems. Also systems of coupled problems can be solved in an intuitive way. In order to demonstrate its possibilities, AMDiS has been applied to several non-standard problems. The most time-consuming part in most simulations is the solution of linear systems of equations. Multi-level methods use discretization hierarchies to solve these systems in a very efficient way. In AMDiS, such multi-level techniques are implemented in the context of adaptive finite elements. Several numerical results are given which compare this multigrid solver with classical iterative methods. Besides the development of more efficient algorithms also the growing hardware capabilities lead to an improvement of simulation possibilities. Modern computing clusters contain more and more processors and also personal computers today are often equipped with multi-core processors. In this work, a new parallelization approach has been developed which allows the parallelization of sequential code in a very easy way and reduces the communication overhead compared to classical parallelization concepts

    Parallel mesh adaptive techniques for complex flow simulation

    Get PDF
    Dynamic mesh adaptation on unstructured grids, by localised refinement and derefinement, is a very efficient tool for enhancing solution accuracy and optimise computational time. One of the major drawbacks however resides in the projection of the new nodes created, during the refinement process, onto the boundary surfaces. This can be addressed by the introduction of a library capable of handling geometric properties given by a CAD (Computer Aided Design) description. This is of particular interest also to enhance the adaptation module when the mesh is being smoothed, and hence moved, to then re-project it onto the surface of the exact geometry. However, the above procedure is not always possibly due to either faulty or too complex designs, which require a higher level of complexity in the CAD library. It is therefore paramount to have a built-in algorithm able to place the new nodes, belonging to the boundary, closer to the geometric definition of it. Such a procedure is proposed in this work, based on the idea of interpolating subdivision. In order to efficiently and effectively adapt a mesh to a solution field, the criteria used for the adaptation process needs to be as accurate as possible. Due to the nature of the solution, which is obtained by discretisation of a continuum model, numerical error is intrinsic in the calculation. A posteriori error estimation allows us to somewhat assess the accuracy by using the computed solution itself. In particular, an a posteriori error estimator based on the Zienkievicz Zhu model is introduced. This can be used in the adaptation procedure to refine the mesh in those areas where the local error exceeds a set tolerance, hence further increasing the accuracy of the solution in those regions during the next computational step. Variants of this error estimator have also been studied and implemented. One of the important aspects of this project is the fact that the algorithmic concepts are developed thinking parallel, i.e. the algorithms take into account the possibility of multiprocessor implementation. Indeed these concepts require complex programming if one tries to parallelise them, once they have been devised serially. Another important and innovative aspect of this work is the consistency of the algorithms with parallel processor execution

    Development of a Navier-Stokes algorithm for parallel-processing supercomputers

    Get PDF
    An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2

    Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters

    Full text link
    Actualmente, los clusters de PC son un alternativa rentable a los computadores paralelos. En estos sistemas, miles de componentes (procesadores y/o discos duros) se conectan a través de redes de interconexión de altas prestaciones. Entre las tecnologías de red actualmente disponibles para construir clusters, InfiniBand (IBA) ha emergido como un nuevo estándar de interconexión para clusters. De hecho, ha sido adoptado por muchos de los sistemas más potentes construidos actualmente (lista top500). A medida que el número de nodos aumenta en estos sistemas, la red de interconexión también crece. Junto con el aumento del número de componentes la probabilidad de averías aumenta dramáticamente, y así, la tolerancia a fallos en el sistema en general, y de la red de interconexión en particular, se convierte en una necesidad. Desafortunadamente, la mayor parte de las estrategias de encaminamiento tolerantes a fallos propuestas para los computadores masivamente paralelos no pueden ser aplicadas porque el encaminamiento y las transiciones de canal virtual son deterministas en IBA, lo que impide que los paquetes eviten los fallos. Por lo tanto, son necesarias nuevas estrategias para tolerar fallos. Por ello, esta tesis se centra en proporcionar los niveles adecuados de tolerancia a fallos a los clusters de PC, y en particular a las redes IBA. En esta tesis proponemos y evaluamos varios mecanismos adecuados para las redes de interconexión para clusters. El primer mecanismo para proporcionar tolerancia a fallos en IBA (al que nos referimos como encaminamiento tolerante a fallos basado en transiciones; TFTR) consiste en usar varias rutas disjuntas entre cada par de nodos origen-destino y seleccionar la ruta apropiada en el nodo fuente usando el mecanismo APM proporcionado por IBA. Consiste en migrar las rutas afectadas por el fallo a las rutas alternativas sin fallos. Sin embargo, con este fin, es necesario un algoritmo eficiente de encaminamiento capaz de proporcionar suficientesMontañana Aliaga, JM. (2008). Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2603Palanci

    IST Austria Thesis

    Get PDF
    This thesis describes a brittle fracture simulation method for visual effects applications. Building upon a symmetric Galerkin boundary element method, we first compute stress intensity factors following the theory of linear elastic fracture mechanics. We then use these stress intensities to simulate the motion of a propagating crack front at a significantly higher resolution than the overall deformation of the breaking object. Allowing for spatial variations of the material's toughness during crack propagation produces visually realistic, highly-detailed fracture surfaces. Furthermore, we introduce approximations for stress intensities and crack opening displacements, resulting in both practical speed-up and theoretically superior runtime complexity compared to previous methods. While we choose a quasi-static approach to fracture mechanics, ignoring dynamic deformations, we also couple our fracture simulation framework to a standard rigid-body dynamics solver, enabling visual effects artists to simulate both large scale motion, as well as fracturing due to collision forces in a combined system. As fractures inside of an object grow, their geometry must be represented both in the coarse boundary element mesh, as well as at the desired fine output resolution. Using a boundary element method, we avoid complicated volumetric meshing operations. Instead we describe a simple set of surface meshing operations that allow us to progressively add cracks to the mesh of an object and still re-use all previously computed entries of the linear boundary element system matrix. On the high resolution level, we opt for an implicit surface representation. We then describe how to capture fracture surfaces during crack propagation, as well as separate the individual fragments resulting from the fracture process, based on this implicit representation. We show results obtained with our method, either solving the full boundary element system in every time step, or alternatively using our fast approximations. These results demonstrate that both of these methods perform well in basic test cases and produce realistic fracture surfaces. Furthermore we show that our fast approximations substantially out-perform the standard approach in more demanding scenarios. Finally, these two methods naturally combine, using the full solution while the problem size is manageably small and switching to the fast approximations later on. The resulting hybrid method gives the user a direct way to choose between speed and accuracy of the simulation
    • …
    corecore