48 research outputs found

    Mapping large-scale FEM-graphs to highly parallel computers with grid-like topology by self-organization

    Get PDF
    We consider the problem of mapping large scale FEM graphs for the solution of partial differential equations to highly parallel distributed memory computers. Typically, these programs show a low-dimensional grid-like communication structure. We argue that conventional domain decomposition methods that are usually employed today are not well suited for future highly parallel computers as they do not take into account the interconnection structure of the parallel computer resulting in a large communication overhead. Therefore we propose a new mapping heuristic which performs both, partitioning of the solution domain and processor allocation in one integrated step. Our procedure is based on the ability of Kohonen neural networks to exploit topological similarities of an input space and a grid-like structured network to compute a neighborhood preserving mapping between the set of discretization points and the parallel computer. We report about results of mapping up to 44,000-node FEM graphs to a 4096-processor parallel computer and demonstrate the capability of the proposed scheme for dynamic remapping considering adaptive refinement of the discretization graph

    Fano resonances and decoherence in transport through quantum dots

    Full text link
    A tunable microwave scattering device is presented which allows the controlled variation of Fano line shape parameters in transmission through quantum billiards. We observe a non-monotonic evolution of resonance parameters that is explained in terms of interacting resonances. The dissipation of radiation in the cavity walls leads to decoherence and thus to a modification of the Fano profile. We show that the imaginary part of the complex Fano q-parameter allows to determine the absorption constant of the cavity. Our theoretical results demonstrate further that the two decohering mechanisms, dephasing and dissipation, are equivalent in terms of their effect on the evolution of Fano resonance lineshapes.Comment: 9 pages, 7 figures, submitted to Physica E (conference proceedings

    Towards Cloud-based Asynchronous Elasticity for Iterative HPC Applications

    Get PDF
    Elasticity is one of the key features of cloud computing. It allows applications to dynamically scale computing and storage resources, avoiding over- and under-provisioning. In high performance computing (HPC), initiatives are normally modeled to handle bag-of-tasks or key-value applications through a load balancer and a loosely-coupled set of virtual machine (VM) instances. In the joint-field of Message Passing Interface (MPI) and tightly-coupled HPC applications, we observe the need of rewriting source codes, previous knowledge of the application and/or stop-reconfigure-and-go approaches to address cloud elasticity. Besides, there are problems related to how profit this new feature in the HPC scope, since in MPI 2.0 applications the programmers need to handle communicators by themselves, and a sudden consolidation of a VM, together with a process, can compromise the entire execution. To address these issues, we propose a PaaS-based elasticity model, named AutoElastic. It acts as a middleware that allows iterative HPC applications to take advantage of dynamic resource provisioning of cloud infrastructures without any major modification. AutoElastic provides a new concept denoted here as asynchronous elasticity, i.e., it provides a framework to allow applications to either increase or decrease their computing resources without blocking the current execution. The feasibility of AutoElastic is demonstrated through a prototype that runs a CPU-bound numerical integration application on top of the OpenNebula middleware. The results showed the saving of about 3 min at each scaling out operations, emphasizing the contribution of the new concept on contexts where seconds are precious

    1 The Problem Processor Management in Highly Parallel Systems with 2D-Grid Architectures: Buddy Schemes

    No full text
    Programming for parallel systems and in particular, multicomputers, is still uncomfortable and inefficient. We often observe monoprogramming operation, which inevitably leads to poor utilization and uneconomic machine usage. When multiprogramming is available, the machine is usually partitioned manually and in a rather static way without the ability to adjust the partitioning to the dynamic requests of the parallel programs. The reason for this situation is a lack of operating system software support. We therefore claim that operating systems for those machines have to provide a dynamic processor management facility comparable to storage management. Mesh-connected multicomputer (MIMD message passing) systems become more and more popular for different reasons: Firstly, for many problems to be solved the 2D-grid is the natural and appropriate communication topology. Secondly, even if the problem structure is not exactly the 2D-grid, many parallel programs based on data partitioning need only local information exchange with a few partners and can still be sufficiently well mapped to 2D-grid architectures. Thirdly, unlike the hypercube, the 2D-grid has a constant node degree, which makes the topology highly scalable. Fourthly, there are already powerful processor chips available that are specifically designed for 2D-topologies (Transputer). We assume a homogeneous multicomputer system with N = 2 n 1 × 2 n 2 identical processor nodes. Each nod

    Mapping tasks to processors at run-time

    No full text
    We consider the dynamic task allocation problem in multicomputer system with multiprogramming. Programs are given as task interaction graphs that have to be mapped onto the processors at run-time. We propose a fast two-phase heuristic algorithm where phase 1 performs a hierarchic clustering of the tasks which is used by the second phase to map clusters of suitable size onto free partitions of the processor graph.

    The Prism Bridge: Maximizing Inter-Chip AXI Throughput in the High-Speed Serial Era

    No full text
    In this paper, we present the Prism Bridge, a soft IP core developed to bridge FPGA-MPSoC systems using high-speed serial links. Considering the current trend of ubiquitous serial transceivers with staggeringly increasing line rates, minimizing overhead and maximizing data throughput becomes paramount. Hence, our main design goal is to maximize bandwidth utilization for AXI data, which we realize through an advanced packetization mechanism. We give an overview of the Prism Bridge’s design and analyze its half-duplex bandwidth utilization. Additionally, we discuss the results of the experiments we conducted to assess its real-world performance, including measurements of throughput and latency of various combinations of line rates, link-layer cores, and bridge cores. Using a serial link with a 16.375 Gbit/s line rate, the Prism Bridge with an advanced packetizing mechanism achieved an AXI write throughput of 1368.81 MiB/s and an AXI read throughput of 1376.61 MiB/s, an increase of 46.19% and 45.85%, respectively, compared with the de-facto industry-standard core. The advanced packetization mechanism had negligible impact on latency but required 69.14%–73.91% more LUTs and 33.62%–36.19% more flip-flops. We conclude that for most designs that support inter-chip AXI transactions and will not be limited to short transaction lengths, the higher data throughput of the Prism Bridge with an advanced packetization mechanism is worth its cost in additional logic resource utilization
    corecore