Search CORE

48 research outputs found

Mapping large-scale FEM-graphs to highly parallel computers with grid-like topology by self-organization

Author: Dormanns Marcus
Heiss Hans-Ulrich
Publication venue
Publication date: 02/08/2007
Field of study

We consider the problem of mapping large scale FEM graphs for the solution of partial differential equations to highly parallel distributed memory computers. Typically, these programs show a low-dimensional grid-like communication structure. We argue that conventional domain decomposition methods that are usually employed today are not well suited for future highly parallel computers as they do not take into account the interconnection structure of the parallel computer resulting in a large communication overhead. Therefore we propose a new mapping heuristic which performs both, partitioning of the solution domain and processor allocation in one integrated step. Our procedure is based on the ability of Kohonen neural networks to exploit topological similarities of an input space and a grid-like structured network to compute a neighborhood preserving mapping between the set of discretization points and the parallel computer. We report about results of mapping up to 44,000-node FEM graphs to a 4096-processor parallel computer and demonstrate the capability of the proposed scheme for dynamic remapping considering adaptive refinement of the discretization graph

KITopen

Process Rescheduling: Enabling Performance by Applying Multiple Metrics and Efficient Adaptations

Author: Alexandre Carissimi
Hans-Ulrich Heiss
Laércio Pilla
Philippe Navaux
Rodrigo Righi
Publication venue: 'IntechOpen'
Publication date: 17/08/2010
Field of study

IntechOpen

Fano resonances and decoherence in transport through quantum dots

Author: Adair
Beutler
Burgdörfer
Cerdeira
Clerk
Fano
Fano
Feist
Florian Libisch
Göres
Hans-Jürgen Stöckmann
Heiss
Heiss
Jackson
Joachim Burgdörfer
Kim
Kim
Kobayashi
Kobayashi
Ladrón de Guevara
Lee
Levstein
Madhavan
Magunov
Nöckel
Okołowicz
Persson
Porod
Richter
Rotter
Rotter
Rotter
Shao
Simpson
Song
Stefan Rotter
Stöckmann
Ulrich Kuhl
von Neumann
Wirtz
Publication venue: 'Elsevier BV'
Publication date: 20/12/2004
Field of study

A tunable microwave scattering device is presented which allows the controlled variation of Fano line shape parameters in transmission through quantum billiards. We observe a non-monotonic evolution of resonance parameters that is explained in terms of interacting resonances. The dissipation of radiation in the cavity walls leads to decoherence and thus to a modification of the Fano profile. We show that the imaginary part of the complex Fano q-parameter allows to determine the absorption constant of the cavity. Our theoretical results demonstrate further that the two decohering mechanisms, dephasing and dissipation, are equivalent in terms of their effect on the evolution of Fano resonance lineshapes.Comment: 9 pages, 7 figures, submitted to Physica E (conference proceedings

arXiv.org e-Print Archive

Crossref

Towards Cloud-based Asynchronous Elasticity for Iterative HPC Applications

Author: da Costa Cristiano André
Heiss Hans-Ulrich
Kreutz Diego
Righi Rodrigo da Rosa
Rodrigues Vinicius Facco
Publication venue
Publication date: 01/01/2015
Field of study

Elasticity is one of the key features of cloud computing. It allows applications to dynamically scale computing and storage resources, avoiding over- and under-provisioning. In high performance computing (HPC), initiatives are normally modeled to handle bag-of-tasks or key-value applications through a load balancer and a loosely-coupled set of virtual machine (VM) instances. In the joint-field of Message Passing Interface (MPI) and tightly-coupled HPC applications, we observe the need of rewriting source codes, previous knowledge of the application and/or stop-reconfigure-and-go approaches to address cloud elasticity. Besides, there are problems related to how profit this new feature in the HPC scope, since in MPI 2.0 applications the programmers need to handle communicators by themselves, and a sudden consolidation of a VM, together with a process, can compromise the entire execution. To address these issues, we propose a PaaS-based elasticity model, named AutoElastic. It acts as a middleware that allows iterative HPC applications to take advantage of dynamic resource provisioning of cloud infrastructures without any major modification. AutoElastic provides a new concept denoted here as asynchronous elasticity, i.e., it provides a framework to allow applications to either increase or decrease their computing resources without blocking the current execution. The feasibility of AutoElastic is demonstrated through a prototype that runs a CPU-bound numerical integration application on top of the OpenNebula middleware. The results showed the saving of about 3 min at each scaling out operations, emphasizing the contribution of the new concept on contexts where seconds are precious

DepositOnce

1 The Problem Processor Management in Highly Parallel Systems with 2D-Grid Architectures: Buddy Schemes

Author: Hans-ulrich Heiss
Publication venue
Publication date
Field of study

Programming for parallel systems and in particular, multicomputers, is still uncomfortable and inefficient. We often observe monoprogramming operation, which inevitably leads to poor utilization and uneconomic machine usage. When multiprogramming is available, the machine is usually partitioned manually and in a rather static way without the ability to adjust the partitioning to the dynamic requests of the parallel programs. The reason for this situation is a lack of operating system software support. We therefore claim that operating systems for those machines have to provide a dynamic processor management facility comparable to storage management. Mesh-connected multicomputer (MIMD message passing) systems become more and more popular for different reasons: Firstly, for many problems to be solved the 2D-grid is the natural and appropriate communication topology. Secondly, even if the problem structure is not exactly the 2D-grid, many parallel programs based on data partitioning need only local information exchange with a few partners and can still be sufficiently well mapped to 2D-grid architectures. Thirdly, unlike the hypercube, the 2D-grid has a constant node degree, which makes the topology highly scalable. Fourthly, there are already powerful processor chips available that are specifically designed for 2D-topologies (Transputer). We assume a homogeneous multicomputer system with N = 2 n 1 × 2 n 2 identical processor nodes. Each nod

CiteSeerX

Mapping tasks to processors at run-time

Author: Hans-ulrich Heiss
Publication venue
Publication date
Field of study

We consider the dynamic task allocation problem in multicomputer system with multiprogramming. Programs are given as task interaction graphs that have to be mapped onto the processors at run-time. We propose a fast two-phase heuristic algorithm where phase 1 performs a hierarchic clustering of the tasks which is used by the second phase to map clusters of suitable size onto free partitions of the processor graph.

CiteSeerX

The Prism Bridge: Maximizing Inter-Chip AXI Throughput in the High-Speed Serial Era

Author: Hans-Ulrich Heiss
Robert Drehmel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

In this paper, we present the Prism Bridge, a soft IP core developed to bridge FPGA-MPSoC systems using high-speed serial links. Considering the current trend of ubiquitous serial transceivers with staggeringly increasing line rates, minimizing overhead and maximizing data throughput becomes paramount. Hence, our main design goal is to maximize bandwidth utilization for AXI data, which we realize through an advanced packetization mechanism. We give an overview of the Prism Bridge’s design and analyze its half-duplex bandwidth utilization. Additionally, we discuss the results of the experiments we conducted to assess its real-world performance, including measurements of throughput and latency of various combinations of line rates, link-layer cores, and bridge cores. Using a serial link with a 16.375 Gbit/s line rate, the Prism Bridge with an advanced packetizing mechanism achieved an AXI write throughput of 1368.81 MiB/s and an AXI read throughput of 1376.61 MiB/s, an increase of 46.19% and 45.85%, respectively, compared with the de-facto industry-standard core. The advanced packetization mechanism had negligible impact on latency but required 69.14%–73.91% more LUTs and 33.62%–36.19% more flip-flops. We conclude that for most designs that support inter-chip AXI transactions and will not be limited to short transaction lengths, the higher data throughput of the Prism Bridge with an advanced packetization mechanism is worth its cost in additional logic resource utilization

Directory of Open Access Journals