Search CORE

88 research outputs found

Extending HyperTransport Protocol for Improved Scalability

Author: Brüning Ulrich
Cavalli Mario
Duato Jose
Holden Brian
Miranda Paul
Silla Federico
Underhill Jeff
Yalamanchili Sudhakar
Publication venue
Publication date: 01/01/2009
Field of study

HyperTransport 3.10 is the best open standard communication technology for chip-to-chip interconnects. However, its extraordinary features are of little help when building mid- and large-scale systems because it is unable to natively scale beyond 8 computing nodes. Therefore, it must be complemented by other interconnect technologies. The HyperTransport Consortium has intensively stimulated discussions among its high-level members in order to overcome those shortcomings. The result is the High Node Count HyperTransport Specification, which defines protocol extensions to the HyperTransport I/O Link Specification Rev. 3.10 that enable HyperTransport to natively support high numbers of computing nodes, typical of large scale server clustering and High Performance Computing (HPC) applications. This extension has been carefully designed in such a way that it extends the maximum number of connected devices to a number large enough to support current and future scalability requirements, while preserving the excellent features that made HyperTransport successful and keeping full backward compatibility with it

Heidelberger Dokumentenserver

Trilinos I/O Support (Trios)

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

Crossref

The Performance Effect of Multi-core on ScientificApplications

Author: Erich Strohmaier
Harvey Wasserman
Helen He
Hongzhang Shan
John Shalf
Jonathan Carter
Yun
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 01/01/2007
Field of study

The historical trend of increasing single CPU performancehas given way to roadmap of increasing core count. The challenge ofeffectively utilizing these multi-core chips is just starting to beexplored by vendors and application developers alike. In this study, wepresent some performance measurements of several complete scientificapplications on single and dual core Cray XT3 and XT4 systems with a viewto characterizing the effects of switching to multi-core chips. Weconsider effects within a node by using applications run at lowconcurrencies, and also effects on node-interconnect interaction usinghigher concurrency results. Finally, we construct a simple performancemodel based on the principle on-chip shared resource--memorybandwidth--and use this to predict the performance of the forthcomingquad-core system

CiteSeerX

eScholarship - University of California

UNT Digital Library

HPCC Update and Analysis

Author: Jeffery A Kuehn
Nathan L Wichmann
Publication venue
Publication date: 11/04/2020
Field of study

Abstract: The last year has seen significant updates in the programming environment and operating systems on the Cray X1E and Cray XT3 as well as the much anticipated release of version 1.0 of HPCC Benchmark. This paper will provide an update and analysis of the HPCC Benchmark Results for Cray XT3 and X1E as well as a comparison against historical results

CiteSeerX

Automating Topology Aware Mapping for Supercomputers

Author: Bhatele Abhinav
Publication venue
Publication date: 01/01/2010
Field of study

Petascale machines with hundreds of thousands of cores are being built. These machines have varying interconnect topologies and large network diameters. Computation is cheap and communication on the network is becoming the bottleneck for scaling of parallel applications. Network contention, specifically, is becoming an increasingly important factor affecting overall performance. The broad goal of this dissertation is performance optimization of parallel applications through reduction of network contention. Most parallel applications have a certain communication topology. Mapping of tasks in a parallel application based on their communication graph, to the physical processors on a machine can potentially lead to performance improvements. Mapping of the communication graph for an application on to the interconnect topology of a machine while trying to localize communication is the research problem under consideration. The farther different messages travel on the network, greater is the chance of resource sharing between messages. This can create contention on the network for networks commonly used today. Evaluative studies in this dissertation show that on IBM Blue Gene and Cray XT machines, message latencies can be severely affected under contention. Realizing this fact, application developers have started paying attention to the mapping of tasks to physical processors to minimize contention. Placement of communicating tasks on nearby physical processors can minimize the distance traveled by messages and reduce the chances of contention. Performance improvements through topology aware placement for applications such as NAMD and OpenAtom are used to motivate this work. Building on these ideas, the dissertation proposes algorithms and techniques for automatic mapping of parallel applications to relieve the application developers of this burden. The effect of contention on message latencies is studied in depth to guide the design of mapping algorithms. The hop-bytes metric is proposed for the evaluation of mapping algorithms as a better metric than the previously used maximum dilation metric. The main focus of this dissertation is on developing topology aware mapping algorithms for parallel applications with regular and irregular communication patterns. The automatic mapping framework is a suite of such algorithms with capabilities to choose the best mapping for a problem with a given communication graph. The dissertation also briefly discusses completely distributed mapping techniques which will be imperative for machines of the future.published or submitted for publicationnot peer reviewe

CiteSeerX

Illinois Digital Environment for Access to Learning and Scholarship Repository

Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA2009)

Author
Publication venue
Publication date: 01/01/2009
Field of study

Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA2009) which was held Feb. 12th 2009 in Mannheim, Germany. The 1st International Workshop for Research on HyperTransport is an international high quality forum for scientists, researches and developers working in the area of HyperTransport. This includes not only developments and research in HyperTransport itself, but also work which is based on or enabled by HyperTransport. HyperTransport (HT) is an interconnection technology which is typically used as system interconnect in modern computer systems, connecting the CPUs among each other and with the I/O bridges. Primarily designed as interconnect between high performance CPUs it provides an extremely low latency, high bandwidth and excellent scalability. The definition of the HTX connector allows the use of HT even for add-in cards. In opposition to other peripheral interconnect technologies like PCI-Express no protocol conversion or intermediate bridging is necessary. HT is a direct connection between device and CPU with minimal latency. Another advantage is the possibility of cache coherent devices. Because of these properties HT is of high interest for high performance I/O like networking and storage, but also for co-processing and acceleration based on ASIC or FPGA technologies. In particular acceleration sees a resurgence of interest today. One reason is the possibility to reduce power consumption by the use of accelerators. In the area of parallel computing the low latency communication allows for fine grain communication schemes and is perfectly suited for scalable systems. Summing up, HT technology offers key advantages and great performance to any research aspect related to or based on interconnects. For more information please consult the workshop website (http://whtra.uni-hd.de)

Heidelberger Dokumentenserver