Search CORE

382 research outputs found

Recent development and perspectives of machines for lattice QCD

Author: Aglietti
Ammendola
Aoki
Aoki
APE
Arndt
Bartoloni
Bhanot
Bodin
Bodin
Boyle
Boyle
Boyle
Brickner
Chen
Chiu
Christ
Christ
Christ
CP-PACS
Csikor
Fischer
Fodor
Fodor
Gellrich
Gottlieb
Gottlieb
Hasenbusch
Holmgren
Iwasaki
Iwasaki
Lindahl
Luo
Luscher
Marinari
Marinari
Mawhinney
Meuer
Negrassus
Ridge
Sexton
Singh
Sroczynski
Sroczynski
Th Lippert
Watson
Watson
Weingarten
Weingarten
Publication venue: 'Elsevier BV'
Publication date: 10/11/2003
Field of study

I highlight recent progress in cluster computer technology and assess status and prospects of cluster computers for lattice QCD with respect to the development of QCDOC and apeNEXT. Taking the LatFor test case, I specify a 512-processor QCD-cluster better than 1$/Mflops.Comment: 14 pages, 17 figures, Lattice2003(plenary

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

CERN Document Server

Design and Implementation of Open-MX: High-Performance Message Passing over generic Ethernet hardware

Author: Goglin Brice
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2008
Field of study

International audienceOpen-MX is a new message passing layer implemented on top of the generic Ethernet stack of the Linux kernel. It provides high-performance communication on top of any Ethernet hardware while exhibiting the Myrinet Express application interface. Open-MX also enables wire-interoperability with Myricom's MXoE hosts. This article presents the design of the Open-MX stack which reproduces the MX firmware in a Linux driver. MPICH-MX and PVFS2 layers are already able to work flawlessly on Open-MX. The first performance evaluation shows interesting latency and bandwidth results on 1 and 10~gigabit hardware

INRIA a CCSD electronic archive server

Performance measurement and analysis of PC based cluster server using SET of Architecture and modeling a scalable High performance cluster

Author: Mehta Mihir J.
Publication venue
Publication date: 01/04/2006
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

Advanced Message Routing for Scalable Distributed Simulations

Author: Barrett Brian
Gottschalk Thomas
Publication venue
Publication date: 01/12/2004
Field of study

The Joint Forces Command (JFCOM) Experimentation Directorate (J9)'s recent Joint Urban Operations (JUO) experiments have demonstrated the viability of Forces Modeling and Simulation in a distributed environment. The JSAF application suite, combined with the RTI-s communications system, provides the ability to run distributed simulations with sites located across the United States, from Norfolk, Virginia to Maui, Hawaii. Interest-aware routers are essential for communications in the large, distributed environments, and the current RTI-s framework provides such routers connected in a straightforward tree topology. This approach is successful for small to medium sized simulations, but faces a number of significant limitations for very large simulations over high-latency, wide area networks. In particular, traffic is forced through a single site, drastically increasing distances messages must travel to sites not near the top of the tree. Aggregate bandwidth is limited to the bandwidth of the site hosting the top router, and failures in the upper levels of the router tree can result in widespread communications losses throughout the system. To resolve these issues, this work extends the RTI-s software router infrastructure to accommodate more sophisticated, general router topologies, including both the existing tree framework and a new generalization of the fully connected mesh topologies used in the SF Express ModSAF simulations of 100K fully interacting vehicles. The new software router objects incorporate the scalable features of the SF Express design, while optionally using low-level RTI-s objects to perform actual site-to-site communications. The (substantial) limitations of the original mesh router formalism have been eliminated, allowing fully dynamic operations. The mesh topology capabilities allow aggregate bandwidth and site-to-site latencies to match actual network performance. The heavy resource load at the root node can now be distributed across routers at the participating sites

Caltech Authors

High-Performance Message Passing over generic Ethernet Hardware with Open-MX

Author: Goglin Brice
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

International audienceIn the last decade, cluster computing has become the most popular high-performance computing architecture. Although numerous technological innovations have been proposed to improve the interconnection of nodes, many clusters still rely on commodity Ethernet hardware to implement message passing within parallel applications. We present Open-MX, an open-source message passing stack over generic Ethernet. It offers the same abilities as the specialized Myrinet Express stack, without requiring dedicated support from the networking hardware. Open-MX works transparently in the most popular MPI implementations through its MX interface compatibility. It also enables interoperability between hosts running the specialized MX stack and generic Ethernet hosts. We detail how Open-MX copes with the inherent limitations of the Ethernet hardware to satisfy the requirements of message passing by applying an innovative copy offload model. Combined with a careful tuning of the fabric and of the MX wire protocol, Open-MX achieves better performance than TCP implementations, especially on 10 gigabit/s hardware

CiteSeerX

INRIA a CCSD electronic archive server

NIC-assisted cache-efficient receive stack for message passing over Ethernet

Author: Bailey
Browne
Frigo
Goglin
Goglin
Goglin
Huggahalli
Passas
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

International audienceHigh-speed networking in clusters usually relies on advanced hardware features in the NICs, such as zero-copy capability. Open-MX is a high-performance message passing stack tailored for regular Ethernet hardware without such capabilities. We present the addition of a multiqueue support in the Open-MX receive stack so that all incoming packets for the same process are handled on the same core. We then introduce the idea of binding the target end process near its dedicated receive queue. This model leads to a more cache-efficient receive stack for Open-MX. It also proves that very simple and stateless hardware features may have a significant impact on message passing performance over Ethernet. The implementation of this model in a firmware reveals that it may not be as efficient as some manually tuned micro-benchmarks. But our multiqueue receive stack generally performs better than the original single queue stack, especially on large communication patterns where multiple processes are involved and manual binding is difficult

Crossref

INRIA a CCSD electronic archive server

Configuring large high-performance clusters at lightspeed: A case study

Author: Caroline A Papadopoulos
Greg Bruno
Mason J Katz
Philip M Papadopoulos
William J Link
Publication venue
Publication date: 01/01/2002
Field of study

Abstract Over a decade ago, the TOP500 list was started as a way to measure supercomputers by their sustained performance on a particular linear algebra benchmark. Once reserved for the exotic machines and extremely well-funded centers and laboratories, commodity clusters now make it possible for smaller groups to deploy and use highperformance machines in their own laboratories. This paper describes a weekend activity where two existing 128-node commodity clusters were fused into a single 256-node cluster for the specific purpose of running the benchmark used to rank the machines in the TOP500 supercomputer list. The resulting metacluster sits on the November 2002 list at position 233. A key differentiator for this cluster is that it was assembled, in terms of its software, from the NPACI Rocks open-source cluster toolkit as downloaded from the public website. The toolkit allows non-cluster experts to deploy and run supercomputerclass machines in a matter of hours instead of weeks or months. With the exception of recompiling the University of Tennessee's Automatically Tuned Linear Algebra Subroutines (ATLAS) library with a recommended version of the GNU C compiler, this metacluster ran a "stock" Rocks distribution. Successful first-time deployment of the fused cluster was completed in a scant 6 hours. Partitioning of the metacluster and restoration of the two 128-node clusters to their original configuration was completed in just over 40 minutes. This paper describes early (pre-weekend) benchmark activities to empirically determine reasonably good parameters for the High-Performance Linpack (HPL) code on both Ethernet and Myrinet interconnects. It fully describes the physical layout of the machine, the description-based installation methods used in Rocks to re-deploy two independent clusters as a single cluster, and gives the benchmark results that were gathered over the 40-hour period allotted for the complete experiment. In addition, we describe some of the online monitoring and measurement techniques that were employed during the experiment. Finally, we point out the issues uncovered with a commodity cluster of this size. The techniques presented in this paper truly bring supercomputers into the hands of the masses of computational scientists

CiteSeerX

Commodity Computing Clusters at Goddard Space Flight Center

Author: Dorband John E.
Palencia Josephine
Ranawake Udaya
Publication venue: OHIO Open Library
Publication date: 24/05/2021
Field of study

The purpose of commodity cluster computing is to utilize large numbers of readily available computing components for parallel computing to obtaining the greatest amount of useful computations for the least cost. The issue of the cost of a computational resource is key to computational science and data processing at GSFC as it is at most other places, the difference being that the need at GSFC far exceeds any expectation of meeting that need. Therefore, Goddard scientists need as much computing resources that are available for the provided funds. This is exemplified in the following brief history of low-cost high-performance computing at GSFC

OHIO Open Library (Ohio University)