Search CORE

361 research outputs found

Janus II: a new generation application-driven computer for spin-system simulations

This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures, which can be implemented with available electronics technologies, may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.Comment: 28 pages, 6 figure

arXiv.org e-Print Archive

Docta Complutense

Crossref

Directory of Open Access Journals

La Colmena

Turismo y patrimonio (E-Journal)

Archivio istituzionale della ricerca - Università di Ferrara

DIALNET

Archivio della ricerca- Università di Roma La Sapienza

JANUS: an FPGA-based System for High Performance Scientific Computing

Author: A Cruz
Alfonso Tarancon
Andrea Maiorano
Antonio Gordillo-Guerrero
Antonio Munoz-Sudupe
Daniele Sciretti
David Yllanes
Denis Navarro
Enzo Marinari
Filippo Mantovani
Francesco Belletti
Gianpaolo Zanier
Giorgio Parisi
J Luis Velasco
Juan J Ruiz-Lorenzo
Luis Antonio Fernandez
Marco Guidetti
Maria Cotallo
Mauro Rossi
Raffaele Tripiccione
Sebastiano Fabio Schifano
Sergio Perez-Gaviro
Victor Martin-Mayor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/04/2008
Field of study

This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted to Computing in Science & Engineerin

arXiv.org e-Print Archive

Docta Complutense

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Archivio della ricerca- Università di Roma La Sapienza

GRAPEVINE: Grids about anything by Poisson's equation in a visually interactive networking environment

Author: Mccann Karen
Sorenson Reese L.
Publication venue
Publication date
Field of study

A proven 3-D multiple-block elliptic grid generator, designed to run in 'batch mode' on a supercomputer, is improved by the creation of a modern graphical user interface (GUI) running on a workstation. The two parts are connected in real time by a network. The resultant system offers a significant speedup in the process of preparing and formatting input data and the ability to watch the grid solution converge by replotting the grid at each iteration step. The result is a reduction in user time and CPU time required to generate the grid and an enhanced understanding of the elliptic solution process. This software system, called GRAPEVINE, is described, and certain observations are made concerning the creation of such software

NASA Technical Reports Server

Longitudinal Phase Space Tomography with Space Charge

Author: Hancock S
Koscielniak Shane Rupert
Lindroos M
Publication venue: 'American Physical Society (APS)'
Publication date: 07/07/2000
Field of study

Tomography is now a very broad topic with a wealth of algorithms for the reconstruction of both qualitative and quantitative images. In an extension in the domain of particle accelerators, one of the simplest algorithms has been modified to take into account the non-linearity of large-amplitude synchrotron motion. This permits the accurate reconstruction of longitudinal phase space density from one-dimensional bunch profile data. The method is a hybrid one which incorporates particle tracking. Hitherto, a very simple tracking algorithm has been employed because only a brief span of measured profile data is required to build a snapshot of phase space. This is one of the strengths of the method, as tracking for relatively few turns relaxes the precision to which input machine parameters need to be known. The recent addition of longitudinal space charge considerations as an optional refinement of the code is described. Simplicity suggested an approach based on the derivative of bunch shape with the properties of the vacuum chamber parametrized by a single value of distributed reactive impedance and by a geometrical coupling coefficient. This is sufficient to model the dominant collective effects in machines of low to moderate energy. In contrast to simulation codes, binning is not an issue since the profiles to be differentiated are measured ones. The program is written in Fortran 90 with High-Performance Fortran (HPF) extensions for parallel processing. A major effort has been made to identify and remove execution bottlenecks, for example by reducting floating-point calculations and recoding slow intrinsic functions. A pointer-like mechanism which avoids the problems associated with pointers and parallel processing has been implemented. This is required to handle the large, sparse matrices that the algorithm employs. Results obtained with and without the inclusion of space charge are presented and compared for proton beams in the CERN PS Booster. Comparisons of execution times on different platforms are presented and the chosen solution for our application program, which uses a dual processor PC for the number crunching, is described

Directory of Open Access Journals

CERN Document Server

Tomographic Measurements of Longitudinal Phase Space Density

Author: Hancock S
Lindroos M
McIntosh E
Metcalf M
Publication venue
Publication date: 19/01/1999
Field of study

Tomography : the reconstruction of a two-dimensional image from a series of its one-dimensional projections is now a very broad topic with a wealth of algorithms for the reconstruction of both qualitative and quantitative images. One of the simplest algorithms has been modified to take into account the non-linearity of large-amplitude synchrotron motion in a particle accelerator. This permits the accurate reconstruction of longitudinal phase space density from one-dimensional bunch profile data. The algorithm was developed in Mathematica TM in order to exploit the extensive built-in functions and graphics. Subsequently, it has been recoded in Fortran 90 with the aim of reducing the execution time by at least a factor of one hundred. The choice of Fortran 90 was governed by the desire ultimately to exploit parallel architectures, but sequential compilation and execution have already largely yielded the required gain in speed. The use of the method to produce longitudinal phase space plots, animated sequences of the evolution of phase space density and to estimate accelerator parameters is presented. More generally, the new algorithm constitutes an extension of computerized tomography which caters for non rigid bodies whose projections cannot be measured simultaneously

CERN Document Server

GPGPU Processing in CUDA Architecture

Author: Bawaskar Amit
Ghorpade Jayshree
Kulkarni Madhura
Parande Jitendra
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 20/02/2012
Field of study

The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these GPUs possess, they are developing into great parallel computing units. It is quite simple to program a graphics processor to perform general parallel tasks. But after understanding the various architectural aspects of the graphics processor, it can be used to perform other taxing tasks as well. In this paper, we will show how CUDA can fully utilize the tremendous power of these GPUs. CUDA is NVIDIA's parallel computing architecture. It enables dramatic increases in computing performance, by harnessing the power of the GPU. This paper talks about CUDA and its architecture. It takes us through a comparison of CUDA C/C++ with other parallel programming languages like OpenCL and DirectCompute. The paper also lists out the common myths about CUDA and how the future seems to be promising for CUDA.Comment: 16 pages, 5 figures, Advanced Computing: an International Journal (ACIJ) 201

arXiv.org e-Print Archive

Crossref

Elliptic Curve Cryptography on Modern Processor Architectures

Author: Costigan Neil
Publication venue: Dublin City University. School of Computing
Publication date: 25/03/2010
Field of study

Abstract Elliptic Curve Cryptography (ECC) has been adopted by the US National Security Agency (NSA) in Suite "B" as part of its "Cryptographic Modernisation Program ". Additionally, it has been favoured by an entire host of mobile devices due to its superior performance characteristics. ECC is also the building block on which the exciting field of pairing/identity based cryptography is based. This widespread use means that there is potentially a lot to be gained by researching efficient implementations on modern processors such as IBM's Cell Broadband Engine and Philip's next generation smart card cores. ECC operations can be thought of as a pyramid of building blocks, from instructions on a core, modular operations on a finite field, point addition & doubling, elliptic curve scalar multiplication to application level protocols. In this thesis we examine an implementation of these components for ECC focusing on a range of optimising techniques for the Cell's SPU and the MIPS smart card. We show significant performance improvements that can be achieved through of adoption of EC

Irish Universities

DCU Online Research Access Service

Massively parallel number crunching at EPFL

Author: Bos J.
Kaihara M.
Kleinjung T.
Lenstra Arjen K.
Osvik A. D.
Publication venue
Publication date: 29/03/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne