337,435 research outputs found
Critical Behavior of the Three-Dimensional Ising Spin Glass
We have simulated, using parallel tempering, the three dimensional Ising spin
glass model with binary couplings in a helicoidal geometry. The largest lattice
(L=20) has been studied using a dedicated computer (the SUE machine). We have
obtained, measuring the correlation length in the critical region, a strong
evidence for a second-order finite temperature phase transition ruling out
other possible scenarios like a Kosterlitz-Thouless phase transition. Precise
values for the and critical exponents are also presented.Comment: RevTex; 12 pages plus 5 ps figures. Final version to be published in
PR
apeNEXT: A multi-TFlops Computer for Simulations in Lattice Gauge Theory
We present the APE (Array Processor Experiment) project for the development
of dedicated parallel computers for numerical simulations in lattice gauge
theories. While APEmille is a production machine in today's physics simulations
at various sites in Europe, a new machine, apeNEXT, is currently being
developed to provide multi-Tflops computing performance. Like previous APE
machines, the new supercomputer is largely custom designed and specifically
optimized for simulations of Lattice QCD.Comment: Poster at the XXIII Physics in Collisions Conference (PIC03),
Zeuthen, Germany, June 2003, 3 pages, Latex. PSN FRAP15. Replaced for adding
forgotten autho
Measuring NUMA effects with the STREAM benchmark
Modern high-end machines feature multiple processor packages, each of which
contains multiple independent cores and integrated memory controllers connected
directly to dedicated physical RAM. These packages are connected via a shared
bus, creating a system with a heterogeneous memory hierarchy. Since this shared
bus has less bandwidth than the sum of the links to memory, aggregate memory
bandwidth is higher when parallel threads all access memory local to their
processor package than when they access memory attached to a remote package.
But, the impact of this heterogeneous memory architecture is not easily
understood from vendor benchmarks. Even where these measurements are available,
they provide only best-case memory throughput. This work presents a series of
modifications to the well-known STREAM benchmark to measure the effects of NUMA
on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine
C-NNAP - A parallel processing architecture for binary neural networks
This paper describes the CNNAP machine, a MIMD implementation of an array of ADAM binary neural networks, primarily designed for image processing. CNNAP comprises an array of VME cards each containing a DSP, SCSI controller, and a new design of the SAT peripheral processor. The SAT processor is a dedicated hardware implemention that performs binary neural network computations. The SAT processor yields a potential speed-up of between 108 times to 182 times that of the current DSP with its dedicated coprocessor. CNNAP in association with the SAT provides a fast, parallel environment for performing binary neural network operations
Ianus: an Adpative FPGA Computer
Dedicated machines designed for specific computational algorithms can
outperform conventional computers by several orders of magnitude. In this note
we describe {\it Ianus}, a new generation FPGA based machine and its basic
features: hardware integration and wide reprogrammability. Our goal is to build
a machine that can fully exploit the performance potential of new generation
FPGA devices. We also plan a software platform which simplifies its
programming, in order to extend its intended range of application to a wide
class of interesting and computationally demanding problems. The decision to
develop a dedicated processor is a complex one, involving careful assessment of
its performance lead, during its expected lifetime, over traditional computers,
taking into account their performance increase, as predicted by Moore's law. We
discuss this point in detail
A low-cost parallel implementation of direct numerical simulation of wall turbulence
A numerical method for the direct numerical simulation of incompressible wall
turbulence in rectangular and cylindrical geometries is presented. The
distinctive feature resides in its design being targeted towards an efficient
distributed-memory parallel computing on commodity hardware. The adopted
discretization is spectral in the two homogeneous directions; fourth-order
accurate, compact finite-difference schemes over a variable-spacing mesh in the
wall-normal direction are key to our parallel implementation. The parallel
algorithm is designed in such a way as to minimize data exchange among the
computing machines, and in particular to avoid taking a global transpose of the
data during the pseudo-spectral evaluation of the non-linear terms. The
computing machines can then be connected to each other through low-cost network
devices. The code is optimized for memory requirements, which can moreover be
subdivided among the computing nodes. The layout of a simple, dedicated and
optimized computing system based on commodity hardware is described. The
performance of the numerical method on this computing system is evaluated and
compared with that of other codes described in the literature, as well as with
that of the same code implementing a commonly employed strategy for the
pseudo-spectral calculation.Comment: To be published in J. Comp. Physic
Janus II: a new generation application-driven computer for spin-system simulations
This paper describes the architecture, the development and the implementation
of Janus II, a new generation application-driven number cruncher optimized for
Monte Carlo simulations of spin systems (mainly spin glasses). This domain of
computational physics is a recognized grand challenge of high-performance
computing: the resources necessary to study in detail theoretical models that
can make contact with experimental data are by far beyond those available using
commodity computer systems. On the other hand, several specific features of the
associated algorithms suggest that unconventional computer architectures, which
can be implemented with available electronics technologies, may lead to order
of magnitude increases in performance, reducing to acceptable values on human
scales the time needed to carry out simulation campaigns that would take
centuries on commercially available machines. Janus II is one such machine,
recently developed and commissioned, that builds upon and improves on the
successful JANUS machine, which has been used for physics since 2008 and is
still in operation today. This paper describes in detail the motivations behind
the project, the computational requirements, the architecture and the
implementation of this new machine and compares its expected performances with
those of currently available commercial systems.Comment: 28 pages, 6 figure
- …
