2,847 research outputs found
High volume colour image processing with massively parallel embedded processors
Currently Oc´e uses FPGA technology for implementing colour image processing for their high volume colour printers. Although FPGA technology provides enough performance it, however, has a rather tedious development process. This paper describes the research conducted on an alternative implementation technology: software defined massively parallel processing. It is shown that this technology not only leads to a reduction in development time but also adds flexibility to the design
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
The present panorama of HPC architectures is extremely heterogeneous, ranging
from traditional multi-core CPU processors, supporting a wide class of
applications but delivering moderate computing performance, to many-core GPUs,
exploiting aggressive data-parallelism and delivering higher performances for
streaming computing applications. In this scenario, code portability (and
performance portability) become necessary for easy maintainability of
applications; this is very relevant in scientific computing where code changes
are very frequent, making it tedious and prone to error to keep different code
versions aligned. In this work we present the design and optimization of a
state-of-the-art production-level LQCD Monte Carlo application, using the
directive-based OpenACC programming model. OpenACC abstracts parallel
programming to a descriptive level, relieving programmers from specifying how
codes should be mapped onto the target architecture. We describe the
implementation of a code fully written in OpenACC, and show that we are able to
target several different architectures, including state-of-the-art traditional
CPUs and GPUs, with the same code. We also measure performance, evaluating the
computing efficiency of our OpenACC code on several architectures, comparing
with GPU-specific implementations and showing that a good level of
performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for
consideration in International Journal of Modern Physics
Janus II: a new generation application-driven computer for spin-system simulations
This paper describes the architecture, the development and the implementation
of Janus II, a new generation application-driven number cruncher optimized for
Monte Carlo simulations of spin systems (mainly spin glasses). This domain of
computational physics is a recognized grand challenge of high-performance
computing: the resources necessary to study in detail theoretical models that
can make contact with experimental data are by far beyond those available using
commodity computer systems. On the other hand, several specific features of the
associated algorithms suggest that unconventional computer architectures, which
can be implemented with available electronics technologies, may lead to order
of magnitude increases in performance, reducing to acceptable values on human
scales the time needed to carry out simulation campaigns that would take
centuries on commercially available machines. Janus II is one such machine,
recently developed and commissioned, that builds upon and improves on the
successful JANUS machine, which has been used for physics since 2008 and is
still in operation today. This paper describes in detail the motivations behind
the project, the computational requirements, the architecture and the
implementation of this new machine and compares its expected performances with
those of currently available commercial systems.Comment: 28 pages, 6 figure
JANUS: an FPGA-based System for High Performance Scientific Computing
This paper describes JANUS, a modular massively parallel and reconfigurable
FPGA-based computing system. Each JANUS module has a computational core and a
host. The computational core is a 4x4 array of FPGA-based processing elements
with nearest-neighbor data links. Processors are also directly connected to an
I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for,
but not limited to, the requirements of a class of hard scientific applications
characterized by regular code structure, unconventional data manipulation
instructions and not too large data-base size. We discuss the architecture of
this configurable machine, and focus on its use on Monte Carlo simulations of
statistical mechanics. On this class of application JANUS achieves impressive
performances: in some cases one JANUS processing element outperfoms high-end
PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of
scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted
to Computing in Science & Engineerin
- …