28,280 research outputs found
Towards a Holistic CAD Platform for Nanotechnologies
Silicon-based CMOS technologies are predicted to reach their ultimate limits
by the middle of the next decade. Research on nanotechnologies is actively
conducted, in a world-wide effort to develop new technologies able to maintain
the Moore's law. They promise revolutionizing the computing systems by
integrating tremendous numbers of devices at low cost. These trends will have a
profound impact on the architectures of computing systems and will require a
new paradigm of CAD. The paper presents a work in progress on this direction.
It is aimed at fitting requirements and constraints of nanotechnologies, in an
effort to achieve efficient use of the huge computing power promised by them.
To achieve this goal we are developing CAD tools able to exploit efficiently
these huge computing capabilities promised by nanotechnologies in the domain of
simulation of complex systems composed by huge numbers of relatively simple
elements.Comment: Submitted on behalf of TIMA Editions
(http://irevues.inist.fr/tima-editions
Magnetic Cellular Nonlinear Network with Spin Wave Bus for Image Processing
We describe and analyze a cellular nonlinear network based on magnetic
nanostructures for image processing. The network consists of magneto-electric
cells integrated onto a common ferromagnetic film - spin wave bus. The
magneto-electric cell is an artificial two-phase multiferroic structure
comprising piezoelectric and ferromagnetic materials. A bit of information is
assigned to the cell's magnetic polarization, which can be controlled by the
applied voltage. The information exchange among the cells is via the spin waves
propagating in the spin wave bus. Each cell changes its state as a combined
effect of two: the magneto-electric coupling and the interaction with the spin
waves. The distinct feature of the network with spin wave bus is the ability to
control the inter-cell communication by an external global parameter - magnetic
field. The latter makes possible to realize different image processing
functions on the same template without rewiring or reconfiguration. We present
the results of numerical simulations illustrating image filtering, erosion,
dilation, horizontal and vertical line detection, inversion and edge detection
accomplished on one template by the proper choice of the strength and direction
of the external magnetic field. We also present numerical assets on the major
network parameters such as cell density, power dissipation and functional
throughput, and compare them with the parameters projected for other
nano-architectures such as CMOL-CrossNet, Quantum Dot Cellular Automata, and
Quantum Dot Image Processor. Potentially, the utilization of spin waves
phenomena at the nanometer scale may provide a route to low-power consuming and
functional logic circuits for special task data processing
On the Way to Future's High Energy Particle Physics Transport Code
High Energy Physics (HEP) needs a huge amount of computing resources. In
addition data acquisition, transfer, and analysis require a well developed
infrastructure too. In order to prove new physics disciplines it is required to
higher the luminosity of the accelerator facilities, which produce
more-and-more data in the experimental detectors. Both testing new theories and
detector R&D are based on complex simulations. Today have already reach that
level, the Monte Carlo detector simulation takes much more time than real data
collection. This is why speed up of the calculations and simulations became
important in the HEP community. The Geant Vector Prototype (GeantV) project
aims to optimize the most-used particle transport code applying parallel
computing and to exploit the capabilities of the modern CPU and GPU
architectures as well. With the maximized concurrency at multiple levels the
GeantV is intended to be the successor of the Geant4 particle transport code
that has been used since two decades successfully. Here we present our latest
result on the GeantV tests performances, comparing CPU/GPU based vectorized
GeantV geometrical code to the Geant4 version
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of
lattice QCD with two staggered flavors on Graphics Processing Units, using the
NVIDIA CUDA programming language. The main feature of our code is that the GPU
is not used just as an accelerator, but instead the whole Molecular Dynamics
trajectory is performed on it. After pointing out the main bottlenecks and how
to circumvent them, we discuss the obtained performances. We present some
preliminary results regarding OpenCL and multiGPU extensions of our code and
discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer
Physics Communication
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
- …