28,280 research outputs found

    Towards a Holistic CAD Platform for Nanotechnologies

    Get PDF
    Silicon-based CMOS technologies are predicted to reach their ultimate limits by the middle of the next decade. Research on nanotechnologies is actively conducted, in a world-wide effort to develop new technologies able to maintain the Moore's law. They promise revolutionizing the computing systems by integrating tremendous numbers of devices at low cost. These trends will have a profound impact on the architectures of computing systems and will require a new paradigm of CAD. The paper presents a work in progress on this direction. It is aimed at fitting requirements and constraints of nanotechnologies, in an effort to achieve efficient use of the huge computing power promised by them. To achieve this goal we are developing CAD tools able to exploit efficiently these huge computing capabilities promised by nanotechnologies in the domain of simulation of complex systems composed by huge numbers of relatively simple elements.Comment: Submitted on behalf of TIMA Editions (http://irevues.inist.fr/tima-editions

    Magnetic Cellular Nonlinear Network with Spin Wave Bus for Image Processing

    Full text link
    We describe and analyze a cellular nonlinear network based on magnetic nanostructures for image processing. The network consists of magneto-electric cells integrated onto a common ferromagnetic film - spin wave bus. The magneto-electric cell is an artificial two-phase multiferroic structure comprising piezoelectric and ferromagnetic materials. A bit of information is assigned to the cell's magnetic polarization, which can be controlled by the applied voltage. The information exchange among the cells is via the spin waves propagating in the spin wave bus. Each cell changes its state as a combined effect of two: the magneto-electric coupling and the interaction with the spin waves. The distinct feature of the network with spin wave bus is the ability to control the inter-cell communication by an external global parameter - magnetic field. The latter makes possible to realize different image processing functions on the same template without rewiring or reconfiguration. We present the results of numerical simulations illustrating image filtering, erosion, dilation, horizontal and vertical line detection, inversion and edge detection accomplished on one template by the proper choice of the strength and direction of the external magnetic field. We also present numerical assets on the major network parameters such as cell density, power dissipation and functional throughput, and compare them with the parameters projected for other nano-architectures such as CMOL-CrossNet, Quantum Dot Cellular Automata, and Quantum Dot Image Processor. Potentially, the utilization of spin waves phenomena at the nanometer scale may provide a route to low-power consuming and functional logic circuits for special task data processing

    On the Way to Future's High Energy Particle Physics Transport Code

    Full text link
    High Energy Physics (HEP) needs a huge amount of computing resources. In addition data acquisition, transfer, and analysis require a well developed infrastructure too. In order to prove new physics disciplines it is required to higher the luminosity of the accelerator facilities, which produce more-and-more data in the experimental detectors. Both testing new theories and detector R&D are based on complex simulations. Today have already reach that level, the Monte Carlo detector simulation takes much more time than real data collection. This is why speed up of the calculations and simulations became important in the HEP community. The Geant Vector Prototype (GeantV) project aims to optimize the most-used particle transport code applying parallel computing and to exploit the capabilities of the modern CPU and GPU architectures as well. With the maximized concurrency at multiple levels the GeantV is intended to be the successor of the Geant4 particle transport code that has been used since two decades successfully. Here we present our latest result on the GeantV tests performances, comparing CPU/GPU based vectorized GeantV geometrical code to the Geant4 version

    QCD simulations with staggered fermions on GPUs

    Full text link
    We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU extensions of our code and discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer Physics Communication

    Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

    Get PDF
    Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)
    corecore