23 research outputs found

    Plasma Physics Computations on Emerging Hardware Architectures

    Get PDF
    This thesis explores the potential of emerging hardware architectures to increase the impact of high performance computing in fusion plasma physics research. For next generation tokamaks like ITER, realistic simulations and data-processing tasks will become significantly more demanding of computational resources than current facilities. It is therefore essential to investigate how emerging hardware such as the graphics processing unit (GPU) and field-programmable gate array (FPGA) can provide the required computing power for large data-processing tasks and large scale simulations in plasma physics specific computations. The use of emerging technology is investigated in three areas relevant to nuclear fusion: (i) a GPU is used to process the large amount of raw data produced by the synthetic aperture microwave imaging (SAMI) plasma diagnostic, (ii) the use of a GPU to accelerate the solution of the Bateman equations which model the evolution of nuclide number densities when subjected to neutron irradiation in tokamaks, and (iii) an FPGA-based dataflow engine is applied to compute massive matrix multiplications, a feature of many computational problems in fusion and more generally in scientific computing. The GPU data processing code for SAMI provides a 60x acceleration over the previous IDL-based code, enabling inter-shot analysis in future campaigns and the data-mining (and therefore analysis) of stored raw data from previous MAST campaigns. The feasibility of porting the whole Bateman solver to a GPU system is demonstrated and verified against the industry standard FISPACT code. Finally a dataflow approach to matrix multiplication is shown to provide a substantial acceleration compared to CPU-based approaches and, whilst not performing as well as a GPU for this particular problem, is shown to be much more energy efficient. Emerging hardware technologies will no doubt continue to provide a positive contribution in terms of performance to many areas of fusion research and several exciting new developments are on the horizon with tighter integration of GPUs and FPGAs with their host central processor units. This should not only improve performance and reduce data transfer bottlenecks, but also allow more user-friendly programming tools to be developed. All of this has implications for ITER and beyond where emerging hardware technologies will no doubt provide the key to delivering the computing power required to handle the large amounts of data and more realistic simulations demanded by these complex systems

    Hardware implementation of non-bonded forces in molecular dynamics simulations

    Get PDF
    Molecular Dynamics is a computational method based on classical mechanics to describe the behavior of a molecular system. This method is used in biomolecular simulations, which are intended to contribute to the study and advance of nanotechnology, medicine, chemistry and biology. Software implementations of Molecular Dynamics simulations can spend most of time computing the non-bonded interactions. This work presents the design and implementation of an FPGA-based coprocessor that accelerates MD simulations by computing in parallel the non-bonded interactions, specifically, the van der Waals and the electrostatic interactions. These interactions are modeled as the Lennard-Jones 6-12 potential and the direct-space Ewald summation, respectively. In addition, this work introduces a novel variable transformation of the potential energy functions, and a novel interpolation method with pseudo-floating-point representation to compute the short-range forces. Also, it uses a combination of fixed-point and floating-point arithmetic to obtain the best of both representations. The FPGA coprocessor is a memory-mapped system connected to a host by PCI Express, and is provided with interruption capabilities to improve parallelization. Its main block is based on a single functional pipeline, and is connected via Avalon Bus to other peripherals such as the PCIe Hard-IP and the SG-DMA. It is implemented on an Altera¿s EP2AGX125EF35C4 device, can process 16k particles, and is configured to store up to 16 different types of particles. Simulations in a custom C-application for MD that only computes non-bonded forces become up to 12.5x faster using the FPGA coprocessor when considering 12500 atoms.PregradoINGENIERO(A) EN ELECTRÓNIC

    Bibliography of Lewis Research Center technical publications announced in 1984

    Get PDF
    This compilation of abstracts describes and indexes the technical reporting that resulted from the scientific and engineering work performed and managed by the Lewis Research Center in 1984. All the publications were announced in the 1984 issues of STAR (Scientific and Technical Aerospace Reports) and/or IAA (International Aerospace Abstracts). Included are research reports, journal articles, conference presentations, patents and patent applications, and theses

    2022 roadmap on neuromorphic computing and engineering

    Full text link
    Modern computation based on von Neumann architecture is now a mature cutting-edge science. In the von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018^{18} calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community

    SynthÚse et description de circuits numériques au niveau des transferts synchronisés par les données

    Get PDF
    RÉSUMÉ Au-delĂ  des processeurs d’instructions multi-coeurs, le monde du traitement numĂ©rique haute performance moderne est Ă©galement caractĂ©risĂ© par l’utilisation de circuits spĂ©cifiques Ă  un domaine d’application implĂ©mentĂ©s au moyen de circuits programmables FPGA (rĂ©seau de portes programmables in situ). Les FPGA reprĂ©sentent des candidats intĂ©ressants Ă  la rĂ©alisation de calculs haute-performances pour diffĂ©rentes raisons. D’une part, le nombre importants de blocs de propriĂ©tĂ©s intellectuelles gravĂ©s en dur sur ces puces (processeurs, mĂ©moires, unitĂ©s de traitement de signal numĂ©rique) rĂ©duit l’écart qui les sĂ©pare des circuits intĂ©grĂ©s dĂ©diĂ©s en termes de ressources disponibles. Un Ă©cart qui s’explique par le haut niveau de configurabilitĂ© offert par le circuit programmable, une capacitĂ© pour laquelle un grand nombre de ressources doit ĂȘtre dĂ©diĂ© sans ĂȘtre utilisĂ© par le circuit programmĂ©. NĂ©anmoins dans un contexte oĂč souvent plus de transistors sont disponibles qu’on puisse en utiliser, le coĂ»t associĂ© Ă  la configurabilitĂ© s’en trouve d’autant rĂ©duit. De par leur capacitĂ© Ă  ĂȘtre reconfigurĂ©s complĂštement ou partiellement, les FPGAs modernes, tout comme les processeurs d’instructions, offrent la flexibilitĂ© requise pour supporter un grand nombre d’applications. NĂ©anmoins, contrairement aux processeurs d’instructions qui peuvent ĂȘtre programmĂ©s avec diffĂ©rents langages de programmation haut-niveau (Java, C#, C/C++, MPI, OpenMP, OpenCL), la programmation d’un FPGA requiert la spĂ©cification d’un circuit numĂ©rique, ce qui reprĂ©sente un obstacle majeur Ă  leur plus grande adoption. La description de circuits numĂ©riques est gĂ©nĂ©ralement exprimĂ©e au moyen d’un langage concurrent pour lequel le niveau d’abstraction se situe au niveau des transferts entre registres (RTL), tels les langages VHDL et Verilog. Pour une application donnĂ©e, la rĂ©alisation d’un circuit numĂ©rique spĂ©cialisĂ© requiert typiquement un effort de conception significativement plus grand qu’une rĂ©alisation logicielle. Il existe aujourd’hui diffĂ©rents outils acadĂ©miques et commerciaux permettant la synthĂšse haut-niveau de circuits numĂ©riques en partant de descriptions C/C++/SystemC, et plus rĂ©cemment OpenCL. Cependant, selon l’application considĂ©rĂ©e, ces outils ne permettent pas toujours d’obtenir des performances comparables Ă  celles qui peuvent ĂȘtre obtenues avec une description RTL produite manuellement. On s’intĂ©resse dans ce travail Ă  un outil de synthĂšse de niveau intermĂ©diaire offrant un compromis entre les performances atteignables au moyen d’une mĂ©thode de conception RTL, ainsi que les temps de conception que permet la synthĂšse Ă  haut-niveau.----------ABSTRACT Beyond modern multi/many-cores processors, the world of computing is also caracterized by the use of dedicated circuits implemented on Field-Programmable Gate-Arrays (FPGAs). For many reasons, modern FPGAs have become interesting targets for high-performance computing applications. On one hand, their integration of considerable amounts of IP blocks (processors, memories, DSPs) has contributed to reduce the resource/performance gap that exist with Application Specific Integrated Devices (ASICs). A gap that is easily explained by the high-level of reconfigurability that these devices provide, a feature for which a considerable amount of resources (transistors) must be dedicated. Nevertheless, in a context where often more transistors are often available than it is needed or required, the impact of such a cost is less important. The ability to reconfigure completely or partially modern FPGAs further offer the flexibility required to support multiple different applications over time, similarly to instruction processors. However, while instruction processors can be programmed with different high abstraction level software programming languages (Java, C#, C/C++, MPI, OpenMP, OpenCL), FPGA programming typically requires the specification of a hardware design, which is a major obstacle to their widespread use. The description of a hardware design is generally done at the register-transfer level (RTL), using hardware description languages (HDLs) such as VHDL and Verilog. For a given application, the design and verification of a dedicated circuit requires a significantly more important effort than a software implementation. Nowadays, numerous commercial and academic tools allow the high-level synthesis of hardware designs starting from a software description using programming languages such as C/C++/SystemC, and more recently OpenCL. Nevertheless, depending on the application considered, at current state of the art, these tools do not allow performances that matches those which can be obtained through hand-made RTL designs. In this work, we consider an intermediate-level synthesis methodology offering a compromise between the performances and design times that can be obtained with RTL and high-level synthesis methodologies. We consider an input hardware description language that allows the description of algorithmic state machines (ASMs) handling connections between sources and sinks with predefined streaming interfaces. These interfaces are similar AXI4-Streaming and Avalon-Streaming interfaces, featuring ready-to-send/ready-to-receive synchronisation signals

    College of Engineering

    Full text link
    Cornell University Courses of Study Vol. 102 2010/201
    corecore