Search CORE

1,230 research outputs found

Efficient Molecular Dynamics Simulation on Reconfigurable Models with MultiGrid Method

Author: Cho Eunjung
Publication venue: ScholarWorks @ Georgia State University
Publication date: 22/04/2008
Field of study

In the field of biology, MD simulations are continuously used to investigate biological studies. A Molecular Dynamics (MD) system is defined by the position and momentum of particles and their interactions. The dynamics of a system can be evaluated by an N-body problem and the simulation is continued until the energy reaches equilibrium. Thus, solving the dynamics numerically and evaluating the interaction is computationally expensive even for a small number of particles in the system. We are focusing on long-ranged interactions, since the calculation time is O(N^2) for an N particle system. In this dissertation, we are proposing two research directions for the MD simulation. First, we design a new variation of Multigrid (MG) algorithm called Multi-level charge assignment (MCA) that requires O(N) time for accurate and efficient calculation of the electrostatic forces. We apply MCA and back interpolation based on the structure of molecules to enhance the accuracy of the simulation. Our second research utilizes reconfigurable models to achieve fast calculation time. We have been working on exploiting two reconfigurable models. We design FPGA-based MD simulator implementing MCA method for Xilinx Virtex-IV. It performs about 10 to 100 times faster than software implementation depending on the simulation accuracy desired. We also design fast and scalable Reconfigurable mesh (R-Mesh) algorithms for MD simulations. This work demonstrates that the large scale biological studies can be simulated in close to real time. The R-Mesh algorithms we design highlight the feasibility of these models to evaluate potentials with faster calculation times. Specifically, we develop R-Mesh algorithms for both Direct method and Multigrid method. The Direct method evaluates exact potentials and forces, but requires O(N^2) calculation time for evaluating electrostatic forces on a general purpose processor. The MG method adopts an interpolation technique to reduce calculation time to O(N) for a given accuracy. However, our R-Mesh algorithms require only O(N) or O(logN) time complexity for the Direct method on N linear R-Mesh and N¡¿N R-Mesh, respectively and O(r)+O(logM) time complexity for the Multigrid method on an X¡¿Y¡¿Z R-Mesh. r is N/M and M = X¡¿Y¡¿Z is the number of finest grid points

ScholarWorks @ Georgia State University

The future of computing beyond Moore's Law.

Author: Shalf John
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

Ezid

eScholarship - University of California

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

Author: Liu Jie
Mo Pinghui
Tan Ziling
Tao Ming
Wang Xiaonan
Zhang Xin
Zhao Dan
Zhao Zhuoying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/03/2023
Field of study

This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture

arXiv.org e-Print Archive

Real-time simulation of dynamic vehicle models using high performance reconﬁgurable computing platforms

Author: Monga Madhu
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2010
Field of study

A software-based approach for Real-Time Simulation (RTS) may have difficulties in meeting real-time constraints for complex models. In this thesis, we present a methodology for the design and implementation of RTS algorithms, based on the use of Field-Programmable Gate Array (FPGA) technology to improve the response time of these models. Our methodology utilizes traditional Hardware/Software co-design approaches to generate a heterogeneous architecture for an FPGA-based simulator. We have optimized the hardware design such that it efficiently utilizes the parallel nature of FPGAs and pipelines the independent operations. Further enhancement is obtained through the use of custom accelerators for common non-linear functions. Since the systems we examine have relatively low response time requirements, our approach greatly simplifies the software components by porting the computationally complex regions to hardware. We illustrate the partitioning of a hardware-based simulator design across dual FPGAs, initiate RTS using a system input from a Hardware-in-the-Loop (HIL) framework, and use these simulation results from our FPGA-based platform to perform response analysis. The total simulation time, which includes the time required to receive the system input over a socket (without HIL), software initialization, hardware computation, and transfer of simulation results back over a socket, shows a speedup of 2x as compared to a similar setup with no hardware acceleration. The correctness of the simulation output from the hardware has also been validated with the simulated results from the software-only design

Digital Repository @ Iowa State University (ISU)

Best practices for building hardware designs for living computational science applications

Author: NC DOCKS at The University of North Carolina at Charlotte
Pottathuparambil Robin Jacob
Publication venue
Publication date: 01/01/2013
Field of study

Scientific computing or Computational science, is a field of study where engineers and scientists use computer simulations to solve equations that model the physical world. In some cases, these equations come from the first principles of physics. In the past, these simulations were run on a single processor machine. However, due to various technological reasons, the performance of these machines are not likely to improve at the same rate as in the past. In order to improve the performance per watt of these simulations, special-purpose hardware accelerators can be used. This work mainly focuses on using FPGA-based hardware accelerators. In order to run these simulations on an FPGA accelerator, the application code needs to be re-factored into software and hardware sections. These faster simulations have motivated scientists to capture more behavior of the physical world. As additional behavior is captured, the application code needs to be re-factored each time, and a significant effort is required to re-build the design. Unfortunately, these multiple cycles of re-design reduces the overall productivity of scientists and engineers. This work proposes a set of hardware design guidelines for changing computational science codes or living computational science codes. These guidelines co-evolve the hardware with the software, reducing the overall effort of re-design and improving productivity. The design guidelines are evaluated for effectiveness, communicability, and broad applicability. Experimental results have shown that the overall re-design effort is reduced, and these guidelines are broadly applicable to a wide variety of scientific computing applications

The University of North Carolina at Greensboro

New strategies for the aerodynamic design optimization of aeronautical configurations through soft-computing techniques

Author: Andrés Pérez Esther
Publication venue
Publication date: 01/01/2012
Field of study

Premio Extraordinario de Doctorado de la UAH en 2013Lozano Rodríguez, Carlos, codir.This thesis deals with the improvement of the optimization process in the aerodynamic design of aeronautical configurations. Nowadays, this topic is of great importance in order to allow the European aeronautical industry to reduce their development and operational costs, decrease the time-to-market for new aircraft, improve the quality of their products and therefore maintain their competitiveness. Within this thesis, a study of the state-of-the-art of the aerodynamic optimization tools has been performed, and several contributions have been proposed at different levels: -One of the main drawbacks for an industrial application of aerodynamic optimization tools is the huge requirement of computational resources, in particular, for complex optimization problems, current methodological approaches would need more than a year to obtain an optimized aircraft. For this reason, one proposed contribution of this work is focused on reducing the computational cost by the use of different techniques as surrogate modelling, control theory, as well as other more software-related techniques as code optimization and proper domain parallelization, all with the goal of decreasing the cost of the aerodynamic design process. -Other contribution is related to the consideration of the design process as a global optimization problem, and, more specifically, the use of evolutionary algorithms (EAs) to perform a preliminary broad exploration of the design space, due to their ability to obtain global optima. Regarding this, EAs have been hybridized with metamodels (or surrogate models), in order to substitute expensive CFD simulations. In this thesis, an innovative approach for the global aerodynamic optimization of aeronautical configurations is proposed, consisting of an Evolutionary Programming algorithm hybridized with a Support Vector regression algorithm (SVMr) as a metamodel. Specific issues as precision, dataset training size, geometry parameterization sensitivity and techniques for design of experiments are discussed and the potential of the proposed approach to achieve innovative shapes that would not be achieved with traditional methods is assessed. -Then, after a broad exploration of the design space, the optimization process is continued with local gradient-based optimization techniques for a finer improvement of the geometry. Here, an automated optimization framework is presented to address aerodynamic shape design problems. Key aspects of this framework include the use of the adjoint methodology to make the computational requirements independent of the number of design variables, and Computer Aided Design (CAD)-based shape parameterization, which uses the flexibility of Non-Uniform Rational B-Splines (NURBS) to handle complex configurations. The mentioned approach is applied to the optimization of several test cases and the improvements of the proposed strategy and its ability to achieve efficient shapes will complete this study

e_Buah - Biblioteca Digital de la Universidad de Alcalá

New strategies for the aerodynamic design optimization of aeronautical configurations through soft-computing techniques

Author: Andrés Pérez Esther
Publication venue
Publication date: 01/01/2012
Field of study

e_Buah - Biblioteca Digital de la Universidad de Alcalá

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital de la Universidad de Alcalá

High performance communication on reconfigurable clusters

Author: Sheng Jiayi
Publication venue
Publication date: 02/11/2017
Field of study

High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud

Boston University Institutional Repository (OpenBU)

Modeling and Experimental Techniques to Demonstrate Nanomanipulation With Optical Tweezers

Author: Balijepalli Arvind K.
Publication venue
Publication date: 01/01/2011
Field of study

The development of truly three-dimensional nanodevices is currently impeded by the absence of effective prototyping tools at the nanoscale. Optical trapping is well established for flexible three-dimensional manipulation of components at the microscale. However, it has so far not been demonstrated to confine nanoparticles, for long enough time to be useful in nanoassembly applications. Therefore, as part of this work we demonstrate new techniques that successfully extend optical trapping to nanoscale manipulation. In order to extend optical trapping to the nanoscale, we must overcome certain challenges. For the same incident beam power, the optical binding forces acting on a nanoparticle within an optical trap are very weak, in comparison with forces acting on microscale particles. Consequently, due to Brownian motion, the nanoparticle often exits the trap in a very short period of time. We improve the performance of optical traps at the nanoscale by using closed-loop control. Furthermore, we show through laboratory experiments that we are able to localize nanoparticles to the trap using control systems, for sufficient time to be useful in nanoassembly applications, conditions under which a static trap set to the same power as the controller is unable to confine a same-sized particle. Before controlled optical trapping can be demonstrated in the laboratory, key tools must first be developed. We implement Langevin dynamics simulations to model the interaction of nanoparticles with an optical trap. Physically accurate simulations provide a robust platform to test new methods to characterize and improve the performance of optical tweezers at the nanoscale, but depend on accurate trapping force models. Therefore, we have also developed two new laboratory-based force measurement techniques that overcome the drawbacks of conventional force measurements, which do not accurately account for the weak interaction of nanoparticles in an optical trap. Finally, we use numerical simulations to develop new control algorithms that demonstrate significantly enhanced trapping of nanoparticles and implement these techniques in the laboratory. The algorithms and characterization tools developed as part of this work will allow the development of optical trapping instruments that can confine nanoparticles for longer periods of time than is currently possible, for a given beam power. Furthermore, the low average power achieved by the controller makes this technique especially suitable to manipulate biological specimens, but is also generally beneficial to nanoscale prototyping applications. Therefore, capabilities developed as part of this work, and the technology that results from it may enable the prototyping of three-dimensional nanodevices, critically required in many applications

CiteSeerX

Digital Repository at the University of Maryland