1,272 research outputs found
Fast, Scalable, and Interactive Software for Landau-de Gennes Numerical Modeling of Nematic Topological Defects
Numerical modeling of nematic liquid crystals using the tensorial Landau-de
Gennes (LdG) theory provides detailed insights into the structure and
energetics of the enormous variety of possible topological defect
configurations that may arise when the liquid crystal is in contact with
colloidal inclusions or structured boundaries. However, these methods can be
computationally expensive, making it challenging to predict (meta)stable
configurations involving several colloidal particles, and they are often
restricted to system sizes well below the experimental scale. Here we present
an open-source software package that exploits the embarrassingly parallel
structure of the lattice discretization of the LdG approach. Our
implementation, combining CUDA/C++ and OpenMPI, allows users to accelerate
simulations using both CPU and GPU resources in either single- or multiple-core
configurations. We make use of an efficient minimization algorithm, the Fast
Inertial Relaxation Engine (FIRE) method, that is well-suited to large-scale
parallelization, requiring little additional memory or computational cost while
offering performance competitive with other commonly used methods. In
multi-core operation we are able to scale simulations up to supra-micron length
scales of experimental relevance, and in single-core operation the simulation
package includes a user-friendly GUI environment for rapid prototyping of
interfacial features and the multifarious defect states they can promote. To
demonstrate this software package, we examine in detail the competition between
curvilinear disclinations and point-like hedgehog defects as size scale,
material properties, and geometric features are varied. We also study the
effects of an interface patterned with an array of topological point-defects.Comment: 16 pages, 6 figures, 1 youtube link. The full catastroph
Hardware acceleration of reaction-diffusion systems:a guide to optimisation of pattern formation algorithms using OpenACC
Reaction Diffusion Systems (RDS) have widespread applications in computational ecology, biology, computer graphics and the visual arts. For the former applications a major barrier to the development of effective simulation models is their computational complexity - it takes a great deal of processing power to simulate enough replicates such that reliable conclusions can be drawn. Optimizing the computation is thus highly desirable in order to obtain more results with less resources. Existing optimizations of RDS tend to be low-level and GPGPU based. Here we apply the higher-level OpenACC framework to two case studies: a simple RDS to learn the ‘workings’ of OpenACC and a more realistic and complex example. Our results show that simple parallelization directives and minimal data transfer can produce a useful performance improvement. The relative simplicity of porting OpenACC code between heterogeneous hardware is a key benefit to the scientific computing community in terms of speed-up and portability
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
Developing Efficient Discrete Simulations on Multicore and GPU Architectures
In this paper we show how to efficiently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU) TIN2017-89842
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers
AbstractGROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported
Computing for Perturbative QCD - A Snowmass White Paper
We present a study on high-performance computing and large-scale distributed
computing for perturbative QCD calculations.Comment: 21 pages, 5 table
A GPU-Accelerated Approach to Static Stability Assessments for Pallet Loading in Air Cargo
The static stability constraint is one of the most important constraints in pallet loading and plays a substantial role when assembling safe and loadable palletizing layouts. Current approaches reach their limits as soon as additional complexity is added, which is a given in the practice of air cargo logistics, or when performance becomes important. As our central objective, we explore a new approach to calculate static stability more performantly and to cover more complexity by relaxing several simplifying assumptions. The approach is implemented in a prototype and builds on the emerging technology of graphical processing unit acceleration in combination with physics engines. We propose a new artifact design and summarize the how-to knowledge in the form of abstracted design principles. Our results demonstrate an improvement in terms of performance depending on the underlying hardware. We develop a conceptual model to assist future research in choosing a solution technology
- …