Search CORE

1,272 research outputs found

Fast, Scalable, and Interactive Software for Landau-de Gennes Numerical Modeling of Nematic Topological Defects

Author: Beller Daniel A
Sussman Daniel M
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Numerical modeling of nematic liquid crystals using the tensorial Landau-de Gennes (LdG) theory provides detailed insights into the structure and energetics of the enormous variety of possible topological defect configurations that may arise when the liquid crystal is in contact with colloidal inclusions or structured boundaries. However, these methods can be computationally expensive, making it challenging to predict (meta)stable configurations involving several colloidal particles, and they are often restricted to system sizes well below the experimental scale. Here we present an open-source software package that exploits the embarrassingly parallel structure of the lattice discretization of the LdG approach. Our implementation, combining CUDA/C++ and OpenMPI, allows users to accelerate simulations using both CPU and GPU resources in either single- or multiple-core configurations. We make use of an efficient minimization algorithm, the Fast Inertial Relaxation Engine (FIRE) method, that is well-suited to large-scale parallelization, requiring little additional memory or computational cost while offering performance competitive with other commonly used methods. In multi-core operation we are able to scale simulations up to supra-micron length scales of experimental relevance, and in single-core operation the simulation package includes a user-friendly GUI environment for rapid prototyping of interfacial features and the multifarious defect states they can promote. To demonstrate this software package, we examine in detail the competition between curvilinear disclinations and point-like hedgehog defects as size scale, material properties, and geometric features are varied. We also study the effects of an interface patterned with an array of topological point-defects.Comment: 16 pages, 6 figures, 1 youtube link. The full catastroph

arXiv.org e-Print Archive

eScholarship - University of California

Hardware acceleration of reaction-diffusion systems:a guide to optimisation of pattern formation algorithms using OpenACC

Author: Falconer Ruth E.
Houston Alasdair N.
Otten Wilfred
Portell Xavier
Publication venue
Publication date: 10/06/2019
Field of study

Reaction Diffusion Systems (RDS) have widespread applications in computational ecology, biology, computer graphics and the visual arts. For the former applications a major barrier to the development of effective simulation models is their computational complexity - it takes a great deal of processing power to simulate enough replicates such that reliable conclusions can be drawn. Optimizing the computation is thus highly desirable in order to obtain more results with less resources. Existing optimizations of RDS tend to be low-level and GPGPU based. Here we apply the higher-level OpenACC framework to two case studies: a simple RDS to learn the ‘workings’ of OpenACC and a more realistic and complex example. Our results show that simple parallelization directives and minimal data transfer can produce a useful performance improvement. The relative simplicity of porting OpenACC code between heterogeneous hardware is a key benefit to the scientific computing community in terms of speed-up and portability

Abertay Research Portal

Crossref

Direct $N$ -body code on low-power embedded ARM GPUs

Author: AR Brodtkorb
E Bortolas
F Perez
J Hunter
K Nitadori
K Nitadori
M Katevenis
M Spera
R Capuzzo-Dolcetta
R Capuzzo-Dolcetta
S Harfst
S Konstantinidis
S Walt van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2019
Field of study

This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are heterogeneous architecture where computing power is supplied both by CPUs and GPUs, and are emerging as a possible low-power and low-cost alternative to clusters based on traditional CPUs. A state-of-the-art direct

N

-body code suitable for astrophysical simulations has been re-engineered in order to exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs. Performance tests show that embedded GPUs can be effectively used to accelerate real-life scientific calculations, and that are promising also because of their energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the Computing Conference 2019 proceeding

arXiv.org e-Print Archive

Crossref

Developing Efﬁcient Discrete Simulations on Multicore and GPU Architectures

Author: Cagigas Muñiz Daniel
Díaz del Río Fernando
Guisado Lízar José Luís
Jiménez-Morales Francisco de Paula
López-Torres Manuel Ramón
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

In this paper we show how to efﬁciently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientiﬁc computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, coﬁnanced by FEDER funds (EU) TIN2017-89842

idUS. Depósito de Investigación Universidad de Sevilla

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Author: Abraham Mark James
Hess Berk
Lindahl Erik
Murtola Teemu
Páll Szilárd
Schulz Roland
Smith Jeremy C.
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 01/01/2015
Field of study

AbstractGROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported

Publikationer från KTH

Elsevier - Publisher Connector

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Computing for Perturbative QCD - A Snowmass White Paper

Author: /Argonne
/Fermilab
/LBNL Berkeley
/SLAC
/SLAC
/UCLA
Bauer Christian
Bern Zvi
Boughezal Radja
Campbell John
Christensen Neil
Dixon Lance
Gehrmann Thomas
Hoeche Stefan
Kanzaki Junichi
Mitov Alexander
Nadolsky Pavel
Olness Fredrick
Peskin Michael
Petriello Frank
Pittsburgh /U.
Pozzorini Stefano
Reina Laura
Siegert Frank
Wackeroth Doreen
Walsh Jonathan
Williams Ciaran
Wobisch Markus
Zurich /U.
Publication venue
Publication date: 13/09/2013
Field of study

We present a study on high-performance computing and large-scale distributed computing for perturbative QCD calculations.Comment: 21 pages, 5 table

arXiv.org e-Print Archive

UNT Digital Library

CERN Document Server

A GPU-Accelerated Approach to Static Stability Assessments for Pallet Loading in Air Cargo

Author: Lee No-San
Mazur Philipp Gabriel
Schoder Detlef
Publication venue: 'HICSS Conference Office'
Publication date: 03/01/2022
Field of study

The static stability constraint is one of the most important constraints in pallet loading and plays a substantial role when assembling safe and loadable palletizing layouts. Current approaches reach their limits as soon as additional complexity is added, which is a given in the practice of air cargo logistics, or when performance becomes important. As our central objective, we explore a new approach to calculate static stability more performantly and to cover more complexity by relaxing several simplifying assumptions. The approach is implemented in a prototype and builds on the emerging technology of graphical processing unit acceleration in combination with physics engines. We propose a new artifact design and summarize the how-to knowledge in the form of abstracted design principles. Our results demonstrate an improvement in terms of performance depending on the underlying hardware. We develop a conceptual model to assist future research in choosing a solution technology

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)