1,364 research outputs found

    Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations

    Full text link
    We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in abstract, modular, re-usable and combinable forms, which lets us generate versatile solvers from little set of Paraiso source codes. We demonstrate Paraiso by implementing a compressive hydrodynamics solver. A single source code less than 500 lines can be used to generate solvers of arbitrary dimensions, for both multicore CPUs and GPUs. We demonstrate both manual annotation based tuning and evolutionary computing based automated tuning of the program.Comment: 52 pages, 14 figures, accepted for publications in Computational Science and Discover

    Phases of polymer systems in solution studied via molecular dynamics

    Get PDF
    Polymers are versatile molecules that can self-assemble into a variety of phases in solution. The phases that form can be controlled by varying the concentration, temperature, or pH of the solution. Inorganic particles added to a solution of functionalized polymers also self-assemble into novel polymer nanocomposite materials. The determination of phase diagrams of these systems, as well as detailed calculations of their properties, is accomplished using Molecular Dynamics (MD) simulations. Additionally, algorithms are developed that implement MD on recent Graphics Processing Unit (GPU) hardware capable of astounding levels of performance. A single inexpensive GPU runs a MD simulation at the same performance as 63 CPU cores in a distributed memory cluster

    POWER AND PERFORMANCE STUDIES OF THE EXPLICIT MULTI-THREADING (XMT) ARCHITECTURE

    Get PDF
    Power and thermal constraints gained critical importance in the design of microprocessors over the past decade. Chipmakers failed to keep power at bay while sustaining the performance growth of serial computers at the rate expected by consumers. As an alternative, they turned to fitting an increasing number of simpler cores on a single die. While this is a step forward for relaxing the constraints, the issue of power is far from resolved and it is joined by new challenges which we explain next. As we move into the era of many-cores, processors consisting of 100s, even 1000s of cores, single-task parallelism is the natural path for building faster general-purpose computers. Alas, the introduction of parallelism to the mainstream general-purpose domain brings another long elusive problem to focus: ease of parallel programming. The result is the dual challenge where power efficiency and ease-of-programming are vital for the prevalence of up and coming many-core architectures. The observations above led to the lead goal of this dissertation: a first order validation of the claim that even under power/thermal constraints, ease-of-programming and competitive performance need not be conflicting objectives for a massively-parallel general-purpose processor. As our platform, we choose the eXplicit Multi-Threading (XMT) many-core architecture for fine grained parallel programs developed at the University of Maryland. We hope that our findings will be a trailblazer for future commercial products. XMT scales up to thousand or more lightweight cores and aims at improving single task execution time while making the task for the programmer as easy as possible. Performance advantages and ease-of-programming of XMT have been shown in a number of publications, including a study that we present in this dissertation. Feasibility of the hardware concept has been exhibited via FPGA and ASIC (per our partial involvement) prototypes. Our contributions target the study of power and thermal envelopes of an envisioned 1024-core XMT chip (XMT1024) under programs that exist in popular parallel benchmark suites. First, we compare XMT against an area and power equivalent commercial high-end many-core GPU. We demonstrate that XMT can provide an average speedup of 8.8x in irregular parallel programs that are common and important in general purpose computing. Even under the worst-case power estimation assumptions for XMT, average speedup is only reduced by half. We further this study by experimentally evaluating the performance advantages of Dynamic Thermal Management (DTM), when applied to XMT1024. DTM techniques are frequently used in current single and multi-core processors, however until now their effects on single-tasked many-cores have not been examined in detail. It is our purpose to explore how existing techniques can be tailored for XMT to improve performance. Performance improvements up to 46% over a generic global management technique has been demonstrated. The insights we provide can guide designers of other similar many-core architectures. A significant infrastructure contribution of this dissertation is a highly configurable cycle-accurate simulator, XMTSim. To our knowledge, XMTSim is currently the only publicly-available shared-memory many-core simulator with extensive capabilities for estimating power and temperature, as well as evaluating dynamic power and thermal management algorithms. As a major component of the XMT programming toolchain, it is not only used as the infrastructure in this work but also contributed to other publications and dissertations

    Computational Wave Field Modeling in Anisotropic Media

    Get PDF
    In this thesis, a meshless semi-analytical computational method is presented to compute the ultrasonic wave field in generalized anisotropic material while understanding the physics of wave propagation in detail. To understand the wave-damage interaction in an anisotropic material, it is neither feasible nor cost-effective to perform multiple experiments in the laboratory. Hence, recently the computational nondestructive evaluation (CNDE) received much attention to performing the NDE experiments in a virtual environment. In this thesis, a fundamental framework is constructed to perform the CNDE experiment of a thick composite specimen in a Pulse-Echo (PE) and through-transmission mode. To achieve the target, the following processes were proposed. The solution of the elastodynamic Green’s function at a spatial point in an anisotropic media was first obtained by solving the fundamental elastodynamic equation using Radon transform and Fourier transform. Next, the basic concepts of wave propagation behavior in a generalized material and the visualization of the anisotropic bulk wave modes were accomplished by solving the Christoffel’s Equation in 3D. Moreover, the displacement and stress Green’s functions in a generalized anisotropic material were calculated in the frequency domain. Frequency domain Green’s functions were achieved by superposing the effect of propagating eigen wave modes that were obtained from the Christoffel’s solution and integrated over the all possible directions of wave propagation by discretizing a sphere. MATLAB and C++ codes were developed to compute the displacement and stress Green\u27s functions numerically. The generated Green’s function is verified with the existing methodologies reported in the literature. Further, the numerically calculated Green’s functions were implemented and integrated with the meshless Distributed Point Source Method (DPSM). DPSM technique was used to virtually simulate NDE experiments of half-space, 1-layer plate, and multilayered plate anisotropic material for both pristine and damage state scenarios, inspected by a circular transducer immersed in fluid. The ultrasonic wave fields were calculated using DPSM after applying the boundary conditions and solving the unknown source strengths. A method named sequential mapping of poly-crepitus Green’s function was introduced and executed along with discretization angle optimization for the time- efficient computation of the wave fields. The full displacements and stress wave fields in transversely isotropic, fully orthotropic and monoclinic materials are presented in this thesis on different planes of the material. The time domain signal was generated for 1-ply plate at any given point for transversely isotropic, fully orthotropic and monoclinic materials. Finally, the wave field is presented for structures with damage scenarios such as material degradation and delamination and compared with pristine counterparts to visualize and understand the effect of damages/defect in material state

    DeePMD-kit v2: A software package for Deep Potential models

    Full text link
    DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range (DPLR), GPU support for customized operators, model compression, non-von Neumann molecular dynamics (NVNMD), and improved usability, including documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, the article benchmarks the accuracy and efficiency of different models and discusses ongoing developments.Comment: 51 pages, 2 figure

    Heterogeneous Acceleration for 5G New Radio Channel Modelling Using FPGAs and GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Image-based Control and Automation of High-speed X-ray Imaging Experiments

    Get PDF
    Moderne Röntgenbildgebung gibt Aufschluss über die innere Struktur von Objekten aus den verschiedensten Materialien. Der Erfolg solcher Messungen hängt dabei entscheidend von einer geeigneten Wahl der Aufnahmebedingungen ab, von der mechanischen Instrumentierung und von den Eigenschaften der Probe oder des untersuchten Prozesses selbst. Bisher gibt es kein bekanntes Verfahren für autonome Datenakquise, welches auch für sehr verschiedene Röntgenbildgebungsexperimenten die Steuerung über bildbasiertes Feedback erlaubt. Die vorliegende Arbeit setzt sich als Ziel, diese Lücke zu schließen, indem gezielt die hierbei auftretenden Probleme angegangen und gelöst werden: die Auswahl der experimentellen Startparameter, eine schnelle Verarbeitung der aufgenommenen Daten und ein automatisches Feedback zur Korrektur der laufenden Messprozedur. Um die am besten geeigneten experimentellen Bedingungen zu bestimmen, gehen wir von den Grundlagen der Bildentstehung aus und entwickeln ein Framework für dessen Simulation. Dieses ermöglicht uns eine große Bandbreite an virtuellen Röntgenbildgebungsexperimenten durchzuführen, wobei die entscheidenden physikalischen Prozesse auf dem Weg der Röntgenstrahlung von der Quelle bis zum Detektor berücksichtigt werden. Darüber hinaus betrachten wir verschiedene Probenformen und bewegungen, was uns die Simulation von Experimenten wie etwa 4D (zeitaufgelöster) Tomographie ermöglicht. Außerdem entwickeln wir eine autonome Prozedur für die Datenakquise, welches die Startbedingungen des Versuchs dann während der schon laufenden Messung auf Basis schneller Bildanalyse das nachjustiert und auch andere Parameter des Experiments steuern kann. Besonderes Augenmerk legen wir hier auf Hochgeschwindigkeitsexperimente, welche hohen Anforderungen an die Geschwindigkeit der Datenverarbeitung stellen, vor allem wenn die Steuerung auf rechenintensiven Algorithmen wie etwa für die tomographische 3D Rekonstruktion der Probe basiert. Um hierzu einen effizienten Algorithmus zu implementieren, verwenden wir ein hochgradig parallelisiertes Framework. Dessen Ausgabe kann dann zur Berechnung verschiedener Bildmetriken verwendet werden, um quantitative Information über die aufgenommenen Daten zu erhalten. Diese bilden die Grundlage zur Entscheidungsfindung in einem geschlossenen Regelkreis, in dem die Hardware für die Datenakquise betrieben wird. Die Genauigkeit des entwickelten Simulationsframeworks zeigen wir, indem wir virtuelle und reale Experimente vergleichen, die auf Gitterinterferometrie basieren und damit spezielle optische Elemente für die Kontrastbildung einsetzen. Außerdem untersuchen wir im Detail den Einfluss der Bildgebungsbedingungen auf die Genauigkeit des implementierten Algorithmus für gefilterte Rückprojektion, und inwiefern unter dessen Berücksichtigung eine Optimierung der experimentellen Bedingungen möglich ist. Wir demonstrieren die Fähigkeiten des von uns entwickelten Systems zur autonomen Datenakquise anhand eines in-situ Tomographieexperiments, bei dem es basierend auf 3D-Rekonstruktion die Framerate der Kamera optimiert und damit sicherstellt, dass die aufgezeichneten Datensätze ohne Artefakte rekonstruiert werden können. Außerdem nutzen wir unser System, um ein Tomographieexperiment mit hohem Probendurchsatz durchzuführen, bei dem viele ähnliche biologische Proben gescannt werde: Für jede davon wird automatisch die tomographische Rotationsachse bestimmt und schließlich zur Sicherstellung der Qualität schon während der Messung ein komplettes 3D Volumen rekonstruiert. Darüber hinaus führen wir ein in-situ Laminographieexperiment durch, welches die Rissbildung in einer Materialprobe untersucht. Hierbei führt unser System die Datenakquise durch und rekonstruiert einen zentral gelegenen Querschnitt durch die Probe, um dessen korrekte Ausrichtung und die Qualität der Daten sicherzustellen. Unsere Arbeit ermöglicht - basierend auf hochgenauen Simulationen - die Wahl der am besten geeigneten Startbedingungen eines Experiments, deren Feinabstimmung während eines realen Experiments und schließlich dessen automatische Steuerung basierend auf schneller Analyse der gerade aufgezeichneten Daten. Ein solches Vorgehen bei der Datenakquise ermöglicht neuartige in-vivo und in-situ Hochgeschwindigkeitsexperimente, die bedingt durch die hohen Datenraten nicht mehr von einer menschlichen Bedienperson gehandhabt werden könnten

    Designing advanced multistatic imaging systems with optimal 2D sparse arrays

    Get PDF
    This study introduces an innovative optimization method to identify the optimal configuration of a sparse symmetric 2D array for applications in security, particularly multistatic imaging. Utilizing genetic algorithms (GAs) in a sophisticated optimization process, the research focuses on achieving the most favorable antenna distribution while mitigating the common issue of secondary lobes in sparse arrays. The main objective is to determine the ideal configuration from specific design parameters, including hardware specifications such as number of radiating elements, minimum spacing, operating frequency range, and image separation distance. The study employed a cost function based on the the point spread function (PSF), the system response to a point source, with the goal of minimizing the secondary lobe levels and maximizing their separation from the main lobe. Advanced simulation algorithms based on physical optics (PO) were used to validate the presented methodology and results.Agencia Estatal de InvestigaciĂłn | Ref. PID2020-113979RB-C21Agencia Estatal de InvestigaciĂłn | Ref. PID2020-113979RB-C22Agencia Estatal de InvestigaciĂłn | Ref. RYC2021-033593-
    • …
    corecore