73 research outputs found

    Porting a Lattice Boltzmann Simulation to FPGAs Using OmpSs

    Get PDF
    Reconfigurable computing, exploiting Field Programmable Gate Arrays (FPGA), has become of great interest for both academia and industry research thanks to the possibility to greatly accelerate a variety of applications. The interest has been further boosted by recent developments of FPGA programming frameworks which allows to design applications at a higher-level of abstraction, for example using directive based approaches. In this work we describe our first experiences in porting to FPGAs an HPC application, used to simulate Rayleigh-Taylor instability of fluids with different density and temperature using Lattice Boltzmann Methods. This activity is done in the context of the FET HPC H2020 EuroEXA project which is developing an energyefficient HPC system, at exa-scale level, based on Arm processors and FPGAs. In this work we use the OmpSs directive based programming model, one of the models available within the EuroEXA project. OmpSs is developed by the Barcelona Supercomputing Center (BSC) and allows to target FPGA devices as accelerators, but also commodity CPUs and GPUs, enabling code portability across different architectures. In particular, we describe the initial porting of this application, evaluating the programming efforts required, and assessing the preliminary performances on a Trenz development board hosting a Xilinx Zynq UltraScale+ MPSoC embedding a 16nm FinFET+ programmable logic and a multi-core Arm CPU

    Hybrid High Performance Computing (HPC) + Cloud for Scientific Computing

    Get PDF
    The HPC+Cloud framework has been built to enable on-premise HPC jobs to use resources from cloud computing nodes. As part of designing the software framework, public cloud providers, namely Amazon AWS, Microsoft Azure and NeCTAR were benchmarked against one another, and Microsoft Azure was determined to be the most suitable cloud component in the proposed HPC+Cloud software framework. Finally, an HPC+Cloud cluster was built using the HPC+Cloud software framework and then was validated by conducting HPC processing benchmarks

    A model-based design flow for embedded vision applications on heterogeneous architectures

    Get PDF
    The ability to gather information from images is straightforward to human, and one of the principal input to understand external world. Computer vision (CV) is the process to extract such knowledge from the visual domain in an algorithmic fashion. The requested computational power to process these information is very high. Until recently, the only feasible way to meet non-functional requirements like performance was to develop custom hardware, which is costly, time-consuming and can not be reused in a general purpose. The recent introduction of low-power and low-cost heterogeneous embedded boards, in which CPUs are combine with heterogeneous accelerators like GPUs, DSPs and FPGAs, can combine the hardware efficiency needed for non-functional requirements with the flexibility of software development. Embedded vision is the term used to identify the application of the aforementioned CV algorithms applied in the embedded field, which usually requires to satisfy, other than functional requirements, also non-functional requirements such as real-time performance, power, and energy efficiency. Rapid prototyping, early algorithm parametrization, testing, and validation of complex embedded video applications for such heterogeneous architectures is a very challenging task. This thesis presents a comprehensive framework that: 1) Is based on a model-based paradigm. Differently from the standard approaches at the state of the art that require designers to manually model the algorithm in any programming language, the proposed approach allows for a rapid prototyping, algorithm validation and parametrization in a model-based design environment (i.e., Matlab/Simulink). The framework relies on a multi-level design and verification flow by which the high-level model is then semi-automatically refined towards the final automatic synthesis into the target hardware device. 2) Relies on a polyglot parallel programming model. The proposed model combines different programming languages and environments such as C/C++, OpenMP, PThreads, OpenVX, OpenCV, and CUDA to best exploit different levels of parallelism while guaranteeing a semi-automatic customization. 3) Optimizes the application performance and energy efficiency through a novel algorithm for the mapping and scheduling of the application 3 tasks on the heterogeneous computing elements of the device. Such an algorithm, called exclusive earliest finish time (XEFT), takes into consideration the possible multiple implementation of tasks for different computing elements (e.g., a task primitive for CPU and an equivalent parallel implementation for GPU). It introduces and takes advantage of the notion of exclusive overlap between primitives to improve the load balancing. This thesis is the result of three years of research activity, during which all the incremental steps made to compose the framework have been tested on real case studie

    Study of Condensed Matter Systems with Monte Carlo Simulation on Heterogeneous Computing Systems

    Get PDF
    We study the Edwards-Anderson model on a simple cubic lattice with a finite constant external field. We employ an indicator composed of a ratio of susceptibilities at finite momenta, which was recently proposed to avoid the difficulties of a zero momentum quantity, for capturing the spin glass phase transition. Unfortunately, this new indicator is fairly noisy, so a large pool of samples at low temperature and small external field are needed to generate results with a sufficiently small statistical error for analysis. We thus implement the Monte Carlo method using graphics processing units to drastically speed up the simulation. We confirm previous findings that conventional indicators for the spin glass transition, including the Binder ratio and the correlation length do not show any indication of a transition for rather low temperatures. However, the ratio of spin glass susceptibilities does show crossing behavior, albeit a systematic analysis is beyond the reach of the present data. This reveals the difficulty with current numerical methods and computing capability in studying this problem. One of the fundamental challenges of theoretical condensed matter physics is the accurate solution of quantum impurity models. By taking expansion in the hybridization about an exactly solved local limit, one can formulate a quantum impurity solver. We implement the hybridization expansion quantum impurity solver on Intel Xeon Phi accelerators, and aim to apply this approach on the Dynamic Hubbard Models

    Domain engineering and generic programming for parallel scientific computing

    Get PDF
    Die Entwicklung von Software für wissenschaftliche Anwendungen, die auf dynamischen oder irregulären Gittern beruhen, ist mit vielen Problemen verbunden, da hier so unterschiedliche Ziele wie hohe Leistung und Flexibilität miteinander vereinbart werden müssen. Die vorliegende Dissertation geht diese Probleme folgendermaßen an: Zunächst werden die Ideen des domain engineering auf das Gebiet daten-paralleler Anwendungen angewandt, um wiederverwendbare Softwareprodukte zu entwerfen, deren Benutzung die Entwicklung konkreter Softwaresysteme auf diesem Gebiet beschleunigt. Hierbei wird eine umfassende Analyse datenparalleler Anwendungen durchgeführt und es werden allgemeine Anforderungen an zu entwickelnde Komponenten formuliert. In einem zweiten Schritt wird auf der Grundlage der gewonnen Kenntnisse und unter Benutzung der Ideen des generischen Programmierens die Janus Softwarearchitektur entworfen und implementiert. Das sich daraus ergebende konzeptionelle Gerüst und die C++-template Bibliothek Janus stellt eine flexible und erweiterbare Sammlung effizienter Datenstrukturen und Algorithmen für eine umfassende Klasse datenparalleler Anwendungen dar. Insbesondere werden finite Differenz- und finite Elementverfahren sowie datenparallele Graphalgorithmen unterstützt. Ein herausragender Vorteil einer generischen C++ Bibliothek wie Janus ist, dass ihre anwendungsorientierten Abstraktionen eine hohe Leistung liefern und dabei weder von Spracherweiterungen noch von nicht allgemein verfügbaren Kompilationstechniken abhängen. Die Benutzung von C++-Templates bei der Implementierung von Janus macht es sehr einfach, nutzerdefinierte Datentypen in die Komponenten von Janus zu integrieren, ohne dass dabei die Effizienz leidet. Ein weiterer Vorteil von Janus ist, dass es sehr einfach mit bereits existierenden Softwarepaketen kooperieren kann. Diese Dissertation beschreibt eine portable Implementierung von Janus für Architekturen mit verteiltem Speicher, die auf der standardisierten Kommunikationsbibliothek MPI beruht. Die Ausdruckskraft von Janus wird an Hand der Implementierung typischer Anwendungen aus dem Bereich des datenparallelen wissenschaftlichen Rechnens nachgewiesen. Die Leistungsfähigkeit der Komponenten von Janus wird bewertet, indem Janus-Applikationen mit vergleichbaren Implementierungen, die auf anderen Ansätzen beruhen, verglichen werden. Die Untersuchungen zur Skalierbarkeit von Janus-Applikationen auf einem Linux Clustersystem zeigen, dass Janus auch in dieser Hinsicht hohe Anforderungen erfüllt

    Efficient coarse-grained brownian dynamics simulations for dna and lipid bilayer membrane with hydrodynamic interactions

    Get PDF
    The coarse-grained molecular dynamics (CGMD) or Brownian dynamics (BD) simulation is a particle-based approach that has been applied to a wide range of biological problems that involve interactions with surrounding fluid molecules or the so-called hydrodynamic interactions (HIs). From simple biological systems such as a single DNA macromolecule to large and complicated systems, for instances, vesicles and red blood cells (RBCs), the numerical results have shown outstanding agreements with experiments and continuum modeling by adopting Stokesian dynamics and explicit solvent model. Finally, when combined with fast algorithms such as the fast multipole method (FMM) which has nearly optimal complexity in the total number of CG particles, the resulting method is parallelizable, scalable to large systems, and stable for large time step size, thus making the long-time large-scale BD simulation within practical reach. This will be useful for the study of a large collection of molecules or cells immersed in the fluids. This dissertation can be divided into three main subjects: (1) An efficient algorithm is proposed to simulate the motion of a single DNA molecule in linear flows. The algorithm utilizes the integrating factor method to cope with the effect of the linear flow of the surrounding fluid and applies the Metropolis method (MM) in [N. Bou-Rabee, A. Donev, and E. Vanden-Eijnden, Multiscale Model. Simul. 12, 781 (2014)] to achieve more efficient BD simulation. More importantly, this proposed method permits much larger time step size than methods in previous literature while still maintaining the stability of the BD simulation, which is advantageous for long-time BD simulation. The numerical results on λ-DNA agree very well with both experimental data and previous simulation results. (2) Lipid bilayer membranes have been extensively studied by CGMD simulations. Numerical efficiencies have been reported in the cases of aggressive coarse-graining, where several lipids are coarse-grained into a particle of size 4 ~ 6 nm so that there is only one particle in the thickness direction. In [H. Yuan et al., Phys. Rev. E, 82, 011905 (2010)], Yuan et al. proposed a pair-potential between these one-particle-thick coarse-grained lipid particles to capture the mechanical properties of a lipid bilayer membrane, such as gel-fluid-gas phase transitions of lipids, diffusion, and bending rigidity. This dissertation provides a detailed implementation of this interaction potential in LAMMPS to simulate large-scale lipid systems such as a giant unilamellar vesicle (GUV) and RBCs. Moreover, this work also considers the effect of cytoskeleton on the lipid membrane dynamics as a model for RBC dynamics, and incorporates coarse-grained water molecules to account for hydrodynamic interactions. (3) An action field method for lipid bilayer membrane model is introduced where several lipid molecules are represented by a Janus particle with corresponding orientation pointing from lipid head to lipid tail. With this level of coarse-grained modeling, as the preliminary setup, the lipid tails occupy a half sphere and the lipid heads take the other half. An action field is induced from lipid-lipid interactions and exists everywhere in the computational domain. Therefore, a hydrophobic attraction energy can be described from utilizing the variational approach and its minimizer with respect to the action field is the so-called screened Laplace equation. For the numerical method, the well-known integral equation method (IEM) has great capability to solve exterior screened Laplace equation with Dirichlet boundary conditions. Finally, one then can obtain the lipid dynamics to validate the self-assembly property and other physical properties of lipid bilayer membrane. This approach combines continuum modeling with CGMD and gives a different perspective to the membrane energy model from the traditional Helfrich membrane free energy

    High Performance Embedded Computing

    Get PDF
    Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platforms Programming models Mapping and scheduling of parallel computations Timing and schedulability analysis Runtimes and operating systemsThe work reflected in this book was done in the scope of the European project P SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things
    • …
    corecore