57 research outputs found

    Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)

    Full text link
    Multi-component polymer systems are important for the development of new materials because of their ability to phase-separate or self-assemble into nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction with a soft, coarse-grained polymer model is an established technique to investigate these soft-matter systems. Here we present an im- plementation of this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is suitable to simulate large system sizes with up to billions of particles, yet versatile enough to study properties of different kinds of molecular architectures and interactions. We achieve efficiency of the simulations commissioning accelerators like GPUs on both workstations as well as supercomputers. The implementa- tion remains flexible and maintainable because of the implementation of the scientific programming language enhanced by OpenACC pragmas for the accelerators. We present implementation details and features of the program package, investigate the scalability of our implementation SOMA, and discuss two applications, which cover system sizes that are difficult to reach with other, common particle-based simulation methods

    A Fully-Pipelined Hardware Design for Gaussian Mixture Models

    No full text
    Gaussian Mixture Models (GMMs) are widely used in many applications such as data mining, signal processing and computer vision, for probability density modeling and soft clustering. However, the parameters of a GMM need to be estimated from data by, for example, the Expectation-Maximization algorithm for Gaussian Mixture Models (EM-GMM), which is computationally demanding. This paper presents a novel design for the EM-GMM algorithm targeting reconfigurable platforms, with five main contributions. First, a pipeline-friendly EM-GMM with diagonal covariance matrices that can easily be mapped to hardware architectures. Second, a function evaluation unit for Gaussian probability density based on fixed-point arithmetic. Third, our approach is extended to support a wide range of dimensions or/and components by fitting multiple pieces of smaller dimensions onto an FPGA chip. Fourth, we derive a cost and performance model that estimates logic resources. Fifth, our dataflow design targeting the Maxeler MPCX2000 with a Stratix-5SGSD8 FPGA can run over 200 times faster than a 6-core Xeon E5645 processor, and over 39 times faster than a Pascal TITAN-X GPU. Our design provides a practical solution to applications for training and explores better parameters for GMMs with hundreds of millions of high dimensional input instances, for low-latency and high-performance applications

    Coupled Kinetic-Fluid Simulations of Ganymede's Magnetosphere and Hybrid Parallelization of the Magnetohydrodynamics Model

    Full text link
    The largest moon in the solar system, Ganymede, is the only moon known to possess a strong intrinsic magnetic field. The interaction between the Jovian plasma and Ganymede's magnetic field creates a mini-magnetosphere with periodically varying upstream conditions, which creates a perfect laboratory in nature for studying magnetic reconnection and magnetospheric physics. Using the latest version of Space Weather Modeling Framework (SWMF), we study the upstream plasma interactions and dynamics in this subsonic, sub-Alfvénic system. We have developed a coupled fluid-kinetic Hall Magnetohydrodynamics with embedded Particle-in-Cell (MHD-EPIC) model for Ganymede's magnetosphere, with a self-consistently coupled resistive body representing the electrical properties of the moon's interior, improved inner boundary conditions, and high resolution charge and energy conserved PIC scheme. I reimplemented the boundary condition setup in SWMF for more versatile control and functionalities, and developed a new user module for Ganymede's simulation. Results from the models are validated with Galileo magnetometer data of all close encounters and compared with Plasma Subsystem (PLS) data. The energy fluxes associated with the upstream reconnection in the model is estimated to be about 10^-7 W/cm^2, which accounts for about 40% to the total peak auroral emissions observed by the Hubble Space Telescope. We find that under steady upstream conditions, magnetopause reconnection in our fluid-kinetic simulations occurs in a non-steady manner. Flux ropes with length of Ganymede's radius form on the magnetopause at a rate about 3/minute and create spatiotemporal variations in plasma and field properties. Upon reaching proper grid resolutions, the MHD-EPIC model can resolve both electron and ion kinetics at the magnetopause and show localized crescent shape distribution in both ion and electron phase space, non-gyrotropic and non-isotropic behavior inside the diffusion regions. The estimated global reconnection rate from the models is about 80 kV with 60% efficiency. There is weak evidence of sim1sim 1 minute periodicity in the temporal variations of the reconnection rate due to the dynamic reconnection process. The requirement of high fidelity results promotes the development of hybrid parallelized numerical model strategy and faster data processing techniques. The state-of-the-art finite volume/difference MHD code Block Adaptive Tree Solarwind Roe Upwind Scheme (BATS-R-US) was originally designed with pure MPI parallelization. The maximum problem size achievable was limited by the storage requirements of the block tree structure. To mitigate this limitation, we have added multithreaded OpenMP parallelization to the previous pure MPI implementation. We opt to use a coarse-grained approach by making the loops over grid blocks multithreaded and have succeeded in making BATS-R-US an efficient hybrid parallel code with modest changes in the source code while preserving the performance. Good weak scalings up to 50,0000 and 25,0000 of cores are achieved for the explicit and implicit time stepping schemes, respectively. This parallelization strategy greatly extends the possible simulation scale by an order of magnitude, and paves the way for future GPU-portable code development. To improve visualization and data processing, I have developed a whole new data processing workflow with the Julia programming language for efficient data analysis and visualization. As a summary, 1. I build a single fluid Hall MHD-EPIC model of Ganymede's magnetosphere; 2. I did detailed analysis of the upstream reconnection; 3. I developed a MPI+OpenMP parallel MHD model with BATSRUS; 4. I wrote a package for data analysis and visualization.PHDClimate and Space Sciences and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163032/1/hyzhou_1.pd

    Parallelization Strategies for Modern Computing Platforms: Application to Illustrative Image Processing and Computer Vision Applications

    Get PDF
    RÉSUMÉ L’évolution spectaculaire des technologies dans le domaine du matériel et du logiciel a permis l’émergence des nouvelles plateformes parallèles très performantes. Ces plateformes ont marqué le début d’une nouvelle ère de la computation et il est préconisé qu’elles vont rester dans le domaine pour une bonne période de temps. Elles sont présentes déjà dans le domaine du calcul de haute performance (en anglais HPC, High Performance Computer) ainsi que dans le domaine des systèmes embarqués. Récemment, dans ces domaines le concept de calcul hétérogène a été adopté pour atteindre des performances élevées. Ainsi, plusieurs types de processeurs sont utilisés, dont les plus populaires sont les unités centrales de traitement ou CPU (de l’anglais Central Processing Unit) et les processeurs graphiques ou GPU (de l’anglais Graphics Processing Units). La programmation efficace pour ces nouvelles plateformes parallèles amène actuellement non seulement des opportunités mais aussi des défis importants pour les concepteurs. Par conséquent, l’industrie a besoin de l’appui de la communauté de recherche pour assurer le succès de ce nouveau changement de paradigme vers le calcul parallèle. Trois défis principaux présents pour les processeurs GPU massivement parallèles (ou “many-cores”) ainsi que pour les processeurs CPU multi-coeurs sont: (1) la sélection de la meilleure plateforme parallèle pour une application donnée, (2) la sélection de la meilleure stratégie de parallèlisation et (3) le réglage minutieux des performances (ou en anglais performance tuning) pour mieux exploiter les plateformes existantes. Dans ce contexte, l’objectif global de notre projet de recherche est de définir de nouvelles solutions pour aider à la programmation efficace des applications complexes sur les plateformes parallèles modernes. Les principales contributions à la recherche sont: 1. L’évaluation de l’efficacité d’accélération pour plusieurs plateformes parallèles, dans le cas des applications de calcul intensif. 2. Une analyse quantitative des stratégies de parallèlisation et implantation sur les plateformes à base de processeurs CPU multi-cœur ainsi que pour les plateformes à base de processeurs GPU massivement parallèles. 3. La définition et la mise en place d’une approche de réglage de performances (en Anglais performance tuning) pour les plateformes parallèles. Les contributions proposées ont été validées en utilisant des applications réelles illustratives et un ensemble varié de plateformes parallèles modernes.----------ABSTRACT With the technology improvement for both hardware and software, parallel platforms started a new computing era and they are here to stay. Parallel platforms may be found in High Performance Computers (HPC) or embedded computers. Recently, both HPC and embedded computers are moving toward heterogeneous computing platforms. They are employing both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) to achieve the highest performance. Programming efficiently for parallel platforms brings new opportunities but also several challenges. Therefore, industry needs help from the research community to succeed in its recent dramatic shift to parallel computing. Parallel programing presents several major challenges. These challenges are equally present whether one programs on a many-core GPU or on a multi-core CPU. Three of the main challenges are: (1) Finding the best platform providing the required acceleration (2) Select the best parallelization strategy (3) Performance tuning to efficiently leverage the parallel platforms. In this context, the overall objective of our research is to propose a new solution helping designers to efficiently program complex applications on modern parallel architectures. The contributions of this thesis are: 1. The evaluation of the efficiency of several target parallel platforms to speedup compute-intensive applications. 2. The quantitative analysis for parallelization and implementation strategies on multicore CPUs and many-core GPUs. 3. The definition and implementation of a new performance tuning framework for heterogeneous parallel platforms. The contributions were validated using real computation intensive applications and modern parallel platform based on multi-core CPU and many-core GPU

    Online Modeling and Tuning of Parallel Stream Processing Systems

    Get PDF
    Writing performant computer programs is hard. Code for high performance applications is profiled, tweaked, and re-factored for months specifically for the hardware for which it is to run. Consumer application code doesn\u27t get the benefit of endless massaging that benefits high performance code, even though heterogeneous processor environments are beginning to resemble those in more performance oriented arenas. This thesis offers a path to performant, parallel code (through stream processing) which is tuned online and automatically adapts to the environment it is given. This approach has the potential to reduce the tuning costs associated with high performance code and brings the benefit of performance tuning to consumer applications where otherwise it would be cost prohibitive. This thesis introduces a stream processing library and multiple techniques to enable its online modeling and tuning. Stream processing (also termed data-flow programming) is a compute paradigm that views an application as a set of logical kernels connected via communications links or streams. Stream processing is increasingly used by computational-x and x-informatics fields (e.g., biology, astrophysics) where the focus is on safe and fast parallelization of specific big-data applications. A major advantage of stream processing is that it enables parallelization without necessitating manual end-user management of non-deterministic behavior often characteristic of more traditional parallel processing methods. Many big-data and high performance applications involve high throughput processing, necessitating usage of many parallel compute kernels on several compute cores. Optimizing the orchestration of kernels has been the focus of much theoretical and empirical modeling work. Purely theoretical parallel programming models can fail when the assumptions implicit within the model are mis-matched with reality (i.e., the model is incorrectly applied). Often it is unclear if the assumptions are actually being met, even when verified under controlled conditions. Full empirical optimization solves this problem by extensively searching the range of likely configurations under native operating conditions. This, however, is expensive in both time and energy. For large, massively parallel systems, even deciding which modeling paradigm to use is often prohibitively expensive and unfortunately transient (with workload and hardware). In an ideal world, a parallel run-time will re-optimize an application continuously to match its environment, with little additional overhead. This work presents methods aimed at doing just that through low overhead instrumentation, modeling, and optimization. Online optimization provides a good trade-off between static optimization and online heuristics. To enable online optimization, modeling decisions must be fast and relatively accurate. Online modeling and optimization of a stream processing system first requires the existence of a stream processing framework that is amenable to the intended type of dynamic manipulation. To fill this void, we developed the RaftLib C++ template library, which enables usage of the stream processing paradigm for C++ applications (it is the run-time which is the basis of almost all the work within this dissertation). An application topology is specified by the user, however almost everything else is optimizable by the run-time. RaftLib takes advantage of the knowledge gained during the design of several prior streaming languages (notably Auto-Pipe). The resultant framework enables online migration of tasks, auto-parallelization, online buffer-reallocation, and other useful dynamic behaviors that were not available in many previous stream processing systems. Several benchmark applications have been designed to assess the performance gains through our approaches and compare performance to other leading stream processing frameworks. Information is essential to any modeling task, to that end a low-overhead instrumentation framework has been developed which is both dynamic and adaptive. Discovering a fast and relatively optimal configuration for a stream processing application often necessitates solving for buffer sizes within a finite capacity queueing network. We show that a generalized gain/loss network flow model can bootstrap the process under certain conditions. Any modeling effort, requires that a model be selected; often a highly manual task, involving many expensive operations. This dissertation demonstrates that machine learning methods (such as a support vector machine) can successfully select models at run-time for a streaming application. The full set of approaches are incorporated into the open source RaftLib framework
    • …
    corecore