19 research outputs found

    Parallel Brownian dynamics simulations with the message-passing and PGAS programming models

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Computer Physics Communications. The final authenticated version is available online at: https://doi.org/10.1016/j.cpc.2012.12.015[Abstract] The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes.Ministerio de Ciencia e Innovación ; TIN2010-16735Xunta de Galicia; ref. 2010/

    Extended collectives library for unified parallel C

    Get PDF
    [Abstract] Current multicore processors mitigate single-core processor problems (e.g., power, memory and instruction-level parallelism walls), but they have raised the programmability wall. In this scenario, the use of a suitable parallel programming model is key to facilitate a paradigm shift from sequential application development while maximizing the productivity of code developers. At this point, the PGAS (Partitioned Global Address Space) paradigm represents a relevant research advance for its application to multicore systems, as its memory model, with a shared memory view while providing private memory for taking advantage of data locality, mimics the memory structure provided by these architectures. Unified Parallel C (UPC), a PGAS-based extension of ANSI C, has been grabbing the attention of developers for the last years. Nevertheless, the focus on improving performance of current UPC compilers/ runtimes has been relegating the goal of providing higher programmability, where the available constructs have not always guaranteed good performance. Therefore, this Thesis focuses on making original contributions to the state of the art of UPC programmability by means of two main tasks: (1) presenting an analytical and empirical study of the features of the language, and (2) providing new functionalities that favor programmability, while not hampering performance. Thus, the main contribution of this Thesis is the development of a library of extended collective functions, which complements and improves the existing UPC standard library with programmable constructs based on efficient algorithms. A UPC MapReduce framework (UPC-MR) has also been implemented to support this highly scalable computing model for UPC applications. Finally, the analysis and development of relevant kernels and applications (e.g., a large parallel particle simulation based on Brownian dynamics) confirm the usability of these libraries, concluding that UPC can provide high performance and scalability, especially for environments with a large number of threads at a competitive development cost

    Parallel Lagrangian particle transport : application to respiratory system airways

    Get PDF
    This thesis is focused on particle transport in the context of high computing performance (HPC) in its widest range, from the numerical modeling to the physics involved, including its parallelization and post-process. The main goal is to obtain a general framework that enables understanding all the requirements and characteristics of particle transport using the Lagrangian frame of reference. Although the idea is to provide a suitable model for any engineering application that involves particle transport simulation, this thesis uses the respiratory system framework. This means that all the simulations are focused on this topic, including the benchmarks for testing, verifying and optimizing the results. Other applications, such as combustion, ocean residuals, or automotive, have also been simulated by other researchers using the same numerical model proposed here. However, they have not been included here in the interest of allowing the project to advance in a specific direction, and facilitate the structure and comprehension of this work. Human airways and respiratory system simulations are of special interest for medical purposes. Indeed, human airways can be significantly different in every individual. This complicates the study of drug delivery efficiency, deposition of polluted particles, etc., using classic in-vivo or in-vitro techniques. In other words, flow and deposition results may vary depending on the geometry of the patient and simulations allow customized studies using specific geometries. With the help of the new computational techniques, in the near future it may be possible to optimize nasal drugs delivery, surgery or other medical studies for each individual patient though a more personalized medicine. In summary, this thesis prioritizes numerical modeling, wide usability, performance, parallelization, and the study of the physics that affects particle transport. In addition, the simulation of the respiratory system should carry out interesting biological and medical results. However, the interpretation of these results will be only done from a pure numerical point of view.Aquesta tesi se centra en el transport de partícules dins el context de la computació d'alt rendiment (HPC), en el seu ventall més ampli; des del model numèric fins a la física involucrada, incloent-hi la part de paral·lelització del codi i de post-procés. L'objectiu principal és obtenir un esquema general que permeti entendre tant els requeriments com les característiques del transport de partícules fent servir el marc de referència Lagrangià. Encara que la idea sigui definir un model capaç¸ de simular qualsevol aplicació en el camp de l'enginyeria que involucri el transport de partícules, aquesta tesi utilitza el sistema respiratori com a temàtica de referència. Això significa que totes les simulacions estan emmarcades en aquest camp d'estudi, incloent-hi els tests de referència, verificacions i optimitzacions de resultats. L'estudi d'altres aplicacions, com ara la combustió, els residus oceànics, l'automoció o l'aeronàutica també han estat dutes a terme per altres investigadors utilitzant el mateix model numèric proposat aquí. Tot i així, aquests resultats no han estat inclosos en aquesta tesi per simplificar-la i avançar en una sola direcció; facilitant així l'estructura i millor comprensió d'aquest treball. Pel que fa al sistema respiratori humà i les seves simulacions, tenen especial interès per a propòsits mèdics. Particularment, la geometria dels conductes respiratoris pot variar de manera considerable en cada persona. Això complica l'estudi en aspectes com el subministrament de medicaments o la deposició de partícules contaminants, per exemple, utilitzant les tècniques clàssiques de laboratori (in-vivo o in-vitro). En altres paraules, tant el flux com la deposició poden canviar en funció de la geometria del pacient i aquí és on les simulacions permeten estudis adaptats a geometries concretes. Gràcies a les noves tècniques de computació, en un futur proper és probable que puguem optimitzar el subministrament de medicaments per via nasal, la cirurgia o altres estudis mèdics per a cada pacient mitjançant una medicina més personalitzada. En resum, aquesta tesi prioritza el model numèric, l'amplitud d'usos, el rendiment, la paral·lelització i l'estudi de la física que afecta directament a les partícules. A més, el fet de basar les nostres simulacions en el sistema respiratori dota aquesta tesi d'un interès biològic i mèdic pel que fa als resultats

    HIGH PERFORMANCE AGENT-BASED MODELS WITH REAL-TIME IN SITU VISUALIZATION OF INFLAMMATORY AND HEALING RESPONSES IN INJURED VOCAL FOLDS

    Get PDF
    The introduction of clusters of multi-core and many-core processors has played a major role in recent advances in tackling a wide range of new challenging applications and in enabling new frontiers in BigData. However, as the computing power increases, the programming complexity to take optimal advantage of the machine's resources has significantly increased. High-performance computing (HPC) techniques are crucial in realizing the full potential of parallel computing. This research is an interdisciplinary effort focusing on two major directions. The first involves the introduction of HPC techniques to substantially improve the performance of complex biological agent-based models (ABM) simulations, more specifically simulations that are related to the inflammatory and healing responses of vocal folds at the physiological scale in mammals. The second direction involves improvements and extensions of the existing state-of-the-art vocal fold repair models. These improvements and extensions include comprehensive visualization of large data sets generated by the model and a significant increase in user-simulation interactivity. We developed a highly-interactive remote simulation and visualization framework for vocal fold (VF) agent-based modeling (ABM). The 3D VF ABM was verified through comparisons with empirical vocal fold data. Representative trends of biomarker predictions in surgically injured vocal folds were observed. The physiologically representative human VF ABM consisted of more than 15 million mobile biological cells. The model maintained and generated 1.7 billion signaling and extracellular matrix (ECM) protein data points in each iteration. The VF ABM employed HPC techniques to optimize its performance by concurrently utilizing the power of multi-core CPU and multiple GPUs. The optimization techniques included the minimization of data transfer between the CPU host and the rendering GPU. These transfer minimization techniques also reduced transfers between peer GPUs in multi-GPU setups. The data transfer minimization techniques were executed with a scheduling scheme that aims to achieve load balancing, maximum overlap of computation and communication, and a high degree of interactivity. This scheduling scheme achieved optimal interactivity by hyper-tasking the available GPUs (GHT). In comparison to the original serial implementation on a popular ABM framework, NetLogo, these schemes have shown substantial performance improvements of 400x and 800x for the 2D and 3D model, respectively. Furthermore, the combination of data footprint and data transfer reduction techniques with GHT achieved high-interactivity visualization with an average framerate of 42.8 fps. This performance enabled the users to perform real-time data exploration on large simulated outputs and steer the course of their simulation as needed

    Low-Impact Profiling of Streaming, Heterogeneous Applications

    Get PDF
    Computer engineers are continually faced with the task of translating improvements in fabrication process technology: i.e., Moore\u27s Law) into architectures that allow computer scientists to accelerate application performance. As feature-size continues to shrink, architects of commodity processors are designing increasingly more cores on a chip. While additional cores can operate independently with some tasks: e.g. the OS and user tasks), many applications see little to no improvement from adding more processor cores alone. For many applications, heterogeneous systems offer a path toward higher performance. Significant performance and power gains have been realized by combining specialized processors: e.g., Field-Programmable Gate Arrays, Graphics Processing Units) with general purpose multi-core processors. Heterogeneous applications need to be programmed differently than traditional software. One approach, stream processing, fits these systems particularly well because of the segmented memories and explicit expression of parallelism. Unfortunately, debugging and performance tools that support streaming, heterogeneous applications do not exist. This dissertation presents TimeTrial, a performance measurement system that enables performance optimization of streaming applications by profiling the application deployed on a heterogeneous system. TimeTrial performs low-impact measurements by dedicating computing resources to monitoring and by aggressively compressing performance traces into statistical summaries guided by user specification of the performance queries of interest

    Multi-scale modeling of particle-laden flows

    Get PDF
    Particle-laden flow occur in a wide range of engineering applications such as combustors, gasifiers, fluidized beds and pollution control systems. Particle-flow interactions are complex, especially in turbulent and confined flows. A proper understanding of these interactions is critical in designing devices with better performance characteristics. In this work, particle-laden flows in channels are numerically investigated with the lattice-Boltzmann method (LBM). A three-dimensional parallelized lattice-Boltzmann method code is developed to carry out these studies. The code resolves the particle surface and the boundary layer surrounding it to gain fundamental insights into particle-flow interactions. The lattice-Boltzmann method is assessed for its accuracy in solving several standard single-phase and multi-phase, laminar and turbulent flows. Direct numerical simulations (DNS) of particle-laden channel flows are then performed. When the particle diameter is smaller than the Kolmogorov length scale, direct numerical simulations (DNS) with the point-particle approximation show that the Stokes number, St, mass loading of particles, i.e. ratio of mass of dispersed to carried phase, and particle diameter, are important parameters that determine the distribution of the particles across the channel cross-section and the impact of the particles on the flow field. When the St is infinitesimally small, the particles are uniformly distributed across the cross-section of the channel. As St is increased, the particle concentration near the wall increases. At even higher St, the particle concentration near the wall decreases, but it increases at the center of the channel. These changes in concentration are attributed to turbophoresis which causes preferential movement of the particles. The impact of the turbophoretic force is affected by St and particle diameter. The parameters that influence the mean flow field of the carrier phase is primarily the mass loading. To further improve the understanding of the physics of the flow, particle-resolved direct numerical simulations (PR-DNS) are carried out. Particle motion in a laminar channel flow is initially studied. The trajectory of a single particle is examined. It is shown that the mean equilibrium position of the particle in the channel depends on the St. Particles with low St reach an equilibrium position that lies between the wall and the center of the channel (Segre-Silberberg effect) while those with high St begin to oscillate about the center of the channel as they are transported by the fluid. The particle location and motion are determined by the interplay of three forces acting on the particle in the wall normal direction: the Saffman lift, Magnus lift and wall repulsion. Saffman lift and Magnus lift act to move the particle towards the wall while wall-repulsion opposes this motion. Direct numerical simulations of turbulent flow past stationary particles in a channel are then carried out. These simulations provide information about particle-flow interactions when the particle is near the wall and at the center. Multiple particles fixed in a cross-sectional plane are also considered. The position of the particles in the channel, the particle size, the Reynolds number and the number of particles are varied. The details of the flow field are analyzed to provide insight into the factors that control the distance of influence of the fixed particle on the flow field. With a single particle case, the effect of the particle is felt for about 20 diameters downstream. When multiple particles are present, interaction between the vortices shed by the particles lengthens the distance to about 40 diameters downstream. The results suggest that in a particle-laden flow, if particles are separated by an average distance greater than 40 diameters, particle-fluid-particle interactions can be neglected. At shorter distances, these interactions become important. Next particle-resolved direct numerical simulations (PR-DNS) in a turbulent channel flow are carried out to study the particle motion when the particle diameter is larger than the Kolmogorov length scale. It is shown that in a turbulent channel flow, the dominant forces are the Saffman lift and the turbophoresis. When the particle is larger than the Kolmogorov length scale, turbophoresis can act in a local sense whereby the more intense exchange of momentum of eddies on the side of the particle with higher turbulent kinetic energy relative to the opposite side move the particle toward the lower turbulent kinetic energy region or in a global sense whereby even when the particles do not directly feel the effect of eddies, particles tend to diffuse down gradients of turbulent kinetic energy. The simulations show that particles with relatively lower St move preferentially toward the wall while those with higher St exhibit a relatively uniform concentration. This is consistent with the conclusion from the point-particle simulations. As particle size is increased, the St at which uniform distribution is reached increases. The likely reason is that the effect of local turbophoresis and Saffman lift increases for larger particles and these forces tend to concentrate particles near the wall. Higher St, i.e. higher inertia, is needed to overcome these forces

    Research and Technology, 1994

    Get PDF
    This report selectively summarizes the NASA Lewis Research Center's research and technology accomplishments for the fiscal year 1994. It comprises approximately 200 short articles submitted by the staff members of the technical directorates. The report is organized into six major sections: Aeronautics, Aerospace Technology, Space Flight Systems, Engineering and Computational Support, Lewis Research Academy, and Technology Transfer. A table of contents and author index have been developed to assist the reader in finding articles of special interest. This report is not intended to be a comprehensive summary of all research and technology work done over the past fiscal year. Most of the work is reported in Lewis-published technical reports, journal articles, and presentations prepared by Lewis staff members and contractors. In addition, university grants have enabled faculty members and graduate students to engage in sponsored research that is reported at technical meetings or in journal articles. For each article in this report a Lewis contact person has been identified, and where possible, reference documents are listed so that additional information can be easily obtained. The diversity of topics attests to the breadth of research and technology being pursued and to the skill mix of the staff that makes it possible
    corecore