5,208 research outputs found

    Distributed-memory parallelization of an explicit time-domain volume integral equation solver on Blue Gene/P

    Get PDF
    Two distributed-memory schemes for efficiently parallelizing the explicit marching-on in-time based solution of the time domain volume integral equation on the IBM Blue Gene/P platform are presented. In the first scheme, each processor stores the time history of all source fields and only the computationally dominant step of the tested field computations is distributed among processors. This scheme requires all-to-all global communications to update the time history of the source fields from the tested fields. In the second scheme, the source fields as well as all steps of the tested field computations are distributed among processors. This scheme requires sequential global communications to update the time history of the distributed source fields from the tested fields. Numerical results demonstrate that both schemes scale well on the IBM Blue Gene/P platform and the memory efficient second scheme allows for the characterization of transient wave interactions on composite structures discretized using three million spatial elements without an acceleration algorithm

    The parallel computation of morse-smale complexes

    Get PDF
    pre-printTopology-based techniques are useful for multi-scale exploration of the feature space of scalar-valued functions, such as those derived from the output of large-scale simulations. The Morse-Smale (MS) complex, in particular, allows robust identification of gradient-based features, and therefore is suitable for analysis tasks in a wide range of application domains. In this paper, we develop a two-stage algorithm to construct the Morse-Smale complex in parallel, the first stage independently computing local features per block and the second stage merging to resolve global features. Our implementation is based on MPI and a distributed-memory architecture. Through a set of scalability studies on the IBM Blue Gene/P supercomputer, we characterize the performance of the algorithm as block sizes, process counts, merging strategy, and levels of topological simplification are varied, for datasets that vary in feature composition and size. We conclude with a strong scaling study using scientific datasets computed by combustion and hydrodynamics simulations

    ViSUS: Visualization Streams for Ultimate Scalability

    Full text link

    Integration of continuous-time dynamics in a spiking neural network simulator

    Full text link
    Contemporary modeling approaches to the dynamics of neural networks consider two main classes of models: biologically grounded spiking neurons and functionally inspired rate-based units. The unified simulation framework presented here supports the combination of the two for multi-scale modeling approaches, the quantitative validation of mean-field approaches by spiking network simulations, and an increase in reliability by usage of the same simulation code and the same network model specifications for both model classes. While most efficient spiking simulations rely on the communication of discrete events, rate models require time-continuous interactions between neurons. Exploiting the conceptual similarity to the inclusion of gap junctions in spiking network simulations, we arrive at a reference implementation of instantaneous and delayed interactions between rate-based models in a spiking network simulator. The separation of rate dynamics from the general connection and communication infrastructure ensures flexibility of the framework. We further demonstrate the broad applicability of the framework by considering various examples from the literature ranging from random networks to neural field models. The study provides the prerequisite for interactions between rate-based and spiking models in a joint simulation

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    Lattice Boltzmann simulations of anisotropic particles at liquid interfaces

    Get PDF
    Complex colloidal fluids, such as emulsions stabilized by particles with complex shapes, play an important role in many industrial applications. However, understanding their physics requires a study at sufficiently large length scales while still resolving the microscopic structure of a large number of particles and of the local hydrodynamics. Due to its high degree of locality, the lattice Boltzmann method, when combined with a molecular dynamics solver and parallelized on modern supercomputers, provides a tool that allows such studies. Still, running simulations on hundreds of thousands of cores is not trivial. We report on our practical experiences when employing large fractions of an IBM Blue Gene/P system for our simulations. Then, we extend our model for spherical particles in multicomponent flows to anisotropic ellipsoidal objects rendering the shape of, e.g., clay particles. The model is applied to a number of test cases including the adsorption of single particles at fluid interfaces and the formation and stabilization of Pickering emulsions or bijels

    Parallel volume rendering for large scientific data

    Get PDF
    Data sets of immense size are regularly generated by large scale computing resources. Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualized on standard workstations is now commonplace. One solution to this problem is to employ a \u27visualization cluster,\u27 a small to medium scale cluster dedicated to performing visualization and analysis of massive data sets generated on larger scale supercomputers. These clusters are designed to fulfill a different need than traditional supercomputers, and therefore their design mandates different hardware choices, such as increased memory, and more recently, graphics processing units (GPUs). While there has been much previous work on distributed memory visualization as well as GPU visualization, there is a relative dearth of algorithms which effectively use GPUs at a large scale in a distributed memory environment. In this work, we study a common visualization technique in a GPU-accelerated, distributed memory setting, and present performance characteristics when scaling to extremely large data sets

    Doctor of Philosophy

    Get PDF
    dissertationThe increase in computational power of supercomputers is enabling complex scientific phenomena to be simulated at ever-increasing resolution and fidelity. With these simulations routinely producing large volumes of data, performing efficient I/O at this scale has become a very difficult task. Large-scale parallel writes are challenging due to the complex interdependencies between I/O middleware and hardware. Analytic-appropriate reads are traditionally hindered by bottlenecks in I/O access. Moreover, the two components of I/O, data generation from simulations (writes) and data exploration for analysis and visualization (reads), have substantially different data access requirements. Parallel writes, performed on supercomputers, often deploy aggregation strategies to permit large-sized contiguous access. Analysis and visualization tasks, usually performed on computationally modest resources, require fast access to localized subsets or multiresolution representations of the data. This dissertation tackles the problem of parallel I/O while bridging the gap between large-scale writes and analytics-appropriate reads. The focus of this work is to develop an end-to-end adaptive-resolution data movement framework that provides efficient I/O, while supporting the full spectrum of modern HPC hardware. This is achieved by developing technology for highly scalable and tunable parallel I/O, applicable to both traditional parallel data formats and multiresolution data formats, which are directly appropriate for analysis and visualization. To demonstrate the efficacy of the approach, a novel library (PIDX) is developed that is highly tunable and capable of adaptive-resolution parallel I/O to a multiresolution data format. Adaptive resolution storage and I/O, which allows subsets of a simulation to be accessed at varying spatial resolutions, can yield significant improvements to both the storage performance and I/O time. The library provides a set of parameters that controls the storage format and the nature of data aggregation across he network; further, a machine learning-based model is constructed that tunes these parameters for the maximum throughput. This work is empirically demonstrated by showing parallel I/O scaling up to 768K cores within a framework flexible enough to handle adaptive resolution I/O

    Effective elimination of Staphylococcal contamination from hospital surfaces by a bacteriophage-probiotic sanitation strategy: a monocentric study.

    Get PDF
    Persistent contamination of hospital surfaces and antimicrobial resistance (AMR) are recognized major causes of healthcare-associated infections (HAI). We recently showed that a probiotic-based sanitation (PCHS) can stably decrease surface pathogens and reduce AMR and HAIs. However, PCHS action is slow and non-specific. By contrast, bacteriophages have been proposed as a decontamination method as they can rapidly attack specific targets, but their routine application has never been tested. Here we analyzed the feasibility and effectiveness of phage addition to PCHS sanitation, aiming to obtain a rapid and stable abatement of specific pathogens in the hospital environment. Staphylococcal contamination in the bathrooms of General Medicine wards was analyzed, being such areas the most contaminated and Staphylococci the most prevalent bacteria in such settings. Results showed that a daily phage application by nebulization induced a rapid and significant decrease of Staphylococcus spp. load on treated surfaces, up to 97% more than PCHS alone (p<0.001), suggesting that such system might be considered as a part of prevention and control strategies, to counteract outbreaks of specific pathogens and prevent associated infections
    • …
    corecore