2,863 research outputs found
Recommended from our members
Improving Performance of M-to-N Processing and Data Redistribution in In Transit Analysis and Visualization
In an in transit setting, a parallel data producer, such as a numerical simulation, runs on one set of ranks M, while a data consumer, such as a parallel visualization application, runs on a different set of ranks N. One of the central challenges in this in transit setting is to determine the mapping of data from the set of M producer ranks to the set of N consumer ranks. This is a challenging problem for several reasons, such as the producer and consumer codes potentially having different scaling characteristics and different data models. The resulting mapping from M to N ranks can have a significant impact on aggregate application performance. In this work, we present an approach for performing this M-to-N mapping in a way that has broad applicability across a diversity of data producer and consumer applications. We evaluate its design and performance with
a study that runs at high concurrency on a modern HPC platform. By leveraging design characteristics, which facilitate an “intelligent” mapping from M-to-N, we observe significant performance gains are possible in terms of several different metrics, including time-to-solution and amount of data moved
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many
high-performance graph algorithms as well as for some linear solvers, such as
algebraic multigrid. The scaling of existing parallel implementations of SpGEMM
is heavily bound by communication. Even though 3D (or 2.5D) algorithms have
been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi
matrices, those algorithms had not been implemented in practice and their
complexities had not been analyzed for the general case. In this work, we
present the first ever implementation of the 3D SpGEMM formulation that also
exploits multiple (intra-node and inter-node) levels of parallelism, achieving
significant speedups over the state-of-the-art publicly available codes at all
levels of concurrencies. We extensively evaluate our implementation and
identify bottlenecks that should be subject to further research
HIV and Concurrent Sexual Partnerships: Modelling the Role of Coital Dilution
Background: The concurrency hypothesis asserts that high prevalence of overlapping sexual partnerships explains extraordinarily high HIV levels in sub-Saharan Africa. Earlier simulation models show that the network effect of concurrency can increase HIV incidence, but those models do not account for the coital dilution effect (nonprimary partnerships have lower coital frequency than primary partnerships).
Methods: We modify the model of Eaton et al (AIDS and Behavior, September 2010) to incorporate coital dilution by assigning lower coital frequencies to non-primary partnerships. We parameterize coital dilution based on the empirical work of Morris et al (PLoS ONE, December 2010) and others. Following Eaton et al, we simulate the daily transmission of HIV over 250 years for 10 levels of concurrency.
Results: At every level of concurrency, our focal coital-dilution simulation produces epidemic extinction. Our sensitivity analysis shows that this result is quite robust; even modestly lower coital frequencies in non-primary partnerships lead to epidemic extinction.
Conclusions: In order to contribute usefully to the investigation of HIV prevalence, simulation models of concurrent partnering and HIV epidemics must incorporate realistic degrees of coital dilution. Doing so dramatically reduces the role that concurrency can play in accelerating the spread of HIV and suggests that concurrency cannot be an important driver of HIV epidemics in sub-Saharan Africa. Alternative explanations for HIV epidemics in sub- Saharan Africa are needed
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Actors vs Shared Memory: two models at work on Big Data application frameworks
This work aims at analyzing how two different concurrency models, namely the
shared memory model and the actor model, can influence the development of
applications that manage huge masses of data, distinctive of Big Data
applications. The paper compares the two models by analyzing a couple of
concrete projects based on the MapReduce and Bulk Synchronous Parallel
algorithmic schemes. Both projects are doubly implemented on two concrete
platforms: Akka Cluster and Managed X10. The result is both a conceptual
comparison of models in the Big Data Analytics scenario, and an experimental
analysis based on concrete executions on a cluster platform
Distributed Environment for Efficient Virtual Machine Image Management in Federated Cloud Architectures
The use of Virtual Machines (VM) in Cloud computing provides various benefits in the overall software engineering lifecycle. These include efficient elasticity mechanisms resulting in higher resource utilization and lower operational costs. VM as software artifacts are created using provider-specific templates, called VM images (VMI), and are stored in proprietary or public repositories for further use. However, some technology specific choices can limit the interoperability among various Cloud providers and bundle the VMIs with nonessential or redundant software packages, leading to increased storage size, prolonged VMI delivery, stagnant VMI instantiation and ultimately vendor lock-in. To address these challenges, we present a set of novel functionalities and design approaches for efficient operation of distributed VMI repositories, specifically tailored for enabling: (i) simplified creation of lightweight and size optimized VMIs tuned for specific application requirements; (ii) multi-objective VMI repository optimization; and (iii) efficient reasoning mechanism to help optimizing complex VMI operations. The evaluation results confirm that the presented approaches can enable VMI size reduction by up to 55%, while trimming the image creation time by 66%. Furthermore, the repository optimization algorithms, can reduce the VMI delivery time by up to 51% and cut down the storage expenses by 3%. Moreover, by implementing replication strategies, the optimization algorithms can increase the system reliability by 74%
- …