19,859 research outputs found
From Quantity to Quality: Massive Molecular Dynamics Simulation of Nanostructures under Plastic Deformation in Desktop and Service Grid Distributed Computing Infrastructure
The distributed computing infrastructure (DCI) on the basis of BOINC and
EDGeS-bridge technologies for high-performance distributed computing is used
for porting the sequential molecular dynamics (MD) application to its parallel
version for DCI with Desktop Grids (DGs) and Service Grids (SGs). The actual
metrics of the working DG-SG DCI were measured, and the normal distribution of
host performances, and signs of log-normal distributions of other
characteristics (CPUs, RAM, and HDD per host) were found. The practical
feasibility and high efficiency of the MD simulations on the basis of DG-SG DCI
were demonstrated during the experiment with the massive MD simulations for the
large quantity of aluminum nanocrystals (-). Statistical
analysis (Kolmogorov-Smirnov test, moment analysis, and bootstrapping analysis)
of the defect density distribution over the ensemble of nanocrystals had shown
that change of plastic deformation mode is followed by the qualitative change
of defect density distribution type over ensemble of nanocrystals. Some
limitations (fluctuating performance, unpredictable availability of resources,
etc.) of the typical DG-SG DCI were outlined, and some advantages (high
efficiency, high speedup, and low cost) were demonstrated. Deploying on DG DCI
allows to get new scientific from the simulated
of numerous configurations by harnessing sufficient computational power to
undertake MD simulations in a wider range of physical parameters
(configurations) in a much shorter timeframe.Comment: 13 pages, 11 pages (http://journals.agh.edu.pl/csci/article/view/106
A methodology for exploiting parallelism in the finite element process
A methodology is described for developing a parallel system using a top down approach taking into account the requirements of the user. Substructuring, a popular technique in structural analysis, is used to illustrate this approach
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
A Library for Pattern-based Sparse Matrix Vector Multiply
Pattern-based Representation (PBR) is a novel approach to improving the performance of Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that many matrices can be divided into blocks that share a small number of distinct patterns, we generate custom multiplication kernels for frequently recurring block patterns.
The resulting reduction in index overhead significantly reduces memory bandwidth requirements and improves performance. Unlike existing methods, PBR requires neither detection of dense blocks nor zero filling, making it particularly advantageous for matrices that lack dense nonzero concentrations. SMVM kernels for PBR can benefit from explicit prefetching and vectorization, and are amenable to parallelization. The analysis and format conversion to PBR is implemented as a library, making it suitable for applications that generate matrices dynamically at runtime. We present sequential and parallel performance results for PBR on two current multicore architectures, which show that PBR outperforms available alternatives for the matrices to which it is applicable,
and that the analysis and conversion overhead is amortized in realistic application scenarios
- ā¦