26,095 research outputs found
Multigrid Method versus Staging Algorithm for PIMC Simulations
We present a comparison of the performance of two non-local update algorithms
for path integral Monte Carlo (PIMC) simulations, the multigrid Monte Carlo
method and the staging algorithm. Looking at autocorrelation times for the
internal energy we show that both refined algorithms beat the slowing down
which is encountered for standard local update schemes in the continuum limit.
We investigate the conditions under which the staging algorithm performs
optimally and give a brief discussion of the mutual merits of the two
algorithms.Comment: 11 pp. LaTeX, 4 Postscript Figure
Optimal Energy Estimation in Path-Integral Monte Carlo Simulations
We investigate the properties of two standard energy estimators used in
path-integral Monte Carlo simulations. By disentangling the variance of the
estimators and their autocorrelation times we analyse the dependence of the
performance on the update algorithm and present a detailed comparison of
refined update schemes such as multigrid and staging techniques. We show that a
proper combination of the two estimators leads to a further reduction of the
statistical error of the estimated energy with respect to the better of the two
without extra cost.Comment: 45 pp. LaTeX, 22 Postscript Figure
Supervised machine learning based multi-task artificial intelligence classification of retinopathies
Artificial intelligence (AI) classification holds promise as a novel and
affordable screening tool for clinical management of ocular diseases. Rural and
underserved areas, which suffer from lack of access to experienced
ophthalmologists may particularly benefit from this technology. Quantitative
optical coherence tomography angiography (OCTA) imaging provides excellent
capability to identify subtle vascular distortions, which are useful for
classifying retinovascular diseases. However, application of AI for
differentiation and classification of multiple eye diseases is not yet
established. In this study, we demonstrate supervised machine learning based
multi-task OCTA classification. We sought 1) to differentiate normal from
diseased ocular conditions, 2) to differentiate different ocular disease
conditions from each other, and 3) to stage the severity of each ocular
condition. Quantitative OCTA features, including blood vessel tortuosity (BVT),
blood vascular caliber (BVC), vessel perimeter index (VPI), blood vessel
density (BVD), foveal avascular zone (FAZ) area (FAZ-A), and FAZ contour
irregularity (FAZ-CI) were fully automatically extracted from the OCTA images.
A stepwise backward elimination approach was employed to identify sensitive
OCTA features and optimal-feature-combinations for the multi-task
classification. For proof-of-concept demonstration, diabetic retinopathy (DR)
and sickle cell retinopathy (SCR) were used to validate the supervised machine
leaning classifier. The presented AI classification methodology is applicable
and can be readily extended to other ocular diseases, holding promise to enable
a mass-screening platform for clinical deployment and telemedicine.Comment: Supplemental material attached at the en
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
Dense Multi-GPU systems have recently gained a lot of attention in the HPC
arena. Traditionally, MPI runtimes have been primarily designed for clusters
with a large number of nodes. However, with the advent of MPI+CUDA applications
and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important
to address efficient communication schemes for such dense Multi-GPU nodes. This
coupled with new application workloads brought forward by Deep Learning
frameworks like Caffe and Microsoft CNTK pose additional design constraints due
to very large message communication of GPU buffers during the training phase.
In this context, special-purpose libraries like NVIDIA NCCL have been proposed
for GPU-based collective communication on dense GPU systems. In this paper, we
propose a pipelined chain (ring) design for the MPI_Bcast collective operation
along with an enhanced collective tuning framework in MVAPICH2-GDR that enables
efficient intra-/inter-node multi-GPU communication. We present an in-depth
performance landscape for the proposed MPI_Bcast schemes along with a
comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The
proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement,
compared to NCCL-based solutions, for intra- and inter-node broadcast latency,
respectively. In addition, the proposed designs provide up to 7% improvement
over NCCL-based solutions for data parallel training of the VGG network on 128
GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure
- …