134 research outputs found

    Enhanced molecular dynamics performance with a programmable graphics processor

    Full text link
    Design considerations for molecular dynamics algorithms capable of taking advantage of the computational power of a graphics processing unit (GPU) are described. Accommodating the constraints of scalable streaming-multiprocessor hardware necessitates a reformulation of the underlying algorithm. Performance measurements demonstrate the considerable benefit and cost-effectiveness of such an approach, which produces a factor of 2.5 speed improvement over previous work for the case of the soft-sphere potential.Comment: 20 pages (v2: minor additions and changes; v3: corrected typos

    Cortical Surface Area Differentiates Familial High Risk Individuals Who Go on to Develop Schizophrenia

    Get PDF
    BACKGROUND: Schizophrenia is associated with structural brain abnormalities that may be present before disease onset. It remains unclear whether these represent general vulnerability indicators or are associated with the clinical state itself. METHODS: To investigate this, structural brain scans were acquired at two time points (mean scan interval 1.87 years) in a cohort of individuals at high familial risk of schizophrenia (n 5 142) and control subjects (n 5 36). Cortical reconstructions were generated using FreeSurfer. The high-risk cohort was subdivided into individuals that remained well during the study, individuals that had transient psychotic symptoms, and individuals that subsequently became ill. Baseline measures and longitudinal change in global estimates of thickness and surface area and lobar values were compared, focusing on overall differences between high-risk individuals and control subjects and then on group differences within the high-risk cohort. RESULTS: Longitudinally, control subjects showed a significantly greater reduction in cortical surface area compared with the high-risk group. Within the high-risk group, differences in surface area at baseline predicted clinical course, with individuals that subsequently became ill having significantly larger surface area than individuals that remained well during the study. For thickness, longitudinal reductions were most prominent in the frontal, cingulate, and occipital lobes in all high-risk individuals compared with control subjects. CONCLUSIONS: Our results suggest that larger surface areas at baseline may be associated with mechanisms that go above and beyond a general familial disposition. A relative preservation over time of surface area, coupled with a thinning of the cortex compared with control subjects, may serve as vulnerability markers of schizophrenia

    An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

    Get PDF
    Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations

    A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters

    Get PDF
    Numerical computations of incompressible flow equations with pressure-based algorithms necessitate the solution of an elliptic Poisson equation, for which multigrid methods are known to be very efficient. In our previous work we presented a dual-level (MPI-CUDA) parallel implementation of the Navier-Stokes equations to simulate buoyancy-driven incompressible fluid flows on GPU clusters with simple iterative methods while focusing on the scalability of the overall solver. In the present study we describe the implementation and performance of a multigrid method to solve the pressure Poisson equation within our MPI-CUDA parallel incompressible flow solver. Various design decisions and algorithmic choices for multigrid methods are explored in light of NVIDIA’s recent Fermi architecture. We discuss how unique aspects of an MPI-CUDA implementation for GPU clusters is related to the software choices made to implement the multigrid method. We propose a new coarse grid solution method of embedded multigrid with amalgamation and show that the parallel implementation retains the numerical efficiency of the multigrid method. Performance measurements on the NCSA Lincoln and TACC Longhorn clusters are presented for up to 64 GPUs

    Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

    Get PDF
    High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively

    Measurement of the Isolated Photon Cross Section in p-pbar Collisions at sqrt{s}=1.96 TeV

    Get PDF
    The cross section for the inclusive production of isolated photons has been measured in p anti-p collisions at sqrt{s}=1.96 TeV with the D0 detector at the Fermilab Tevatron Collider. The photons span transverse momenta 23 to 300 GeV and have pseudorapidity |eta|<0.9. The cross section is compared with the results from two next-to-leading order perturbative QCD calculations. The theoretical predictions agree with the measurement within uncertainties.Comment: 7 pages, 5 figures, submitted to Phys.Lett.
    • 

    corecore