69,118 research outputs found
Development of modularity in the neural activity of children's brains
We study how modularity of the human brain changes as children develop into
adults. Theory suggests that modularity can enhance the response function of a
networked system subject to changing external stimuli. Thus, greater cognitive
performance might be achieved for more modular neural activity, and modularity
might likely increase as children develop. The value of modularity calculated
from fMRI data is observed to increase during childhood development and peak in
young adulthood. Head motion is deconvolved from the fMRI data, and it is shown
that the dependence of modularity on age is independent of the magnitude of
head motion. A model is presented to illustrate how modularity can provide
greater cognitive performance at short times, i.e.\ task switching. A fitness
function is extracted from the model. Quasispecies theory is used to predict
how the average modularity evolves with age, illustrating the increase of
modularity during development from children to adults that arises from
selection for rapid cognitive function in young adults. Experiments exploring
the effect of modularity on cognitive performance are suggested. Modularity may
be a potential biomarker for injury, rehabilitation, or disease.Comment: 29 pages, 11 figure
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconïŹgurable computing has grown from a concept to a mature ïŹeld of science and technology. The cornerstone of this evolution is the ïŹeld programmable gate array, a building block enabling the conïŹguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reïŹnements
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Graphics Processing Units (GPUs) are having a transformational effect on
numerical lattice quantum chromodynamics (LQCD) calculations of importance in
nuclear and particle physics. The QUDA library provides a package of mixed
precision sparse matrix linear solvers for LQCD applications, supporting single
GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This
library, interfaced to the QDP++/Chroma framework for LQCD calculations, is
currently in production use on the "9g" cluster at the Jefferson Laboratory,
enabling unprecedented price/performance for a range of problems in LQCD.
Nevertheless, memory constraints on current GPU devices limit the problem sizes
that can be tackled. In this contribution we describe the parallelization of
the QUDA library onto multiple GPUs using MPI, including strategies for the
overlapping of communication and computation. We report on both weak and strong
scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in
excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing
2010 (submitted April 12, 2010
Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures
Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studie
Large-scale linear regression: Development of high-performance routines
In statistics, series of ordinary least squares problems (OLS) are used to
study the linear correlation among sets of variables of interest; in many
studies, the number of such variables is at least in the millions, and the
corresponding datasets occupy terabytes of disk space. As the availability of
large-scale datasets increases regularly, so does the challenge in dealing with
them. Indeed, traditional solvers---which rely on the use of black-box"
routines optimized for one single OLS---are highly inefficient and fail to
provide a viable solution for big-data analyses. As a case study, in this paper
we consider a linear regression consisting of two-dimensional grids of related
OLS problems that arise in the context of genome-wide association analyses, and
give a careful walkthrough for the development of {\sc ols-grid}, a
high-performance routine for shared-memory architectures; analogous steps are
relevant for tailoring OLS solvers to other applications. In particular, we
first illustrate the design of efficient algorithms that exploit the structure
of the OLS problems and eliminate redundant computations; then, we show how to
effectively deal with datasets that do not fit in main memory; finally, we
discuss how to cast the computation in terms of efficient kernels and how to
achieve scalability. Importantly, each design decision along the way is
justified by simple performance models. {\sc ols-grid} enables the solution of
correlated OLS problems operating on terabytes of data in a matter of
hours
Structural dynamics branch research and accomplishments for fiscal year 1987
This publication contains a collection of fiscal year 1987 research highlights from the Structural Dynamics Branch at NASA Lewis Research Center. Highlights from the branch's four major work areas, Aeroelasticity, Vibration Control, Dynamic Systems, and Computational Structural Methods, are included in the report as well as a complete listing of the FY87 branch publications
Quantum Monte Carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond
Various strategies to implement efficiently QMC simulations for large
chemical systems are presented. These include: i.) the introduction of an
efficient algorithm to calculate the computationally expensive Slater matrices.
This novel scheme is based on the use of the highly localized character of
atomic Gaussian basis functions (not the molecular orbitals as usually done),
ii.) the possibility of keeping the memory footprint minimal, iii.) the
important enhancement of single-core performance when efficient optimization
tools are employed, and iv.) the definition of a universal, dynamic,
fault-tolerant, and load-balanced computational framework adapted to all kinds
of computational platforms (massively parallel machines, clusters, or
distributed grids). These strategies have been implemented in the QMC=Chem code
developed at Toulouse and illustrated with numerical applications on small
peptides of increasing sizes (158, 434, 1056 and 1731 electrons). Using 10k-80k
computing cores of the Curie machine (GENCI-TGCC-CEA, France) QMC=Chem has been
shown to be capable of running at the petascale level, thus demonstrating that
for this machine a large part of the peak performance can be achieved.
Implementation of large-scale QMC simulations for future exascale platforms
with a comparable level of efficiency is expected to be feasible
A review of High Performance Computing foundations for scientists
The increase of existing computational capabilities has made simulation
emerge as a third discipline of Science, lying midway between experimental and
purely theoretical branches [1, 2]. Simulation enables the evaluation of
quantities which otherwise would not be accessible, helps to improve
experiments and provides new insights on systems which are analysed [3-6].
Knowing the fundamentals of computation can be very useful for scientists, for
it can help them to improve the performance of their theoretical models and
simulations. This review includes some technical essentials that can be useful
to this end, and it is devised as a complement for researchers whose education
is focused on scientific issues and not on technological respects. In this
document we attempt to discuss the fundamentals of High Performance Computing
(HPC) [7] in a way which is easy to understand without much previous
background. We sketch the way standard computers and supercomputers work, as
well as discuss distributed computing and discuss essential aspects to take
into account when running scientific calculations in computers.Comment: 33 page
- âŠ