529 research outputs found
User-Friendly Parallel Computations with Econometric Examples
This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave.parallel computing, Monte Carlo, bootstrapping,maximum likelihood, GMM, kernel regression
User-friendly parallel computations with econometric examples
This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave
Parallel Astronomical Data Processing with Python: Recipes for multicore machines
High performance computing has been used in various fields of astrophysical
research. But most of it is implemented on massively parallel systems
(supercomputers) or graphical processing unit clusters. With the advent of
multicore processors in the last decade, many serial software codes have been
re-implemented in parallel mode to utilize the full potential of these
processors. In this paper, we propose parallel processing recipes for multicore
machines for astronomical data processing. The target audience are astronomers
who are using Python as their preferred scripting language and who may be using
PyRAF/IRAF for data processing. Three problems of varied complexity were
benchmarked on three different types of multicore processors to demonstrate the
benefits, in terms of execution time, of parallelizing data processing tasks.
The native multiprocessing module available in Python makes it a relatively
trivial task to implement the parallel code. We have also compared the three
multiprocessing approaches - Pool/Map, Process/Queue, and Parallel Python. Our
test codes are freely available and can be downloaded from our website.Comment: 15 pages, 7 figures, 1 table, "for associated test code, see
http://astro.nuigalway.ie/staff/navtejs", Accepted for publication in
Astronomy and Computin
State of the Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.
State-of-the-Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix
The ALPS project release 2.0: Open source software for strongly correlated systems
We present release 2.0 of the ALPS (Algorithms and Libraries for Physics
Simulations) project, an open source software project to develop libraries and
application programs for the simulation of strongly correlated quantum lattice
models such as quantum magnets, lattice bosons, and strongly correlated fermion
systems. The code development is centered on common XML and HDF5 data formats,
libraries to simplify and speed up code development, common evaluation and
plotting tools, and simulation programs. The programs enable non-experts to
start carrying out serial or parallel numerical simulations by providing basic
implementations of the important algorithms for quantum lattice models:
classical and quantum Monte Carlo (QMC) using non-local updates, extended
ensemble simulations, exact and full diagonalization (ED), the density matrix
renormalization group (DMRG) both in a static version and a dynamic
time-evolving block decimation (TEBD) code, and quantum Monte Carlo solvers for
dynamical mean field theory (DMFT). The ALPS libraries provide a powerful
framework for programers to develop their own applications, which, for
instance, greatly simplify the steps of porting a serial code onto a parallel,
distributed memory machine. Major changes in release 2.0 include the use of
HDF5 for binary data, evaluation tools in Python, support for the Windows
operating system, the use of CMake as build system and binary installation
packages for Mac OS X and Windows, and integration with the VisTrails workflow
provenance tool. The software is available from our web server at
http://alps.comp-phys.org/.Comment: 18 pages + 4 appendices, 7 figures, 12 code examples, 2 table
Computing for Perturbative QCD - A Snowmass White Paper
We present a study on high-performance computing and large-scale distributed
computing for perturbative QCD calculations.Comment: 21 pages, 5 table
Scientific Workflows for Metabolic Flux Analysis
Metabolic engineering is a highly interdisciplinary research domain that interfaces biology, mathematics, computer science, and engineering. Metabolic flux analysis with carbon tracer experiments (13 C-MFA) is a particularly challenging metabolic engineering application that consists of several tightly interwoven building blocks such as modeling, simulation, and experimental design. While several general-purpose workflow solutions have emerged in recent years to support the realization of complex scientific applications, the transferability of these approaches are only partially applicable to 13C-MFA workflows. While problems in other research fields (e.g., bioinformatics) are primarily centered around scientific data processing, 13C-MFA workflows have more in common with business workflows. For instance, many bioinformatics workflows are designed to identify, compare, and annotate genomic sequences by "pipelining" them through standard tools like BLAST. Typically, the next workflow task in the pipeline can be automatically determined by the outcome of the previous step. Five computational challenges have been identified in the endeavor of conducting 13 C-MFA studies: organization of heterogeneous data, standardization of processes and the unification of tools and data, interactive workflow steering, distributed computing, and service orientation. The outcome of this thesis is a scientific workflow framework (SWF) that is custom-tailored for the specific requirements of 13 C-MFA applications. The proposed approach – namely, designing the SWF as a collection of loosely-coupled modules that are glued together with web services – alleviates the realization of 13C-MFA workflows by offering several features. By design, existing tools are integrated into the SWF using web service interfaces and foreign programming language bindings (e.g., Java or Python). Although the attributes "easy-to-use" and "general-purpose" are rarely associated with distributed computing software, the presented use cases show that the proposed Hadoop MapReduce framework eases the deployment of computationally demanding simulations on cloud and cluster computing resources. An important building block for allowing interactive researcher-driven workflows is the ability to track all data that is needed to understand and reproduce a workflow. The standardization of 13 C-MFA studies using a folder structure template and the corresponding services and web interfaces improves the exchange of information for a group of researchers. Finally, several auxiliary tools are developed in the course of this work to complement the SWF modules, i.e., ranging from simple helper scripts to visualization or data conversion programs. This solution distinguishes itself from other scientific workflow approaches by offering a system of loosely-coupled components that are flexibly arranged to match the typical requirements in the metabolic engineering domain. Being a modern and service-oriented software framework, new applications are easily composed by reusing existing components
Towards Molecular Simulations that are Transparent, Reproducible, Usable By Others, and Extensible (TRUE)
Systems composed of soft matter (e.g., liquids, polymers, foams, gels,
colloids, and most biological materials) are ubiquitous in science and
engineering, but molecular simulations of such systems pose particular
computational challenges, requiring time and/or ensemble-averaged data to be
collected over long simulation trajectories for property evaluation. Performing
a molecular simulation of a soft matter system involves multiple steps, which
have traditionally been performed by researchers in a "bespoke" fashion,
resulting in many published soft matter simulations not being reproducible
based on the information provided in the publications. To address the issue of
reproducibility and to provide tools for computational screening, we have been
developing the open-source Molecular Simulation and Design Framework (MoSDeF)
software suite. In this paper, we propose a set of principles to create
Transparent, Reproducible, Usable by others, and Extensible (TRUE) molecular
simulations. MoSDeF facilitates the publication and dissemination of TRUE
simulations by automating many of the critical steps in molecular simulation,
thus enhancing their reproducibility. We provide several examples of TRUE
molecular simulations: All of the steps involved in creating, running and
extracting properties from the simulations are distributed on open-source
platforms (within MoSDeF and on GitHub), thus meeting the definition of TRUE
simulations
- …