Search CORE

12,893 research outputs found

C programs for solving the time-dependent Gross-Pitaevskii equation in a fully anisotropic trap

Author: Adhikari
Adhikari
Antun Balaž
Balaž
Balaž
Cardoso
Chaudhary
Cheng
Dušan Vudragović
Gautam
Gautam
Gautam
Hua
Ivana Vidanović
Mazzarella
Muruganandam
Nicolin
Nicolin
Nicolin
Nicolin
Paulsamy Muruganandam
Sabari
Sadhan K. Adhikari
Sakhel
Sun
Vidanović
Yang
Young-S
Publication venue: 'Elsevier BV'
Publication date: 06/06/2012
Field of study

We present C programming language versions of earlier published Fortran programs (Muruganandam and Adhikari, Comput. Phys. Commun. 180 (2009) 1888) for calculating both stationary and non-stationary solutions of the time-dependent Gross-Pitaevskii (GP) equation. The GP equation describes the properties of dilute Bose-Einstein condensates at ultra-cold temperatures. C versions of programs use the same algorithms as the Fortran ones, involving real- and imaginary-time propagation based on a split-step Crank-Nicolson method. In a one-space-variable form of the GP equation, we consider the one-dimensional, two-dimensional, circularly-symmetric, and the three-dimensional spherically-symmetric harmonic-oscillator traps. In the two-space-variable form, we consider the GP equation in two-dimensional anisotropic and three-dimensional axially-symmetric traps. The fully-anisotropic three-dimensional GP equation is also considered. In addition to these twelve programs, for six algorithms that involve two and three space variables, we have also developed threaded (OpenMP parallelized) programs, which allow numerical simulations to use all available CPU cores on a computer. All 18 programs are optimized and accompanied by makefiles for several popular C compilers. We present typical results for scalability of threaded codes and demonstrate almost linear speedup obtained with the new programs, allowing a decrease in execution times by an order of magnitude on modern multi-core computers.Comment: 8 pages, 1 figure; 18 C programs included (to download, click other and download the source

arXiv.org e-Print Archive

Crossref

Simple and Effective Type Check Removal through Lazy Basic Block Versioning

Author: Chevalier-Boisvert Maxime
Feeley Marc
Publication venue
Publication date: 01/01/2015
Field of study

Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Importance of Explicit Vectorization for CPU and GPU Software Performance

Author: Allen
Anderson
Berg
Eichenberger
Firas Hamze
Hamze
Kamran Karimi
Karimi
Karimi
Kirk
Knuth
Marsaglia
Matsumoto
Metropolis
Neil G. Dickson
Owens
Preis
Samant
Scott
Suzuki
Tomov
Publication venue: 'Elsevier BV'
Publication date: 31/03/2010
Field of study

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.Comment: 17 pages, 17 figure

arXiv.org e-Print Archive

Crossref