1,638 research outputs found

    Introducing Molly: Distributed Memory Parallelization with LLVM

    Get PDF
    Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life

    Interest rate convergence in the EMS prior to European Monetary Union

    Get PDF
    In this paper we analyze the convergence of interest rates in the European Monetary System (EMS) in a framework of changing persistence. This allows us to estimate the exact date of full convergence from the data. A change in persistence means that a time series switches from stationarity to non-stationarity, or vice versa. It is often argued that due to the specific historical situation in the EMS the interest rate differential was non-stationary before the full convergence of interest rates was achieved and stationary afterwards. Our empirical results suggest that the convergence date has been very different for Belgium, France, the Netherlands and Italy and are in line with the conclusions one would draw from a narrative approach. We compare three different estimators for the convergence date and find that the results are quite robust. Our results therefore stress the importance of credibility for monetary policy

    Lattice QCD estimate of the ηc(2S)→J/ψγ\eta_{c}(2S)\to J/\psi\gamma decay rate

    Full text link
    We compute the hadronic matrix element relevant to the physical radiative decay ηc(2S)→J/ψγ\eta_{c}(2S)\to J/\psi\gamma by means of lattice QCD. We use the (maximally) twisted mass QCD action with Nf=2 light dynamical quarks and from the computations made at four lattice spacings we were able to take the continuum limit. The value of the mass ratio mηc(2S)/mηc(1S)m_{\eta_c(2S)}/m_{\eta_c(1S)} we obtain is consistent with the experimental value, and our prediction for the form factor is Vηc(2S)→J/ψγ(0)≡V12(0)=0.32(6)(2)V^{\eta_{c}(2S)\to J/\psi\gamma}(0)\equiv V_{12}(0)=0.32(6)(2), leading to Γ(ηc(2S)→J/ψγ)=(15.7±5.7)\Gamma(\eta_c (2S) \to J/\psi\gamma) = (15.7\pm 5.7) keV, which is much larger than Γ(ψ(2S)→ηcγ)\Gamma(\psi (2S) \to \eta_c\gamma) and within reach of modern experiments.Comment: 19 pages, 4 fig

    Perfrewrite -- Program Complexity Analysis via Source Code Instrumentation

    Get PDF
    ACACES 2012 summer schoolMost program profiling methods output the execution time of one specific program execution, but not its computational complexity class in terms of the big-O notation. Perfrewrite is a tool based on LLVM's Clang compiler to rewrite a program such that it tracks semantic information while the program executes and uses it to guess memory usage, communication and computational complexity. While source code instrumentation is a standard technique for profiling, using it for deriving formulas is an uncommon approach

    Generalizing Hierarchical Parallelism

    Full text link
    Since the days of OpenMP 1.0 computer hardware has become more complex, typically by specializing compute units for coarse- and fine-grained parallelism in incrementally deeper hierarchies of parallelism. Newer versions of OpenMP reacted by introducing new mechanisms for querying or controlling its individual levels, each time adding another concept such as places, teams, and progress groups. In this paper we propose going back to the roots of OpenMP in the form of nested parallelism for a simpler model and more flexible handling of arbitrary deep hardware hierarchies.Comment: IWOMP'23 preprin
    • …
    corecore