701 research outputs found

    Automated and accurate cache behavior analysis for codes with irregular access patterns

    Get PDF
    This is the peer reviewed version of the following article: Andrade, D. , Arenaz, M. , Fraguela, B. B., Touriño, J. and Doallo, R. (2007), Automated and accurate cache behavior analysis for codes with irregular access patterns. Concurrency Computat.: Pract. Exper., 19: 2407-2423. doi:10.1002/cpe.1173, which has been published in final form at https://doi.org/10.1002/cpe.1173. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal base for such tools if its traditional limitations in accuracy and scope of application can be overcome. While there has been extensive research on the modeling of codes with regular access patterns, less attention has been paid to codes with irregular patterns due to the increased difficulty in analyzing them. Nevertheless, many important applications exhibit this kind of pattern, and their lack of locality make them more cache‐demanding, which makes their study more relevant. The focus of this paper is the automation of the Probabilistic Miss Equations (PME) model, an analytical model of the cache behavior that provides fast and accurate predictions for codes with irregular access patterns. The information requirements of the PME model are defined and its integration in the XARK compiler, a research compiler oriented to automatic kernel recognition in scientific codes, is described. We show how to exploit the powerful information‐gathering capabilities provided by this compiler to allow the automated modeling of loop‐oriented scientific codes. Experimental results that validate the correctness of the automated PME model are also presented.Ministerio de Educación y Ciencia; TIN2004-07797-C02Xunta de Galicia; PGIDIT03TIC10502PRXunta de Galicia; PGIDT05PXIC10504P

    XARK: an extensible framework for automatic recognition of computational kernels

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Programming Languages and Systems. The final authenticated version is available online at: http://dx.doi.org/10.1145/1391956.1391959[Abstract] The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.Ministeiro de Educación y Ciencia; TIN2004-07797-C02Ministeiro de Educación y Ciencia; TIN2007-67537-C03Xunta de Galicia; PGIDIT05PXIC10504PNXunta de Galicia; PGIDIT06PXIB105228P

    An exactly solvable quantum-lattice model with a tunable degree of nonlocality

    Full text link
    An array of N subsequent Laguerre polynomials is interpreted as an eigenvector of a non-Hermitian tridiagonal Hamiltonian HH with real spectrum or, better said, of an exactly solvable N-site-lattice cryptohermitian Hamiltonian whose spectrum is known as equal to the set of zeros of the N-th Laguerre polynomial. The two key problems (viz., the one of the ambiguity and the one of the closed-form construction of all of the eligible inner products which make HH Hermitian in the respective {\em ad hoc} Hilbert spaces) are discussed. Then, for illustration, the first four simplest, kk-parametric definitions of inner products with k=0,k=1,k=2k=0,k=1,k=2 and k=3k=3 are explicitly displayed. In mathematical terms these alternative inner products may be perceived as alternative Hermitian conjugations of the initial N-plet of Laguerre polynomials. In physical terms the parameter kk may be interpreted as a measure of the "smearing of the lattice coordinates" in the model.Comment: 35 p

    Metastates in mean-field models with random external fields generated by Markov chains

    Full text link
    We extend the construction by Kuelske and Iacobelli of metastates in finite-state mean-field models in independent disorder to situations where the local disorder terms are are a sample of an external ergodic Markov chain in equilibrium. We show that for non-degenerate Markov chains, the structure of the theorems is analogous to the case of i.i.d. variables when the limiting weights in the metastate are expressed with the aid of a CLT for the occupation time measure of the chain. As a new phenomenon we also show in a Potts example that, for a degenerate non-reversible chain this CLT approximation is not enough and the metastate can have less symmetry than the symmetry of the interaction and a Gaussian approximation of disorder fluctuations would suggest.Comment: 20 pages, 2 figure

    Introducing Molly: Distributed Memory Parallelization with LLVM

    Get PDF
    Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life

    Tensor network simulation of multi-environmental open quantum dynamics via machine learning and entanglement renormalisation

    Get PDF
    The simulation of open quantum dynamics is a critical tool for understanding how the non-classical properties of matter might be functionalised in future devices. However, unlocking the enormous potential of molecular quantum processes is highly challenging due to the very strong and non-Markovian coupling of ‘environmental’ molecular vibrations to the electronic ‘system’ degrees of freedom. Here, we present an advanced but general computational strategy that allows tensor network methods to effectively compute the non-perturbative, real-time dynamics of exponentially large vibronic wave functions of real molecules. We demonstrate how ab initio modelling, machine learning and entanglement analysis can enable simulations which provide real-time insight and direct visualisation of dissipative photophysics, and illustrate this with an example based on the ultrafast process known as singlet fission

    Faster FPTASes for counting and random generation of Knapsack solutions

    Get PDF
    In the #P-complete problem of counting 0/1 Knapsack solutions, the input consists of a sequence of n nonnegative integer weights w1,…,wn and an integer C, and we have to find the number of subsequences (subsets of indices) with total weight at most C. We give faster and simpler fully polynomial-time approximation schemes (FPTASes) for this problem, and for its random generation counterpart. Our method is based on dynamic programming and discretization of large numbers through floating-point arithmetic. We improve both deterministic counting FPTASes from Gopalan et al. (2011) [9], Štefankovič et al. (2012) [6] and the randomized counting and random generation algorithms in Dyer (2003) [5]. Our method is general, and it can be directly applied on top of combinatorial decompositions (such as dynamic programming solutions) of various problems. For example, we also improve the complexity of the problem of counting 0/1 Knapsack solutions in an arc-weighted DAG.Peer reviewe

    Some advances in the polyhedral model

    Get PDF
    Department Head: L. Darrell Whitley.2010 Summer.Includes bibliographical references.The polyhedral model is a mathematical formalism and a framework for the analysis and transformation of regular computations. It provides a unified approach to the optimization of computations from different application domains. It is now gaining wide use in optimizing compilers and automatic parallelization. In its purest form, it is based on a declarative model where computations are specified as equations over domains defined by "polyhedral sets". This dissertation presents two results. First is an analysis and optimization technique that enables us to simplify---reduce the asymptotic complexity---of such equations. The second is an extension of the model to richer domains called Ƶ-Polyhedra. Many equational specifications in the polyhedral model have reductions---application of an associative and commutative operator to collections of values to produce a collection of answers. Moreover, expressions in such equations may also exhibit reuse where intermediate values that are computed or used at different index points are identical. We develop various compiler transformations to automatically exploit this reuse and simplify the computational complexity of the specification. In general, there is an infinite set of applicable simplification transformations. Unfortunately, different choices may result in equivalent specifications with different asymptotic complexity. We present an algorithm for the optimal application of simplification transformations resulting in a final specification with minimum complexity. This dissertation also presents the Ƶ-Polyhedral model, an extension to the polyhedral model to more general sets, thereby providing a transformation framework for a larger set of regular computations. For this, we present a novel representation and interpretation of Ƶ-Polyhedra and prove a number of properties of the family of unions of Ƶ-Polyhedra that are required to extend the polyhedral model. Finally, we present value based dependence analysis and scheduling analysis for specifications in the Ƶ-Polyhedral model. These are direct extensions of the corresponding analyses of specifications in the polyhedral model. One of the benefits of our results in the Ƶ-Polyhedral model is that our abstraction allows the reuse of previously developed tools in the polyhedral model with straightforward pre- and post-processing
    corecore