53 research outputs found

    C++ Templates as Partial Evaluation

    Full text link
    This paper explores the relationship between C++ templates and partial evaluation. Templates were designed to support generic programming, but unintentionally provided the ability to perform compile-time computations and code generation. These features are completely accidental, and as a result their syntax is awkward. By recasting these features in terms of partial evaluation, a much simpler syntax can be achieved. C++ may be regarded as a two-level language in which types are first-class values. Template instantiation resembles an offline partial evaluator. This paper describes preliminary work toward a single mechanism based on Partial Evaluation which unifies generic programming, compile-time computation and code generation. The language Catat is introduced to illustrate these ideas.Comment: 13 page

    Exploiting Locality and Parallelism with Hierarchically Tiled Arrays

    Get PDF
    The importance of tiles or blocks in mathematics and thus computer science cannot be overstated. From a high level point of view, they are the natural way to express many algorithms, both in iterative and recursive forms. Tiles or sub-tiles are used as basic units in the algorithm description. From a low level point of view, tiling, either as the unit maintained by the algorithm, or as a class of data layouts, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers given the growing gap between memory and processor speed. Finally, tiles and operations on them are also basic to express data distribution and parallelism. Despite the importance of this concept, which makes inevitable its widespread usage, most languages do not support it directly. Programmers have to understand and manage the low-level details along with the introduction of tiling. This gives place to bloated potentially error-prone programs in which opportunities for performance are lost. On the other hand, the disparity between the algorithm and the actual implementation enlarges. This thesis illustrates the power of Hierarchically Tiled Arrays (HTAs), a data type which enables the easy manipulation of tiles in object-oriented languages. The objective is to evolve this data type in order to make the representation of all classes for algorithms with a high degree of parallelism and/or locality as natural as possible. We show in the thesis a set of tile operations which leads to a natural and easy implementation of different algorithms in parallel and in sequential with higher clarity and smaller size. In particular, two new language constructs dynamic partitioning and overlapped tiling are discussed in detail. They are extensions of the HTA data type to improve its capabilities to express algorithms with a high abstraction and free programmers from programming tedious low-level tasks. To prove the claims, two popular languages, C++ and MATLAB are extended with our HTA data type. In addition, several important dense linear algebra kernels, stencil computation kernels, as well as some benchmarks in NAS benchmark suite were implemented. We show that the HTA codes needs less programming effort with a negligible effect on performance

    The Design & Implementation of an Abstract Semantic Graph for Statement-Level Dynamic Analysis of C++ Applications

    Get PDF
    In this thesis, we describe our system, Hylian, for statement-level analysis, both static and dynamic, of a C++ application. We begin by extending the GNU gcc parser to generate parse trees in XML format for each of the compilation units in a C++ application. We then provide verification that the generated parse trees are structurally equivalent to the code in the original C++ application. We use the generated parse trees, together with an augmented version of the gcc test suite, to recover a grammar for the C++ dialect that we parse. We use the recovered grammar to generate a schema for further verification of the parse trees and evaluate the coverage provided by our C++ test suite. We then extend the parse tree, for each compilation unit, with semantic information to form an abstract semantic graph, ASG, and then link the ASGs for all of the compilation units into a unified ASG for the entire application under study. In addition, to relieve the cognitive burden of information that may inundate a developer, we describe our development of extensions to Hylian to build abbreviated abstract semantic graphs, which incorporate information about user code, but not about compiler provided library code. Finally, we describe the various approaches that we adopted to provide assurance for the developer that the ASGs that Hylian builds, correctly represent the program under study

    Probabilistic analysis of defect tolerance in asynchronous nano crossbar architecture

    Get PDF
    Among recent advancements in technology, nanotechnology is particularly promising. Most researchers have begun to focus their efforts on developing nano scale circuits. Nano scale devices such as carbon nano tubes (CNT) and silicon nano wires (SiNW) form the primitive building blocks of many nano scale logic devices and recently developed computing architecture. One of the most promising nanotechnologies is crossbar-based architecture, a two-dimensional nanoarray, formed by the intersection of two orthogonal sets of parallel and uniformly-spaced CNTs or SiNWs. Nanowire crossbars offer the potential for ultra-high density, which has never been achieved by photolithography. In an effort to improve these circuits, our research group proposed a new Null Convention Logic (NCL) based clock-less crossbar architecture. By eliminating the clock, this architecture makes possible a still higher density in reconfigurable systems. Defect density, however, is directly proportional to the density of nanowires in the architecture. Future work, therefore, must improve the defect tolerance of these asynchronous structures. The thesis comprises two papers. The first introduces asynchronous crossbar architecture and concludes with the validation of mapping a 1-bit adder on it. It also discusses various advantages of asynchronous crossbar architecture over clock based nano structures. The second paper concentrates on the probabilistic analysis of asynchronous nano crossbar architecture to address the high defect rates in these structures. It analyzes the probability distribution of mapping functions over the structure for varying number of defects and proposes a method to increase the probability of successful mapping --Abstract, page iv


    Get PDF
    A long-standing challenge in High-Performance Computing (HPC) is the simultaneous achievement of programmer productivity and hardware computational efficiency. The challenge has been exacerbated by the onset of multi- and many-core CPUs and accelerators. Only a few expert programmers have been able to hand-code domain-specific data transformations and vectorization schemes needed to extract the best possible performance on such architectures. In this research, we examined the possibility of automating these methods by developing a Domain-Specific Language (DSL) framework. Our DSL approach extends C++14 by embedding into it a high-level data-parallel array language, and by using a domain-specific compiler to compile to hybrid-parallel code. We also implemented an array index-space transformation algebra within this high-level array language to manipulate array data-layouts and data-distributions. The compiler introduces a novel method for SIMD auto-vectorization based on array data-layouts. Our new auto-vectorization technique is shown to outperform the default auto-vectorization strategy by up to 40% for stencil computations. The compiler also automates distributed data movement with overlapping of local compute with remote data movement using polyhedral integer set analysis. Along with these main innovations, we developed a new technique using C++ template metaprogramming for developing embedded DSLs using C++. We also proposed a domain-specific compiler intermediate representation that simplifies data flow analysis of abstract DSL constructs. We evaluated our framework by constructing a DSL for the HPC grand-challenge domain of lattice quantum chromodynamics. Our DSL yielded performance gains of up to twice the flop rate over existing production C code for selected kernels. This gain in performance was obtained while using less than one-tenth the lines of code. The performance of this DSL was also competitive with the best hand-optimized and hand-vectorized code, and is an order of magnitude better than existing production DSLs.Doctor of Philosoph

    Algorithm engineering for parallel computation

    Get PDF
    • …