5 research outputs found

    Search-based Model-driven Loop Optimizations for Tensor Contractions

    Get PDF
    Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the coupled cluster method. The Tensor Contraction Engine (TCE) is a high-level program synthesis system that facilitates the generation of high-performance parallel programs from tensor contraction equations. We are developing a new software infrastructure for the TCE that is designed to allow experimentation with optimization algorithms for modern computing platforms, including for heterogeneous architectures employing general-purpose graphics processing units (GPGPUs). In this dissertation, we present improvements and extensions to the loop fusion optimization algorithm, which can be used with cost models, e.g., for minimizing memory usage or for minimizing data movement costs under a memory constraint. We show that our data structure and pruning improvements to the loop fusion algorithm result in significant performance improvements that enable complex cost models being use for large input equations. We also present an algorithm for optimizing the fused loop structure of handwritten code. It determines the regions in handwritten code that are safe to be optimized and then runs the loop fusion algorithm on the dependency graph of the code. Finally, we develop an optimization framework for generating GPGPU code consisting of loop fusion optimization with a novel cost model, tiling optimization, and layout optimization. Depending on the memory available on the GPGPU and the sizes of the tensors, our framework decides which processor (CPU or GPGPU) should perform an operation and where the result should be moved. We present extensive measurements for tuning the loop fusion algorithm, for validating our optimization framework, and for measuring the performance characteristics of GPGPUs. Our measurements demonstrate that our optimization framework outperforms existing general-purpose optimization approaches both on multi-core CPUs and on GPGPUs

    Data Centric Cache Measurement Using Hardware and Software Instrumentation

    Get PDF
    The speed at which microprocessors can perform computations is increasing faster than the speed of access to main memory, making efficient use of memory caches ever more important. Because of this, information about the cache behavior of applications is valuable for performance tuning. To be most useful to a programmer, this information should be presented in a way that relates it to data structures at the source code level; we will refer to this as data centric cache information. This disser-tation examines the problem of how to collect such information. We describe tech-niques for accomplishing this using hardware performance monitors and software in-strumentation. We discuss both performance monitoring features that are present in existing processors and a proposed feature for future designs. The first technique we describe uses sampling of cache miss addresses, relat-ing them to data structures. We present the results of experiments using an imple-mentation of this technique inside a simulator, which show that it can collect the de-sired information accurately and with low overhead. We then discuss a tool called Cache Scope that implements this on actual hardware, the Intel Itanium 2 processor. Experiments with this tool validate that perturbation and overhead can be kept low in a real-world setting. We present examples of tuning the performance of two applica-tions based on data from this tool. By changing only the layout of data structures, we achieved approximately 24% and 19% reductions in running time. We also describe a technique that uses a proposed hardware feature that pro-vides information about cache evictions to sample eviction addresses. We present results from an implementation of this technique inside a simulator, showing that even though this requires storing considerably more data than sampling cache misses, we are still able to collect information accurate enough to be useful while keeping overhead low. We discuss an example of performance tuning in which we were able to reduce the running time of an application by 8% using information gained from this tool

    An Experimental Evaluation of Tiling and Shackling for Memory Hierarchy Management

    No full text
    On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring compiler community has developed localityenhancing program transformations, the most well-known of which is loop tiling. Tiling is restricted to perfectly nested loops, but many imperfectly nested loops can be transformed into perfectly nested loops that can then be tiled. Recently, we proposed an alternative approach to locality enhancement called data shackling. Data shackling reasons about data traversals rather than iteration space traversals, and can be applied directly to imperfectly nested loops. We have implemented shackling in the SGI MIPSPro compiler which already has a sophisticated implementation of tiling. Our experiments on the SGI Octane workstation with dense numerical linear algebra programs show that shackled code obtains double the performance of tiled code for most of these programs, and ob..

    Andalusi Christianity: the survival of indigenous Christian communities

    Get PDF
    This thesis comprises an attempt to re-evaluate the experience and the survival of the indigenous Christian population of al-Andalus. It is a response to two problematic aspects of the historiography, whose authority has only recently begun to be questioned: first, the inordinate focus upon the polemical and problematic mid-ninth-century Cordoban hagiography and apologetic of Eulogius and Paul Albar, whose prejudiced vision has not only been accepted as a source of social history, but also projected onto all Andalusī Christianity to support the second – the assertion that conversion happened early and en masse, and led to their eradication in the early twelfth century. Eulogius and Albar’s account of a Córdoba oppressed and Christians persecuted (a trope herein dubbed the ecclesia destituta) has dominated thinking about the indigenous Christians of al-Andalus, due to its championing by Catholic historians since the texts’ rediscovery and publication in 1574, and by nineteenth-century Spanish nationalists to whose ideological and patriotic purposes it was amenable. The Cordobans’ account is here re-evaluated as regards its value as a historical artefact and its internal problems are outlined. The discrepancies between the picture created by Eulogius and Albar and that of other contemporary reports, and the problematic hagiography, are then explained to some degree by the literary models Eulogius had at his disposal – of primary interest are the classical pagan poetics of Vergil, Horace and Juvenal and the late antique theology of Augustine. Albar’s famous despair at the Arabisation of the Christian youth has, in conjunction with Eulogius’ ecclesia destituta and the relative scarcity of documentary evidence for the Christians of Andalusī territory, formed the crux of assumptions regarding the speed and extent of Arabisation and conversion. In reassessing Richard Bulliet’s ‘curve of conversion’, which seemed on a faulty reading to prove these assumptions, the second part of the thesis seeks to argue that profound Arabisation did not impact until a century later than is thought and resulted not in assimilative decline but in a late cultural flowering, and show the long, and in many places unbroken, survival of indigenous Christian communities in al-Andalus to the early fifteenth century

    Maritime expressions:a corpus based exploration of maritime metaphors

    Get PDF
    This study uses a purpose-built corpus to explore the linguistic legacy of Britain’s maritime history found in the form of hundreds of specialised ‘Maritime Expressions’ (MEs), such as TAKEN ABACK, ANCHOR and ALOOF, that permeate modern English. Selecting just those expressions commencing with ’A’, it analyses 61 MEs in detail and describes the processes by which these technical expressions, from a highly specialised occupational discourse community, have made their way into modern English. The Maritime Text Corpus (MTC) comprises 8.8 million words, encompassing a range of text types and registers, selected to provide a cross-section of ‘maritime’ writing. It is analysed using WordSmith analytical software (Scott, 2010), with the 100 million-word British National Corpus (BNC) as a reference corpus. Using the MTC, a list of keywords of specific salience within the maritime discourse has been compiled and, using frequency data, concordances and collocations, these MEs are described in detail and their use and form in the MTC and the BNC is compared. The study examines the transformation from ME to figurative use in the general discourse, in terms of form and metaphoricity. MEs are classified according to their metaphorical strength and their transference from maritime usage into new registers and domains such as those of business, politics, sports and reportage etc. A revised model of metaphoricity is developed and a new category of figurative expression, the ‘resonator’, is proposed. Additionally, developing the work of Lakov and Johnson, Kovesces and others on Conceptual Metaphor Theory (CMT), a number of Maritime Conceptual Metaphors are identified and their cultural significance is discussed
    corecore