Search CORE

517 research outputs found

Prefetching in functional languages

Author: Baer Jean-Loup
Cahoon Brendon
Salvucci Jérémie
Publication venue: Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management
Publication date: 16/06/2020
Field of study

Functional programming languages contain a number of runtime and language features, such as garbage collection, indirect memory accesses, linked data structures and immutability, that interact with a processor’s memory system. These conspire to cause a variety of unintuitive memory performance effects. For example, it is slower to traverse through linked lists and arrays of data that have been sorted than to traverse the same data accessed in the order it was allocated. We seek to understand these issues and mitigate them in a manner consistent with functional languages, taking advantage of the features themselves where possible. For example, immutability and garbage collection force linked lists to be allocated roughly sequentially in memory, even when the data pointed to within each node is not. We add language primitives for software-prefetching to the OCaml language to exploit this, and observe significant performance improvements a variety of micro- and macro-benchmarks, resulting in speedups of up to 2× on the out-of-order superscalar Intel Haswell and Xeon Phi Knights Landing systems, and up to 3× on the in-order Arm Cortex-A53.Arm Limite

Crossref

Apollo (Cambridge)

Recommended from our members

Software prefetching for indirect memory accesses: A microarchitectural perspective

Author: Ainsworth S
Jones TM
Publication venue: ACM Transactions on Computer Systems
Publication date: 01/01/2019
Field of study

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. This article develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special class of irregular memory accesses often seen in high-performance workloads. We evaluate this across a wide set of systems, all of which gain benefit from the technique. We then evaluate the extent to which good prefetch instructions are architecture dependent and the class of programs that are particularly amenable. Across a set of memory-bound benchmarks, our automated pass achieves average speedups of 1.3× for an Intel Haswell processor, 1.1× for both an ARM Cortex-A57 and Qualcomm Kryo, 1.2× for a Cortex-72 and an Intel Kaby Lake, and 1.35× for an Intel Xeon Phi Knight’s Landing, each of which is an out-of-order core, and performance improvements of 2.1× and 2.7× for the in-order ARM Cortex-A53 and first generation Intel Xeon Phi.EPSRC [EP/K026399/1, EP/M506485/1], ARM Ltd

Apollo (Cambridge)

Predicting access to persistent objects through static code analysis

Author: A Ibrahim
A Stoutchinin
B Cahoon
J Mart
KM Curewitz
N Knafla
W Han
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In this paper, we present a fully-automatic, high-accuracy approach to predict access to persistent objects through static code analysis of object-oriented applications. The most widely-used previous technique uses a simple heuristic to make the predictions while approaches that offer higher accuracy are based on monitoring application execution. These approaches add a non-negligible overhead to the application’s execution time and/or consume a considerable amount of memory. By contrast, we demonstrate in our experimental study that our proposed approach offers better accuracy than the most common technique used to predict access to persistent objects, and makes the predictions farther in advance, without performing any analysis during application executionThis work has been supported by the European Union’s Horizon 2020 research and innovation program (grant H2020-MSCA-ITN-2014-642963), the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), the Spanish Ministry of Science and Innovation (contract TIN2015-65316) and Generalitat de Catalunya (contract 2014-SGR-1051). The authors would also like to thank Alex Barceló for his feedback on the formalization included in this paper.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Archivo Digital UPM

Computer-language based data prefetching techniques

Author: Touma Rizkallah
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data. However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead. In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead. The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach. The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Effective Compile-Time Analysis for Data Prefetching In Java

Author: Brendon D. Cahoon
Publication venue
Publication date: 01/01/2002
Field of study

The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many existing techniques for improving memory performance focus on Fortran and C programs, but memory latency is also a barrier to achieving high performance in object-oriented languages. Existing software techniques are inadequate for exposing optimization opportunities in object-oriented programs. One key problem is the use of high-level programming abstractions which make analysis difficult. Another challenge is that programmers use a variety of data structures, including arrays and linked structures, so optimizations must work on a broad range of programs. We develop a new unified data-flow analysis for identifying accesses to arrays and linked structures called recurrence analysis. Prior approaches that identify these access patterns are ad hoc, or treat arrays and linked structures independently. The data-flow analysis is intra- and inter-procedural, which is important in Java programs that use encapsulation to hide implementation details. We sho

CiteSeerX

ScholarWorks@UMass Amherst

Optimization of Storage-Referencing Gestures

Author: Cytron Ron K.
Fox Lucas M.
Hill Christopher R.
Kavi Krishna
Publication venue: Washington University Open Scholarship
Publication date: 03/10/2003
Field of study

We describe techniques for identifying and optimizing memory-accessing instruction sequences. We capture a sequence of such instructions, with the goal of sending the sequence as a single instruction from the CPU to a smart memory subsystem (IRAM or PIM). With a software/hardware codesign, the memory-accessing gestures can be rewritten as succinct superoperator instructions, and the gestures themselves could vary at runtime. As a result, the CPU executes fewer instructions and the CPU-memory bus is charged less often, resulting in lower power consumption. Reduction in power can be crucial for constrained, embedded systems. We discover gestures using a static and a dynamic approach, and we present data showing the presence of such gestures in real benchmarks (Java and C). We have shown the gesture-minimization problem to be NP-Complete, so we offer in this paper a heuristic approach the effectiveness of which we evaluate with experiments

Washington University St. Louis: Open Scholarship

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Ubiquitous Memory Introspection (Preliminary Manuscript)

Author: Amarasinghe Saman
Rabbah Rodric
Rudolph Larry
Wong Weng-Fai
Zhao Qin
Publication venue
Publication date: 25/09/2006
Field of study

Modern memory systems play a critical role in the performance ofapplications, but a detailed understanding of the application behaviorin the memory system is not trivial to attain. It requires timeconsuming simulations of the memory hierarchy using long traces, andoften using detailed modeling. It is increasingly possible to accesshardware performance counters to measure events in the memory system,but the measurements remain coarse grained, better suited forperformance summaries than providing instruction level feedback. Theavailability of a low cost, online, and accurate methodology forderiving fine-grained memory behavior profiles can prove extremelyuseful for runtime analysis and optimization of programs.This paper presents a new methodology for Ubiquitous MemoryIntrospection (UMI). It is an online and lightweight mini-simulationmethodology that focuses on simulating short memory access tracesrecorded from frequently executed code regions. The simulations arefast and can provide profiling results at varying granularities, downto that of a single instruction or address. UMI naturally complementsruntime optimizations techniques and enables new opportunities formemory specific optimizations.In this paper, we present a prototype implementation of a runtimesystem implementing UMI. The prototype is readily deployed oncommodity processors, requires no user intervention, and can operatewith stripped binaries and legacy software. The prototype operateswith an average runtime overhead of 20% but this slowdown is only 6%slower than a state of the art binary instrumentation tool. We used32 benchmarks, including the full suite of SPEC2000 benchmarks, forour evaluation. We show that the mini-simulation results accuratelyreflect the cache performance of two existing memory systems, anIntel Pentium~4 and an AMD Athlon MP (K7) processor. We alsodemonstrate that low level profiling information from the onlinesimulation can serve to identify high-miss rate load instructions with a77% rate of accuracy compared to full offline simulations thatrequired days to complete. The online profiling results are used atruntime to implement a simple software prefetching strategy thatachieves a speedup greater than 60% in the best case

DSpace@MIT

Sandboxes for Grid Computing

Author: Andersen Rasmus
Publication venue
Publication date: 01/01/2009
Field of study

Copenhagen University Research Information System