Search CORE

3,590 research outputs found

DSM64: A Distributed Shared Memory System in User-Space

Author: Holsapple Stephen Alan
Publication venue: DigitalCommons@CalPoly
Publication date: 01/05/2012
Field of study

This paper presents DSM64: a lazy release consistent software distributed shared memory (SDSM) system built entirely in user-space. The DSM64 system is capable of executing threaded applications implemented with pthreads on a cluster of networked machines without any modifications to the target application. The DSM64 system features a centralized memory manager [1] built atop Hoard [2, 3]: a fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. In my presentation, I present a SDSM system written in C++ for Linux operating systems. I discuss a straight-forward approach to implement SDSM systems in a Linux environment using system-provided tools and concepts avail- able entirely in user-space. I show that the SDSM system presented in this paper is capable of resolving page faults over a local area network in as little as 2 milliseconds. In my analysis, I present the following. I compare the performance characteristics of a matrix multiplication benchmark using various memory coherency models. I demonstrate that matrix multiplication benchmark using a LRC model performs orders of magnitude quicker than the same application using a stricter coherency model. I show the effect of coherency model on memory access patterns and memory contention. I compare the effects of different locking strategies on execution speed and memory access patterns. Lastly, I provide a comparison of the DSM64 system to a non-networked version using a system-provided allocator

DigitalCommons@CalPoly

Compiling Tree Transforms to Operate on Packed Representations

Author: Chamith Buddhika
Koparkar Chaitanya
Kulkarni Milind
Newton Ryan R.
Sakka Laith
Spall Sarah
Tobin-Hochstadt Sam
Vollmer Michael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st European Conference on Object-Oriented Programming (ECOOP 2017)
Publication date: 01/01/2017
Field of study

When written idiomatically in most programming languages, programs that traverse and construct trees operate over pointer-based data structures, using one heap object per-leaf and per-node. This representation is efficient for random access and shape-changing modifications, but for traversals, such as compiler passes, that process most or all of a tree in bulk, it can be inefficient. In this work we instead compile tree traversals to operate on pointer-free pre-order serializations of trees. On modern architectures such programs often run significantly faster than their pointer-based counterparts, and additionally are directly suited to storage and transmission without requiring marshaling. We present a prototype compiler, Gibbon, that compiles a small first-order, purely functional language sufficient for tree traversals. The compiler transforms this language into intermediate representation with explicit pointers into input and output buffers for packed data. The key compiler technologies include an effect system for capturing traversal behavior, combined with an algorithm to insert destination cursors. We evaluate our compiler on tree transformations over a real-world dataset of source-code syntax trees. For traversals touching the whole tree, such as maps and folds, packed data allows speedups of over 2x compared to a highly-optimized pointer-based baseline

Dagstuhl Research Online Publication Server

Kent Academic Repository

HALO: Post-Link Heap-Layout Optimisation

Author: Berger Emery D.
Berger Emery D.
Calder Brad
Chilimbi Trishul M.
Chilimbi Trishul M.
David
Evans Jason
Evans Jason
Leijen Daan
Matthew
Nevill-Manning C. G.
Newman M. E. J.
Powers Bobby
Standard Performance Evaluation Corporation
Standard Performance Evaluation Corporation
Trishul
Truong D. N.
Publication venue: CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
Publication date: 01/02/2020
Field of study

Today, general-purpose memory allocators dominate the landscape of dynamic memory management. While these so- lutions can provide reasonably good behaviour across a wide range of workloads, it is an unfortunate reality that their behaviour for any particular workload can be highly suboptimal. By catering primarily to average and worst-case usage patterns, these allocators deny programs the advantages of domain-specific optimisations, and thus may inadvertently place data in a manner that hinders performance, generating unnecessary cache misses and load stalls. To help alleviate these issues, we propose HALO: a post-link profile-guided optimisation tool that can improve the layout of heap data to reduce cache misses automatically. Profiling the target binary to understand how allocations made in different contexts are related, we specialise memory-management routines to allocate groups of related objects from separate pools to increase their spatial locality. Unlike other solutions of its kind, HALO employs novel grouping and identification algorithms which allow it to create tight-knit allocation groups using the entire call stack and to identify these efficiently at runtime. Evaluation of HALO on contemporary out-of-order hardware demonstrates speedups of up to 28% over jemalloc, out-performing a state-of-the-art data placement technique from the literature

Crossref

Apollo (Cambridge)

Automated Object Layout Optimization in a Portable Microkernel

Author: Dannowski Uwe
Publication venue
Publication date: 26/03/2012
Field of study

KITopen

Reducing Library Overheads through Source-to-Source Translation

Author: Baden Scott
King Alden
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2012
Field of study

AbstractObject oriented application libraries targeted to a specific application domain are an attractive means of reducing the software development time for sophisticated high performance applications. However, libraries can have the drawback of high abstraction penalties. We describe a domain specific, source-to-source translator that eliminates abstraction penalties in an array class library used to analyze turbulent flow simulation data. Our translator effectively flattens the abstractions, yielding performance within 75% of C code that uses primitive C arrays and no user-defined abstractions

Elsevier - Publisher Connector

Software/Hardware Co-Design and Co-Specialisation: Novel Simulation Techniques and Optimisations

Author: Rodchenko Andrey
Publication venue
Publication date: 01/08/2018
Field of study

The University of Manchester - Institutional Repository

Scopes Describe Frames: A Uniform Model for Memory Layout in Dynamic Semantics

Author: Bach Poulsen Casper
Tolmach Andrew
Visser Eelco
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th European Conference on Object-Oriented Programming (ECOOP 2016)
Publication date: 01/01/2016
Field of study

Semantic specifications do not make a systematic connection between the names and scopes in the static structure of a program and memory layout, and access during its execution. In this paper, we introduce a systematic approach to the alignment of names in static semantics and memory in dynamic semantics, building on the scope graph framework for name resolution. We develop a uniform memory model consisting of frames that instantiate the scopes in the scope graph of a program. This provides a language-independent correspondence between static scopes and run-time memory layout, and between static resolution paths and run-time memory access paths. The approach scales to a range of binding features, supports straightforward type soundness proofs, and provides the basis for a language-independent specification of sound reachability-based garbage collection

Dagstuhl Research Online Publication Server

Software-Architecture Recovery from Machine Code

Author: Reps Thomas
Srinivasan Venkatesh Karthik
Publication venue
Publication date: 13/03/2013
Field of study

In this paper, we present a tool, called Lego, which recovers object-oriented software architecture from stripped binaries. Lego takes a stripped binary as input, and uses information obtained from dynamic analysis to (i) group the functions in the binary into classes, and (ii) identify inheritance and composition relationships between the inferred classes. The information obtained by Lego can be used for reengineering legacy software, and for understanding the architecture of software systems that lack documentation and source code. Our experiments show that the class hierarchies recovered by Lego have a high degree of agreement---measured in terms of precision and recall---with the hierarchy defined in the source code

CiteSeerX

Minds@University of Wisconsin