Search CORE

343 research outputs found

Dynamic Memory Optimization using Pool Allocation and Prefetching

Author: Rabbah Rodric
Wong Weng Fai
Zhao Qin
Publication venue
Publication date: 01/01/2006
Field of study

Heap memory allocation plays an important role in modern applications. Conventional heap allocators, however, generally ignore the underlying memory hierarchy of the system, favoring instead a low runtime overhead and fast response times. Unfortunately, with little concern for the memory hierarchy, the data layout may exhibit poor spatial locality, and degrade cache performance. In this paper, we describe a dynamic heap allocation scheme called pool allocation. The strategy aims to improve cache performance by inspecting memory allocation requests, and allocating memory from appropriate heap pools as dictated by the requesting context. The advantages are two fold. First, by pooling together data with a common context, we expect to improve spatial locality, as data fetched to the caches will contain fewer items from different contexts. If the allocation patterns are closely matched to the traversal patterns, the end result is faster memory performance. Second, by pooling heap objects, we expect access patterns to exhibit more regularity, thus creating more opportunities for data prefetching. Our dynamic memory optimizer exploits the increased regularity to insert prefetch instructions at runtime. The optimizations are implemented in DynamoRIO, a dynamic optimization framework. We evaluate the work using various benchmarks, and measure a 17% speedup over gcc -O3 on an Athlon MP, and a 13% speedup on a Pentium 4.Singapore-MIT Alliance (SMA

DSpace@MIT

Autonomic State Management for Optimistic Simulation Platforms

Author: Pellegrini Alessandro
Quaglia Francesco
Vitali Roberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

We present the design and implementation of an autonomic state manager (ASM) tailored for integration within optimistic parallel discrete event simulation (PDES) environments based on the C programming language and the executable and linkable format (ELF), and developed for execution on x8664 architectures. With ASM, the state of any logical process (LP), namely the individual (concurrent) simulation unit being part of the simulation model, is allowed to be scattered on dynamically allocated memory chunks managed via standard API (e.g., malloc/free). Also, the application programmer is not required to provide any serialization/deserialization module in order to take a checkpoint of the LP state, or to restore it in case a causality error occurs during the optimistic run, or to provide indications on which portions of the state are updated by event processing, so to allow incremental checkpointing. All these tasks are handled by ASM in a fully transparent manner via (A) runtime identification (with chunk-level granularity) of the memory map associated with the LP state, and (B) runtime tracking of the memory updates occurring within chunks belonging to the dynamic memory map. The co-existence of the incremental and non-incremental log/restore modes is achieved via dual versions of the same application code, transparently generated by ASM via compile/link time facilities. Also, the dynamic selection of the best suited log/restore mode is actuated by ASM on the basis of an innovative modeling/optimization approach which takes into account stability of each operating mode with respect to variations of the model/environmental execution parameters

ART

Archivio della ricerca- Università di Roma La Sapienza

Performance Test Selection Using Machine Learning and a Study of Binning Effect in Memory Allocators

Author: Oliveira Sousa Anderson
Publication venue: 'University of Waterloo'
Publication date: 30/04/2019
Field of study

Performance testing is an essential part of the development life cycle that must be done in a timely fashion. However, checking for performance regressions in software can be time-consuming, especially for complex systems containing multiple lengthy tests cases. The first part of this thesis presents a technique to performance test selection using machine learning. In our approach, we build features using information extracted from the previous software versions to train classifiers that assist developers in deciding whether or not to execute a performance test on a new version. Our results show that the classifiers can be used as a mechanism that aids test selection and consequently avoids unnecessary testing. The second part of this work investigates the binning effect on user-space memory allocators. First, we examine how binning events can be a source of performance outliers in Redis and CPython object allocators. Second, we implement a \textit{Pintool} to detect the occurrence of binning on Python programs. The tool performs dynamic binary instrumentation on the interpreter and outputs information that helps developers in performing code optimizations. Finally, we use our tool to investigate the presence of binning in various widely used Python libraries

University of Waterloo's Institutional Repository

Precise garbage collection for C

Author: Rafkind Jon
Regehr John
Publication venue: University of Utah
Publication date: 01/01/2009
Field of study

Journal ArticleMagpie is a source-to-source transformation for C programs that enables precise garbage collection, where precise means that integers are not confused with pointers, and the liveness of a pointer is apparent at the source level. Precise GC is primarily useful for long-running programs and programs that interact with untrusted components. In particular, we have successfully deployed precise GC in the C implementation of a language run-time system that was originally designed to use conservative GC. We also report on our experience in transforming parts of the Linux kernel to use precise GC instead of manual memory management

The University of Utah: J. Willard Marriott Digital Library

Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing

Author: STAZI GIULIA
Publication venue
Publication date: 18/02/2020
Field of study

The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats

Archivio della ricerca- Università di Roma La Sapienza