Search CORE

41 research outputs found

Language and compiler support for dyanmic code generation

Author: Poletto Massimiliano Antonio
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 131-135).by Massimiliano A. Poletto.Ph.D

CiteSeerX

DSpace@MIT

Slot-based Calling Context Encoding

Author: Zhou Tong
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 11/08/2018
Field of study

Calling context is widely used in software engineering areas such as profiling, debugging and event logging. It can also enhance some dynamic analysis such as data race detection. To obtain the calling context at runtime, current approaches either perform expensive stack walking to recover contexts or instrument the application and dynamically encode the context into an integer. The current encoding schemes are either not fully precise, or have high instrumentation and detection overhead, and scalability issue for large and highly recursive applications.We propose slot-based calling context encoding (SCCE), which consists of a scalable encoding for acyclic contexts and an efficient encoding for cyclic contexts. Evaluating with CPU 2006 benchmark suite, we show that our acyclic encoding is scalable, has very low instrumentation overhead, and an acceptable detection overhead. We also show that our cyclic encoding also has lower instrumentation and detection overhead than the state-of-the-art approach by significantly reducing the number of bytes pushed and checked for cyclic contexts

University of Tennessee, Knoxville: Trace

Building software tools for combat modeling and analysis

Author: Chen Yuanxin
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/12/2004
Field of study

The focus of this thesis is to use and leverage the strengths of dynamic computer program analysis methodologies in software engineering testing and debugging such as program behavior modeling and event grammars to automate the building and analysis of combat simulations. An original high level language METALS (Meta-Language for Combat Simulations) and its associated parser and C++ code generator were designed to reduce the amount of time and developmental efforts needed to build sophisticated real world combat simulations. A C++ simulation of the Navy's current mine avoidance problem in littoral waters was generated using high level METALS description in the thesis as a demonstration. The software tools that were developed will allow users to focus their attention and efforts in the problem domain while sparing them to a considerable extent the rigors of detailed implementation.http://archive.org/details/buildingsoftware109451282Major, Republic of Singapore NavyApproved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

ISA-independent workload characterization and its implications for specialized architectures

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Run-time optimization of adaptive irregular applications

Author: Yu Hao
Publication venue: Texas A&M University
Publication date: 15/11/2004
Field of study

Compared to traditional compile-time optimization, run-time optimization could oﬀer signiﬁcant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identiﬁed a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Speciﬁcally, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm speciﬁed by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems

Texas A&M Repository

An algorithm for incrementally enumerating bitriangles in large bipartite networks

Author: Royo Sales Juan Pablo
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/10/2021
Field of study

UPCommons. Portal del coneixement obert de la UPC

Parallelizing Heavyweight Debugging Tools with MPIecho *

Author: Barry Rountree
Bronis R De Supinski
Guy Cobb
Henry Tufo
Martin Schulz
Todd Gamblin
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT Idioms created for debugging execution on single processors and multicore systems have been successfully scaled to thousands of processors, but there is little hope that this class of techniques can continue to be scaled out to tens of millions of cores. In order to allow development of more scalable debugging idioms we introduce MPIecho, a novel runtime platform that enables cloning of MPI ranks. Given identical execution on each clone, we then show how heavyweight debugging approaches can be parallelized, reducing their overhead to a fraction of the serialized case. We also show how this platform can be useful in isolating the source of hardwarebased nondeterministic behavior and provide a case study based on a recent processor bug at LLNL. While total overhead will depend on the individual tool, we show that the platform itself contributes little: 512x tool parallelization incurs at worst 2x overhead across the NAS Parallel benchmarks, hardware fault isolation contributes at worst an additional 44% overhead. Finally, we show how MPIecho can lead to near-linear reduction in overhead when combined with Maid, a heavyweight memory tracking tool provided with Intel's Pin platform. We demonstrate overhead reduction from 1, 466% to 53% and from 740% to 14% for cg.D.64 and lu.D.64, respectively, using only an additional 64 cores

CiteSeerX

Compilation techniques for short-vector instructions

Author: Larsen Samuel (Samuel Barton), 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 127-133).Multimedia extensions are nearly ubiquitous in today's general-purpose processors. These extensions consist primarily of a set of short-vector instructions that apply the same opcode to a vector of operands. This design introduces a data-parallel component to processors that exploit instruction-level parallelism, and presents an opportunity for increased performance. In fact, ignoring a processor's vector opcodes can leave a significant portion of the available resources unused. In order for software developers to find short-vector instructions generally useful, the compiler must target these extensions with complete transparency and consistent performance. This thesis develops compiler techniques to target short-vector instructions automatically and efficiently. One important aspect of compilation is the effective management of memory alignment. As with scalar loads and stores, vector references are typically more efficient when accessing aligned regions. In many cases, the compiler can glean no alignment information and must emit conservative code sequences. In response, I introduce a range of compiler techniques for detecting and enforcing aligned references. In my benchmark suite, the most practical method ensures alignment for roughly 75% of dynamic memory references.(cont.) This thesis also introduces selective vectorization, a technique for balancing computation across a processor's scalar and vector resources. Current approaches for targeting short-vector instructions directly adopt vectorizing technology first developed for supercomputers. Traditional vectorization, however, can lead to a performance degradation since it fails to account for a processor's scalar execution resources. I formulate selective vectorization in the context of software pipelining. My approach creates software pipelines with shorter initiation intervals, and therefore, higher performance. In contrast to conventional methods, selective vectorization operates on a low-level intermediate representation. This technique allows the algorithm to accurately measure the performance trade-offs of code selection alternatives. A key aspect of selective vectorization is its ability to manage communication of operands between vector and scalar instructions. Even when operand transfer is expensive, the technique is sufficiently sophisticated to achieve significant performance gains. I evaluate selective vectorization on a set of SPEC FP benchmarks. On a realistic VLIW processor model, the approach achieves whole-program speedups of up to 1.35x over existing approaches. For individual loops, it provides speedups of up to 1.75x.by Samuel Larsen.Ph.D

DSpace@MIT

Run-time optimization of adaptive irregular applications

Author: Yu Hao
Publication venue: Texas A&M University
Publication date: 15/11/2004
Field of study

Texas A&M Repository