Search CORE

8 research outputs found

Identifying and exploiting concurrency in object-based real-time systems

Author: Yu Guohui
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1996
Field of study

The use of object-based mechanisms, i.e., abstract data types (ADTs), for constructing software systems can help to decrease development costs, increase understandability and increase maintainability. However, execution efficiency may be sacrificed due to the large number of procedure calls, and due to contention for shared ADTs in concurrent systems. Such inefficiencies are a concern in real-time applications that have stringent timing requirements. To address these issues, the potentially inefficient procedure calls are turned into a source of concurrency via asynchronous procedure calls (ARPCs), and contention for shared ADTS is reduced via ADT cloning. A framework for concurrency analysis in object-based systems is developed, and compiler techniques for identifying potential concurrency via ARPCs and cloning are introduced. Exploitation of the parallelizing compiler techniques is illustrated in the context of an incremental schedule construction algorithm that enhances concurrency incrementally so that feasible real-time schedules can be constructed. Experimental results show large speedup gains with these techniques. Additionally, experiments show that the concurrency enhancement techniques are often useful in constructing feasible schedules for hard real-time systems

Digital Commons @ New Jersey Institute of Technology (NJIT)

Identifying reusable functions in code using specification driven techniques

Author: De Lucia Andrea
Publication venue
Publication date: 01/01/1995
Field of study

The work described in this thesis addresses the field of software reuse. Software reuse is widely considered as a way to increase the productivity and improve the quality and reliability of new software systems. Identifying, extracting and reengineering software. components which implement abstractions within existing systems is a promising cost-effective way to create reusable assets. Such a process is referred to as reuse reengineering. A reference paradigm has been defined within the RE(^2) project which decomposes a reuse reengineering process in five sequential phases. In particular, the first phase of the reference paradigm, called Candidature phase, is concerned with the analysis of source code for the identification of software components implementing abstractions and which are therefore candidate to be reused. Different candidature criteria exist for the identification of reuse-candidate software components. They can be classified in structural methods (based on structural properties of the software) and specification driven methods (that search for software components implementing a given specification).In this thesis a new specification driven candidature criterion for the identification and the extraction of code fragments implementing functional abstractions is presented. The method is driven by a formal specification of the function to be isolated (given in terms of a precondition and a post condition) and is based on the theoretical frameworks of program slicing and symbolic execution. Symbolic execution and theorem proving techniques are used to map the specification of the functional abstractions onto a slicing criterion. Once the slicing criterion has been identified the slice is isolated using algorithms based on dependence graphs. The method has been specialised for programs written in the C language. Both symbolic execution and program slicing are performed by exploiting the Combined C Graph (CCG), a fine-grained dependence based program representation that can be used for several software maintenance tasks

Durham e-Theses

Array optimizations for high productivity programming languages

Author: Joyner Mackale
Publication venue
Publication date: 01/01/2009
Field of study

While the HPCS languages (Chapel, Fortress and X10) have introduced improvements in programmer productivity, several challenges still remain in delivering high performance. In the absence of optimization, the high-level language constructs that improve productivity can result in order-of-magnitude runtime performance degradations. This dissertation addresses the problem of efficient code generation for high-level array accesses in the X10 language. The X10 language supports rank-independent specification of loop and array computations using regions and points. Three aspects of high-level array accesses in X10 are important for productivity but also pose significant performance challenges: high-level accesses are performed through Point objects rather than integer indices, variables containing references to arrays are rank-independent, and array subscripts are verified as legal array indices during runtime program execution. Our solution to the first challenge is to introduce new analyses and transformations that enable automatic inlining and scalar replacement of Point objects. Our solution to the second challenge is a hybrid approach. We use an interprocedural rank analysis algorithm to automatically infer ranks of arrays in X10. We use rank analysis information to enable storage transformations on arrays. If rank-independent array references still remain after compiler analysis, the programmer can use X10's dependent type system to safely annotate array variable declarations with additional information for the rank and region of the variable, and to enable the compiler to generate efficient code in cases where the dependent type information is available. Our solution to the third challenge is to use a new interprocedural array bounds analysis approach using regions to automatically determine when runtime bounds checks are not needed. Our performance results show that our optimizations deliver performance that rivals the performance of hand-tuned code with explicit rank-specific loops and lower-level array accesses, and is up to two orders of magnitude faster than unoptimized, high-level X10 programs. These optimizations also result in scalability improvements of X10 programs as we increase the number of CPUs. While we perform the optimizations primarily in X10, these techniques are applicable to other high-productivity languages such as Chapel and Fortress

DSpace at Rice University

Recommended from our members

Dynamic Trace Analysis with Zero-Suppressed BDDs

Author: Price Graham David
Publication venue: CU Scholar
Publication date: 01/01/2011
Field of study

Instruction level parallelism (ILP) limitations have forced processor manufacturers to develop multi-core platforms with the expectation that programs will be able to exploit thread level parallelism (TLP). Multi-core programming shifts the burden of locating additional performance away from computer hardware to the software developers, who often attempt high-level redesigns focused on exposing thread level parallelism, as well as explore aggressive optimizations for sequential codes. Precise dynamic analysis can provide useful guidance for program optimization efforts, including efforts to find and extract thread level parallelism. Unfortunately, finding regions of code amenable to further optimization efforts requires analyzing traces that can quickly grow in size. Analysis of large dynamic traces (e.g. one billion instructions or more) is often impractical for commodity hardware. An ideal representation for dynamic trace data would provide compression. However, decompressing large software traces, even if decompressed data is never permanently stored, would make many analysis impractical. A better solution would allow analysis of the compressed data, without a costly decompression step. Prior works have developed trace compressors that generate an analyzable representation, but often limit the precision or scope of analyses. Zero-suppressed binary decision diagram (ZDDs) exhibit many of the desired properties of an ideal trace representation. This thesis shows: (1) dynamic trace data may be represented by zero-suppressed binary decision diagrams (ZDDs); (2) ZDDs allow many analyses to scale; (3) encoding traces as ZDDs can be performed in a reasonable amount of time; and, (4) ZDD-based analyses, such as irrelevant instruction detection and potential coarse-grained thread level parallelism extraction, can reveal a number of performanc

CU Scholar Institutional Repository

Compiling for parallel multithreaded computation on symmetric multiprocessors

Author: Shaw Andrew, 1968-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 145-149).by Andrew Shaw.Ph.D

CiteSeerX

DSpace@MIT

Multipurpose short-term memory structures.

Author
Publication venue: Department of Cultural and Religious Studies, The Chinese University of Hong Kong
Publication date: 01/01/1995
Field of study

by Yung, Chan.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 107-110).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Cache --- p.1Chapter 1.1.1 --- Introduction --- p.1Chapter 1.1.2 --- Data Prefetching --- p.2Chapter 1.2 --- Register --- p.2Chapter 1.3 --- Problems and Challenges --- p.3Chapter 1.3.1 --- Overhead of registers --- p.3Chapter 1.3.2 --- EReg --- p.5Chapter 1.4 --- Organization of the Thesis --- p.6Chapter 2 --- Previous Studies --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Data aliasing --- p.9Chapter 2.3 --- Data prefetching --- p.12Chapter 2.3.1 --- Introduction --- p.12Chapter 2.3.2 --- Hardware Prefetching --- p.12Chapter 2.3.3 --- Prefetching with Software Support --- p.13Chapter 2.3.4 --- Reducing Cache Pollution --- p.14Chapter 3 --- BASIC and ADM Models --- p.15Chapter 3.1 --- Introduction of Basic Model --- p.15Chapter 3.2 --- Architectural and Operational Detail of Basic Model --- p.18Chapter 3.3 --- Discussion --- p.19Chapter 3.3.1 --- Implicit Storing --- p.19Chapter 3.3.2 --- Associative Logic --- p.22Chapter 3.4 --- Example for Basic Model --- p.22Chapter 3.5 --- Simulation Results --- p.23Chapter 3.6 --- Temporary Storage Problem in Basic Model --- p.29Chapter 3.6.1 --- Introduction --- p.29Chapter 3.6.2 --- Discussion on the Solutions --- p.31Chapter 3.7 --- Introduction of ADM Model --- p.35Chapter 3.8 --- Architectural and Operational Detail of ADM Model --- p.37Chapter 3.9 --- Discussion --- p.39Chapter 3.9.1 --- File Partition --- p.39Chapter 3.9.2 --- STORE Instruction --- p.39Chapter 3.10 --- Example for ADM Model --- p.40Chapter 3.11 --- Simulation Results --- p.40Chapter 3.12 --- Temporary storage Problem of ADM Model --- p.46Chapter 3.12.1 --- Introduction --- p.46Chapter 3.12.2 --- Discussion on the Solutions --- p.46Chapter 4 --- ADS Model and ADSM Model --- p.49Chapter 4.1 --- Introduction of ADS Model --- p.49Chapter 4.2 --- Architectural and Operational Detail of ADS Model --- p.50Chapter 4.3 --- Discussion --- p.52Chapter 4.3.1 --- Prefetching Priority --- p.52Chapter 4.3.2 --- Data Prefetching --- p.53Chapter 4.3.3 --- EReg File Splitting --- p.53Chapter 4.3.4 --- Compiling Procedure --- p.53Chapter 4.4 --- Example for ADS Model --- p.54Chapter 4.5 --- Simulation Results --- p.55Chapter 4.6 --- Discussion on the Architectural and Operational Variations for ADS Model --- p.62Chapter 4.6.1 --- Temporary storage Problem --- p.62Chapter 4.6.2 --- Operational variation for Data Prefetching --- p.63Chapter 4.7 --- Introduction of ADSM Model --- p.64Chapter 4.8 --- Architectural and Operational Detail of ADSM Model --- p.65Chapter 4.9 --- Discussion --- p.67Chapter 4.10 --- Example for ADSM Model --- p.67Chapter 4.11 --- Simulation Results --- p.68Chapter 4.12 --- Discussion on the Architectural and Operational Variations for ADSM Model --- p.71Chapter 4.12.1 --- Temporary storage Problem --- p.71Chapter 4.12.2 --- Operational variation for Data Prefetching --- p.73Chapter 5 --- IADSM Model and IADSMC&IDLC Model --- p.75Chapter 5.1 --- Introduction of IADSM Model --- p.75Chapter 5.2 --- Architectural and Operational Detail of IADSM Model --- p.76Chapter 5.3 --- Discussion --- p.79Chapter 5.3.1 --- Implicit Loading --- p.79Chapter 5.3.2 --- Compiling Procedure --- p.81Chapter 5.4 --- Example for IADSM Model --- p.81Chapter 5.5 --- Simulation Results --- p.84Chapter 5.6 --- Temporary Storage Problem of IADSM Model --- p.87Chapter 5.7 --- Introduction of IADSMC&IDLC Model..........: --- p.88Chapter 5.8 --- Architectural and Operational Detail of IADSMC & IDLC Model --- p.89Chapter 5.9 --- Discussion --- p.90Chapter 5.9.1 --- Additional Operations --- p.90Chapter 5.9.2 --- Compiling Procedure --- p.93Chapter 5.10 --- Example for IADSMC&IDLC Model --- p.93Chapter 5.11 --- Simulation Results --- p.94Chapter 5.12 --- Temporary Storage Problem of IADSMC&IDLC Model --- p.96Chapter 6 --- Compiler and Memory System Support for EReg --- p.99Chapter 6.1 --- Impact on Compiler --- p.99Chapter 6.1.1 --- Register Usage --- p.99Chapter 6.1.2 --- Effect of Unrolling --- p.100Chapter 6.1.3 --- Code Scheduling Algorithm --- p.101Chapter 6.2 --- Impact on Memory System --- p.102Chapter 6.2.1 --- Memory Bottleneck --- p.102Chapter 6.2.2 --- Size of EReg Files --- p.103Chapter 7 --- Conclusions --- p.104Chapter 7.1 --- Summary --- p.104Chapter 7.2 --- Future Research --- p.105Bibliography --- p.107Chapter A --- Source code of the Kernels --- p.111Chapter B --- Program Analysis --- p.126Chapter B.1 --- Program analysed by Basic Model --- p.126Chapter B.2 --- Program analysed by ADM Model --- p.133Chapter B.3 --- Program analysed by ADS Model --- p.140Chapter B.4 --- Program analysed by ADSM Model --- p.148Chapter B.5 --- Program analysed by IADSM Model --- p.156Chapter B.6 --- Program analysed by IADSMC&IDLC Model --- p.163Chapter C --- Cache Simulation on Prefetching of ADS model --- p.17

CUHK Digital Repository