Search CORE

9 research outputs found

Context Switching with Multiple Register Windows: A RISC Performance Study

Author: Konsek Marion B.
Reed Daniel A.
Watcharawittayakul Wittaya
Publication venue
Publication date
Field of study

Although previous studies have shown that a large file of overlapping register windows can greatly reduce procedure call/return overhead, the effects of register windows in a multiprogramming environment are poorly understood. This paper investigates the performance of multiprogrammed, reduced instruction set computers (RISCs) as a function of window management strategy. Using an analytic model that reflects context switch and procedure call overheads, we analyze the performance of simple, linearly self-recursive programs. For more complex programs, we present the results of a simulation study. These studies show that a simple strategy that saves all windows prior to a context switch, but restores only a single window following a context switch, performs near optimally

NASA Technical Reports Server

The Susceptibility of Programs to Context Switching

Author: Conte Thomas M.
Hwu Wen-mei W.
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/04/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / MIP-8809478NCR Corp.AMD Corp. 29K Advanced Processor Development DivisionNational Aeronautics and Space Administration / NASA NAG 1-613Office of Naval Research / N00014-88-K-0656Hewlett-Packard Co

Illinois Digital Environment for Access to Learning and Scholarship Repository

Profile-Guided Automatic Inline Expansion for C Programs

Author: Chang Pohua P.
Hwu Wen-mei W.
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/04/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / MIP-8809478NCRAMD 29K Advanced Processor Development DivisionNational Aeronautics and Space Administration / NASA NAG 1-61

Illinois Digital Environment for Access to Learning and Scholarship Repository

Efficient Instruction Sequencing with Inline Target Insertion

Author: Chang Pohua P.
Hwu Wen-mei W.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/05/1990
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / MIP-8809478NCRNational Aeronautics and Space Administration / NASA NAG 1-613Office of Naval Research / N00014-88-K-065

Illinois Digital Environment for Access to Learning and Scholarship Repository

Software and hardware methods for memory access latency reduction on ILP processors

Author: Zhang Zhao
Publication venue: W&M ScholarWorks
Publication date: 01/01/2002
Field of study

While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies

College of William & Mary: W&M Publish

Recommended from our members

Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture

Author: Demme John David
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

These are exciting times for computer architecture research. Today there is significant demand to improve the performance and energy-efficiency of emerging, transformative applications which are being hammered out by the hundreds for new computing platforms and usage models. This booming growth of applications and the variety of programming languages used to create them is challenging our ability as architects to rapidly and rigorously characterize these applications. Concurrently, hardware has become more complex with the emergence of accelerators, multicore systems, and heterogeneity caused by further divergence between processor market segments. No one architect can now understand all the complexities of many systems and reason about the full impact of changes or new applications. To that end, this dissertation presents four case studies in quantitative methods. Each case study attacks a different application and proposes a new measurement or analytical technique. In each case study we find at least one surprising or unintuitive result which would likely not have been found without the application of our method

Columbia University Academic Commons

A Characterization of Processor Performance in the VAX-11/780

Author: Joel S. Emer
Publication venue: IEEE Computer Society Press
Publication date: 01/01/1984
Field of study

This paper reports the results of a study of VAX-11/780 processor performance using a novel hardware monitoring technique. A micro-PC histogram monitor was built for these measurements. It keeps a count of the number of microcode cycles executed at each microcode location. Measurement experiments were performed on live timesharing workloads as well as on synthetic workloads of several types. The histogram counts allow the calculation of the frequency of various architectural events, such as the frequency of different types of opcodes and operand specifiers, as well as the frequency of some implementation-specific events, such as translation buffer misses. The measurement technique also yields the amount of processing time spent in various activities, such as ordinary microcode computation, memory management, and processor stalls of different kinds. This, paper reports in detail the amount of time the 'average'fVAX instruction spends in these activities. 1

CiteSeerX

A Characterization of Processor Performance in the vax-11/780

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1984
Field of study

Crossref

A Characterization of Processor Performance in the vax-11/780

Author: Alpert D.
Douglas W. Clark
Huck J.C.
Joel S. Emer
Levy H.M.
Strecker W.D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref