Search CORE

14 research outputs found

A Flexible Multi-port Caching Scheme for Reconfigurable Platforms

Author: D.A. Patterson
J. Chalidabhongse
J. Edmondson
P. Panda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Reducing the complexity of the register file in dynamic superscalar processors

Author: Balasubramonian Rajeev
Dwarkadas Sandhya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Journal ArticleDynamic superscalar processors execute multiple instructions out-of-order by looking for independent operations within a large window. The number of physical registers within the processor has a direct impact on the size of this window as most in-flight instructions require a new physical register at dispatch. A large multi-ported register file helps improve the instruction-level parallelism (ILP), but may have a detrimental effect on clock speed, especially in future wire-limited technologies. In this paper, we propose a register file organization that reduces register file size and port requirements for a given amount of ILP. We use a two-level register file organization to reduce register file size requirements, and a banked organization to reduce port requirements. We demonstrate empirically that the resulting register file organizations have reduced latency and (in the case of the banked organization) energy requirements for similar instructions per cycle (IPC) performance and improved instructions per second (IPS) performance in comparison to a conventional monolithic register file. The choice of organization is dependent on design goals

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

Combined branch target and predicate prediction

Author: Douglas Burger
Stephen W. Keckler
Publication venue: United States Patent and Trademark Office
Publication date: 25/03/2015
Field of study

Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.Board of Regents, University of Texas Syste

Texas ScholarWorks

Dynamic data memory partitioning for access region caches

Author: Park Sun Kyu
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2002
Field of study

For wide-issue processors, data cache needs to be heavily multi-ported with extremely wide data-paths. A recent proposal of multi-porting cache design divides memory streams into multiple independent sub-streams with the help of prediction mechanism before they enter the reservation stations. Partitioned memory-reference instructions are then fed into separate memory pipelines, each of which is connected to a small data-cache, called access region cache (ARC). A selection function for mapping memory references to each ARC can affect the data memory bandwidth as conflicts and load balance at each ARC may differ. In this thesis, we study various static and dynamic memory partitioning methods to see the effects of distributing memory references among the ARCS through exposing memory traffic of those designs. Six different approaches of distributing memory references, including two randomization methods and two dynamic methods, are considered. The potential effects on the memory performance with ARC are measured and compared with existing multi-porting solution as well as an ideal multi-ported data cache. This study concludes that scattering access conflicts dynamically, redirecting conflicting references dynamically to different ARCs at each cycle, can increase the memory bandwidth. However, increasing data bandwidth alone does not always results in performance improvement. Keeping the cache miss rate low is as important as sufficient memory bandwidth to achieve higher performance in wide-issue processors

Digital Repository @ Iowa State University (ISU)

Dead-block prediction & dead-block correlating prefetchers

Author: Falsafi Babak
Fide Cem
Lai An-Chow
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/04/2009
Field of study

Effective data prefetching requires accurate mechanisms to predict both “which” cache blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block Predictors (DBPs), trace-based predictors that accurately identify “when” an Ll data cache block becomes evictable or “dead”. Predicting a dead block significantly enhances prefetching lookahead and opportunity, and enables placing data directly into Ll, obviating the need for auxiliary prefetch buffers. This paper also proposes Dead-Block Correlating Prefetchers (DBCPs), that use address correlation to predict “which” subsequent block to prefetch when a block becomes evictable. A DBCP enables effective data prefetching in a wide spectrum of pointer- intensive, integer, and floating-point applications. We use cycle-accurate simulation of an out-of-order superscalar processor and memory-intensive benchmarks to show that: (1) dead-block prediction enhances prefetching lookahead at least by an order of magnitude as compared to previous techniques, (2) a DBP can predict dead blocks on average with a coverage of 90% only mispredicting 4% of the time, (3) a DBCP offers an address prediction coverage of 86% only mispredicting 3% of the time, and (4) DBCPs improve performance by 62% on average and 282% at best in the benchmarks we studie

Infoscience - École polytechnique fédérale de Lausanne

Dead-block prediction & dead-block correlating prefetchers

Author: An-Chow Lai
Babak Falsafi
Cem Fide
Nair Ravi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Level one cache performance in a multiprocessor memory hierarchy

Author: Cabrita Alves Martins C.A.
Publication venue
Publication date: 01/01/2005
Field of study

Repository TU/e

Pure OAI Repository

Approach to applying the contour model to the design of a hypothetical multiple-register-window architecture for the block-structured process

Author: Ying Hsu-Ku Bruce
Publication venue: 'Oklahoma State University Library'
Publication date: 01/05/1992
Field of study

The concepts described in this thesis are towards the implementation of the basic functions of a pipelined, load/store, multiple-register-window and scalar-oriented uniprocessor architecture. During the formation phase of these concepts, I am glad to have the opportunity to investigating the interrelation of computer architectures, data structures and systems programming, which are the fundamentals underlying virtually every software design. I also took pleasure in learning A WK and C++ programming languages (only the elementary things of the latter, however) for the simulation conducted in this thesis and the UNIX� document formatting/typesetting tools for the preparation of the text and figures presented in this thesis on the UNIX�-based PerkinElmer 3230 computer system of the Computer Science Department.Computer Scienc

SHAREOK repository

Task Activity Vectors: A Novel Metric for Temperature-Aware and Energy-Efficient Scheduling

Author: Merkel Andreas
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2010
Field of study

This thesis introduces the abstraction of the task activity vector to characterize applications by the processor resources they utilize. Based on activity vectors, the thesis introduces scheduling policies for improving the temperature distribution on the processor chip and for increasing energy efficiency by reducing the contention for shared resources of multicore and multithreaded processors

KITopen

High-Bandwidth Data Memory Systems for Superscalar Processors

Author: Gurindar S. Sohi
Kroft D.
Manoj Franklin
Przybylski S.A.
Rau B.R.
Smith J.E.
Publication venue: University of Wisconsin-Madison Department of Computer Sciences
Publication date: 01/01/1990
Field of study

Crossref

Minds@University of Wisconsin