Search CORE

39 research outputs found

Trace-level speculative multithreaded architecture

Author: González Colás Antonio María
Molina Clemente Carlos
Tubella Murgadas Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

This paper presents a novel microarchitecture to exploit trace-level speculation by means of two threads working cooperatively in a speculative and non-speculative way respectively. The architecture presents two main benefits: (a) no significant penalties are introduced in the presence of a misspeculation and (b) any type of trace predictor can work together with this proposal. In this way, aggressive trace predictors can be incorporated since misspeculations do not introduce significant penalties. We describe in detail TSMA (trace-level speculative multithreaded architecture) and present initial results to show the benefits of this proposal. We show how simple trace predictors achieve significant speed-up in the majority of cases. Results of a simple trace speculation mechanism show an average speed-up of 16%.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Compiler analysis for trace-level speculative multithreaded architectures

Author: González Colás Antonio María
Molina Clemente Carlos
Tubella Murgadas Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Trace-level speculative multithreaded processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. In this paper, we propose a static program analysis for identifying candidate traces to be speculated. This approach identifies large regions of code whose live-output values may be successfully predicted. We present several heuristics to determine the best opportunities for dynamic speculation, based on compiler analysis and program profiling information. Simulation results show that the proposed trace recognition techniques achieve on average a speed-up close to 38% for a collection of SPEC2000 benchmarks.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Thread-spawning schemes for speculative multithreading

Author: González Colás Antonio María
Marcuello Pascual Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Software-Controlled Instruction Prefetch Buffering for Low-End Processors

Author: Fleury Martin
McDonald-Maier Klaus
Qadri Muhammad Yasir
Qadri Nadia N
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2015
Field of study

This paper proposes a method of buffering instructions by software-based prefetching. The method allows low-end processors to improve their instruction throughput with a minimum of additional logic and power consumption. Low-end embedded processors do not employ caches for mainly two reasons. The first reason is that the overhead of cache implementation in terms of energy and area is considerable. The second reason is that, because a cache's performance primarily depends on the number of hits, an increasing number of misses could cause a processor to remain in stall mode for a longer duration. As a result, a cache may become more of a liability than an advantage. In contrast, the benchmarked results for the proposed software-based prefetch buffering without a cache show a 5-10% improvement in execution time. They also show a 4% or more reduction in the energy-delay-square-product (ED2P) with a maximum reduction of 40%. The results additionally demonstrate that the performance and efficiency of the proposed architecture scales with the number of multicycle instructions. The benchmarked routines tested to arrive at these results are widely deployed components of embedded applications

University of Essex Research Repository

Crossref

Compiler Assisted Cache Prefetch Using Procedure Call Hierarchy

Author: Doshi Sheela A
Publication venue: LSU Digital Commons
Publication date: 01/01/2006
Field of study

Microprocessor performance has been increasing at an exponential rate while memory system performance improved at a linear rate. This widening difference in performances is increasingly rendering advances in computer architecture less useful as more instructions spend more time waiting for data to be fetched from the memory after a cache miss. Data prefetching is a technique that avoids some cache misses by bringing data into the cache before it is actually needed. Different approaches to data prefetching have been developed, however existing prefetch schemes do not eliminate all cache misses and even with smaller cache miss ratio, miss latency remains an important performance limiter. In this thesis, we propose a technique called Compiler Assisted Cache Prefetch Using Procedure Call Hierarchy (CAPPH). It is a hardware-software prefetch technique that uses a compiler to provide information pertaining to data structure layout, data-flow and procedure-call hierarchy of the program to a mechanism that prefetches linked data structures (LDS). It can prefetch data for procedures even before they are called by using this statically generated information. It is also capable of issuing prefetches for recursive functions that access LDS and arbitrary access sequences which are otherwise difficult to prefetch. The scheme is simulated using RSIML, a SPARC v8 simulator. Benchmarks em3d, health and mst from the Olden suite were used. The scheme was compared with an otherwise identical system with no prefetch and one using sequential prefetch. Simulations were performed to measure CAPPH performance and the decrease in the miss ratio of loads accessing LDS. Statistics of individual loads were collected, and accuracy, coverage and timeliness were measured against varying cache size and latency. Results from individual loads accessing linked data structures show considerable decrease in their miss ratios and average access times. CAPPH is found to be more accurate than sequential prefetch. The coverage and timeliness are lower in CAPPH than in sequential prefetch. We suggest heuristics to further enhance the effectiveness of the prefetch technique

Louisiana State University

Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Author: Alejandro Martinez
Antonio Gonzalez
Carlos Madriles
Enric Gibert
Fernando Latorre
Josep M. Codina
Kahle J. A.
Kernighan B.
Marcuello P.
Pedro López
Raúl Martinez
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref