Search CORE

11 research outputs found

Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Author: Alejandro Martinez
Antonio Gonzalez
Carlos Madriles
Enric Gibert
Fernando Latorre
Josep M. Codina
Kahle J. A.
Kernighan B.
Marcuello P.
Pedro López
Raúl Martinez
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Virtualizing Transactional Memory

Author: Kilburn T.
Konrad Lai
Maurice Herlihy
Moore K. E.
Ravi Rajwar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Scalable, reliable, power-efficient communication for hardware transactional memory

Author: Balasubramonian Rajeev
Pugsley Seth H.
Publication venue: University of Utah
Publication date: 01/01/2008
Field of study

Journal ArticleIn a hardware transactional memory system with lazy versioning and lazy conflict detection, the process of transaction commit can emerge as a bottleneck. This is especially true for a large-scale distributed memory system where multiple transactions may attempt to commit simultaneously and co-ordination is required before allowing commits to proceed in parallel. In this paper, we propose novel algorithms to implement commit that are more scalable (in terms of delay and energy) and are free of deadlocks/livelocks. We show that these algorithms have similarities with the token cache coherence concept and leverage these similarities to extend the algorithms to handle message loss and starvation scenarios. The proposed algorithms improve upon the state-of-the-art by yielding up to a 7X reduction in commit delay and up to a 48X reduction in network messages. These translate into overall performance improvements of up to 66% (for synthetic workloads with average transaction length of 200 cycles), 35% (for average transaction length of 1000 cycles), 8% (for average transaction length of 4000 cycles), and 41% (for a collection of SPLASH-2 programs)

The University of Utah: J. Willard Marriott Digital Library

System Support for Implicitly Parallel Programming

Author: Frank Matthew I.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/10/2007
Field of study

Coordinated Science Laboratory was formerly known as Control Systems Laborator

Illinois Digital Environment for Access to Learning and Scholarship Repository

Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

Author: Akkary H.
Cintra M.
Figueiredo R.
Garzarán M. J.
Gopal S.
Gupta M.
Hammond L.
Josep Torrellas
José María Llabería
Knight T.
Lawrence Rauchwerger
Marcuello P.
María Jesús Garzarán
Milos Prvulovic
Prvulovic M.
Rauchwerger L.
Rundberg P.
Sohi G. S.
Steffan J.
Tremblay M.
Víctor Viñals
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization

Author: Josep Torrellas
Lawrence Rauchwerger
María Jesús Garzarán
Milos Prvulovic
Publication venue
Publication date: 01/01/2001
Field of study

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculationinduced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations

CiteSeerX

Crossref

Removing architectural bottlenecks to the scalability of speculative parallelization

Author: Josep Torrellas
Krishnan V.
Lawrence Rauchwerger
María Jesús Garzarán
Milos Prvulovic
Nowatzyk A.
Rundberg P.
Tremblay M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Mitosis based speculative multithreaded architectures

Author: Madriles Gimeno Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 23/07/2012
Field of study

In the last decade, industry made a right-hand turn and shifted towards multi-core processor designs, also known as Chip-Multi-Processors (CMPs), in order to provide further performance improvements under a reasonable power budget, design complexity, and validation cost. Over the years, several processor vendors have come out with multi-core chips in their product lines and they have become mainstream, with the number of cores increasing in each processor generation. Multi-core processors improve the performance of applications by exploiting Thread Level Parallelism (TLP) while the Instruction Level Parallelism (ILP) exploited by each individual core is limited. These architectures are very efficient when multiple threads are available for execution. However, single-thread sections of code (single-thread applications and serial sections of parallel applications) pose important constraints on the benefits achieved by parallel execution, as pointed out by Amdahl’s law. Parallel programming, even with the help of recently proposed techniques like transactional memory, has proven to be a very challenging task. On the other hand, automatically partitioning applications into threads may be a straightforward task in regular applications, but becomes much harder for irregular programs, where compilers usually fail to discover sufficient TLP. In this scenario, two main directions have been followed in the research community to take benefit of multi-core platforms: Speculative Multithreading (SpMT) and Non-Speculative Clustered architectures. The former splits a sequential application into speculative threads, while the later partitions the instructions among the cores based on data-dependences but avoid large degree of speculation. Despite the large amount of research on both these approaches, the proposed techniques so far have shown marginal performance improvements. In this thesis we propose novel schemes to speed-up sequential or lightly threaded applications in multi-core processors that effectively address the main unresolved challenges of previous approaches. In particular, we propose a SpMT architecture, called Mitosis, that leverages a powerful software value prediction technique to manage inter-thread dependences, based on pre-computation slices (p-slices). Thanks to the accuracy and low cost of this technique, Mitosis is able to effectively parallelize applications even in the presence of frequent dependences among threads. We also propose a novel architecture, called Anaphase, that combines the best of SpMT schemes and clustered architectures. Anaphase effectively exploits ILP, TLP and Memory Level Parallelism (MLP), thanks to its unique finegrain thread decomposition algorithm that adapts to the available parallelism in the application.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Mitosis based speculative multithreaded architectures

Author: Madriles Gimeno Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura