Search CORE

1,464 research outputs found

Thread-spawning schemes for speculative multithreading

Author: González Colás Antonio María
Marcuello Pascual Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Thread partitioning and value prediction for exploiting speculative thread-level parallelism

Author: González Colás Antonio María
Marcuello Pedro
Tubella Murgadas Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Speculative thread-level parallelism has been recently proposed as a source of parallelism to improve the performance in applications where parallel threads are hard to find. However, the efficiency of this execution model strongly depends on the performance of the control and data speculation techniques. Several hardware-based schemes for partitioning the program into speculative threads are analyzed and evaluated. In general, we find that spawning threads associated to loop iterations is the most effective technique. We also show that value prediction is critical for the performance of all of the spawning policies. Thus, a new value predictor, the increment predictor, is proposed. This predictor is specially oriented for this kind of architecture and clearly outperforms the adapted versions of conventional value predictors such as the last value, the stride, and the context-based, especially for small-sized history tables.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A cost-effective clustered architecture

Author: Canal Corretger Ramon
González Colás Antonio María
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute simple integer instructions. With minor hardware modifications to a conventional superscalar processor, the issue width can potentially be doubled without increasing the hardware complexity. In fact, the result is a clustered architecture with two heterogeneous clusters. We propose to extend this architecture with a dynamic steering logic that sends the instructions to either cluster. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses run-time information to optimise the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int+4 fp) machine and that it outperforms the previously proposed one.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Quantifying the benefits of SPECint distant parallelism in simultaneous multithreading architectures

Author: Ayguadé Parra Eduard
Krishnan Venkata
Martel Pérez Iván
Ortega Fernández Daniel
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

We exploit the existence of distant parallelism that future compilers could detect and characterise its performance under simultaneous multithreading architectures. By distant parallelism we mean parallelism that cannot be captured by the processor instruction window and that can produce threads suitable for parallel execution in a multithreaded processor. We show that distant parallelism can make feasible wider issue processors by providing more instructions from the distant threads, thus better exploiting the resources from the processor in the case of speeding up single integer applications. We also investigate the necessity of out-of-order processors in the presence of multiple threads of the same program. It is important to notice at this point that the benefits described are totally orthogonal to any other architectural techniques targeting a single thread.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Object oriented execution model (OOM)

Author: Cristal Kestelman Adrián
González Blanco Ruben
Markovic Nikola
Nemirovsky Daniel
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: INRIA
Publication date: 01/01/2011
Field of study

This paper considers implementing the Object Oriented Programming Model directly in the hardware to serve as a base to exploit object-level parallelism, speculation and heterogeneous computing. Towards this goal, we present a new execution model called Object Oriented execution Model - OOM - that implements the OO Programming Models. All OOM hardware structures are objects and the OOM Instruction Set directly utilizes objects while hiding other complex hardware structures. OOM maintains all high-level programming language information until execution time. This enables efficient extraction of available parallelism in OO serial code at execution time with minimal compiler support. Our results show that OOM utilizes the available parallelism better than the OoO (Out-of-Order) modelPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A Survey on Thread-Level Speculation Techniques

Author: Estébanez López Álvaro
González Escribano Arturo
Llanos Ferraris Diego Rafael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

Repositorio Documental de la Universidad de Valladolid

Data speculative multithreaded architecture

Author: González Colás Antonio María
Marcuello Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

We present a novel processor microarchitecture that relieves three of the most important bottlenecks of superscalar processors: the serialization imposed by true dependences, the relatively small window size and the instruction fetch bandwidth. The new architecture executes simultaneously multiple threads of control obtained from a single program by means of control speculation techniques that do not require any compiler/user support nor any special feature in the instruction set architecture. The multiple simultaneous threads execute different iterations of the same loop, which require the same fetch bandwidth as a single thread since they share the same code. Inter-thread dependences as well as the values that flow through them are speculated by means of data prediction techniques. The preliminary evaluation results show a significant speed-up when compared with a superscalar processor. In fact, the new processor architecture can achieve an IPC (instructions per cycle) rate even larger than the peak fetch bandwidthPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Physics Avoidance & Cooperative Semantics: Inferentialism and Mark Wilson’s Engagement with Naturalism Qua Applied Mathematics

Author: Erkan Ekin
Publication venue
Publication date: 01/01/2020
Field of study

Mark Wilson argues that the standard categorizations of "Theory T thinking"— logic-centered conceptions of scientific organization (canonized via logical empiricists in the mid-twentieth century)—dampens the understanding and appreciation of those strategic subtleties working within science. By "Theory T thinking," we mean to describe the simplistic methodology in which mathematical science allegedly supplies ‘processes’ that parallel nature's own in a tidily isomorphic fashion, wherein "Theory T’s" feigned rigor and methodological dogmas advance inadequate discrimination that fails to distinguish between explanatory structures that are architecturally distinct. One of Wilson's main goals is to reverse such premature exclusions and, thus, early on Wilson returns to John Locke's original physical concerns regarding material science and the congeries of descriptive concern insofar as capturing varied phenomena (i.e., cohesion, elasticity, fracture, and the transmission of coherent work) encountered amongst ordinary solids like wood and steel are concerned. Of course, Wilson methodologically updates such a purview by appealing to multiscalar techniques of modern computing, drawing from Robert Batterman's work on the greediness of scales and Jim Woodward's insights on causation

PhilPapers

Cosmos and History (C&H): The Journal of Natural and Social Philosophy

Control speculation in multithreaded processors through dynamic loop detection

Author: González Colás Antonio María
Tubella Murgadas Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

This paper presents a mechanism to dynamically detect the loops that are executed in a program. This technique detects the beginning and the termination of the iterations and executions of the loops without compiler/user intervention. We propose to apply this dynamic loop detection to the speculation of multiple threads of control dynamically obtained from a sequential program. Based an the highly predictable behavior of the loops, the history of the past executed loops is used to speculate the future instruction sequence. The overall objective is to dynamically obtain coarse grain parallelism (at the thread level) that can be exploited by a multithreaded architecture. We show that for a 4-context multithreaded processor the speculation mechanism provides around 2.6 concurrent threads in average.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

The Administrative-Territorial Boundaries Available for a Multiscalar Analysis of EU Port Cities

Author: Dolana Simona
Publication venue: UNIVERSITY of Universities [UOU]
Publication date: 01/06/2023
Field of study

Using various administrative-territorial boundaries, inhabitants, and businesses as nodes, port cities in the European Union (EU) could be analysed on multiple scales using a network method that takes into account the most important criteria (transport, population, and economy) for measuring the urban and port functions. The possible beneficiaries of such a methodology are policymakers who can aid municipalities in resolving problems in port cities. Thus, this study aimed to pinpoint existing port city policy domains that can be impacted by such a methodology and make corresponding recommendations. The policy domains identified in most port cities (port, port-city, and transportation policies) were matched with the three criteria established by the methodology. Study findings indicate that the proposed network methodological approach can impact upon the internal and external configuration urban and spatial policies. Also, it can impact their related policy instruments because they should be selected in light of the port city’s current state.Folosind diversele limite administrativteritoriale, locuitorii, și firmele ca noduri, orașele portuare din UE ar putea fi analizate multiscalar printr-o metodologie ce utilizează rețeaua ca instrument de analiză și cele mai importante criterii (transportul, populația, și economia) pentru măsurarea funcțiilor urbane și portuare. Beneficiarii unei astfel de metodologii ar putea fi elaboratorii de politici care pot ajuta municipalitățile să rezolve problemele cu care se confruntă orașele portuare. Astfel că, acest studiu a avut ca scop identificarea politicilor orașelor portuare ce pot fi afectate de o astfel de abordare metodologică și formularea unor recomandări corespunzătoare. Politicile identificate în majoritatea orașelor portuare (politicile portuare, orașului-port, și de transport) au fost consultate și corelate cu cele trei criterii stabilite de metodologie. Concluziile acestui studiu indică faptul că abordarea metodologică propusă poate să aibă un impact asupra configurației interne și externe a politicilor urbane și spațiale. În plus, poate să aibă un impact asupra instrumentelor aferente politicilor, deoarece acestea ar trebui stabilite pe baza stării actuale a orașului portuar analizat

Repositorio Institucional de la Universidad de Alicante