1,464 research outputs found

    Thread-spawning schemes for speculative multithreading

    Get PDF
    Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version

    Thread partitioning and value prediction for exploiting speculative thread-level parallelism

    Get PDF
    Speculative thread-level parallelism has been recently proposed as a source of parallelism to improve the performance in applications where parallel threads are hard to find. However, the efficiency of this execution model strongly depends on the performance of the control and data speculation techniques. Several hardware-based schemes for partitioning the program into speculative threads are analyzed and evaluated. In general, we find that spawning threads associated to loop iterations is the most effective technique. We also show that value prediction is critical for the performance of all of the spawning policies. Thus, a new value predictor, the increment predictor, is proposed. This predictor is specially oriented for this kind of architecture and clearly outperforms the adapted versions of conventional value predictors such as the last value, the stride, and the context-based, especially for small-sized history tables.Peer ReviewedPostprint (published version

    A cost-effective clustered architecture

    Get PDF
    In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute simple integer instructions. With minor hardware modifications to a conventional superscalar processor, the issue width can potentially be doubled without increasing the hardware complexity. In fact, the result is a clustered architecture with two heterogeneous clusters. We propose to extend this architecture with a dynamic steering logic that sends the instructions to either cluster. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses run-time information to optimise the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int+4 fp) machine and that it outperforms the previously proposed one.Peer ReviewedPostprint (published version

    Quantifying the benefits of SPECint distant parallelism in simultaneous multithreading architectures

    Get PDF
    We exploit the existence of distant parallelism that future compilers could detect and characterise its performance under simultaneous multithreading architectures. By distant parallelism we mean parallelism that cannot be captured by the processor instruction window and that can produce threads suitable for parallel execution in a multithreaded processor. We show that distant parallelism can make feasible wider issue processors by providing more instructions from the distant threads, thus better exploiting the resources from the processor in the case of speeding up single integer applications. We also investigate the necessity of out-of-order processors in the presence of multiple threads of the same program. It is important to notice at this point that the benefits described are totally orthogonal to any other architectural techniques targeting a single thread.Peer ReviewedPostprint (published version

    Object oriented execution model (OOM)

    Get PDF
    This paper considers implementing the Object Oriented Programming Model directly in the hardware to serve as a base to exploit object-level parallelism, speculation and heterogeneous computing. Towards this goal, we present a new execution model called Object Oriented execution Model - OOM - that implements the OO Programming Models. All OOM hardware structures are objects and the OOM Instruction Set directly utilizes objects while hiding other complex hardware structures. OOM maintains all high-level programming language information until execution time. This enables efficient extraction of available parallelism in OO serial code at execution time with minimal compiler support. Our results show that OOM utilizes the available parallelism better than the OoO (Out-of-Order) modelPeer ReviewedPostprint (published version

    A Survey on Thread-Level Speculation Techniques

    Get PDF
    Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

    Data speculative multithreaded architecture

    Get PDF
    We present a novel processor microarchitecture that relieves three of the most important bottlenecks of superscalar processors: the serialization imposed by true dependences, the relatively small window size and the instruction fetch bandwidth. The new architecture executes simultaneously multiple threads of control obtained from a single program by means of control speculation techniques that do not require any compiler/user support nor any special feature in the instruction set architecture. The multiple simultaneous threads execute different iterations of the same loop, which require the same fetch bandwidth as a single thread since they share the same code. Inter-thread dependences as well as the values that flow through them are speculated by means of data prediction techniques. The preliminary evaluation results show a significant speed-up when compared with a superscalar processor. In fact, the new processor architecture can achieve an IPC (instructions per cycle) rate even larger than the peak fetch bandwidthPeer ReviewedPostprint (published version

    Physics Avoidance & Cooperative Semantics: Inferentialism and Mark Wilson’s Engagement with Naturalism Qua Applied Mathematics

    Get PDF
    Mark Wilson argues that the standard categorizations of "Theory T thinking"— logic-centered conceptions of scientific organization (canonized via logical empiricists in the mid-twentieth century)—dampens the understanding and appreciation of those strategic subtleties working within science. By "Theory T thinking," we mean to describe the simplistic methodology in which mathematical science allegedly supplies ‘processes’ that parallel nature's own in a tidily isomorphic fashion, wherein "Theory T’s" feigned rigor and methodological dogmas advance inadequate discrimination that fails to distinguish between explanatory structures that are architecturally distinct. One of Wilson's main goals is to reverse such premature exclusions and, thus, early on Wilson returns to John Locke's original physical concerns regarding material science and the congeries of descriptive concern insofar as capturing varied phenomena (i.e., cohesion, elasticity, fracture, and the transmission of coherent work) encountered amongst ordinary solids like wood and steel are concerned. Of course, Wilson methodologically updates such a purview by appealing to multiscalar techniques of modern computing, drawing from Robert Batterman's work on the greediness of scales and Jim Woodward's insights on causation

    Control speculation in multithreaded processors through dynamic loop detection

    Get PDF
    This paper presents a mechanism to dynamically detect the loops that are executed in a program. This technique detects the beginning and the termination of the iterations and executions of the loops without compiler/user intervention. We propose to apply this dynamic loop detection to the speculation of multiple threads of control dynamically obtained from a sequential program. Based an the highly predictable behavior of the loops, the history of the past executed loops is used to speculate the future instruction sequence. The overall objective is to dynamically obtain coarse grain parallelism (at the thread level) that can be exploited by a multithreaded architecture. We show that for a 4-context multithreaded processor the speculation mechanism provides around 2.6 concurrent threads in average.Peer ReviewedPostprint (published version

    The Administrative-Territorial Boundaries Available for a Multiscalar Analysis of EU Port Cities

    Get PDF
    Using various administrative-territorial boundaries, inhabitants, and businesses as nodes, port cities in the European Union (EU) could be analysed on multiple scales using a network method that takes into account the most important criteria (transport, population, and economy) for measuring the urban and port functions. The possible beneficiaries of such a methodology are policymakers who can aid municipalities in resolving problems in port cities. Thus, this study aimed to pinpoint existing port city policy domains that can be impacted by such a methodology and make corresponding recommendations. The policy domains identified in most port cities (port, port-city, and transportation policies) were matched with the three criteria established by the methodology. Study findings indicate that the proposed network methodological approach can impact upon the internal and external configuration urban and spatial policies. Also, it can impact their related policy instruments because they should be selected in light of the port city’s current state.Folosind diversele limite administrativteritoriale, locuitorii, și firmele ca noduri, orașele portuare din UE ar putea fi analizate multiscalar printr-o metodologie ce utilizează rețeaua ca instrument de analiză și cele mai importante criterii (transportul, populația, și economia) pentru măsurarea funcțiilor urbane și portuare. Beneficiarii unei astfel de metodologii ar putea fi elaboratorii de politici care pot ajuta municipalitățile să rezolve problemele cu care se confruntă orașele portuare. Astfel că, acest studiu a avut ca scop identificarea politicilor orașelor portuare ce pot fi afectate de o astfel de abordare metodologică și formularea unor recomandări corespunzătoare. Politicile identificate în majoritatea orașelor portuare (politicile portuare, orașului-port, și de transport) au fost consultate și corelate cu cele trei criterii stabilite de metodologie. Concluziile acestui studiu indică faptul că abordarea metodologică propusă poate să aibă un impact asupra configurației interne și externe a politicilor urbane și spațiale. În plus, poate să aibă un impact asupra instrumentelor aferente politicilor, deoarece acestea ar trebui stabilite pe baza stării actuale a orașului portuar analizat
    corecore