12 research outputs found

    A Hardware and Software Integrated Approach for Adaptive Thread Management in Multicore Multithreaded Microprocessors

    Get PDF
    The Multicore Multithreaded Microprocessor maximizes parallelism on a chip for the optimal system performance, such that its popularity is growing rapidly in high-performance computing. It increases the complexity in resource distribution on a chip by leading it to two directions: isolation and unification. On one hand, multiple cores are implemented to deliver the computation and memory accessing resources to more than one thread at the same time. Nevertheless, it limits the threads’ access to resources in different cores, even if extensively demanded. On the other hand, simultaneous multithreaded architectures unify the domestic execu- tion resources together for concurrently running threads. In such an environment, threads are greatly affected by the inter-thread interference. Moreover, the impacts of the complicated distribution are enlarged by variation in workload behaviors. As a result, the microprocessor requires an adaptive management scheme to schedule threads throughout different cores and coordinate them within cores. In this study, an adaptive thread management scheme was proposed, integrating both hardware and software approaches. The instruction fetch policy at the hardware level took the responsibility by prioritizing domestic threads, while the Operating System scheduler at the software level was used to pair threads dynami- vi cally to multiple cores. The tie between them was the proposed online linear model, which was dynamically constructed for every thread based on data misses by the regression algorithm. Consequently, the hardware part of the proposed scheme proactively granted higher priority to the threads with less predicted long-latency loads, expecting they would better utilize the shared execution resources. Mean- while, the software part was invoked by such a model upon significant changes in the execution phases and paired threads with different demands to the same core to minimize competition on the chip. The proposed scheme was compared to its peer designs and overall 43% speedup was achieved by the integrated approach over the combination of two baseline policies in hardware and software, respectively. The overhead was examined carefully regarding power, area, storage and latency, as well as the relationship between the overhead and the performance

    L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors

    Full text link
    © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Improving the utilization of shared resources is a key issue to increase performance in SMT processors. Recent work has focused on resource sharing policies to enhance the processor performance, but their proposals mainly concentrate on novel hardware mechanisms that adapt to the dynamic resource requirements of the running threads. This work addresses the L1 cache bandwidth problem in SMT processors experimentally on real hardware. Unlike previous work, this paper concentrates on thread allocation, by selecting the proper pair of co-runners to be launched to the same core. The relation between L1 bandwidth requirements of each benchmark and its performance (IPC) is analyzed. We found that for individual benchmarks, performance is strongly connected to L1 bandwidth consumption, and this observation remains valid when several co-runners are launched to the same SMT core. Based on these findings we propose two L1 bandwidth aware thread to core (t2c) allocation policies, namely Static and Dynamic t2c allocation, respectively. The aim of these policies is to properly balance L1 bandwidth requirements of the running threads among the processor cores. Experiments on a Xeon E5645 processor show that the proposed policies significantly improve the performance of the Linux OS kernel regardless the number of cores considered.This work was supported by the Spanish Ministerio de Econom´ıa y Competitividad (MINECO) and by FEDER funds under Grant TIN2012-38341-C04-01; and by Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-12) of the ´ Universitat Politecnica de Val ` encia under Grant SP20120748Feliu Pérez, J.; Sahuquillo Borrás, J.; Petit Martí, SV.; Duato Marín, JF. (2013). L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors. IEEE. https://doi.org/10.1109/PACT.2013.6618810
    corecore