4 research outputs found

    Enhanced applicability of loop transformations

    Get PDF

    Synchronization-Point Driven Resource Management in Chip Multiprocessors.

    Get PDF
    With the proliferation of Chip Multiprocessors (CMPs), shared memory multi-threaded programs are expanding fast in every application domain. These programs exhibit execution characteristics that go beyond those observed in single-threaded programs, mainly due to data sharing and synchronization. To ensure that next generation CMPs will perform well on such anticipated workloads, it is vital to understand how these programs and architectures interact, and exploit the unique opportunities presented. This thesis examines the time-varying execution characteristics of the shared memory workloads in conjunction to the synchronization points that exist in the programs. The main hypothesis is that the type, the position, and the repetitive execution of synchronization constructs can be exploited to unfold important execution phases and enable new optimization opportunities. The research provides a simple application-driven approach for predicting the program behavior and effectively driving dynamic performance optimization and resource management actions in future CMPs. In the first part of this thesis, I show how synchronization points relate to various program-wide periodic behaviors. Based on the observations, I develop a framework where user-level synchronization primitives are exposed to the hardware and monitored to detect program phases and guide dynamic adaptation. Through workload-driven evaluation, I demonstrate the effectiveness of the framework in improving the performance/power in on-chip interconnects. The second part of the thesis explores in depth the inter-thread communication behaviors. I show that although synchronization points under the shared memory model do not expose any communication details, they indicate well the points where coherence communication patterns change or repeat. By leveraging this property, I design a synchronization-point-based coherence predictor that uncovers communication patterns with high accuracy, while consuming significantly less hardware resources compared to existing predictors. In the last part, I investigate the underlying reasons causing threads to wait in synchronization points, wasting resources. I show that these reasons can vary even across different programs phases, and existing critical-path predictors can render ineffective under certain conditions. I then present a new scheme that improves predictability by incorporating history information from previous points. The new design is robust and can amortize the run-time imbalances to improve the system's performance and/or energy

    A detailed study on phase predictors

    No full text
    Most programs are repetitive, meaning that some parts of a program are executed more than once. As a result, a number of phases can be extracted in which each phase exhibits similar behavior. These phases can then be exploited for various purposes such as hardware adaptation for energy efficiency. Temporal phase classification schemes divide the execution of a program into consecutive (fixed-length) intervals. Intervals showing similar behavior are grouped into a phase. When a temporal scheme is used in an on-line system, phase predictors are necessary to predict when the next phase transition will occur and what the next phase will be. In this paper, we analyze and compare a number of existing state-of-the-art phase predictors using the SPEC CPU2000 benchmarks. The design space we explore is huge. We conclude that the 2-level burst predictor with confidence and conditional update is today's most accurate phase predictor within reasonable hardware budgets
    corecore