4 research outputs found

    Foundations for Automatic, Adaptable Compilation

    Get PDF
    Computational science demands extreme performance because the running time of an application often determines the size of the experiment that a scientist can reasonably compute. Unfortunately, traditional compiler technology is ill-equipped to harness the full potential of today's computing platforms, forcing scientists to spend time manually tuning their application's performance. Although improving compiler technology should alleviate this problem, two challenges obstruct this goal: hardware platforms are rapidly changing and application software is difficult to statically model and predict. To address these problems, this thesis presents two techniques that aim to improve a compiler's adaptability: automatic resource characterization and selective, dynamic optimization. Resource characterization empirically measures a system's performance-critical characteristics, which can be provided to a parameterized compiler that specializes programs accordingly. Measuring these characteristics is important, because a system's physical characteristics do not always match its observed characteristics. Consequently, resource characterization provides an empirical performance model of a system's actual behavior, which is better suited for guiding compiler optimizations than a purely theoretical model. This thesis presents techniques for determining a system's data cache and TLB capacity, line size, and associativity, as well as instruction-cache capacity. Even with a perfect architectural-model, compilers will still often generate suboptimal code because of the difficulty in statically analyzing and predicting a program's behavior. This thesis presents two techniques that enable selective, dynamic-optimization for cases in which static compilation fails to deliver adequate performance. First, intermediate-representation (IR) annotation generates a fully-optimized native binary tagged with a higher-level compiler representation of itself. The native binary benefits from static optimization and code generation, but the IR annotation allows targeted and aggressive dynamic-optimization. Second, adaptive code-selection allows a program to empirically tune its performance throughout execution by automatically identifying and favoring the best performing variant of a routine. This technique can be used for dynamically choosing between different static-compilation strategies; or, it can be used with IR annotation for performing dynamic, feedback-directed optimization

    Potential for hardware-based techniques for reuse distance analysis

    Get PDF
    Reuse distance analysis, the prediction of how many distinct memory addresses will be accessed between two accesses to a given address, has been established as a useful technique in profile-based compiler optimization, but the cost of collecting the memory reuse profile has been prohibitive for some applications. In this report, we propose using the hardware monitoring facilities available in existing CPUs to gather an approximate reuse distance profile. The difficulties associated with this monitoring technique are discussed, most importantly that there is no obvious link between the reuse profile produced by hardware monitoring and the actual reuse behavior. Potential applications which would be made viable by a reliable hardware-based reuse distance analysis are identified

    Cheetah: Detecting False Sharing Efficiently and Effectively

    Get PDF
    False sharing is a notorious performance problem that may occur in multithreaded programs when they are running on ubiquitous multicore hardware. It can dramatically degrade the performance by up to an order of magnitude, significantly hurting the scalability. Identifying false sharing in complex programs is challenging. Existing tools either incur significant performance overhead or do not provide adequate information to guide code optimization. To address these problems, we develop Cheetah, a profiler that detects false sharing both efficiently and effectively. Cheetah leverages the lightweight hardware performance monitoring units (PMUs) that are available in most modern CPU architectures to sample memory accesses. Cheetah develops the first approach to quantify the optimization potential of false sharing instances without actual fixes, based on the latency information collected by PMUs. Cheetah precisely reports false sharing and provides insightful optimization guidance for programmers, while adding less than 7% runtime overhead on average. Cheetah is ready for real deployment

    Dynamic Reconfigurable Computer With A Dynamic Genetic Algorithm

    Get PDF
    The performance of the human brain is incredible and motivated us to create a system which behaves similarly. The flexibility of performance is the main function we emulate. We will create a system model of heterogeneous dynamic multi-core reconfigurable computers with a genetic algorithm. We introduce several concepts to find the next configuration with the genetic algorithm. Before we create our model, we introduce concepts of heterogeneous static multi-core reconfigurable computers to verify the optimization capability of the genetic algorithm. From the simulation results, we conclude that the genetic algorithm can optimize the configuration candidate of static systems. Then, the dynamic reconfigurable systems are generated, simulated, and observed. We conclude that there might exist potential performance improvement of our systems compared to homogeneous multi-core systems. Our evaluation model is fully parameterized, and will be available to the research community. We suggest future applications and improvements for static and dynamic systems.School of Electrical & Computer Engineerin
    corecore