3 research outputs found

    A Taxonomy of Performance Assurance Methodologies and its Application in High Performance Computer Architectures

    Full text link
    This paper presents a systematic approach to the complex problem of high confidence performance assurance of high performance architectures based on methods used over several generations of industrial microprocessors. A taxonomy is presented for performance assurance through three key stages of a product life cycle-high level performance, RTL performance, and silicon performance. The proposed taxonomy includes two components-independent performance assurance space for each stage and a correlation performance assurance space between stages. It provides a detailed insight into the performance assurance space in terms of coverage provided taking into account capabilities and limitations of tools and methodologies used at each stage. An application of the taxonomy to cases described in the literature and to high performance Intel architectures is shown. The proposed work should be of interest to manufacturers of high performance microprocessor/chipset architectures and has not been discussed in the literature.Comment: 17 pages, 5 figures, 3 table

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

    Massachusetts Domestic and Foreign Corporations Subject to an Excise: For the Use of Assessors (2004)

    Get PDF
    International audienc
    corecore