2,770 research outputs found

    A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q

    Get PDF
    The complexity and cost of managing high-performance computing infrastructures are on the rise. Automating management and repair through predictive models to minimize human interventions is an attempt to increase system availability and contain these costs. Building predictive models that are accurate enough to be useful in automatic management cannot be based on restricted log data from subsystems but requires a holistic approach to data analysis from disparate sources. Here we provide a detailed multi-scale characterization study based on four datasets reporting power consumption, temperature, workload, and hardware/software events for an IBM Blue Gene/Q installation. We show that the system runs a rich parallel workload, with low correlation among its components in terms of temperature and power, but higher correlation in terms of events. As expected, power and temperature correlate strongly, while events display negative correlations with load and power. Power and workload show moderate correlations, and only at the scale of components. The aim of the study is a systematic, integrated characterization of the computing infrastructure and discovery of correlation sources and levels to serve as basis for future predictive modeling efforts.Comment: 12 pages, 7 Figure

    A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q

    Get PDF
    The complexity and cost of managing high-performance computing infrastructures are on the rise. Automating management and repair through predictive models to minimize human interventions is an attempt to increase system availability and contain these costs. Building predictive models that are accurate enough to be useful in automatic management cannot be based on restricted log data from subsystems but requires a holistic approach to data analysis from disparate sources. Here we provide a detailed multi-scale characterization study based on four datasets reporting power consumption, temperature, workload, and hardware/software events for an IBM Blue Gene/Q installation.We show that the system runs a rich parallel workload, with low correlation among its components in terms of temperature and power, but higher correlation in terms of events. As expected, power and temperature correlate strongly, while events display negative correlations with load and power. Power and workload show moderate correlations, and only at the scale of components. The aim of the study is a systematic, integrated characterization of the computing infrastructure and discovery of correlation sources and levels to serve as basis for future predictive modeling efforts

    Advanced Simulation and Computing FY12-13 Implementation Plan, Volume 2, Revision 0.5

    Full text link

    High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

    Full text link
    Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

    Improving data center efficiency through smart grid integration and intelligent analytics

    Full text link
    The ever-increasing growth of the demand in IT computing, storage and large-scale cloud services leads to the proliferation of data centers that consist of (tens of) thousands of servers. As a result, data centers are now among the largest electricity consumers worldwide. Data center energy and resource efficiency has started to receive significant attention due to its economical, environmental, and performance impacts. In tandem, facing increasing challenges in stabilizing the power grids due to growing needs of intermittent renewable energy integration, power market operators have started to offer a number of demand response (DR) opportunities for energy consumers (such as data centers) to receive credits by modulating their power consumption dynamically following specific requirements. This dissertation claims that data centers have strong capabilities to emerge as major enablers of substantial electricity integration from renewables. The participation of data centers into emerging DR, such as regulation service reserves (RSRs), enables the growth of the data center in a sustainable, environmentally neutral, or even beneficial way, while also significantly reducing data center electricity costs. In this dissertation, we first model data center participation in DR, and then propose runtime policies to dynamically modulate data center power in response to independent system operator (ISO) requests, leveraging advanced server power and workload management techniques. We also propose energy and reserve bidding strategies to minimize the data center energy cost. Our results demonstrate that a typical data center can achieve up to 44% monetary savings in its electricity cost with RSR provision, dramatically surpassing savings achieved by traditional energy management strategies. In addition, we investigate the capabilities and benefits of various types of energy storage devices (ESDs) in DR. Finally, we demonstrate RSR provision in practice on a real server. In addition to its contributions on improving data center energy efficiency, this dissertation also proposes a novel method to address data center management efficiency. We propose an intelligent system analytics approach, "discovery by example", which leverages fingerprinting and machine learning methods to automatically discover software and system changes. Our approach eases runtime data center introspection and reduces the cost of system management.2018-11-04T00:00:00

    High Availability and Scalability of Mainframe Environments using System z and z/OS as example

    Get PDF
    Mainframe computers are the backbone of industrial and commercial computing, hosting the most relevant and critical data of businesses. One of the most important mainframe environments is IBM System z with the operating system z/OS. This book introduces mainframe technology of System z and z/OS with respect to high availability and scalability. It highlights their presence on different levels within the hardware and software stack to satisfy the needs for large IT organizations

    Field deployment process transformation in IBM PC services

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2007.Includes bibliographical references (p. 57).Field service, as an important focus area of service operations, has increasingly become a critical component of the overall service offering by high-tech enterprises. Enhancing productivity by optimizing field services could bring significant benefits to the organization. This thesis investigates the field deployment process in IBM PC services and attempts to identify potential areas of improvements by applying principles in capacity management, customer-oriented services, as well as IT technologies, such as database and the Internet. In addition, demand statistics are analyzed to provide important insights into the limitations of the existing largely manual planning and scheduling process. A transformation plan is developed, with due consideration to both the capacity and efficiency of the Customer Solution Center and the overall experience of the end users.by Siyu Fan.M.Eng

    On the feasibility of collaborative green data center ecosystems

    Get PDF
    The increasing awareness of the impact of the IT sector on the environment, together with economic factors, have fueled many research efforts to reduce the energy expenditure of data centers. Recent work proposes to achieve additional energy savings by exploiting, in concert with customers, service workloads and to reduce data centers’ carbon footprints by adopting demand-response mechanisms between data centers and their energy providers. In this paper, we debate about the incentives that customers and data centers can have to adopt such measures and propose a new service type and pricing scheme that is economically attractive and technically realizable. Simulation results based on real measurements confirm that our scheme can achieve additional energy savings while preserving service performance and the interests of data centers and customers.Peer ReviewedPostprint (author's final draft

    Fiscal year 1973 scientific and technical reports, articles, papers, and presentations

    Get PDF
    Formal NASA technical reports, papers published in technical journals, and presentations by MSFC personnel in FY73 are presented. Papers of MSFC contractors are also included
    • …
    corecore