8 research outputs found

    An Empirical Study on Data Center System Failure Diagnosis

    Full text link
    Data center downtime causes business losses over a million dollars per hour. 24x7-hour data availability is critical to numerous systems, e.g. public utilities, hospitals, and data centers. Service interruption signifies lives or deaths, higher costs, and poor service quality. This research conducted the system diagnosis of reliability assessment for Tier IV data centers (DC), employing the Failure Modes, Effects, and Criticality Analysis (FMECA) and the Reliability Block Diagram (RBD). The techniques of series-parallel, active standby, k-out-of-n, bridge, full redundancy, fault-tolerant, and multiple utilities were applied in the system failure diagnosis to provide high system availability. Component reliability data were obtained from the IEEE Std. 493 Gold Books. Simulation results from data center system failure diagnosis reveal the functional steps of data center downtime and pinpoint solutions to terminate or mitigate the data center downtime. Proposed improvements on the component’s inherent characteristics (CIC) and the system connectivity topology (SCT) help reduce the failure rate by 1.1706 hours in 1,000,000 hours of operation

    Condition-Based Maintenance for Data Center Operations Management

    Get PDF
    This chapter presents data center operations management by giving four case studies of power distribution systems (PDS) of data centers (Tier I, Tier II, Tier III, and Tier IV). The four topologies of PDS have defined by the design of single points of failure and redundant equipment and systems. The concepts of Mean Time between Failures (MTBF) and Mean Time to Repair (MTTR) apply during PDS design for reduced system downtime. Moreover, MTBF and MTTR use for estimating system availability of each Tier classification. Human factors consider as critical part of data center operations that need to quantify and qualify on knowledge and skills such as certified levels. For sustainable data center operations, the new software for data center operations called Data Center Infrastructure Management (DCIM) has deployed for monitoring and controlling entire system operations that interact among system of systems and human interfaces by deployed condition-based maintenance (CBM) as preventive and predictive conditions. Moreover, CBM performs as long-term cost saving for total cost of ownership (TCO) and energy efficiency
    corecore