2 research outputs found

    A Firmware-first, Early Warning System to Detect Performance Anomalies and Provide Actionable Insights in User-visible Interfaces

    Get PDF
    Application performance can be affected adversely due to many reasons. Some of these sources of performance degradation remain hidden from user visibility due to their very nature of occurrence, such as misconfigured hardware or firmware settings, or a faulty piece of hardware that is designed to keep the system functional, albeit on a reduced horsepower. Prevalent monitoring approaches rely on such performance degradations having persisted in the platform before they can be acted upon. In this paper, we propose a novel, “firmware-first”, rules-based approach for early detection of performance anomalies, both during the boot process and at runtime (where the anomalies may manifest due to autonomous recovery actions taken either in hardware or firmware), and propagating such anomalies to standard user-visible interfaces, to help customers make informed decisions before deploying their services

    A method and apparatus for faster, reliable and consolidated logging of information during system failure

    Get PDF
    For servers today, that run mission critical workloads, downtime is not an option and any outage of these servers usually translates to reduced revenue, reduced profitability and potential customer loss. Any interruption in the operation or availability of these workloads will have a ripple effect throughout the organization. Gathering valid and necessary data about the event of failure from all possible sources plays a significant role in determining how quickly and accurately the root cause for the server down-time is identified. The data required for such analysis is spread across Firmware and Operating System (OS) and comes from different sources on the server. This information comprises of data collected and logged by the firmware such as the error log buffers, event logs and also the state of the system at the time of failure, collected by the operating systems in the core dump files. Most often the challenge faced is with collection of the set of interdependent information originating and stored at different locations on the system. The proposed solution enables a high availability design by eliminating single point of failure during the log collection and retrieval process. This disclosure proposes a method and apparatus for faster, reliable and consolidated logging of necessary data from different sources on occurrence of a system failure
    corecore