2 research outputs found

    Automation and orchestration of hardware and firmware data mining using a smart data analytics platform

    Get PDF
    Effective data mining is going to be important for differentiating and succeeding in the digital economy especially with increased commoditization and reduced barrier to entry for infrastructure devices like servers, storage and networking systems. There is lot of telemetry data from manufacturing facilities and customers that can be used to drive improved supportability experience, unmatched product quality and reliability of infrastructure devices like servers and storage devices. Currently data mining of hardware, firmware and platform logs is a challenging task as the domain knowledge is complex with expertise for large multinational organization distributed across the world. With increasing complexity and data mining continuing to be a very time consuming task that requires math/statistics skills, diverse programming & machine learning skills and cross domain knowledge, it is important to look at next generation analytics solution tailored to infrastructure vendors to improve supportability, quality, reliability, performance and security. In this publication we propose a smart, automated and generic data analytics platform that enables a 24/7 data mining solution using an built in platform domain modeler, an expert system for analyzing hardware and firmware logs and a policy manager that allows user defined hypothesis to be verified round the clock based on policies and configurable triggers. This smart data analytics platform will help democratize data mining of hardware and firmware logs and help improve troubleshooting complex issues, improve supportability experience, reliability and quality and reduce warranty costs

    A method and apparatus for faster, reliable and consolidated logging of information during system failure

    Get PDF
    For servers today, that run mission critical workloads, downtime is not an option and any outage of these servers usually translates to reduced revenue, reduced profitability and potential customer loss. Any interruption in the operation or availability of these workloads will have a ripple effect throughout the organization. Gathering valid and necessary data about the event of failure from all possible sources plays a significant role in determining how quickly and accurately the root cause for the server down-time is identified. The data required for such analysis is spread across Firmware and Operating System (OS) and comes from different sources on the server. This information comprises of data collected and logged by the firmware such as the error log buffers, event logs and also the state of the system at the time of failure, collected by the operating systems in the core dump files. Most often the challenge faced is with collection of the set of interdependent information originating and stored at different locations on the system. The proposed solution enables a high availability design by eliminating single point of failure during the log collection and retrieval process. This disclosure proposes a method and apparatus for faster, reliable and consolidated logging of necessary data from different sources on occurrence of a system failure
    corecore