7 research outputs found

    NETWORK TOPOLOGY RECONSTRUCTION FROM OPERATIONAL LOGS

    Get PDF
    Techniques are presented herein that support a new, low-overhead network topology discovery mechanism that is based on data from existing operational log sources such as, for example, syslog entries. Such an approach lowers the operational cost of any solution that requires knowledge of a network’s topology. Additionally, such an approach allows for the ongoing rediscovery of a network’s topology to track over time the documented or undocumented evolution of the same

    HOLISTIC TELEMETRY ANALYSIS USING DIMENSIONALITY REDUCTION ALGORITHMS

    Get PDF
    Dealing with multiple sources of telemetry data is becoming more and more important, especially given the popularity of the microservices architecture. However, monitoring in such a situation becomes challenging as it is not humanly possible to keep track of thousands of counters, particularly when issues span across multiple nodes (e.g., microservices). Techniques are presented herein that address the problem of monitoring and analyzing high dimensional telemetry (comprising, for example, thousands of counters). Aspects of the presented techniques support a system that applies a dimensionality reduction algorithm, that is based on standard deviation and z-score statistical values, transforming data with a large number of dimensions into a two-dimensional array (such as, for example, a scatterplot). Application of the presented techniques enhances issue detection by a human observer and the automated generation of alerts

    SYSTEM AND METHOD FOR PREDICTING HARDWARE FAILURES OF ELECTRONIC DEVICES USING ONBOARD SENSORS AND DEVICE LIFECYCLE DATA

    Get PDF
    Techniques herein provide a capability to predict failures of hardware by using onboard sensors and provide for the ability to move from detection to prediction for hardware failures. In turn, such techniques can help to reduce downtime due to marginal hardware and improves network availability. The techniques can also help to reduce unnecessary maintenance and changes related to replacing hardware that has not failed, which can lead to business efficiencies for both customers and vendors and may become even more important in a Network-As-A-Service (NAAS) context in which both sides are paid by the vendor

    SYSTEM FOR EXPERT-ASSISTED CAUSAL INFERENCE FOR RANKING EVENTS OF INTEREST IN NETWORKS

    Get PDF
    Networks have increased in size and complexity such that the number of events occurring each day has grown drastically. Techniques of this proposal provide for the ability to infer candidates for causal relationships—in some cases, with confidence. In particular, a novel machine learning (ML) based system is described that provides for the ability to narrow-down candidate temporal patterns that may potentially explain an event of interest (e.g., a network outage). The system is trainable with a human in the loop and is highly effective even with minimal amount of prior training

    MULTI-ABSTRACTIVE CONTEXT INTERPRETATIONS OF NETWORK EVENTS

    Get PDF
    Hybrid and augmented workflows involving predictions or insights produced by automation tools that are handed over to human operators are known to cause cognitive overload. Generally, cognitive overload occurs when an automated system tries to push too much information to a human operator. When such a push of information is sustained over time, cognitive overload leads to what is known as alert fatigue whereby insights of an automated system are not utilized, which can lead to poor adoption. One type of cognitive overload specific to cognitive systems includes situations in which predictions/insights are not necessarily numerous but rather too complex understand and interpret. The lack of ability to understand reasons behind predictions can be a barrier to a broader adoption of artificial intelligence (AI) operations. Presented herein is a novel technique to derive explanations for predictions using multiple contexts, which can help system users to rapidly estimate the importance of predictions from several angles, thereby leading to greater trust and system adoption, as well as improved reaction time

    DYNAMIC PRIORITIZATION FOR FULL STACK OBSERVABILITY

    Get PDF
    Alert fatigue is a well-known issue that impacts many enterprise information technology (IT) teams. Those teams are constantly looking for ways to reduce the mean time to identify (MTTI) and the mean time to resolve (MTTR) issues to minimize the impact to a business. When such a team is inundated with a very large number of alerts, they become desensitized to those alerts and metrics such as MTTI and MTTR increase. Such a desensitization has other negative repercussions that, together, impact a business and affect the adoption of a full-stack observability (FSO) approach. Techniques are presented herein that address these problems through a dynamic prioritization solution that allows for user inputs and past interactions, and which leverages large language models (LLMs)

    AUTONOMOUS COLLECTION DETECTION AND REMEDIATION DECISIONS BASED ON LOCAL MODELS AND LOCALLY SOURCED DATA

    Get PDF
    In context of distributed monitoring and anomaly detection, when a networking device performs anomaly detection based on local data, such as when a remote controller is not reachable during network convergence or other network issues. Anomaly relevance improves if telemetry data used for anomaly detection comes not only from a local device, but also from the device\u27s immediate surroundings (e.g., physical neighbors, protocol peers, redundancy units, etc.). Presented herein are techniques through which a device can reach its own and nearby telemetry sources in a manner that may follow an effective network topology and configuration. Thus, techniques herein may enable the design of intelligent autonomous agents that can operate beyond the scope of a host (and can integrate nearby information to make smarter assessments) but below the network scale and, hence, are capable of scaling well in order to sample data more quickly and merge data more accurately
    corecore