2,916 research outputs found

    An initial approach to distributed adaptive fault-handling in networked systems

    Get PDF
    We present a distributed adaptive fault-handling algorithm applied in networked systems. The probabilistic approach that we use makes the proposed method capable of adaptively detect and localize network faults by the use of simple end-to-end test transactions. Our method operates in a fully distributed manner, such that each network element detects faults using locally extracted information as input. This allows for a fast autonomous adaption to local network conditions in real-time, with significantly reduced need for manual configuration of algorithm parameters. Initial results from a small synthetically generated network indicate that satisfactory algorithm performance can be achieved, with respect to the number of detected and localized faults, detection time and false alarm rate

    Towards Distributed and Adaptive Detection and Localisation of Network Faults

    Get PDF
    We present a statistical probing-approach to distributed fault-detection in networked systems, based on autonomous configuration of algorithm parameters. Statistical modelling is used for detection and localisation of network faults. A detected fault is isolated to a node or link by collaborative fault-localisation. From local measurements obtained through probing between nodes, probe response delay and packet drop are modelled via parameter estimation for each link. Estimated model parameters are used for autonomous configuration of algorithm parameters, related to probe intervals and detection mechanisms. Expected fault-detection performance is formulated as a cost instead of specific parameter values, significantly reducing configuration efforts in a distributed system. The benefit offered by using our algorithm is fault-detection with increased certainty based on local measurements, compared to other methods not taking observed network conditions into account. We investigate the algorithm performance for varying user parameters and failure conditions. The simulation results indicate that more than 95 % of the generated faults can be detected with few false alarms. At least 80 % of the link faults and 65 % of the node faults are correctly localised. The performance can be improved by parameter adjustments and by using alternative paths for communication of algorithm control messages

    Consistent SDNs through Network State Fuzzing

    No full text
    The conventional wisdom is that a software-defined network (SDN) operates under the premise that the logically centralized control plane has an accurate representation of the actual data plane state. Nevertheless, bugs, misconfigurations, faults or attacks can introduce inconsistencies that undermine correct operation. Previous work in this area, however, lacks a holistic methodology to tackle this problem and thus, addresses only certain parts of the problem. Yet, the consistency of the overall system is only as good as its least consistent part. Motivated by an analogy of network consistency checking with program testing, we propose to add active probe-based network state fuzzing to our consistency check repertoire. Hereby, our system, PAZZ, combines production traffic with active probes to continuously test if the actual forwarding path and decision elements (on the data plane) correspond to the expected ones (on the control plane). Our insight is that active traffic covers the inconsistency cases beyond the ones identified by passive traffic. PAZZ prototype was built and evaluated on topologies of varying scale and complexity. Our results show that PAZZ requires minimal network resources to detect persistent data plane faults through fuzzing and localize them quickly

    Consistent SDNs through Network State Fuzzing

    Full text link
    The conventional wisdom is that a software-defined network (SDN) operates under the premise that the logically centralized control plane has an accurate representation of the actual data plane state. Unfortunately, bugs, misconfigurations, faults or attacks can introduce inconsistencies that undermine correct operation. Previous work in this area, however, lacks a holistic methodology to tackle this problem and thus, addresses only certain parts of the problem. Yet, the consistency of the overall system is only as good as its least consistent part. Motivated by an analogy of network consistency checking with program testing, we propose to add active probe-based network state fuzzing to our consistency check repertoire. Hereby, our system, PAZZ, combines production traffic with active probes to periodically test if the actual forwarding path and decision elements (on the data plane) correspond to the expected ones (on the control plane). Our insight is that active traffic covers the inconsistency cases beyond the ones identified by passive traffic. PAZZ prototype was built and evaluated on topologies of varying scale and complexity. Our results show that PAZZ requires minimal network resources to detect persistent data plane faults through fuzzing and localize them quickly while outperforming baseline approaches.Comment: Added three extra relevant references, the arXiv later was accepted in IEEE Transactions of Network and Service Management (TNSM), 2019 with the title "Towards Consistent SDNs: A Case for Network State Fuzzing

    EFFICIENT PROBE STATION PLACEMENT AND PROBE SET SELECTION FOR FAULT LOCALIZATION

    Get PDF
    Network fault management has been a focus of research activity with more emphasis on fault localization – zero down exact source of a failure from set of observed failures. Fault diagnosis is a central aspect of network fault management. Since faults are unavoidable in communication systems, their quick detection and isolation is essential for the robustness, reliability, and accessibility of a system. Probing technique for fault localization involves placement of probe stations (Probe stations are specially instrumented nodes from where probes can be sent to monitor the network) which affects the diagnosis capability of the probes sent by the probe stations. Probe station locations affect probing efficiency, monitoring capability, and deployment cost. We present probe station selection algorithms and aim to minimize the number of probe stations and make the monitoring robust against failures in a deterministic as well as a non-deterministic environment. We then implement algorithms that exploit interactions between probe paths to find a small collection of probes that can be used to locate faults. Small probe sets are desirable in order to minimize the costs imposed by probing, such as additional network load and data management requirements. We discuss a novel integrated approach of probe station and probe set selection for fault localization. A better placing of probe stations would produce fewer probes and probe set maintaining same diagnostic power. We provide experimental evaluation of the proposed algorithms through simulation results

    Efficient Probing Techniques for Fault Diagnosis

    Get PDF
    Abstract — Increase in the network usage and the widespread application of networks for more and more performance critical applications has caused a demand for tools that can monitor network health with minimum management traffic. Adaptive probing holds a potential to provide effective tools for end-toend monitoring and fault diagnosis over a network. In this paper we present adaptive probing tools that meet the requirements to provide an effective and efficient solution for fault diagnosis. In this paper, we propose adaptive probing based algorithms to perform fault localization by adapting the probe set to localize the faults in the network. We compare the performance and efficiency of the proposed algorithms through simulation results

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    A framework for modelling mobile radio access networks for intelligent fault management

    Get PDF
    Postprin
    corecore