As the complexity of networked systems increases, we need mechanisms to automatically detect failures in the network and diagnose the cause of such failures. To realize true self-healing networks, we also need mechanisms to fix these failures and ensure service availability by providing alternatives when a failure is detected. In this paper, we address the detection and diagnosis of network failures. We introduce DYSWIS (“Do you see what I see”) which observes the system from multiple points in the network. Our system consists of detection nodes and diagnosis nodes. Detection nodes detect failures by passive traffic monitoring and active probing. Diagnosis nodes determine the cause of failures using historical information about similar failures and by performing active tests. They are based on a rule engine and represent network dependency relationship encoded as rules. We present our prototype system which considers network components present in a VoIP network and show the feasibility of our solution. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.