By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. That is, the diagnostic system should be able to diagnose not only genericfaults(e.g.,performance-related)butalsoapplicationspecific faults(e.g.,errorcodes). Itshouldalsoidentifyculpritsatafinegranularity such as a process or firewall configuration. We build a system, called NetMedic, that enables detailed diagnosis by harnessing therichinformationexposed bymodernoperating systemsandapplications. It formulates detailed diagnosis as an inference problem that more faithfully captures the behaviors and interactions of finegrained network components such as processes. The primary challenge in solving this problem is inferring when a component might beimpactinganother.Oursolutionisbasedonanintuitive technique thatusesthejointbehavioroftwocomponentsinthe past to estimate thelikelihoodofthemimpactingoneanotherinthepresent. We find that our deployed prototype is effective at diagnosing faults that we injectinaliveenvironment. Thefaultycomponentiscorrectlyidentifiedasthemostlikely culpritin80%ofthecasesandisalmostalways in the list of top five culprits
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.