4 research outputs found

    Diagnosis of TCP overlay connection failures using bayesian networks

    No full text

    Diagnosis of TCP overlay connection failures using Bayesian networks

    No full text
    When failures occur in Internet overlay connections today, it is difficult for users to determine the root cause of failure. An overlay connection may require TCP connections between a series of overlay nodes to succeed, but accurately determining which of these connections has failed is difficult for users without access to the internal workings of the overlay. Diagnosis using active probing is costly and may be inaccurate if probe packets are filtered or blocked. To address this problem, we develop a passive diagnosis approach that infers the most likely cause of failure using a Bayesian network modeling the conditional probability of TCP failures given the IP addresses of the hosts along the overlay path. We collect TCP failure data for 28.3 million TCP connections using data from the new Planetseer overlay monitoring system and train a Bayesian network for the diagnosis of overlay connection failures. We evaluate the accuracy of diagnosis using this Bayesian network on a set of overlay connections generated from observations of CoDeeN traffic patterns and find that our approach can accurately diagnose failures

    Statistical learning in network architecture

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 167-[177]).The Internet has become a ubiquitous substrate for communication in all parts of society. However, many original assumptions underlying its design are changing. Amid problems of scale, complexity, trust and security, the modern Internet accommodates increasingly critical services. Operators face a security arms race while balancing policy constraints, network demands and commercial relationships. This thesis espouses learning to embrace the Internet's inherent complexity, address diverse problems and provide a component of the network's continued evolution. Malicious nodes, cooperative competition and lack of instrumentation on the Internet imply an environment with partial information. Learning is thus an attractive and principled means to ensure generality and reconcile noisy, missing or conflicting data. We use learning to capitalize on under-utilized information and infer behavior more reliably, and on faster time-scales, than humans with only local perspective. Yet the intrinsic dynamic and distributed nature of networks presents interesting challenges to learning. In pursuit of viable solutions to several real-world Internet performance and security problems, we apply statistical learning methods as well as develop new, network-specific algorithms as a step toward overcoming these challenges. Throughout, we reconcile including intelligence at different points in the network with the end-to-end arguments. We first consider learning as an end-node optimization for efficient peer-to-peer overlay neighbor selection and agent-centric latency prediction. We then turn to security and use learning to exploit fundamental weaknesses in malicious traffic streams. Our method is both adaptable and not easily subvertible. Next, we show that certain security and optimization problems require collaboration, global scope and broad views.(cont.) We employ ensembles of weak classifiers within the network core to mitigate IP source address forgery attacks, thereby removing incentive and coordination issues surrounding existing practice. Finally, we argue for learning within the routing plane as a means to directly optimize and balance provider and user objectives. This thesis thus serves first to validate the potential for using learning methods to address several distinct problems on the Internet and second to illuminate design principles in building such intelligent systems in network architecture.by Robert Edward Beverly, IV.Ph.D

    CAPRI: A Common Architecture for Distributed Probabilistic Internet Fault Diagnosis

    Get PDF
    PhD thesisThis thesis presents a new approach to root cause localization and fault diagnosis in the Internet based on a Common Architecture for Probabilistic Reasoning in the Internet (CAPRI) in which distributed, heterogeneous diagnostic agents efficiently conduct diagnostic tests and communicate observations, beliefs, and knowledge to probabilistically infer the cause of network failures. Unlike previous systems that can only diagnose a limited set of network component failures using a limited set of diagnostic tests, CAPRI provides a common, extensible architecture for distributed diagnosis that allows experts to improve the system by adding new diagnostic tests and new dependency knowledge.To support distributed diagnosis using new tests and knowledge, CAPRI must overcome several challenges including the extensible representation and communication of diagnostic information, the description of diagnostic agent capabilities, and efficient distributed inference. Furthermore, the architecture must scale to support diagnosis of a large number of failures using many diagnostic agents. To address these challenges, this thesis presents a probabilistic approach to diagnosis based on an extensible, distributed component ontology to support the definition of new classes of components and diagnostic tests; a service description language for describing new diagnostic capabilities in terms of their inputs and outputs; and a message processing procedure for dynamically incorporating new information from other agents, selecting diagnostic actions, and inferring a diagnosis using Bayesian inference and belief propagation.To demonstrate the ability of CAPRI to support distributed diagnosis of real-world failures, I implemented and deployed a prototype network of agents on Planetlab for diagnosing HTTP connection failures. Approximately 10,000 user agents and 40 distributed regional and specialist agents on Planetlab collect information from over 10,000 users and diagnose over 140,000 failures using a wide range of active and passive tests, including DNS lookup tests, connectivity probes, Rockettrace measurements, and user connection histories. I show how to improve accuracy and cost by learning new dependency knowledge and introducing new diagnostic agents. I also show that agents can manage the cost of diagnosing many similar failures by aggregating related requests and caching observations and beliefs
    corecore