11 research outputs found

    Exploiting Host Availability in Distributed Systems.

    Full text link
    As distributed systems become more decentralized, fluctuating host availability is an increasingly disruptive phenomenon. Older systems such as AFS used a small number of well-maintained, highly available machines to coordinate access to shared client state; server uptime (and thus service availability) were expected to be high. Newer services scale to larger number of clients by increasing the number of servers. In these systems, the responsibility for maintaining the service abstraction is spread amongst thousands of machines. In the extreme, each client is also a server who must respond to requests from its peers, and each host can opt in or out of the system at any time. In these operating environments, a non-trivial fraction of servers will be unavailable at any give time. This diffusion of responsibility from a few dedicated hosts to many unreliable ones has a dramatic impact on distributed system design, since it is difficult to build robust applications atop a partially available, potentially untrusted substrate. This dissertation explores one aspect of this challenge: how can a distributed system measure the fluctuating availability of its constituent hosts, and how can it use an understanding of this churn to improve performance and security? This dissertation extends the previous literature in three ways. First, it introduces new analytical techniques for characterizing availability data, applying these techniques to several real networks and explaining the distinct uptime patterns found within. Second, this dissertation introduces new methods for predicting future availability, both at the granularity of individual hosts and clusters of hosts. Third, my dissertation describes how to use these new techniques to improve the performance and security of distributed systems.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58445/1/jmickens_1.pd

    A technological overview of the guifi.net community network

    Get PDF
    This work presents a technological analysis of guifi.net, a free, neutral, and open-access community network. Guifi.net consists of more than 27,000 operational nodes, which makes it the world’s largest community network in terms of the number of nodes and coverage area. This paper describes the characteristics of the network, the link level topology, its growth over a decade, and its resilience in terms of availability and reachability of network nodes. Our study is based on open data published by guifi.net regarding its nodes and wireless links, monitoring information, community database, and its web portal. The data includes historical information that covers the lifetime of the network. The scale and diversity of the network requires a separate analysis of the subsets of the entire dataset by area or by separating the core from the leaf nodes. This shows some degree of diversity in local characteristics caused by several demographic, geographic, technological, and network design factors. We focus on the following aspects: technological network diversity, topology characteristics, evolution of the network over time, analysis of robustness, and its effect on networking service availability. In addition, we analyse how the community, the technology used, the geographical region where the network is deployed, and its self-organised structure shape the network properties and determine its strengths and weaknesses. The study demonstrates that the guifi.net community network is diverse in technological choices for hardware, link protocols, and channels and uses a combination of routing protocols yet provides a common private IP network. The graph topology follows a power- law distribution for links in regions up to a few thousand Km 2 , limited to the scope of wireless links. Network growth has two aspects: a geographic growth of the network core using long distance links with wireless or fibre, and the local growth in density with leaf low-cost leaf nodes. The resilience of the network derived from the nodes’ uptime and the structure of the graph varies across different regions with more fragile leafs than core nodes and diverse degrees of graph resilience to random failures or coordinated attacks, such as natural causes, depending on the network planning, structure, and maturity. The guifi.net community network results from a loosely coupled and decentralised organic growth that exhibits large local differences, diverse growth, and maturity under a common community license and social network.Peer ReviewedPostprint (author's final draft

    A Framework for anonymous background data delivery and feedback

    Get PDF
    The current state of the industry’s methods of collecting background data reflecting diagnostic and usage information are often opaque and require users to place a lot of trust in the entity receiving the data. For vendors, having a centralized database of potentially sensitive data is a privacy protection headache and a potential liability should a breach of that database occur. Unfortunately, high profile privacy failures are not uncommon, so many individuals and companies are understandably skeptical and choose not to contribute any information. It is a shame, since the data could be used for improving reliability, or getting stronger security, or for valuable academic research into real-world usage patterns. We propose, implement and evaluate a framework for non-realtime anonymous data collection, aggregation for analysis, and feedback. Departing from the usual “trusted core” approach, we aim to maintain reporters’ anonymity even if the centralized part of the system is compromised. We design a peer-to-peer mix network and its protocol that are tuned to the properties of background diagnostic traffic. Our system delivers data to a centralized repository while maintaining (i) source anonymity, (ii) privacy in transit, and (iii) the ability to provide analysis feedback back to the source. By removing the core’s ability to identify the source of data and to track users over time, we drastically reduce its attractiveness as a potential attack target and allow vendors to make concrete and verifiable privacy and anonymity claims

    Failure analysis and reliability -aware resource allocation of parallel applications in High Performance Computing systems

    Get PDF
    The demand for more computational power to solve complex scientific problems has been driving the physical size of High Performance Computing (HPC) systems to hundreds and thousands of nodes. Uninterrupted execution of large scale parallel applications naturally becomes a major challenge because a single node failure interrupts the entire application, and the reliability of a job completion decreases with increasing the number of nodes. Accurate reliability knowledge of a HPC system enables runtime systems such as resource management and applications to minimize performance loss due to random failures while also providing better Quality Of Service (QOS) for computational users. This dissertation makes three major contributions for reliability evaluation and resource management in HPC systems. First we study the failure properties of HPC systems and observe that Times To Failure (TTF\u27s) of individual compute nodes follow a time-varying failure rate based distribution like Weibull distribution. We then propose a model for the TTF distribution of a system of k independent nodes when individual nodes exhibit time varying failure rates. Based on the reliability of the proposed TTF model, we develop reliability-aware resource allocation algorithms and evaluated them on actual parallel workloads and failure data of a HPC system. Our observations indicate that applying time varying failure rate-based reliability function combined with some heuristics reduce the performance loss due to unexpected failures by as much as 30 to 53 percent. Finally, we also study the effect of reliability with respect to the number of nodes and propose reliability-aware optimal k node allocation algorithm for large scale parallel applications. Our simulation results of comparing the optimal k node algorithm indicate that choosing the number of nodes for large scale parallel applications based on the reliability of compute nodes can reduce the overall completion time and waste time when the k may be smaller than the total number of nodes in the system

    A Bird's Eye View on the I2P Anonymous File-sharing Environment

    Get PDF
    International audienceAnonymous communications have been gaining more and more interest from Internet users as privacy and anonymity problems have emerged. Among anonymous enabled services, anonymous file-sharing is one of the most active one and is increasingly growing. Large scale monitoring on these systems allows us to grasp how they behave, which type of data is shared among users, the overall behaviour in the system. But does large scale monitoring jeopardize the system anonymity? In this work we present the first large scale monitoring architecture and experiments on the I2P network, a low-latency message-oriented anonymous network. We characterize the file-sharing environment within I2P, and evaluate if this monitoring affects the anonymity provided by the network. We show that most activities within the network are file-sharing oriented, along with anonymous web-hosting. We assess the wide geographical location of nodes and network popularity. We also demonstrate that group-based profiling is feasible on this particular network

    Building the Future Internet through FIRE

    Get PDF
    The Internet as we know it today is the result of a continuous activity for improving network communications, end user services, computational processes and also information technology infrastructures. The Internet has become a critical infrastructure for the human-being by offering complex networking services and end-user applications that all together have transformed all aspects, mainly economical, of our lives. Recently, with the advent of new paradigms and the progress in wireless technology, sensor networks and information systems and also the inexorable shift towards everything connected paradigm, first as known as the Internet of Things and lately envisioning into the Internet of Everything, a data-driven society has been created. In a data-driven society, productivity, knowledge, and experience are dependent on increasingly open, dynamic, interdependent and complex Internet services. The challenge for the Internet of the Future design is to build robust enabling technologies, implement and deploy adaptive systems, to create business opportunities considering increasing uncertainties and emergent systemic behaviors where humans and machines seamlessly cooperate

    Building the Future Internet through FIRE

    Get PDF
    The Internet as we know it today is the result of a continuous activity for improving network communications, end user services, computational processes and also information technology infrastructures. The Internet has become a critical infrastructure for the human-being by offering complex networking services and end-user applications that all together have transformed all aspects, mainly economical, of our lives. Recently, with the advent of new paradigms and the progress in wireless technology, sensor networks and information systems and also the inexorable shift towards everything connected paradigm, first as known as the Internet of Things and lately envisioning into the Internet of Everything, a data-driven society has been created. In a data-driven society, productivity, knowledge, and experience are dependent on increasingly open, dynamic, interdependent and complex Internet services. The challenge for the Internet of the Future design is to build robust enabling technologies, implement and deploy adaptive systems, to create business opportunities considering increasing uncertainties and emergent systemic behaviors where humans and machines seamlessly cooperate

    High Performance Network Evaluation and Testing

    Get PDF

    The Sensor Network Workbench: Towards Functional Specification, Verification and Deployment of Constrained Distributed Systems

    Full text link
    As the commoditization of sensing, actuation and communication hardware increases, so does the potential for dynamically tasked sense and respond networked systems (i.e., Sensor Networks or SNs) to replace existing disjoint and inflexible special-purpose deployments (closed-circuit security video, anti-theft sensors, etc.). While various solutions have emerged to many individual SN-centric challenges (e.g., power management, communication protocols, role assignment), perhaps the largest remaining obstacle to widespread SN deployment is that those who wish to deploy, utilize, and maintain a programmable Sensor Network lack the programming and systems expertise to do so. The contributions of this thesis centers on the design, development and deployment of the SN Workbench (snBench). snBench embodies an accessible, modular programming platform coupled with a flexible and extensible run-time system that, together, support the entire life-cycle of distributed sensory services. As it is impossible to find a one-size-fits-all programming interface, this work advocates the use of tiered layers of abstraction that enable a variety of high-level, domain specific languages to be compiled to a common (thin-waist) tasking language; this common tasking language is statically verified and can be subsequently re-translated, if needed, for execution on a wide variety of hardware platforms. snBench provides: (1) a common sensory tasking language (Instruction Set Architecture) powerful enough to express complex SN services, yet simple enough to be executed by highly constrained resources with soft, real-time constraints, (2) a prototype high-level language (and corresponding compiler) to illustrate the utility of the common tasking language and the tiered programming approach in this domain, (3) an execution environment and a run-time support infrastructure that abstract a collection of heterogeneous resources into a single virtual Sensor Network, tasked via this common tasking language, and (4) novel formal methods (i.e., static analysis techniques) that verify safety properties and infer implicit resource constraints to facilitate resource allocation for new services. This thesis presents these components in detail, as well as two specific case-studies: the use of snBench to integrate physical and wireless network security, and the use of snBench as the foundation for semester-long student projects in a graduate-level Software Engineering course
    corecore