102 research outputs found

    Scalable discovery of networked data : Algorithms, Infrastructure, Applications

    Get PDF
    Harmelen, F.A.H. van [Promotor]Siebes, R.M. [Copromotor

    Towards a Framework for DHT Distributed Computing

    Get PDF
    Distributed Hash Tables (DHTs) are protocols and frameworks used by peer-to-peer (P2P) systems. They are used as the organizational backbone for many P2P file-sharing systems due to their scalability, fault-tolerance, and load-balancing properties. These same properties are highly desirable in a distributed computing environment, especially one that wants to use heterogeneous components. We show that DHTs can be used not only as the framework to build a P2P file-sharing service, but as a P2P distributed computing platform. We propose creating a P2P distributed computing framework using distributed hash tables, based on our prototype system ChordReduce. This framework would make it simple and efficient for developers to create their own distributed computing applications. Unlike Hadoop and similar MapReduce frameworks, our framework can be used both in both the context of a datacenter or as part of a P2P computing platform. This opens up new possibilities for building platforms to distributed computing problems. One advantage our system will have is an autonomous load-balancing mechanism. Nodes will be able to independently acquire work from other nodes in the network, rather than sitting idle. More powerful nodes in the network will be able use the mechanism to acquire more work, exploiting the heterogeneity of the network. By utilizing the load-balancing algorithm, a datacenter could easily leverage additional P2P resources at runtime on an as needed basis. Our framework will allow MapReduce-like or distributed machine learning platforms to be easily deployed in a greater variety of contexts

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    The 5th Conference of PhD Students in Computer Science

    Get PDF

    Big Data Processing with Peer-To-Peer Architectures

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Global connectivity architecture of mobile personal devices

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 193-207).The Internet's architecture, designed in the days of large, stationary computers tended by technically savvy and accountable administrators, fails to meet the demands of the emerging ubiquitous computing era. Nontechnical users now routinely own multiple personal devices, many of them mobile, and need to share information securely among them using interactive, delay-sensitive applications.Unmanaged Internet Architecture (UIA) is a novel, incrementally deployable network architecture for modern personal devices, which reconsiders three architectural cornerstones: naming, routing, and transport. UIA augments the Internet's global name system with a personal name system, enabling users to build personal administrative groups easily and intuitively, to establish secure bindings between his devices and with other users' devices, and to name his devices and his friends much like using a cell phone's address book. To connect personal devices reliably, even while mobile, behind NATs or firewalls, or connected via isolated ad hoc networks, UIA gives each device a persistent, location-independent identity, and builds an overlay routing service atop IP to resolve and route among these identities. Finally, to support today's interactive applications built using concurrent transactions and delay-sensitive media streams, UIA introduces a new structured stream transport abstraction, which solves the efficiency and responsiveness problems of TCP streams and the functionality limitations of UDP datagrams. Preliminary protocol designs and implementations demonstrate UIA's features and benefits. A personal naming prototype supports easy and portable group management, allowing use of personal names alongside global names in unmodified Internet applications. A prototype overlay router leverages the naming layer's social network to provide efficient ad hoc connectivity in restricted but important common-case scenarios.(cont) Simulations of more general routing protocols--one inspired by distributed hash tables, one based on recent compact routing theory--explore promising generalizations to UIA's overlay routing. A library-based prototype of UIA's structured stream transport enables incremental deployment in either OS infrastructure or applications, and demonstrates the responsiveness benefits of the new transport abstraction via dynamic prioritization of interactive web downloads. Finally, an exposition and experimental evaluation of NAT traversal techniques provides insight into routing optimizations useful in UIA and elsewhere.by Bryan Alexander Ford.Ph.D

    D.1.3 – Protocols for emergent localities

    Get PDF
    GDD_HCERES2020This report presents two contributions that illustrate the potential of emerging-locality protocols in large-scale decentralized systems, in two areas of decentralized social computing: recommendation, and eventual consistency of mutable data structures. The first contribution consists of a framework supporting the development of dynamically adaptive decen-tralised recommendation systems. Decentralised recommenders have been proposed to deliver privacy-preserving, personalised and highly scalable on-line recommendations. Current implementations tend, however, to rely on a hard-wired similarity metric that cannot adapt. This constitutes a strong limitation in the face of evolving needs. Our framework address this through a decentralised form of adaptation, in which individual nodes can independently select, and update their own recommendation algorithm, while still collectively contributing to the overall system's mission. Our second contribution addresses the growing demand for differentiated consistency requirements in large-scale applications. A large number of today's applications rely on Eventual Consistency, a consistency model that emphasizes liveness over safety. Designers generally adopt this consistency model uniformly throughout a distributed system due to its ability to scale as the number of users or devices grows larger. But this clashes with the need for differentiated consistency requirements. In this contribution, we address this need by introducing UPS, a novel consistency mechanism that offers differentiated eventual consistency and delivery speed by working in pair with a two-phase epidemic broadcast protocol. We propose a closed-form analysis of our approach's delivery speed, and we evaluate our complete protocol experimentally on a simulated network of one million nodes. To measure the consistency trade-off, we formally define a novel and scalable consistency metric operating at runtime
    corecore