930 research outputs found

    An Efficient Holistic Data Distribution and Storage Solution for Online Social Networks

    Get PDF
    In the past few years, Online Social Networks (OSNs) have dramatically spread over the world. Facebook [4], one of the largest worldwide OSNs, has 1.35 billion users, 82.2% of whom are outside the US [36]. The browsing and posting interactions (text content) between OSN users lead to user data reads (visits) and writes (updates) in OSN datacenters, and Facebook now serves a billion reads and tens of millions of writes per second [37]. Besides that, Facebook has become one of the top Internet traļ¬ƒc sources [36] by sharing tremendous number of large multimedia ļ¬les including photos and videos. The servers in datacenters have limited resources (e.g. bandwidth) to supply latency eļ¬ƒcient service for multimedia ļ¬le sharing among the rapid growing users worldwide. Most online applications operate under soft real-time constraints (e.g., ā‰¤ 300 ms latency) for good user experience, and its service latency is negatively proportional to its income. Thus, the service latency is a very important requirement for Quality of Service (QoS) to the OSN as a web service, since it is relevant to the OSNā€™s revenue and user experience. Also, to increase OSN revenue, OSN service providers need to constrain capital investment, operation costs, and the resource (bandwidth) usage costs. Therefore, it is critical for the OSN to supply a guaranteed QoS for both text and multimedia contents to users while minimizing its costs. To achieve this goal, in this dissertation, we address three problems. i) Data distribution among datacenters: how to allocate data (text contents) among data servers with low service latency and minimized inter-datacenter network load; ii) Eļ¬ƒcient multimedia ļ¬le sharing: how to facilitate the servers in datacenters to eļ¬ƒciently share multimedia ļ¬les among users; iii) Cost minimized data allocation among cloud storages: how to save the infrastructure (datacenters) capital investment and operation costs by leveraging commercial cloud storage services. Data distribution among datacenters. To serve the text content, the new OSN model, which deploys datacenters globally, helps reduce service latency to worldwide distributed users and release the load of the existing datacenters. However, it causes higher inter-datacenter communica-tion load. In the OSN, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. The distributed data storage, which only stores a userā€™s data to his/her geographically closest datacenters, simply mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter com-munication and hence long service latencies. Therefore, the OSNs need a data allocation algorithm among datacenters with minimized network load and low service latency. Eļ¬ƒcient multimedia ļ¬le sharing. To serve multimedia ļ¬le sharing with rapid growing user population, the ļ¬le distribution method should be scalable and cost eļ¬ƒcient, e.g. minimiza-tion of bandwidth usage of the centralized servers. The P2P networks have been widely used for ļ¬le sharing among a large amount of users [58, 131], and meet both scalable and cost eļ¬ƒcient re-quirements. However, without fully utilizing the altruism and trust among friends in the OSNs, current P2P assisted ļ¬le sharing systems depend on strangers or anonymous users to distribute ļ¬les that degrades their performance due to user selļ¬sh and malicious behaviors. Therefore, the OSNs need a cost eļ¬ƒcient and trustworthy P2P-assisted ļ¬le sharing system to serve multimedia content distribution. Cost minimized data allocation among cloud storages. The new trend of OSNs needs to build worldwide datacenters, which introduce a large amount of capital investment and maintenance costs. In order to save the capital expenditures to build and maintain the hardware infrastructures, the OSNs can leverage the storage services from multiple Cloud Service Providers (CSPs) with existing worldwide distributed datacenters [30, 125, 126]. These datacenters provide diļ¬€erent Get/Put latencies and unit prices for resource utilization and reservation. Thus, when se-lecting diļ¬€erent CSPsā€™ datacenters, an OSN as a cloud customer of a globally distributed application faces two challenges: i) how to allocate data to worldwide datacenters to satisfy application SLA (service level agreement) requirements including both data retrieval latency and availability, and ii) how to allocate data and reserve resources in datacenters belonging to diļ¬€erent CSPs to minimize the payment cost. Therefore, the OSNs need a data allocation system distributing data among CSPsā€™ datacenters with cost minimization and SLA guarantee. In all, the OSN needs an eļ¬ƒcient holistic data distribution and storage solution to minimize its network load and cost to supply a guaranteed QoS for both text and multimedia contents. In this dissertation, we propose methods to solve each of the aforementioned challenges in OSNs. Firstly, we verify the beneļ¬ts of the new trend of OSNs and present OSN typical properties that lay the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters (SD3) to allocate user data among geographical distributed datacenters. In SD3,a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a userā€™s diļ¬€erent types of data (e.g., status update, friend post) for replication, making sure that a replica always reduces inter-datacenter communication. Secondly, we analyze a BitTorrent ļ¬le sharing trace, which proves the necessity of proximity-and interest-aware clustering. Based on the trace study and OSN properties, to address the second problem, we propose a SoCial Network integrated P2P ļ¬le sharing system for enhanced Eļ¬ƒciency and Trustworthiness (SOCNET) to fully and cooperatively leverage the common-interest, geographically-close and trust properties of OSN friends. SOCNET uses a hierarchical distributed hash table (DHT) to cluster common-interest nodes, and then further clusters geographically close nodes into a subcluster, and connects the nodes in a subcluster with social links. Thus, when queries travel along trustable social links, they also gain higher probability of being successfully resolved by proximity-close nodes, simultaneously enhancing eļ¬ƒciency and trustworthiness. Thirdly, to handle the third problem, we model the cost minimization problem under the SLA constraints using integer programming. According to the system model, we propose an Eco-nomical and SLA-guaranteed cloud Storage Service (ES3), which ļ¬nds a data allocation and resource reservation schedule with cost minimization and SLA guarantee. ES3 incorporates (1) a data al-location and reservation algorithm, which allocates each data item to a datacenter and determines the reservation amount on datacenters by leveraging all the pricing policies; (2) a genetic algorithm based data allocation adjustment approach, which makes data Get/Put rates stable in each data-center to maximize the reservation beneļ¬t; and (3) a dynamic request redirection algorithm, which dynamically redirects a data request from an over-utilized datacenter to an under-utilized datacenter with suļ¬ƒcient reserved resource when the request rate varies greatly to further reduce the payment. Finally, we conducted trace driven experiments on a distributed testbed, PlanetLab, and real commercial cloud storage (Amazon S3, Windows Azure Storage and Google Cloud Storage) to demonstrate the eļ¬ƒciency and eļ¬€ectiveness of our proposed systems in comparison with other systems. The results show that our systems outperform others in the network savings and data distribution eļ¬ƒciency

    Efficient and Flexible Search in Large Scale Distributed Systems

    Get PDF
    Peer-to-peer (P2P) technology has triggered a wide range of distributed systems beyond simple file-sharing. Distributed XML databases, distributed computing, server-less web publishing and networked resource/service sharing are only a few to name. Despite of the diversity in applications, these systems share a common problem regarding searching and discovery of information. This commonality stems from the transitory nodes population and volatile information content in the participating nodes. In such dynamic environment, users are not expected to have the exact information about the available objects in the system. Rather queries are based on partial information, which requires the search mechanism to be flexible. On the other hand, to scale with network size the search mechanism is required to be bandwidth efficient. Since the advent of P2P technology experts from industry and academia have proposed a number of search techniques - none of which is able to provide satisfactory solution to the conflicting requirements of search efficiency and flexibility. Structured search techniques, mostly Distributed Hash Table (DHT)-based, are bandwidth efficient while semi(un)-structured techniques are flexible. But, neither achieves both ends. This thesis defines the Distributed Pattern Matching (DPM) problem. The DPM problem is to discover a pattern (\ie bit-vector) using any subset of its 1-bits, under the assumption that the patterns are distributed across a large population of networked nodes. Search problem in many distributed systems can be reduced to the DPM problem. This thesis also presents two distinct search mechanisms, named Distributed Pattern Matching System (DPMS) and Plexus, for solving the DPM problem. DPMS is a semi-structured, hierarchical architecture aiming to discover a predefined number of matches by visiting a small number of nodes. Plexus, on the other hand, is a structured search mechanism based on the theory of Error Correcting Code (ECC). The design goal behind Plexus is to discover all the matches by visiting a reasonable number of nodes

    Multiple description image and video coding for P2P transmissions

    Get PDF
    Peer-to-Peer (P2P) media streaming is, nowadays, a very attractive topic due to the bandwidth available to serve demanding content scales. A key challenge, however, is making content distribution robust to peer transience. Multiple description coding (MDC) has, indeed, proven to be very effective with problems concerning the packetsā€™ losses, since it generates several descriptions and may reconstruct the original information with any number of descriptions that may reach the decoder. Therefore multiple descriptions may be effective for robust peer-to-peer media streaming. In this dissertation, it will not only be showed that, but also that varying the redundancy level of description on the fly may lead to a better performance than the one obtained without varying this parameter. Besides that, it is shown, as well, that varying the Bitrate on the fly outperforms the redundancy on it. Furthermore, the redundancy and the Bitrate were varied simultaneously. Thus, it is shown that this variation is more efficient when the packet loss is high. The experiments reported above were done using an experimental test bed developed for this purpose at the NMCG lab of the University of Beira Interior. It was also used the REGPROT, a video encoder developed by our research team, to splitted the video into multiple descriptions, which were, later, distributed among the peers in the test bed. After the request of the client, the referred encoder decoded the descriptions as they were being received.FundaĆ§Ć£o para a CiĆŖncia e a Tecnologia (FCT

    Large-Scale Distributed Coalition Formation

    Get PDF
    The CyberCraft project is an effort to construct a large scale Distributed Multi-Agent System (DMAS) to provide autonomous Cyberspace defense and mission assurance for the DoD. It employs a small but flexible agent structure that is dynamically reconfigurable to accommodate new tasks and policies. This document describes research into developing protocols and algorithms to ensure continued mission execution in a system of one million or more agents, focusing on protocols for coalition formation and Command and Control. It begins by building large-scale routing algorithms for a Hierarchical Peer to Peer structured overlay network, called Resource-Clustered Chord (RC-Chord). RC-Chord introduces the ability to efficiently locate agents by resources that agents possess. Combined with a task model defined for CyberCraft, this technology feeds into an algorithm that constructs task coalitions in a large-scale DMAS. Experiments reveal the flexibility and effectiveness of these concepts for achieving maximum work throughput in a simulated CyberCraft environment

    Video-on-Demand over Internet: a survey of existing systems and solutions

    Get PDF
    Video-on-Demand is a service where movies are delivered to distributed users with low delay and free interactivity. The traditional client/server architecture experiences scalability issues to provide video streaming services, so there have been many proposals of systems, mostly based on a peer-to-peer or on a hybrid server/peer-to-peer solution, to solve this issue. This work presents a survey of the currently existing or proposed systems and solutions, based upon a subset of representative systems, and defines selection criteria allowing to classify these systems. These criteria are based on common questions such as, for example, is it video-on-demand or live streaming, is the architecture based on content delivery network, peer-to-peer or both, is the delivery overlay tree-based or mesh-based, is the system push-based or pull-based, single-stream or multi-streams, does it use data coding, and how do the clients choose their peers. Representative systems are briefly described to give a summarized overview of the proposed solutions, and four ones are analyzed in details. Finally, it is attempted to evaluate the most promising solutions for future experiments. RĆ©sumĆ© La vidĆ©o Ć  la demande est un service oĆ¹ des films sont fournis Ć  distance aux utilisateurs avec u

    A Peer-to-Peer Network Framework Utilising the Public Mobile Telephone Network

    Get PDF
    P2P (Peer-to-Peer) technologies are well established and have now become accepted as a mainstream networking approach. However, the explosion of participating users has not been replicated within the mobile networking domain. Until recently the lack of suitable hardware and wireless network infrastructure to support P2P activities was perceived as contributing to the problem. This has changed with ready availability of handsets having ample processing resources utilising an almost ubiquitous mobile telephone network. Coupled with this has been a proliferation of software applications written for the more capable `smartphone' handsets. P2P systems have not naturally integrated and evolved into the mobile telephone ecosystem in a way that `client-server' operating techniques have. However as the number of clients for a particular mobile application increase, providing the `server side' data storage infrastructure becomes more onerous. P2P systems offer mobile telephone applications a way to circumvent this data storage issue by dispersing it across a network of the participating users handsets. The main goal of this work was to produce a P2P Application Framework that supports developers in creating mobile telephone applications that use distributed storage. Effort was assigned to determining appropriate design requirements for a mobile handset based P2P system. Some of these requirements are related to the limitations of the host hardware, such as power consumption. Others relate to the network upon which the handsets operate, such as connectivity. The thesis reviews current P2P technologies to assess which was viable to form the technology foundations for the framework. The aim was not to re-invent a P2P system design, rather to adopt an existing one for mobile operation. Built upon the foundations of a prototype application, the P2P framework resulting from modifications and enhancements grants access via a simple API (Applications Programmer Interface) to a subset of Nokia `smartphone' devices. Unhindered operation across all mobile telephone networks is possible through a proprietary application implementing NAT (Network Address Translation) traversal techniques. Recognising that handsets operate with limited resources, further optimisation of the P2P framework was also investigated. Energy consumption was a parameter chosen for further examination because of its impact on handset participation time. This work has proven that operating applications in conjunction with a P2P data storage framework, connected via the mobile telephone network, is technically feasible. It also shows that opportunity remains for further research to realise the full potential of this data storage technique

    Future of networking is the future of Big Data, The

    Get PDF
    2019 Summer.Includes bibliographical references.Scientific domains such as Climate Science, High Energy Particle Physics (HEP), Genomics, Biology, and many others are increasingly moving towards data-oriented workflows where each of these communities generates, stores and uses massive datasets that reach into terabytes and petabytes, and projected soon to reach exabytes. These communities are also increasingly moving towards a global collaborative model where scientists routinely exchange a significant amount of data. The sheer volume of data and associated complexities associated with maintaining, transferring, and using them, continue to push the limits of the current technologies in multiple dimensions - storage, analysis, networking, and security. This thesis tackles the networking aspect of big-data science. Networking is the glue that binds all the components of modern scientific workflows, and these communities are becoming increasingly dependent on high-speed, highly reliable networks. The network, as the common layer across big-science communities, provides an ideal place for implementing common services. Big-science applications also need to work closely with the network to ensure optimal usage of resources, intelligent routing of requests, and data. Finally, as more communities move towards data-intensive, connected workflows - adopting a service model where the network provides some of the common services reduces not only application complexity but also the necessity of duplicate implementations. Named Data Networking (NDN) is a new network architecture whose service model aligns better with the needs of these data-oriented applications. NDN's name based paradigm makes it easier to provide intelligent features at the network layer rather than at the application layer. This thesis shows that NDN can push several standard features to the network. This work is the first attempt to apply NDN in the context of large scientific data; in the process, this thesis touches upon scientific data naming, name discovery, real-world deployment of NDN for scientific data, feasibility studies, and the designs of in-network protocols for big-data science

    Application of overlay techniques to network monitoring

    Get PDF
    Measurement and monitoring are important for correct and efficient operation of a network, since these activities provide reliable information and accurate analysis for characterizing and troubleshooting a networkā€™s performance. The focus of network measurement is to measure the volume and types of traffic on a particular network and to record the raw measurement results. The focus of network monitoring is to initiate measurement tasks, collect raw measurement results, and report aggregated outcomes. Network systems are continuously evolving: besides incremental change to accommodate new devices, more drastic changes occur to accommodate new applications, such as overlay-based content delivery networks. As a consequence, a network can experience significant increases in size and significant levels of long-range, coordinated, distributed activity; furthermore, heterogeneous network technologies, services and applications coexist and interact. Reliance upon traditional, point-to-point, ad hoc measurements to manage such networks is becoming increasingly tenuous. In particular, correlated, simultaneous 1-way measurements are needed, as is the ability to access measurement information stored throughout the network of interest. To address these new challenges, this dissertation proposes OverMon, a new paradigm for edge-to-edge network monitoring systems through the application of overlay techniques. Of particular interest, the problem of significant network overheads caused by normal overlay network techniques has been addressed by constructing overlay networks with topology awareness - the network topology information is derived from interior gateway protocol (IGP) traffic, i.e. OSPF traffic, thus eliminating all overlay maintenance network overhead. Through a prototype that uses overlays to initiate measurement tasks and to retrieve measurement results, systematic evaluation has been conducted to demonstrate the feasibility and functionality of OverMon. The measurement results show that OverMon achieves good performance in scalability, flexibility and extensibility, which are important in addressing the new challenges arising from network system evolution. This work, therefore, contributes an innovative approach of applying overly techniques to solve realistic network monitoring problems, and provides valuable first hand experience in building and evaluating such a distributed system

    Private and censorship-resistant communication over public networks

    Get PDF
    Societyā€™s increasing reliance on digital communication networks is creating unprecedented opportunities for wholesale surveillance and censorship. This thesis investigates the use of public networks such as the Internet to build robust, private communication systems that can resist monitoring and attacks by powerful adversaries such as national governments. We sketch the design of a censorship-resistant communication system based on peer-to-peer Internet overlays in which the participants only communicate directly with people they know and trust. This ā€˜friend-to-friendā€™ approach protects the participantsā€™ privacy, but it also presents two significant challenges. The first is that, as with any peer-to-peer overlay, the users of the system must collectively provide the resources necessary for its operation; some users might prefer to use the system without contributing resources equal to those they consume, and if many users do so, the system may not be able to survive. To address this challenge we present a new game theoretic model of the problem of encouraging cooperation between selfish actors under conditions of scarcity, and develop a strategy for the game that provides rational incentives for cooperation under a wide range of conditions. The second challenge is that the structure of a friend-to-friend overlay may reveal the usersā€™ social relationships to an adversary monitoring the underlying network. To conceal their sensitive relationships from the adversary, the users must be able to communicate indirectly across the overlay in a way that resists monitoring and attacks by other participants. We address this second challenge by developing two new routing protocols that robustly deliver messages across networks with unknown topologies, without revealing the identities of the communication endpoints to intermediate nodes or vice versa. The protocols make use of a novel unforgeable acknowledgement mechanism that proves that a message has been delivered without identifying the source or destination of the message or the path by which it was delivered. One of the routing protocols is shown to be robust to attacks by malicious participants, while the other provides rational incentives for selfish participants to cooperate in forwarding messages
    • ā€¦
    corecore