27 research outputs found

    Semantic Routing in Peer-to-Peer Systems

    Get PDF
    Currently search engines like Google, Yahoo and Excite are centralized, which means that all queries that users post are sent to some big servers (or server group) that handle them. In this way it is easy for the systems to relate IP-addresses with the queries posted from them. Clearly privacy is a problem here. Also censoring out certain information which is not 'appropriate' is simple, and shown in recent examples. To give more privacy to the users and make censoring information more difficult, Peer-to-Peer (P2P) systems are a good alternative to the centralized approach. In P2P systems the search functionality can be devided over a large group of autonomous computers (Peers), where each computer only has a very small piece of information instead of everything. Now the problem in such a distributed system is to make the search process efficient in terms of bandwith, storage, time and CPU usage. In this Ph.D. thesis, three approaches are described that try to reach goal of finding the short routes between seeker and providers with high efficiency. These routing algorithms are all applied on 'Semantic-Overlay-Networks' (SONs). In a SON, peers maintain pointers to semantically relevant peers based on content descriptions, which makes them able to choose the relevant peers for queries instead of, for example, choosing random peers. This work tries to show that decentralized search algorithms based on semantic routing are a good alternative to centralized approaches.Harmelen, F.A.H. van [Promotor

    Scalable discovery of networked data : Algorithms, Infrastructure, Applications

    Get PDF
    Harmelen, F.A.H. van [Promotor]Siebes, R.M. [Copromotor

    Confidential Data-Outsourcing and Self-Optimizing P2P-Networks: Coping with the Challenges of Multi-Party Systems

    Get PDF
    This work addresses the inherent lack of control and trust in Multi-Party Systems at the examples of the Database-as-a-Service (DaaS) scenario and public Distributed Hash Tables (DHTs). In the DaaS field, it is shown how confidential information in a database can be protected while still allowing the external storage provider to process incoming queries. For public DHTs, it is shown how these highly dynamic systems can be managed by facilitating monitoring, simulation, and self-adaptation

    Structures and Algorithms for Peer-to-Peer Cooperation

    Full text link
    Peer-to-peer overlay networks are distributed systems, without any hierarchical organization or centralized control. Peers form self-organizing overlay networks that are on top of the Internet. Both parts of this thesis deal with peer-to-peer overlay networks, the first part with unstructured ones used to build a large scale Networked Virtual Environment. The second part gives insights on how the users of a real life structured peer-to-peer network behave, and how well the proposed algorithms for publishing and retrieving data work. Moreover we analyze the security (holes) in such a system. Networked virtual environments (NVEs), also known as distributed virtual environments, are computer-generated, synthetic worlds that allow simultaneous interactions of multiple participants. Many efforts have been made to allow people to interact in realistic virtual environments, resulting in the recent boom of Massively Multiplayer Online Games. In the first part of the thesis, we present a complete study of an augmented Delaunay-based overlay for peer-to-peer shared virtual worlds. We design an overlay network matching the Delaunay triangulation of the participating peers in a generalized d-dimensional space. Especially, we describe the self-organizing algorithms for peer insertion and deletion. To reduce the delay penalty of overlay routing, we propose to augment each node of the Delaunay-based overlay with a limited number of carefully selected shortcut links creating a small-world. We show that a small number of shortcuts is sufficient to significantly decrease the delay of routing in the space. We present a distributed algorithm for the clustering of peers. The algorithm is dynamic in the sense that whenever a peer joins or leaves the NVE, the clustering will be adapted if necessary by either splitting a cluster or merging clusters. The main idea of the algorithm is to classify links between adjacent peers into short intracluster and long inter-cluster links. In a structured system, the neighbor relationship between peers and data locations is strictly defined. Searching in such systems is therefore determined by the particular network architecture. Among the strictly structured systems, some implement a distributed hash table (DHT) using different data structures. DHTs have been actively studied in the literature and many different proposals have been made on how to organize peers in a DHT. However, very few DHTs have been implemented in real systems and deployed on a large scale. One exception is KAD, a DHT based on Kademlia, which is part of eDonkey, a peer-to-peer file sharing system with several million simultaneous users. In the second part of this thesis we give a detailed background on KAD, the organization of the peers, the search and the publish operations, and we describe our measurement methodology. We have been crawling KAD continuously for more than a year. We obtained information about geographical distribution of peers, session times, peer availability, and peer lifetime. We found that session times are Weibull distributed and show how this information can be exploited to make the publishing mechanism much more efficient. As we have been studying KAD over the course of the last two years we have been both, fascinated and frightened by the possibilities KAD offers. We show that mounting a Sybil attack is very easy in KAD and allows to compromise the privacy of KAD users, to compromise the correct operation of the key lookup and to mount distributed denial-of-service attacks with very little resources

    A bottom-up approach to real-time search in large networks and clouds

    Full text link

    Scalability of findability: decentralized search and retrieval in large information networks

    Get PDF
    Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. Collection of information in advance and centralization of IR operations are hardly possible because systems are dynamic and information is distributed. While monolithic search systems continue to struggle with scalability problems of today, the future of search likely requires a decentralized architecture where many information systems can participate. As individual systems interconnect to form a global structure, finding relevant information in distributed environments transforms into a problem concerning not only information retrieval but also complex networks. Understanding network connectivity will provide guidance on how decentralized search and retrieval methods can function in these information spaces. The dissertation studies one aspect of scalability challenges facing classic information retrieval models and presents a decentralized, organic view of information systems pertaining to search in large scale networks. It focuses on the impact of network structure on search performance and investigates a phenomenon we refer to as the Clustering Paradox, in which the topology of interconnected systems imposes a scalability limit. Experiments involving large scale benchmark collections provide evidence on the Clustering Paradox in the IR context. In an increasingly large, distributed environment, decentralized searches for relevant information can continue to function well only when systems interconnect in certain ways. Relying on partial indexes of distributed systems, some level of network clustering enables very efficient and effective discovery of relevant information in large scale networks. Increasing or reducing network clustering degrades search performances. Given this specific level of network clustering, search time is well explained by a poly-logarithmic relation to network size, indicating a high scalability potential for searching in a continuously growing information space

    Epidemic-based self-organization in peer-to-peer systems

    Get PDF
    Steen, M.R. [Promotor]van Tanenbaum, A.S. [Promotor

    An Efficient Holistic Data Distribution and Storage Solution for Online Social Networks

    Get PDF
    In the past few years, Online Social Networks (OSNs) have dramatically spread over the world. Facebook [4], one of the largest worldwide OSNs, has 1.35 billion users, 82.2% of whom are outside the US [36]. The browsing and posting interactions (text content) between OSN users lead to user data reads (visits) and writes (updates) in OSN datacenters, and Facebook now serves a billion reads and tens of millions of writes per second [37]. Besides that, Facebook has become one of the top Internet traļ¬ƒc sources [36] by sharing tremendous number of large multimedia ļ¬les including photos and videos. The servers in datacenters have limited resources (e.g. bandwidth) to supply latency eļ¬ƒcient service for multimedia ļ¬le sharing among the rapid growing users worldwide. Most online applications operate under soft real-time constraints (e.g., ā‰¤ 300 ms latency) for good user experience, and its service latency is negatively proportional to its income. Thus, the service latency is a very important requirement for Quality of Service (QoS) to the OSN as a web service, since it is relevant to the OSNā€™s revenue and user experience. Also, to increase OSN revenue, OSN service providers need to constrain capital investment, operation costs, and the resource (bandwidth) usage costs. Therefore, it is critical for the OSN to supply a guaranteed QoS for both text and multimedia contents to users while minimizing its costs. To achieve this goal, in this dissertation, we address three problems. i) Data distribution among datacenters: how to allocate data (text contents) among data servers with low service latency and minimized inter-datacenter network load; ii) Eļ¬ƒcient multimedia ļ¬le sharing: how to facilitate the servers in datacenters to eļ¬ƒciently share multimedia ļ¬les among users; iii) Cost minimized data allocation among cloud storages: how to save the infrastructure (datacenters) capital investment and operation costs by leveraging commercial cloud storage services. Data distribution among datacenters. To serve the text content, the new OSN model, which deploys datacenters globally, helps reduce service latency to worldwide distributed users and release the load of the existing datacenters. However, it causes higher inter-datacenter communica-tion load. In the OSN, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. The distributed data storage, which only stores a userā€™s data to his/her geographically closest datacenters, simply mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter com-munication and hence long service latencies. Therefore, the OSNs need a data allocation algorithm among datacenters with minimized network load and low service latency. Eļ¬ƒcient multimedia ļ¬le sharing. To serve multimedia ļ¬le sharing with rapid growing user population, the ļ¬le distribution method should be scalable and cost eļ¬ƒcient, e.g. minimiza-tion of bandwidth usage of the centralized servers. The P2P networks have been widely used for ļ¬le sharing among a large amount of users [58, 131], and meet both scalable and cost eļ¬ƒcient re-quirements. However, without fully utilizing the altruism and trust among friends in the OSNs, current P2P assisted ļ¬le sharing systems depend on strangers or anonymous users to distribute ļ¬les that degrades their performance due to user selļ¬sh and malicious behaviors. Therefore, the OSNs need a cost eļ¬ƒcient and trustworthy P2P-assisted ļ¬le sharing system to serve multimedia content distribution. Cost minimized data allocation among cloud storages. The new trend of OSNs needs to build worldwide datacenters, which introduce a large amount of capital investment and maintenance costs. In order to save the capital expenditures to build and maintain the hardware infrastructures, the OSNs can leverage the storage services from multiple Cloud Service Providers (CSPs) with existing worldwide distributed datacenters [30, 125, 126]. These datacenters provide diļ¬€erent Get/Put latencies and unit prices for resource utilization and reservation. Thus, when se-lecting diļ¬€erent CSPsā€™ datacenters, an OSN as a cloud customer of a globally distributed application faces two challenges: i) how to allocate data to worldwide datacenters to satisfy application SLA (service level agreement) requirements including both data retrieval latency and availability, and ii) how to allocate data and reserve resources in datacenters belonging to diļ¬€erent CSPs to minimize the payment cost. Therefore, the OSNs need a data allocation system distributing data among CSPsā€™ datacenters with cost minimization and SLA guarantee. In all, the OSN needs an eļ¬ƒcient holistic data distribution and storage solution to minimize its network load and cost to supply a guaranteed QoS for both text and multimedia contents. In this dissertation, we propose methods to solve each of the aforementioned challenges in OSNs. Firstly, we verify the beneļ¬ts of the new trend of OSNs and present OSN typical properties that lay the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters (SD3) to allocate user data among geographical distributed datacenters. In SD3,a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a userā€™s diļ¬€erent types of data (e.g., status update, friend post) for replication, making sure that a replica always reduces inter-datacenter communication. Secondly, we analyze a BitTorrent ļ¬le sharing trace, which proves the necessity of proximity-and interest-aware clustering. Based on the trace study and OSN properties, to address the second problem, we propose a SoCial Network integrated P2P ļ¬le sharing system for enhanced Eļ¬ƒciency and Trustworthiness (SOCNET) to fully and cooperatively leverage the common-interest, geographically-close and trust properties of OSN friends. SOCNET uses a hierarchical distributed hash table (DHT) to cluster common-interest nodes, and then further clusters geographically close nodes into a subcluster, and connects the nodes in a subcluster with social links. Thus, when queries travel along trustable social links, they also gain higher probability of being successfully resolved by proximity-close nodes, simultaneously enhancing eļ¬ƒciency and trustworthiness. Thirdly, to handle the third problem, we model the cost minimization problem under the SLA constraints using integer programming. According to the system model, we propose an Eco-nomical and SLA-guaranteed cloud Storage Service (ES3), which ļ¬nds a data allocation and resource reservation schedule with cost minimization and SLA guarantee. ES3 incorporates (1) a data al-location and reservation algorithm, which allocates each data item to a datacenter and determines the reservation amount on datacenters by leveraging all the pricing policies; (2) a genetic algorithm based data allocation adjustment approach, which makes data Get/Put rates stable in each data-center to maximize the reservation beneļ¬t; and (3) a dynamic request redirection algorithm, which dynamically redirects a data request from an over-utilized datacenter to an under-utilized datacenter with suļ¬ƒcient reserved resource when the request rate varies greatly to further reduce the payment. Finally, we conducted trace driven experiments on a distributed testbed, PlanetLab, and real commercial cloud storage (Amazon S3, Windows Azure Storage and Google Cloud Storage) to demonstrate the eļ¬ƒciency and eļ¬€ectiveness of our proposed systems in comparison with other systems. The results show that our systems outperform others in the network savings and data distribution eļ¬ƒciency

    Signaling and Reciprocity:Robust Decentralized Information Flows in Social, Communication, and Computer Networks

    Get PDF
    Complex networks exist for a number of purposes. The neural, metabolic and food networks ensure our survival, while the social, economic, transportation and communication networks allow us to prosper. Independently of the purposes and particularities of the physical embodiment of the networks, one of their fundamental functions is the delivery of information from one part of the network to another. Gossip and diseases diffuse in the social networks, electrochemical signals propagate in the neural networks and data packets travel in the Internet. Engineering networks for robust information flows is a challenging task. First, the mechanism through which the network forms and changes its topology needs to be defined. Second, within a given topology, the information must be routed to the appropriate recipients. Third, both the network formation and the routing mechanisms need to be robust against a wide spectrum of failures and adversaries. Fourth, the network formation, routing and failure recovery must operate under the resource constraints, either intrinsic or extrinsic to the network. Finally, the autonomously operating parts of the network must be incentivized to contribute their resources to facilitate the information flows. This thesis tackles the above challenges within the context of several types of networks: 1) peer-to-peer overlays ā€“ computers interconnected over the Internet to form an overlay in which participants provide various services to one another, 2) mobile ad-hoc networks ā€“ mobile nodes distributed in physical space communicating wirelessly with the goal of delivering data from one part of the network to another, 3) file-sharing networks ā€“ networks whose participants interconnect over the Internet to exchange files, 4) social networks ā€“ humans disseminating and consuming information through the network of social relationships. The thesis makes several contributions. Firstly, we propose a general algorithm, which given a set of nodes embedded in an arbitrary metric space, interconnects them into a network that efficiently routes information. We apply the algorithm to the peer-to-peer overlays and experimentally demonstrate its high performance, scalability as well as resilience to continuous peer arrivals and departures. We then shift our focus to the problem of the reliability of routing in the peer-to-peer overlays. Each overlay peer has limited resources and when they are exhausted this ultimately leads to delayed or lost overlay messages. All the solutions addressing this problem rely on message redundancy, which significantly increases the resource costs of fault-tolerance. We propose a bandwidth-efficient single-path Forward Feedback Protocol (FFP) for overlay message routing in which successfully delivered messages are followed by a feedback signal to reinforce the routing paths. Internet testbed evaluation shows that FFP uses 2-5 times less network bandwidth than the existing protocols relying on message redundancy, while achieving comparable fault-tolerance levels under a variety of failure scenarios. While the Forward Feedback Protocol is robust to message loss and delays, it is vulnerable to malicious message injection. We address this and other security problems by proposing Castor, a variant of FFP for mobile ad-hoc networks (MANETs). In Castor, we use the same general mechanism as in FFP; each time a message is routed, the routing path is either enforced or weakened by the feedback signal depending on whether the routing succeeded or not. However, unlike FFP, Castor employs cryptographic mechanisms for ensuring the integrity and authenticity of the messages. We compare Castor to four other MANET routing protocols. Despite Castor's simplicity, it achieves up to 40% higher packet delivery rates than the other protocols and recovers at least twice as fast as the other protocols in a wide range of attacks and failure scenarios. Both of our protocols, FFP and Castor, rely on simple signaling to improve the routing robustness in peer-to-peer and mobile ad-hoc networks. Given the success of the signaling mechanism in shaping the information flows in these two types of networks, we examine if signaling plays a similar crucial role in the on-line social networks. We characterize the propagation of URLs in the social network of Twitter. The data analysis uncovers several statistical regularities in the user activity, the social graph, the structure of the URL cascades as well as the communication and signaling dynamics. Based on these results, we propose a propagation model that accurately predicts which users are likely to mention which URLs. We outline a number of applications where the social network information flow modelling would be crucial: content ranking and filtering, viral marketing and spam detection. Finally, we consider the problem of freeriding in peer-to-peer file-sharing applications, when users can download data from others, but never reciprocate by uploading. To address the problem, we propose a variant of the BitTorrent system in which two peers are only allowed to connect if their owners know one another in the real world. When the users know which other users their BitTorrent client connects to, they are more likely to cooperate. The social network becomes the content distribution network and the freeriding problem is solved by leveraging the social norms and reciprocity to stabilize cooperation rather than relying on technological means. Our extensive simulation shows that the social network topology is an efficient and scalable content distribution medium, while at the same time provides robustness to freeriding
    corecore