117 research outputs found

    Optimising Structured P2P Networks for Complex Queries

    Get PDF
    With network enabled consumer devices becoming increasingly popular, the number of connected devices and available services is growing considerably - with the number of connected devices es- timated to surpass 15 billion devices by 2015. In this increasingly large and dynamic environment it is important that users have a comprehensive, yet efficient, mechanism to discover services. Many existing wide-area service discovery mechanisms are centralised and do not scale to large numbers of users. Additionally, centralised services suffer from issues such as a single point of failure, high maintenance costs, and difficulty of management. As such, this Thesis seeks a Peer to Peer (P2P) approach. Distributed Hash Tables (DHTs) are well known for their high scalability, financially low barrier of entry, and ability to self manage. They can be used to provide not just a platform on which peers can offer and consume services, but also as a means for users to discover such services. Traditionally DHTs provide a distributed key-value store, with no search functionality. In recent years many P2P systems have been proposed providing support for a sub-set of complex query types, such as keyword search, range queries, and semantic search. This Thesis presents a novel algorithm for performing any type of complex query, from keyword search, to complex regular expressions, to full-text search, over any structured P2P overlay. This is achieved by efficiently broadcasting the search query, allowing each peer to process the query locally, and then efficiently routing responses back to the originating peer. Through experimentation, this technique is shown to be successful when the network is stable, however performance degrades under high levels of network churn. To address the issue of network churn, this Thesis proposes a number of enhancements which can be made to existing P2P overlays in order to improve the performance of both the existing DHT and the proposed algorithm. Through two case studies these enhancements are shown to improve not only the performance of the proposed algorithm under churn, but also the performance of traditional lookup operations in these networks

    Efficient service discovery in wide area networks

    Get PDF
    Living in an increasingly networked world, with an abundant number of services available to consumers, the consumer electronics market is enjoying a boom. The average consumer in the developed world may own several networked devices such as games consoles, mobile phones, PDAs, laptops and desktops, wireless picture frames and printers to name but a few. With this growing number of networked devices comes a growing demand for services, defined here as functions requested by a client and provided by a networked node. For example, a client may wish to download and share music or pictures, find and use printer services, or lookup information (e.g. train times, cinema bookings). It is notable that a significant proportion of networked devices are now mobile. Mobile devices introduce a new dynamic to the service discovery problem, such as lower battery and processing power and more expensive bandwidth. Device owners expect to access services not only in their immediate proximity, but further afield (e.g. in their homes and offices). Solving these problems is the focus of this research. This Thesis offers two alternative approaches to service discovery in Wide Area Networks (WANs). Firstly, a unique combination of the Session Initiation Protocol (SIP) and the OSGi middleware technology is presented to provide both mobility and service discovery capability in WANs. Through experimentation, this technique is shown to be successful where the number of operating domains is small, but it does not scale well. To address the issue of scalability, this Thesis proposes the use of Peer-to-Peer (P2P) service overlays as a medium for service discovery in WANs. To confirm that P2P overlays can in fact support service discovery, a technique to utilise the Distributed Hash Table (DHT) functionality of distributed systems is used to store and retrieve service advertisements. Through simulation, this is shown to be both a scalable and a flexible service discovery technique. However, the problems associated with P2P networks with respect to efficiency are well documented. In a novel approach to reduce messaging costs in P2P networks, multi-destination multicast is used. Two well known P2P overlays are extended using the Explicit Multi-Unicast (XCAST) protocol. The resulting analysis of this extension provides a strong argument for multiple P2P maintenance algorithms co-existing in a single P2P overlay to provide adaptable performance. A novel multi-tier P2P overlay system is presented, which is tailored for service rich mobile devices and which provides an efficient platform for service discovery

    éšŽć±€ćž‹ăƒ”ă‚ąăƒ»ăƒ„ăƒŒăƒ»ăƒ”ă‚ąăƒ•ă‚Ąă‚€ăƒ«æ€œçŽąăźăŸă‚ăźèČ è·çźĄç†ăźç ”ç©¶

    Get PDF
    In a Peer-to-Peer (P2P) system, multiple interconnected peers or nodes contribute a portion of their resources (e.g., files, disk storage, network bandwidth) in order to inexpensively handle tasks that would normally require powerful servers. Since the emergency of P2P file sharing, load balancing has been considered as a primary concern, as well as other issues such as autonomy, fault tolerance and security. In a process of file search, a heavily loaded peer may incur a long latency or failure in query forwarding or responding. If there are many such peers in a system, it may cause link congestion or path congestion, and consequently affect the performance of overall system. To avoid such situation, some of general techniques used in Web systems such as caching and paging are adopted into P2P systems. However, it is highly insufficient for load balancing since peers often exhibit high heterogeneity and dynamicity in P2P systems. To overcome such a difficulty, the use of super-peers is currently being the most promising approach in optimizing allocation of system load to peers, i.e., it allocates more system load to high capacity and stable super-peers by assigning task of index maintenance and retrieval to them. In this thesis, we focused on two kinds of super-peer based hierarchical architectures of P2P systems, which are distinguished by the organization of super-peers. In each of them, we discussed system load allocation, and proposed novel load balancing algorithms for alleviating load imbalance of super-peers, aiming to decrease average and variation of query response time during index retrieval process. More concretely, in this thesis, our contribution to load management solutions for hierarchical P2P file search are the following: ‱ In Qin’s hierarchical architecture, indices of files held by the user peers in the bottom layer are stored at the super-peers in the middle layer, and the correlation of those two bottom layers is controlled by the central server(s) in the top layer using the notion of tags. In Qin’s system, a heavily loaded super-peer can move excessive load to a lightly loaded super-peer by using the notion of task migration. However, such a task migration approach is not sufficient to balance the load of super-peers if the size of tasks is highly imbalanced. To overcome such an issue, in this thesis, we propose two task migration schemes for this architecture, aiming to ensure an even load distribution over the super-peers. The first scheme controls the load of each task in order to decrease the total cost of task migration. The second scheme directly balances the load over tasks by reordering the priority of tags used in the query forwarding step. The effectiveness of the proposed schemes are evaluated by simulation. The result of simulations indicates that all the schemes can work in coordinate, in alleviating the bottleneck situation of super-peers. ‱ In DHT-based super-peer architecture, indices of files held by the user peers in the lower layer are stored at the DHT connected super-peers in the upper layer. In DHT-based super-peer systems, the skewness of user’s preference regarding keywords contained in multi-keyword query causes query load imbalance of super-peers that combines both routing and response load. Although index replication has a great potential for alleviating this problem, existing schemes did not explicitly address it or incurred high cost. To overcome such an issue, in this thesis, we propose an integrated solution that consists of three replication schemes to alleviate query load imbalance while minimizing the cost. The first scheme is an active index replication in order to decrease routing load in the super-peer layer, and distribute response load of an index among super-peers that stored the replica. The second scheme is a proactive pointer replication that places location information of an index, for reducing maintenance cost between the index and its replicas. The third scheme is a passive index replication that guarantees the maximum query load of super-peers. The result of simulations indicates that the proposed schemes can help alleviating the query load imbalance of super-peers. Moreover, by comparison it was found that our schemes are more cost-effective on placing replicas than other approaches.ćșƒćł¶ć€§ć­Š(Hiroshima University)ćšćŁ«(ć·„ć­Š)Doctor of Engineering in Information Engineeringdoctora

    Distributed aop middleware for large-scale scenarios

    Get PDF
    En aquesta tesi doctoral presentem una proposta de middleware distribuĂŻt pel desenvolupament d'aplicacions de gran escala. La nostra motivaciĂł principal Ă©s permetre que les responsabilitats distribuĂŻdes d'aquestes aplicacions, com per exemple la replicaciĂł, puguin integrar-se de forma transparent i independent. El nostre enfoc es basa en la implementaciĂł d'aquestes responsabilitats mitjançant el paradigma d'aspectes distribuĂŻts i es beneficia dels substrats de les xarxes peer-to-peer (P2P) i de la programaciĂł orientada a aspectes (AOP) per realitzar-ho de forma descentralitzada, desacoblada, eficient i transparent. La nostra arquitectura middleware es divideix en dues capes: un model de composiciĂł i una plataforma escalable de desplegament d'aspectes distribuĂŻts. Per Ășltim, es demostra la viabilitat i aplicabilitat del nostre model mitjançant la implementaciĂł i experimentaciĂł de prototipus en xarxes de gran escala reals.In this PhD dissertation we present a distributed middleware proposal for large-scale application development. Our main aim is to separate the distributed concerns of these applications, like replication, which can be integrated independently and transparently. Our approach is based on the implementation of these concerns using the paradigm of distributed aspects. In addition, our proposal benefits from the peer-to-peer (P2P) networks and aspect-oriented programming (AOP) substrates to provide these concerns in a decentralized, decoupled, efficient, and transparent way. Our middleware architecture is divided into two layers: a composition model and a scalable deployment platform for distributed aspects. Finally, we demonstrate the viability and applicability of our model via implementation and experimentation of prototypes in real large-scale networks

    System support for keyword-based search in structured Peer-to-Peer systems

    Get PDF
    In this dissertation, we present protocols for building a distributed search infrastructure over structured Peer-to-Peer systems. Unlike existing search engines which consist of large server farms managed by a centralized authority, our approach makes use of a distributed set of end-hosts built out of commodity hardware. These end-hosts cooperatively construct and maintain the search infrastructure. The main challenges with distributing such a system include node failures, churn, and data migration. Localities inherent in query patterns also cause load imbalances and hot spots that severely impair performance. Users of search systems want their results returned quickly, and in ranked order. Our main contribution is to show that a scalable, robust, and distributed search infrastructure can be built over existing Peer-to-Peer systems through the use of techniques that address these problems. We present a decentralized scheme for ranking search results without prohibitive network or storage overhead. We show that caching allows for efficient query evaluation and present a distributed data structure, called the View Tree, that enables efficient storage, and retrieval of cached results. We also present a lightweight adaptive replication protocol, called LAR that can adapt to different kinds of query streams and is extremely effective at eliminating hotspots. Finally, we present techniques for storing indexes reliably. Our approach is to use an adaptive partitioning protocol to store large indexes and employ efficient redundancy techniques to handle failures. Through detailed analysis and experiments we show that our techniques are efficient and scalable, and that they make distributed search feasible

    A framework for the dynamic management of Peer-to-Peer overlays

    Get PDF
    Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation. At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an “insider‟s” view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively. To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed. Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance

    M-Grid : A distributed framework for multidimensional indexing and querying of location based big data

    Get PDF
    The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes. In this thesis we present MGrid, a unifying indexing framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a P-Grid overlay network which provides fault-tolerance and efficient query processing. We use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multi-dimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Experiments on Amazon EC2 show that MGrid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MDHBase scheme --Abstract, pages iii-iv

    Efficient Peer-to-Peer Namespace Searches

    Get PDF
    In this paper we describe new methods for efficient and exact search (keyword and full-text) in distributed namespaces. Our methods can be used in conjunction with existing distributed lookup schemes, such as Distributed Hash Tables, and distributed directories. We describe how indexes for implementing distributed searches can be efficiently created, located, and stored. We describe techniques for creating approximate indexes that can be used to bound the space requirement at individual hosts; such techniques are particularly useful for full-text searches that may require a very large number of individual indexes to be created and maintained. Our methods use a new distributed data structure called the view tree. View trees can be used to efficiently cache and locate results from prior queries. We describe how view trees are created, and maintained. We present experimental results, using large namespaces and realistic data, showing that the techniques introduced in this paper can reduce search overheads (both network and processing costs) by more than an order of magnitude. (UMIACS-TR-2004-13

    Dynamic data placement and discovery in wide-area networks

    Get PDF
    The workloads of online services and applications such as social networks, sensor data platforms and web search engines have become increasingly global and dynamic, setting new challenges to providing users with low latency access to data. To achieve this, these services typically leverage a multi-site wide-area networked infrastructure. Data access latency in such an infrastructure depends on the network paths between users and data, which is determined by the data placement and discovery strategies. Current strategies are static, which offer low latencies upon deployment but worse performance under a dynamic workload. We propose dynamic data placement and discovery strategies for wide-area networked infrastructures, which adapt to the data access workload. We achieve this with data activity correlation (DAC), an application-agnostic approach for determining the correlations between data items based on access pattern similarities. By dynamically clustering data according to DAC, network traffic in clusters is kept local. We utilise DAC as a key component in reducing access latencies for two application scenarios, emphasising different aspects of the problem: The first scenario assumes the fixed placement of data at sites, and thus focusses on data discovery. This is the case for a global sensor discovery platform, which aims to provide low latency discovery of sensor metadata. We present a self-organising hierarchical infrastructure consisting of multiple DAC clusters, maintained with an online and distributed split-and-merge algorithm. This reduces the number of sites visited, and thus latency, during discovery for a variety of workloads. The second scenario focusses on data placement. This is the case for global online services that leverage a multi-data centre deployment to provide users with low latency access to data. We present a geo-dynamic partitioning middleware, which maintains DAC clusters with an online elastic partition algorithm. It supports the geo-aware placement of partitions across data centres according to the workload. This provides globally distributed users with low latency access to data for static and dynamic workloads.Open Acces

    Clouder : a flexible large scale decentralized object store

    Get PDF
    Programa Doutoral em InformĂĄtica MAP-iLarge scale data stores have been initially introduced to support a few concrete extreme scale applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. In strong contrast with traditional relational databases (RDBMS), large scale data stores present very simple data models and APIs, lacking most of the established relational data management operations; and relax consistency guarantees, providing eventual consistency. With a number of alternatives now available and mature, there is an increasing willingness to use them in a wider and more diverse spectrum of applications, by skewing the current trade-off towards the needs of common business users, and easing the migration from current RDBMS. This is particularly so when used in the context of a Cloud solution such as in a Platform as a Service (PaaS). This thesis aims at reducing the gap between traditional RDBMS and large scale data stores, by seeking mechanisms to provide additional consistency guarantees and higher level data processing primitives in large scale data stores. The devised mechanisms should not hinder the scalability and dependability of large scale data stores. Regarding, higher level data processing primitives this thesis explores two complementary approaches: by extending data stores with additional operations such as general multi-item operations; and by coupling data stores with other existent processing facilities without hindering scalability. We address this challenges with a new architecture for large scale data stores, efficient multi item access for large scale data stores, and SQL processing atop large scale data stores. The novel architecture allows to find the right trade-offs among flexible usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data placement strategy, that allow to efficiently store and retrieve large sets of related data at once. For efficient SQL support atop scalable data stores we devise design modifications to existing relational SQL query engines, allowing them to be distributed. We demonstrate our approaches with running prototypes and extensive experimental evaluation using proper workloads.Os sistemas de armazenamento de dados de grande escala foram inicialmente desenvolvidos para suportar um leque restrito de aplicacĂ”es de escala extrema, como as redes sociais. Os requisitos de escalabilidade e elevada disponibilidade levaram a sacrificar modelos de dados e processamento enriquecidos e atĂ© a coerĂȘncia dos dados. Em oposição aos tradicionais sistemas relacionais de gestĂŁo de bases de dados (SRGBD), os sistemas de armazenamento de dados de grande escala apresentam modelos de dados e APIs muito simples. Em particular, evidenciasse a ausĂȘncia de muitas das conhecidas operacĂ”es de gestĂŁo de dados relacionais e o relaxamento das garantias de coerĂȘncia, fornecendo coerĂȘncia futura. Atualmente, com o nĂșmero de alternativas disponĂ­veis e maduras, existe o crescente interesse em usĂĄ-los num maior e diverso leque de aplicacĂ”es, orientando o atual compromisso para as necessidades dos tĂ­picos clientes empresariais e facilitando a migração a partir das atuais SRGBD. Isto Ă© particularmente importante no contexto de soluçÔes cloud como plataformas como um servicžo (PaaS). Esta tese tem como objetivo reduzir a diferencça entre os tradicionais SRGDBs e os sistemas de armazenamento de dados de grande escala, procurando mecanismos que providenciem garantias de coerĂȘncia mais fortes e primitivas com maior capacidade de processamento. Os mecanismos desenvolvidos nĂŁo devem comprometer a escalabilidade e fiabilidade dos sistemas de armazenamento de dados de grande escala. No que diz respeito Ă s primitivas com maior capacidade de processamento esta tese explora duas abordagens complementares : a extensĂŁo de sistemas de armazenamento de dados de grande escala com operacĂ”es genĂ©ricas de multi objeto e a junção dos sistemas de armazenamento de dados de grande escala com mecanismos existentes de processamento e interrogacž ˜ao de dados, sem colocar em causa a escalabilidade dos mesmos. Para isso apresentÂŽamos uma nova arquitetura para os sistemas de armazenamento de dados de grande escala, acesso eficiente a mÂŽultiplos objetos, e processamento de SQL sobre sistemas de armazenamento de dados de grande escala. A nova arquitetura permite encontrar os compromissos adequados entre flexibilidade, eficiˆencia e tolerˆancia a faltas. De forma a suportar de forma eficiente o acesso a mÂŽultiplos objetos estendemos o modelo de dados de sistemas de armazenamento de dados de grande escala da primeira geracž ˜ao com palavras-chave e definimos uma estratÂŽegia de colocacž ˜ao de dados para mÂŽultiplos objetos que permite de forma eficiente armazenar e obter grandes quantidades de dados de uma sÂŽo vez. Para o suporte eficiente de SQL sobre sistemas de armazenamento de dados de grande escala, analisĂĄmos a arquitetura dos motores de interrogação de SRGBDs e fizemos alteraçÔes que permitem que sejam distribuĂ­dos. As abordagens propostas sĂŁo demonstradas atravĂ©s de protĂłtipos e uma avaliacĂŁo experimental exaustiva recorrendo a cargas adequadas baseadas em aplicaçÔes reais
    • 

    corecore