117 research outputs found
Optimising Structured P2P Networks for Complex Queries
With network enabled consumer devices becoming increasingly popular, the number of connected devices and available services is growing considerably - with the number of connected devices es- timated to surpass 15 billion devices by 2015. In this increasingly large and dynamic environment it is important that users have a comprehensive, yet efficient, mechanism to discover services.
Many existing wide-area service discovery mechanisms are centralised and do not scale to large numbers of users. Additionally, centralised services suffer from issues such as a single point of failure, high maintenance costs, and difficulty of management. As such, this Thesis seeks a Peer to Peer (P2P) approach.
Distributed Hash Tables (DHTs) are well known for their high scalability, financially low barrier of entry, and ability to self manage. They can be used to provide not just a platform on which peers can offer and consume services, but also as a means for users to discover such services.
Traditionally DHTs provide a distributed key-value store, with no search functionality. In recent years many P2P systems have been proposed providing support for a sub-set of complex query types, such as keyword search, range queries, and semantic search.
This Thesis presents a novel algorithm for performing any type of complex query, from keyword search, to complex regular expressions, to full-text search, over any structured P2P overlay. This is achieved by efficiently broadcasting the search query, allowing each peer to process the query locally, and then efficiently routing responses back to the originating peer. Through experimentation, this technique is shown to be successful when the network is stable, however performance degrades under high levels of network churn.
To address the issue of network churn, this Thesis proposes a number of enhancements which can be made to existing P2P overlays in order to improve the performance of both the existing DHT and the proposed algorithm. Through two case studies these enhancements are shown to improve not only the performance of the proposed algorithm under churn, but also the performance of traditional lookup operations in these networks
Efficient service discovery in wide area networks
Living in an increasingly networked world, with an abundant number
of services available to consumers, the consumer electronics market
is enjoying a boom. The average consumer in the developed world may
own several networked devices such as games consoles, mobile phones,
PDAs, laptops and desktops, wireless picture frames and printers to
name but a few. With this growing number of networked devices comes
a growing demand for services, defined here as functions requested
by a client and provided by a networked node. For example, a client
may wish to download and share music or pictures, find and use
printer services, or lookup information (e.g. train times, cinema
bookings).
It is notable that a significant proportion of networked devices are
now mobile. Mobile devices introduce a new dynamic to the service
discovery problem, such as lower battery and processing power and
more expensive bandwidth. Device owners expect to access services
not only in their immediate proximity, but further afield (e.g. in
their homes and offices). Solving these problems is the focus of
this research.
This Thesis offers two alternative approaches to service discovery
in Wide Area Networks (WANs). Firstly, a unique combination of the
Session Initiation Protocol (SIP) and the OSGi middleware technology
is presented to provide both mobility and service discovery
capability in WANs. Through experimentation, this technique is shown
to be successful where the number of operating domains is small, but
it does not scale well.
To address the issue of scalability, this Thesis proposes the use of
Peer-to-Peer (P2P) service overlays as a medium for service
discovery in WANs. To confirm that P2P overlays can in fact support
service discovery, a technique to utilise the Distributed Hash Table
(DHT) functionality of distributed systems is used to store and
retrieve service advertisements. Through simulation, this is shown
to be both a scalable and a flexible service discovery technique.
However, the problems associated with P2P networks with respect to
efficiency are well documented.
In a novel approach to reduce messaging costs in P2P networks,
multi-destination multicast is used. Two well known P2P overlays are
extended using the Explicit Multi-Unicast (XCAST) protocol. The
resulting analysis of this extension provides a strong argument for
multiple P2P maintenance algorithms co-existing in a single P2P
overlay to provide adaptable performance. A novel multi-tier P2P
overlay system is presented, which is tailored for service rich
mobile devices and which provides an efficient platform for service
discovery
éć±€ćăăąă»ăăŒă»ăăąăăĄă€ă«æ€çŽąăźăăăźèČ è·çźĄçăźç 究
In a Peer-to-Peer (P2P) system, multiple interconnected peers or nodes contribute a portion of their resources (e.g., files, disk storage, network bandwidth) in order to inexpensively handle tasks that would normally require powerful servers. Since the emergency of P2P file sharing, load balancing has been considered as a primary concern, as well as other issues such as autonomy, fault tolerance and security. In a process of file search, a heavily loaded peer may incur a long latency or failure in query forwarding or responding. If there are many such peers in a system, it may cause link congestion or path congestion, and consequently affect the performance of overall system. To avoid such situation, some of general techniques used in Web systems such as caching and paging are adopted into P2P systems. However, it is highly insufficient for load balancing since peers often exhibit high heterogeneity and dynamicity in P2P systems. To overcome such a difficulty, the use of super-peers is currently being the most promising approach in optimizing allocation of system load to peers, i.e., it allocates more system load to high capacity and stable super-peers by assigning task of index maintenance and retrieval to them.
In this thesis, we focused on two kinds of super-peer based hierarchical architectures of P2P systems, which are distinguished by the organization of super-peers. In each of them, we discussed system load allocation, and proposed novel load balancing algorithms for alleviating load imbalance of super-peers, aiming to decrease average and variation of query response time during index retrieval process.
More concretely, in this thesis, our contribution to load management solutions for hierarchical P2P file search are the following:
âą In Qinâs hierarchical architecture, indices of files held by the user peers in the bottom layer are stored at the super-peers in the middle layer, and the correlation of those two bottom layers is controlled by the central server(s) in the top layer using the notion of tags. In Qinâs system, a heavily loaded super-peer can move excessive load to a lightly loaded super-peer by using the notion of task migration. However, such a task migration approach is not sufficient to balance the load of super-peers if the size of tasks is highly imbalanced. To overcome such an issue, in this thesis, we propose two task migration schemes for this architecture, aiming to ensure an even load distribution over the super-peers. The first scheme controls the load of each task in order to decrease the total cost of task migration. The second scheme directly balances the load over tasks by reordering the priority of tags used in the query forwarding step. The effectiveness of the proposed schemes are evaluated by simulation. The result of simulations indicates that all the schemes can work in coordinate, in alleviating the bottleneck situation of super-peers.
âą In DHT-based super-peer architecture, indices of files held by the user peers in the lower layer are stored at the DHT connected super-peers in the upper layer. In DHT-based super-peer systems, the skewness of userâs preference regarding keywords contained in multi-keyword query causes query load imbalance of super-peers that combines both routing and response load. Although index replication has a great potential for alleviating this problem, existing schemes did not explicitly address it or incurred high cost. To overcome such an issue, in this thesis, we propose an integrated solution that consists of three replication schemes to alleviate query load imbalance while minimizing the cost. The first scheme is an active index replication in order to decrease routing load in the super-peer layer, and distribute response load of an index among super-peers that stored the replica. The second scheme is a proactive pointer replication that places location information of an index, for reducing maintenance cost between the index and its replicas. The third scheme is a passive index replication that guarantees the maximum query load of super-peers. The result of simulations indicates that the proposed schemes can help alleviating the query load imbalance of super-peers. Moreover, by comparison it was found that our schemes are more cost-effective on placing replicas than other approaches.ćș泶性ćŠ(Hiroshima University)ć棫(ć·„ćŠ)Doctor of Engineering in Information Engineeringdoctora
Distributed aop middleware for large-scale scenarios
En aquesta tesi doctoral presentem una proposta de middleware distribuĂŻt pel desenvolupament d'aplicacions de gran escala. La nostra motivaciĂł principal Ă©s permetre que les responsabilitats distribuĂŻdes d'aquestes aplicacions, com per exemple la replicaciĂł, puguin integrar-se de forma transparent i independent. El nostre enfoc es basa en la implementaciĂł d'aquestes responsabilitats mitjançant el paradigma d'aspectes distribuĂŻts i es beneficia dels substrats de les xarxes peer-to-peer (P2P) i de la programaciĂł orientada a aspectes (AOP) per realitzar-ho de forma descentralitzada, desacoblada, eficient i transparent. La nostra arquitectura middleware es divideix en dues capes: un model de composiciĂł i una plataforma escalable de desplegament d'aspectes distribuĂŻts. Per Ășltim, es demostra la viabilitat i aplicabilitat del nostre model mitjançant la implementaciĂł i experimentaciĂł de prototipus en xarxes de gran escala reals.In this PhD dissertation we present a distributed middleware proposal for large-scale application development. Our main aim is to separate the distributed concerns of these applications, like replication, which can be integrated independently and transparently. Our approach is based on the implementation of these concerns using the paradigm of distributed aspects. In addition, our proposal benefits from the peer-to-peer (P2P) networks and aspect-oriented programming (AOP) substrates to provide these concerns in a decentralized, decoupled, efficient, and transparent way. Our middleware architecture is divided into two layers: a composition model and a scalable deployment platform for distributed aspects. Finally, we demonstrate the viability and applicability of our model via implementation and experimentation of prototypes in real large-scale networks
System support for keyword-based search in structured Peer-to-Peer systems
In this dissertation, we present protocols for building a distributed search infrastructure over structured Peer-to-Peer systems. Unlike existing search engines which consist of large server farms managed by a centralized authority, our approach makes use of a distributed set of end-hosts built out of commodity hardware. These end-hosts cooperatively construct and maintain the search infrastructure.
The main challenges with distributing such a system include node failures, churn, and data migration. Localities inherent in query patterns also cause load imbalances and hot spots that severely impair performance. Users of search systems want their results returned quickly, and in ranked order. Our main contribution is to show that a scalable, robust, and distributed search infrastructure can be built over existing Peer-to-Peer systems through the use of techniques that address these problems. We present a decentralized scheme for ranking search results without prohibitive network or storage overhead. We show that caching allows for efficient query evaluation and present a distributed data structure, called the View Tree, that enables efficient storage, and retrieval of cached results. We also present a lightweight adaptive replication protocol, called LAR that can adapt to different kinds of query streams and is extremely effective at eliminating hotspots. Finally, we present techniques for storing indexes reliably. Our approach is to use an adaptive partitioning protocol to store large indexes and employ efficient redundancy techniques to handle failures. Through detailed analysis and experiments we show that our techniques are efficient and scalable, and that they make distributed search feasible
A framework for the dynamic management of Peer-to-Peer overlays
Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation.
At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an âinsiderâsâ view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively.
To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed.
Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance
M-Grid : A distributed framework for multidimensional indexing and querying of location based big data
The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes.
In this thesis we present MGrid, a unifying indexing framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a P-Grid overlay network which provides fault-tolerance and efficient query processing. We use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multi-dimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Experiments on Amazon EC2 show that MGrid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MDHBase scheme --Abstract, pages iii-iv
Efficient Peer-to-Peer Namespace Searches
In this paper we describe new methods for efficient and exact search
(keyword and full-text) in distributed namespaces. Our methods can be
used in conjunction with existing distributed lookup schemes, such as
Distributed Hash Tables, and distributed directories. We describe how
indexes for implementing distributed searches can be efficiently
created, located, and stored. We describe techniques for creating
approximate indexes that can be used to bound the space requirement at
individual hosts; such techniques are particularly useful for full-text
searches that may require a very large number of individual indexes to
be created and maintained.
Our methods use a new distributed data structure called the view tree.
View trees can be used to efficiently cache and locate results from
prior queries. We describe how view trees are created, and maintained.
We present experimental results, using large namespaces and realistic
data, showing that the techniques introduced in this paper can reduce
search overheads (both network and processing costs) by more than an
order of magnitude.
(UMIACS-TR-2004-13
Dynamic data placement and discovery in wide-area networks
The workloads of online services and applications such as social networks, sensor data platforms and web search engines have become increasingly global and dynamic, setting new challenges to providing users with low latency access to data. To achieve this, these services typically leverage a multi-site wide-area networked infrastructure. Data access latency in such an infrastructure depends on the network paths between users and data, which is determined by the data placement and discovery strategies. Current strategies are static, which offer low latencies upon deployment but worse performance under a dynamic workload.
We propose dynamic data placement and discovery strategies for wide-area networked infrastructures, which adapt to the data access workload. We achieve this with data activity correlation (DAC), an application-agnostic approach for determining the correlations between data items based on access pattern similarities. By dynamically clustering data according to DAC, network traffic in clusters is kept local. We utilise DAC as a key component in reducing access latencies for two application scenarios, emphasising different aspects of the problem:
The first scenario assumes the fixed placement of data at sites, and thus focusses on data discovery. This is the case for a global sensor discovery platform, which aims to provide low latency discovery of sensor metadata. We present a self-organising hierarchical infrastructure consisting of multiple DAC clusters, maintained with an online and distributed split-and-merge algorithm. This reduces the number of sites visited, and thus latency, during discovery for a variety of workloads.
The second scenario focusses on data placement. This is the case for global online services that leverage a multi-data centre deployment to provide users with low latency access to data. We present a geo-dynamic partitioning middleware, which maintains DAC clusters with an online elastic partition algorithm. It supports the geo-aware placement of partitions across data centres according to the workload. This provides globally distributed users with low latency access to data for static and dynamic workloads.Open Acces
Clouder : a flexible large scale decentralized object store
Programa Doutoral em InformĂĄtica MAP-iLarge scale data stores have been initially introduced to support a few concrete extreme
scale applications such as social networks. Their scalability and availability
requirements often outweigh sacrificing richer data and processing models, and even
elementary data consistency. In strong contrast with traditional relational databases
(RDBMS), large scale data stores present very simple data models and APIs, lacking
most of the established relational data management operations; and relax consistency
guarantees, providing eventual consistency.
With a number of alternatives now available and mature, there is an increasing
willingness to use them in a wider and more diverse spectrum of applications, by
skewing the current trade-off towards the needs of common business users, and easing
the migration from current RDBMS. This is particularly so when used in the context
of a Cloud solution such as in a Platform as a Service (PaaS).
This thesis aims at reducing the gap between traditional RDBMS and large scale
data stores, by seeking mechanisms to provide additional consistency guarantees and
higher level data processing primitives in large scale data stores. The devised mechanisms
should not hinder the scalability and dependability of large scale data stores.
Regarding, higher level data processing primitives this thesis explores two complementary
approaches: by extending data stores with additional operations such as general
multi-item operations; and by coupling data stores with other existent processing
facilities without hindering scalability.
We address this challenges with a new architecture for large scale data stores, efficient
multi item access for large scale data stores, and SQL processing atop large scale
data stores. The novel architecture allows to find the right trade-offs among flexible
usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data storeâs data models with tags and a multi-tuple data
placement strategy, that allow to efficiently store and retrieve large sets of related data
at once. For efficient SQL support atop scalable data stores we devise design modifications
to existing relational SQL query engines, allowing them to be distributed.
We demonstrate our approaches with running prototypes and extensive experimental
evaluation using proper workloads.Os sistemas de armazenamento de dados de grande escala foram inicialmente desenvolvidos
para suportar um leque restrito de aplicacÔes de escala extrema, como as
redes sociais. Os requisitos de escalabilidade e elevada disponibilidade levaram a
sacrificar modelos de dados e processamento enriquecidos e atĂ© a coerĂȘncia dos dados.
Em oposição aos tradicionais sistemas relacionais de gestão de bases de dados
(SRGBD), os sistemas de armazenamento de dados de grande escala apresentam modelos
de dados e APIs muito simples. Em particular, evidenciasse a ausĂȘncia de muitas
das conhecidas operacÔes de gestão de dados relacionais e o relaxamento das garantias
de coerĂȘncia, fornecendo coerĂȘncia futura.
Atualmente, com o nĂșmero de alternativas disponĂveis e maduras, existe o crescente
interesse em uså-los num maior e diverso leque de aplicacÔes, orientando o atual
compromisso para as necessidades dos tĂpicos clientes empresariais e facilitando a
migração a partir das atuais SRGBD. Isto é particularmente importante no contexto de
soluçÔes cloud como plataformas como um servicžo (PaaS).
Esta tese tem como objetivo reduzir a diferencça entre os tradicionais SRGDBs e os
sistemas de armazenamento de dados de grande escala, procurando mecanismos que
providenciem garantias de coerĂȘncia mais fortes e primitivas com maior capacidade de
processamento. Os mecanismos desenvolvidos nĂŁo devem comprometer a escalabilidade
e fiabilidade dos sistemas de armazenamento de dados de grande escala. No que
diz respeito Ă s primitivas com maior capacidade de processamento esta tese explora
duas abordagens complementares : a extensĂŁo de sistemas de armazenamento de dados
de grande escala com operacÔes genéricas de multi objeto e a junção dos sistemas de armazenamento de dados de grande escala com mecanismos existentes de processamento
e interrogacž Ëao de dados, sem colocar em causa a escalabilidade dos mesmos.
Para isso apresentÂŽamos uma nova arquitetura para os sistemas de armazenamento
de dados de grande escala, acesso eficiente a mÂŽultiplos objetos, e processamento de
SQL sobre sistemas de armazenamento de dados de grande escala. A nova arquitetura
permite encontrar os compromissos adequados entre flexibilidade, eficiËencia e
tolerËancia a faltas. De forma a suportar de forma eficiente o acesso a mÂŽultiplos objetos
estendemos o modelo de dados de sistemas de armazenamento de dados de grande escala
da primeira geracž Ëao com palavras-chave e definimos uma estratÂŽegia de colocacž Ëao
de dados para mÂŽultiplos objetos que permite de forma eficiente armazenar e obter
grandes quantidades de dados de uma sÂŽo vez. Para o suporte eficiente de SQL sobre
sistemas de armazenamento de dados de grande escala, analisĂĄmos a arquitetura dos
motores de interrogação de SRGBDs e fizemos alteraçÔes que permitem que sejam
distribuĂdos.
As abordagens propostas são demonstradas através de protótipos e uma avaliacão
experimental exaustiva recorrendo a cargas adequadas baseadas em aplicaçÔes reais
- âŠ