3,383 research outputs found
Ontology-based Search Algorithms over Large-Scale Unstructured Peer-to-Peer Networks
Peer-to-Peer(P2P) systems have emerged as a promising paradigm to structure large scale distributed systems. They provide a robust, scalable and decentralized way to share and publish data.The unstructured P2P systems have gained much popularity in recent years for their wide applicability and simplicity. However efficient resource discovery remains a fundamental challenge for unstructured P2P networks due to the lack of a network structure. To effectively harness the power of unstructured P2P systems, the challenges in distributed knowledge management and information search need to be overcome. Current attempts to solve the problems pertaining to knowledge management and search have focused on simple term based routing indices and keyword search queries. Many P2P resource discovery applications will require more complex query functionality, as users will publish semantically rich data and need efficiently content location algorithms that find target content at moderate cost. Therefore, effective knowledge and data management techniques and search tools for information retrieval are imperative and lasting.
In my dissertation, I present a suite of protocols that assist in efficient content location and knowledge management in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past peer interactions and increasing their performance with time.My work aims to provide effective and bandwidth-efficient searching and data sharing in unstructured P2P environments. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. Also, Existing approaches to federated search are adapted and new methods are developed for semantic knowledge representation, resource selection, and knowledge evolution for efficient search in dynamic and distributed P2P network environments. Furthermore,autonomous and decentralized algorithms that reorganizes an unstructured network topology into a one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient semantic search in dynamic environments
階層型ピア・ツー・ピアファイル検索のための負荷管理の研究
In a Peer-to-Peer (P2P) system, multiple interconnected peers or nodes contribute a portion of their resources (e.g., files, disk storage, network bandwidth) in order to inexpensively handle tasks that would normally require powerful servers. Since the emergency of P2P file sharing, load balancing has been considered as a primary concern, as well as other issues such as autonomy, fault tolerance and security. In a process of file search, a heavily loaded peer may incur a long latency or failure in query forwarding or responding. If there are many such peers in a system, it may cause link congestion or path congestion, and consequently affect the performance of overall system. To avoid such situation, some of general techniques used in Web systems such as caching and paging are adopted into P2P systems. However, it is highly insufficient for load balancing since peers often exhibit high heterogeneity and dynamicity in P2P systems. To overcome such a difficulty, the use of super-peers is currently being the most promising approach in optimizing allocation of system load to peers, i.e., it allocates more system load to high capacity and stable super-peers by assigning task of index maintenance and retrieval to them.
In this thesis, we focused on two kinds of super-peer based hierarchical architectures of P2P systems, which are distinguished by the organization of super-peers. In each of them, we discussed system load allocation, and proposed novel load balancing algorithms for alleviating load imbalance of super-peers, aiming to decrease average and variation of query response time during index retrieval process.
More concretely, in this thesis, our contribution to load management solutions for hierarchical P2P file search are the following:
• In Qin’s hierarchical architecture, indices of files held by the user peers in the bottom layer are stored at the super-peers in the middle layer, and the correlation of those two bottom layers is controlled by the central server(s) in the top layer using the notion of tags. In Qin’s system, a heavily loaded super-peer can move excessive load to a lightly loaded super-peer by using the notion of task migration. However, such a task migration approach is not sufficient to balance the load of super-peers if the size of tasks is highly imbalanced. To overcome such an issue, in this thesis, we propose two task migration schemes for this architecture, aiming to ensure an even load distribution over the super-peers. The first scheme controls the load of each task in order to decrease the total cost of task migration. The second scheme directly balances the load over tasks by reordering the priority of tags used in the query forwarding step. The effectiveness of the proposed schemes are evaluated by simulation. The result of simulations indicates that all the schemes can work in coordinate, in alleviating the bottleneck situation of super-peers.
• In DHT-based super-peer architecture, indices of files held by the user peers in the lower layer are stored at the DHT connected super-peers in the upper layer. In DHT-based super-peer systems, the skewness of user’s preference regarding keywords contained in multi-keyword query causes query load imbalance of super-peers that combines both routing and response load. Although index replication has a great potential for alleviating this problem, existing schemes did not explicitly address it or incurred high cost. To overcome such an issue, in this thesis, we propose an integrated solution that consists of three replication schemes to alleviate query load imbalance while minimizing the cost. The first scheme is an active index replication in order to decrease routing load in the super-peer layer, and distribute response load of an index among super-peers that stored the replica. The second scheme is a proactive pointer replication that places location information of an index, for reducing maintenance cost between the index and its replicas. The third scheme is a passive index replication that guarantees the maximum query load of super-peers. The result of simulations indicates that the proposed schemes can help alleviating the query load imbalance of super-peers. Moreover, by comparison it was found that our schemes are more cost-effective on placing replicas than other approaches.広島大学(Hiroshima University)博士(工学)Doctor of Engineering in Information Engineeringdoctora
Routing and caching on DHTS
L'obiettivo della tesi e' quello di analizzare i principali meccanismi di caching e routing implementati oggigiorno nelle DHT piu' utilizzate.
In particolare, la nostra analisi mostra come tali meccanismi siano sostanzialmente inefficaci nel garantire un adeguato load balancing tra i peers; le principali cause di questo fenomeno sono individuate nella struttura, eccessivamente rigida, adottata dalle DHT e nella mancanza di correlazione tra meccanismi di routing e di caching.
Viene quindi proposto un diverso overlay, organizzato in base a una struttura ipercubica, che permetta di adottare un algoritmo di routing piu' flessibile e di sviluppare due meccanismi di caching e routing strettamente interconnessi.
In particolare, l'overlay ottenuto riesce a garantire che ogni nodo subisca un carico al piu' costante, con una taglia di cache costante e una complessita' di routing polilogaritmica nel caso peggior
A HOLISTIC REDUNDANCY- AND INCENTIVE-BASED FRAMEWORK TO IMPROVE CONTENT AVAILABILITY IN PEER-TO-PEER NETWORKS
Peer-to-Peer (P2P) technology has emerged as an important alternative to the traditional client-server communication paradigm to build large-scale distributed systems. P2P enables the creation, dissemination and access to information at low cost and without the need of dedicated coordinating entities. However, existing P2P systems fail to provide high-levels of content availability, which limit their applicability and adoption. This dissertation takes a holistic approach to device mechanisms to improve content availability in large-scale P2P systems.
Content availability in P2P can be impacted by hardware failures and churn. Hardware failures, in the form of disk or node failures, render information inaccessible. Churn, an inherent property of P2P, is the collective effect of the users’ uncoordinated behavior, which occurs when a large percentage of nodes join and leave frequently. Such a behavior reduces content availability significantly. Mitigating the combined effect of hardware failures and churn on content availability in P2P requires new and innovative solutions that go beyond those applied in existing distributed systems. To addresses this challenge, the thesis proposes two complementary, low cost mechanisms, whereby nodes self-organize to overcome failures and improve content availability. The first mechanism is a low complexity and highly flexible hybrid redundancy scheme, referred to as Proactive Repair (PR). The second mechanism is an incentive-based scheme that promotes cooperation and enforces fair exchange of resources among peers. These mechanisms provide the basis for the development of distributed self-organizing algorithms to automate PR and, through incentives, maximize their effectiveness in realistic P2P environments.
Our proposed solution is evaluated using a combination of analytical and experimental methods. The analytical models are developed to determine the availability and repair cost properties of PR. The results indicate that PR’s repair cost outperforms other redundancy schemes. The experimental analysis was carried out using simulation and the development of a testbed. The simulation results confirm that PR improves content availability in P2P. The proposed mechanisms are implemented and tested using a DHT-based P2P application environment. The experimental results indicate that the incentive-based mechanism can promote fair exchange of resources and limits the impact of uncooperative behaviors such as “free-riding”
Trade-off among timeliness, messages and accuracy for large-Ssale information management
The increasing amount of data and the number of nodes in large-scale environments
require new techniques for information management. Examples of such environments
are the decentralized infrastructures of Computational Grid and Computational
Cloud applications. These large-scale applications need different kinds
of aggregated information such as resource monitoring, resource discovery or economic
information. The challenge of providing timely and accurate information
in large scale environments arise from the distribution of the information. Reasons
for delays in distributed information system are a long information transmission
time due to the distribution, churn and failures.
A problem of large applications such as peer-to-peer (P2P) systems is the increasing
retrieval time of the information due to the decentralization of the data
and the failure proneness. However, many applications need a timely information
provision. Another problem is an increasing network consumption when the application
scales to millions of users and data. Using approximation techniques allows
reducing the retrieval time and the network consumption. However, the usage of
approximation techniques decreases the accuracy of the results. Thus, the remaining
problem is to offer a trade-off in order to solve the conflicting requirements of
fast information retrieval, accurate results and low messaging cost.
Our goal is to reach a self-adaptive decision mechanism to offer a trade-off
among the retrieval time, the network consumption and the accuracy of the result.
Self-adaption enables distributed software to modify its behavior based on
changes in the operating environment. In large-scale information systems that use
hierarchical data aggregation, we apply self-adaptation to control the approximation
used for the information retrieval and reduces the network consumption and
the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an
accuracy of the results, while considering user’s defined priorities.
First, this presented research addresses the problem of a trade-off among a
timely information retrieval, accurate results and low messaging cost by proposing
a summarization algorithm for resource discovery in P2P-content networks.
After identifying how summarization can improve the discovery process, we propose
an algorithm which uses a precision-recall metric to compare the accuracy
and to offer a user-driven trade-off. Second, we propose an algorithm that applies
a self-adaptive decision making on each node. The decision is about the pruning
of the query and returning the result instead of continuing the query. The pruning
reduces the retrieval time and the network consumption at the cost of a lower accuracy
in contrast to continuing the query. The algorithm uses an analytic hierarchy
process to assess the user’s priorities and to propose a trade-off in order to satisfy
the accuracy requirements with a low message cost and a short delay.
A quantitative analysis evaluates our presented algorithms with a simulator,
which is fed with real data of a network topology and the nodes’ attributes. The
usage of a simulator instead of the prototype allows the evaluation in a large scale
of several thousands of nodes. The algorithm for content summarization is evaluated
with half a million of resources and with different query types. The selfadaptive
algorithm is evaluated with a simulator of several thousands of nodes
that are created from real data. A qualitative analysis addresses the integration
of the simulator’s components in existing market frameworks for Computational
Grid and Cloud applications.
The proposed content summarization algorithm reduces the information retrieval
time from a logarithmic increase to a constant factor. Furthermore, the
message size is reduced significantly by applying the summarization technique.
For the user, a precision-recall metric allows defining the relation between the retrieval
time and the accuracy. The self-adaptive algorithm reduces the number of
messages needed from an exponential increase to a constant factor. At the same
time, the retrieval time is reduced to a constant factor under an increasing number
of nodes. Finally, the algorithm delivers the data with the required accuracy
adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent
quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests
entorns són les infraestructures descentralitzades de Computacional Grid i Cloud.
Les aplicacions a gran escala necessiten diferents classes d’informació agregada
com monitorització de recursos i informació econòmica. El desafiament de proporcionar
una provisió ràpida i acurada d’informació en ambients de grans escala
sorgeix de la distribució de la informació. Una raó és que el sistema d’informació
ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients.
Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P)
és el creixent temps de recuperació de l’informació a causa de la descentralització
de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten
una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten
inexactituds dels resultats si la informació es reparteix a temps. A més i més, el
consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del
sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació
i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació
disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís
per resoldre els requisits en conflicte d’extracció de la informació ràpida,
resultats acurats i cost d’enviament baix.
El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu
per tal d’oferir el compromís entre temps de recuperació, consum de
xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar
el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes
d’informació de gran escala que utilitzen agregació de dades jeràrquica,
l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi
principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el
temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada
definida per l’usari.
La recerca que es presenta, introdueix un algoritme de sumarització de continguts
per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar
com sumarització pot millorar el procés de descoberta, proposem una mètrica que
s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després,
introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer
els requisits de precisió amb un cost de missatge baix i un retard curt. Basat
en les prioritats d’usuari, l’algoritme troba automàticament un compromís.
L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per
permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb
dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de
sumarització de contingut s’avalua amb mig milió de recursos i amb diferents
tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del
simulador en estructures de mercat existents per a aplicacions de Computacional
Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés
d’agregació i la query language) és comprovada per la integració de prototips.
L’algoritme de sumarització de contingut proposat redueix el temps d’extracció
de l’informació d’un augment logarítmic a un factor constant. A més, també permet
que la mida del missatge es redueix significativament. Per a l’usuari, una
precision-recall mètric permet definir la relació entre el nivell de precisió i el
temps d’extracció de la informació. Alhora, el temps de recuperació es redueix
a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme
reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud
segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser
utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació
de gran escala en el futur
Trade-off among timeliness, messages and accuracy for large-Ssale information management
The increasing amount of data and the number of nodes in large-scale environments
require new techniques for information management. Examples of such environments
are the decentralized infrastructures of Computational Grid and Computational
Cloud applications. These large-scale applications need different kinds
of aggregated information such as resource monitoring, resource discovery or economic
information. The challenge of providing timely and accurate information
in large scale environments arise from the distribution of the information. Reasons
for delays in distributed information system are a long information transmission
time due to the distribution, churn and failures.
A problem of large applications such as peer-to-peer (P2P) systems is the increasing
retrieval time of the information due to the decentralization of the data
and the failure proneness. However, many applications need a timely information
provision. Another problem is an increasing network consumption when the application
scales to millions of users and data. Using approximation techniques allows
reducing the retrieval time and the network consumption. However, the usage of
approximation techniques decreases the accuracy of the results. Thus, the remaining
problem is to offer a trade-off in order to solve the conflicting requirements of
fast information retrieval, accurate results and low messaging cost.
Our goal is to reach a self-adaptive decision mechanism to offer a trade-off
among the retrieval time, the network consumption and the accuracy of the result.
Self-adaption enables distributed software to modify its behavior based on
changes in the operating environment. In large-scale information systems that use
hierarchical data aggregation, we apply self-adaptation to control the approximation
used for the information retrieval and reduces the network consumption and
the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an
accuracy of the results, while considering user’s defined priorities.
First, this presented research addresses the problem of a trade-off among a
timely information retrieval, accurate results and low messaging cost by proposing
a summarization algorithm for resource discovery in P2P-content networks.
After identifying how summarization can improve the discovery process, we propose
an algorithm which uses a precision-recall metric to compare the accuracy
and to offer a user-driven trade-off. Second, we propose an algorithm that applies
a self-adaptive decision making on each node. The decision is about the pruning
of the query and returning the result instead of continuing the query. The pruning
reduces the retrieval time and the network consumption at the cost of a lower accuracy
in contrast to continuing the query. The algorithm uses an analytic hierarchy
process to assess the user’s priorities and to propose a trade-off in order to satisfy
the accuracy requirements with a low message cost and a short delay.
A quantitative analysis evaluates our presented algorithms with a simulator,
which is fed with real data of a network topology and the nodes’ attributes. The
usage of a simulator instead of the prototype allows the evaluation in a large scale
of several thousands of nodes. The algorithm for content summarization is evaluated
with half a million of resources and with different query types. The selfadaptive
algorithm is evaluated with a simulator of several thousands of nodes
that are created from real data. A qualitative analysis addresses the integration
of the simulator’s components in existing market frameworks for Computational
Grid and Cloud applications.
The proposed content summarization algorithm reduces the information retrieval
time from a logarithmic increase to a constant factor. Furthermore, the
message size is reduced significantly by applying the summarization technique.
For the user, a precision-recall metric allows defining the relation between the retrieval
time and the accuracy. The self-adaptive algorithm reduces the number of
messages needed from an exponential increase to a constant factor. At the same
time, the retrieval time is reduced to a constant factor under an increasing number
of nodes. Finally, the algorithm delivers the data with the required accuracy
adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent
quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests
entorns són les infraestructures descentralitzades de Computacional Grid i Cloud.
Les aplicacions a gran escala necessiten diferents classes d’informació agregada
com monitorització de recursos i informació econòmica. El desafiament de proporcionar
una provisió ràpida i acurada d’informació en ambients de grans escala
sorgeix de la distribució de la informació. Una raó és que el sistema d’informació
ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients.
Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P)
és el creixent temps de recuperació de l’informació a causa de la descentralització
de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten
una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten
inexactituds dels resultats si la informació es reparteix a temps. A més i més, el
consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del
sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació
i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació
disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís
per resoldre els requisits en conflicte d’extracció de la informació ràpida,
resultats acurats i cost d’enviament baix.
El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu
per tal d’oferir el compromís entre temps de recuperació, consum de
xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar
el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes
d’informació de gran escala que utilitzen agregació de dades jeràrquica,
l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi
principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el
temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada
definida per l’usari.
La recerca que es presenta, introdueix un algoritme de sumarització de continguts
per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar
com sumarització pot millorar el procés de descoberta, proposem una mètrica que
s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després,
introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer
els requisits de precisió amb un cost de missatge baix i un retard curt. Basat
en les prioritats d’usuari, l’algoritme troba automàticament un compromís.
L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per
permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb
dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de
sumarització de contingut s’avalua amb mig milió de recursos i amb diferents
tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del
simulador en estructures de mercat existents per a aplicacions de Computacional
Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés
d’agregació i la query language) és comprovada per la integració de prototips.
L’algoritme de sumarització de contingut proposat redueix el temps d’extracció
de l’informació d’un augment logarítmic a un factor constant. A més, també permet
que la mida del missatge es redueix significativament. Per a l’usuari, una
precision-recall mètric permet definir la relació entre el nivell de precisió i el
temps d’extracció de la informació. Alhora, el temps de recuperació es redueix
a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme
reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud
segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser
utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació
de gran escala en el futur.Postprint (published version
Resource Management in a Peer to Peer Cloud Network for IoT
Software-Defined Internet of Things (SDIoT) is defined as merging heterogeneous objects in a form of interaction among physical and virtual entities. Large scale of data centers, heterogeneity issues and their interconnections have made the resource management a hard problem specially when there are different actors in cloud system with different needs. Resource management is a vital requirement to achieve robust networks specially with facing continuously increasing amount of heterogeneous resources and devices to the network. The goal of this paper is reviews to address IoT resource management issues in cloud computing services. We discuss the bottlenecks of cloud networks for IoT services such as mobility. We review Fog computing in IoT services to solve some of these issues. It provides a comprehensive literature review of around one hundred studies on resource management in Peer to Peer Cloud Networks and IoT. It is very important to find a robust design to efficiently manage and provision requests and available resources. We also reviewed different search methodologies to help clients find proper resources to answer their needs
- …