749 research outputs found
`Q-Feed' - An Effective Solution for the Free-riding Problem in Unstructured P2P Networks
This paper presents a solution for reducing the ill effects of free-riders in
decentralised unstructured P2P networks. An autonomous replication scheme is
proposed to improve the availability and enhance system performance. Q-learning
is widely employed in different situations to improve the accuracy in decision
making by each peer. Based on the performance of neighbours of a peer, every
neighbour is awarded different levels of ranks. At the same time a
low-performing node is allowed to improve its rank in different ways.
Simulation results show that Q-learning based free riding control mechanism
effectively limits the services received by free-riders and also encourages the
low-performing neighbours to improve their position. The popular files are
autonomously replicated to nodes possessing required parameters. Due to this
improvement of quantity of popular files, free riders are given opportunity to
lift their position for active participation in the network for sharing files.
Q-feed effectively manages queries from free riders and reduces network traffic
significantlyComment: 14 pages, 10 figure
Exploring heterogeneity of unreliable machines for p2p backup
P2P architecture is a viable option for enterprise backup. In contrast to
dedicated backup servers, nowadays a standard solution, making backups directly
on organization's workstations should be cheaper (as existing hardware is
used), more efficient (as there is no single bottleneck server) and more
reliable (as the machines are geographically dispersed).
We present the architecture of a p2p backup system that uses pairwise
replication contracts between a data owner and a replicator. In contrast to
standard p2p storage systems using directly a DHT, the contracts allow our
system to optimize replicas' placement depending on a specific optimization
strategy, and so to take advantage of the heterogeneity of the machines and the
network. Such optimization is particularly appealing in the context of backup:
replicas can be geographically dispersed, the load sent over the network can be
minimized, or the optimization goal can be to minimize the backup/restore time.
However, managing the contracts, keeping them consistent and adjusting them in
response to dynamically changing environment is challenging.
We built a scientific prototype and ran the experiments on 150 workstations
in the university's computer laboratories and, separately, on 50 PlanetLab
nodes. We found out that the main factor affecting the quality of the system is
the availability of the machines. Yet, our main conclusion is that it is
possible to build an efficient and reliable backup system on highly unreliable
machines (our computers had just 13% average availability)
Data sharing in DHT based P2P systems
International audienceThe evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the "extreme" characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases forproviding data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identies important future research trends in data management in P2P DHT systems
A lightweight distributed super peer election algorithm for unstructured dynamic P2P systems
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de ComputadoresNowadays with the current growth of information exchange, and the increasing mobility of devices, it becomes essential to use technology to monitor this development. For that P2P networks are used, the exchange of information between agencies is facilitated, these now being applied in mobile networks, including MANETs, where they have special features such as the fact that they are semi-centralized, where it takes peers more ability to make a greater role in the network. But those peer with more capacity, which are used in the optimization of various parameters of these systems, such as optimization\to research, are difficult to identify due to the fact that the network does not have a fixed topology, be constantly changing, (we like to go online and offline, to change position, etc.) and not to allow the exchange of large messages. To this end, this thesis proposes a distributed election algorithm of us greater capacity among several possible goals, enhance research in the network. This includes distinguishing characteristics, such as election without global knowledge network, minimal exchange of messages, distributed decision made without dependence on us and the possibility of influencing the election outcome as the special needs of the network
Implications of query caching for JXTA peers
This dissertation studies the caching of queries and how to cache in an efficient way, so that retrieving previously accessed data does not need any intermediary nodes between the data-source peer and the querying peer in super-peer P2P network. A precise algorithm was devised that demonstrated how queries can be deconstructed to provide greater flexibility for reusing their constituent elements. It showed how subsequent queries can make use of more than one previous query and any part of those queries to reconstruct direct data communication with one or more source peers that have supplied data previously. In effect, a new query can search and exploit the entire cached list of queries to construct the list of the data locations it requires that might match any locations previously accessed. The new method increases the likelihood of repeat queries being able to reuse earlier queries and provides a viable way of by-passing shared data indexes in structured networks. It could also increase the efficiency of unstructured networks by reducing traffic and the propensity for network flooding. In addition, performance evaluation for predicting query routing performance by using a UML sequence diagram is introduced. This new method of performance evaluation provides designers with information about when it is most beneficial to use caching and how the peer connections can optimize its exploitation
Trade-off among timeliness, messages and accuracy for large-Ssale information management
The increasing amount of data and the number of nodes in large-scale environments
require new techniques for information management. Examples of such environments
are the decentralized infrastructures of Computational Grid and Computational
Cloud applications. These large-scale applications need different kinds
of aggregated information such as resource monitoring, resource discovery or economic
information. The challenge of providing timely and accurate information
in large scale environments arise from the distribution of the information. Reasons
for delays in distributed information system are a long information transmission
time due to the distribution, churn and failures.
A problem of large applications such as peer-to-peer (P2P) systems is the increasing
retrieval time of the information due to the decentralization of the data
and the failure proneness. However, many applications need a timely information
provision. Another problem is an increasing network consumption when the application
scales to millions of users and data. Using approximation techniques allows
reducing the retrieval time and the network consumption. However, the usage of
approximation techniques decreases the accuracy of the results. Thus, the remaining
problem is to offer a trade-off in order to solve the conflicting requirements of
fast information retrieval, accurate results and low messaging cost.
Our goal is to reach a self-adaptive decision mechanism to offer a trade-off
among the retrieval time, the network consumption and the accuracy of the result.
Self-adaption enables distributed software to modify its behavior based on
changes in the operating environment. In large-scale information systems that use
hierarchical data aggregation, we apply self-adaptation to control the approximation
used for the information retrieval and reduces the network consumption and
the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an
accuracy of the results, while considering user’s defined priorities.
First, this presented research addresses the problem of a trade-off among a
timely information retrieval, accurate results and low messaging cost by proposing
a summarization algorithm for resource discovery in P2P-content networks.
After identifying how summarization can improve the discovery process, we propose
an algorithm which uses a precision-recall metric to compare the accuracy
and to offer a user-driven trade-off. Second, we propose an algorithm that applies
a self-adaptive decision making on each node. The decision is about the pruning
of the query and returning the result instead of continuing the query. The pruning
reduces the retrieval time and the network consumption at the cost of a lower accuracy
in contrast to continuing the query. The algorithm uses an analytic hierarchy
process to assess the user’s priorities and to propose a trade-off in order to satisfy
the accuracy requirements with a low message cost and a short delay.
A quantitative analysis evaluates our presented algorithms with a simulator,
which is fed with real data of a network topology and the nodes’ attributes. The
usage of a simulator instead of the prototype allows the evaluation in a large scale
of several thousands of nodes. The algorithm for content summarization is evaluated
with half a million of resources and with different query types. The selfadaptive
algorithm is evaluated with a simulator of several thousands of nodes
that are created from real data. A qualitative analysis addresses the integration
of the simulator’s components in existing market frameworks for Computational
Grid and Cloud applications.
The proposed content summarization algorithm reduces the information retrieval
time from a logarithmic increase to a constant factor. Furthermore, the
message size is reduced significantly by applying the summarization technique.
For the user, a precision-recall metric allows defining the relation between the retrieval
time and the accuracy. The self-adaptive algorithm reduces the number of
messages needed from an exponential increase to a constant factor. At the same
time, the retrieval time is reduced to a constant factor under an increasing number
of nodes. Finally, the algorithm delivers the data with the required accuracy
adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent
quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests
entorns són les infraestructures descentralitzades de Computacional Grid i Cloud.
Les aplicacions a gran escala necessiten diferents classes d’informació agregada
com monitorització de recursos i informació econòmica. El desafiament de proporcionar
una provisió rà pida i acurada d’informació en ambients de grans escala
sorgeix de la distribució de la informació. Una raó és que el sistema d’informació
ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients.
Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P)
és el creixent temps de recuperació de l’informació a causa de la descentralització
de les dades i la facilitat al fracà s. No obstant això, moltes aplicacions necessiten
una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten
inexactituds dels resultats si la informació es reparteix a temps. A més i més, el
consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del
sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació
i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació
disminueix la precisió dels resultats. AixÃ, el problema restant és oferir un compromÃs
per resoldre els requisits en conflicte d’extracció de la informació rà pida,
resultats acurats i cost d’enviament baix.
El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu
per tal d’oferir el compromÃs entre temps de recuperació, consum de
xarxa i precisió del resultat. AutoadaptacÃo permet al programari distribuït modificar
el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes
d’informació de gran escala que utilitzen agregació de dades jerà rquica,
l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi
principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el
temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada
definida per l’usari.
La recerca que es presenta, introdueix un algoritme de sumarització de continguts
per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar
com sumarització pot millorar el procés de descoberta, proposem una mètrica que
s’utilitza per comparar la precisió i oferir un compromÃs definit per l’usuari. Després,
introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer
els requisits de precisió amb un cost de missatge baix i un retard curt. Basat
en les prioritats d’usuari, l’algoritme troba automà ticament un compromÃs.
L’anà lisi quantitativa avalua els algoritmes presentats amb un simulador per
permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb
dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de
sumarització de contingut s’avalua amb mig milió de recursos i amb diferents
tipus de sol·licituds. L’anà lisi qualitativa avalua la integració del components del
simulador en estructures de mercat existents per a aplicacions de Computacional
Grid i Cloud. AixÃ, la funcionalitat implementada del simulador (com el procés
d’agregació i la query language) és comprovada per la integració de prototips.
L’algoritme de sumarització de contingut proposat redueix el temps d’extracció
de l’informació d’un augment logarÃtmic a un factor constant. A més, també permet
que la mida del missatge es redueix significativament. Per a l’usuari, una
precision-recall mètric permet definir la relació entre el nivell de precisió i el
temps d’extracció de la informació. Alhora, el temps de recuperació es redueix
a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme
reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud
segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser
utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació
de gran escala en el futur.Postprint (published version
- …