215 research outputs found
Content Distribution in P2P Systems
The report provides a literature review of the state-of-the-art for content distribution. The report's contributions are of threefold. First, it gives more insight into traditional Content Distribution Networks (CDN), their requirements and open issues. Second, it discusses Peer-to-Peer (P2P) systems as a cheap and scalable alternative for CDN and extracts their design challenges. Finally, it evaluates the existing P2P systems dedicated for content distribution according to the identied requirements and challenges
Improving Data Freshness in Replicated Databases
Projet RODINData replication is often used in distributed database applications to improve data availability and performance. Replicated data must be periodicall- y refreshed using update propagation strategies. Most of the current strategie- s guarantee mutual consistency of replicated data but are inefficient. Lazy (or asynchronous) replication is an alternative, more efficient solution where mutual consistency is relaxed. It is needed in applications such as on-line financial transactions and telecommunication systems which require high freshness of replicated data. In this case, the concept of \it freshnes
Reducing Network Traffic in Unstructured P2P Systems Using Top-k Queries
A major problem of unstructured P2P systems is their heavy network traffic.
This is caused mainly by high numbers of query answers, many of which are
irrelevant for users. One solution to this problem is to use Top-k queries
whereby the user can specify a limited number (k) of the most relevant answers.
In this paper, we present FD, a (Fully Distributed) framework for executing
Top-k queries in unstructured P2P systems, with the objective of reducing
network traffic. FD consists of a family of algorithms that are simple but
effec-tive. FD is completely distributed, does not depend on the existence of
certain peers, and addresses the volatility of peers during query execution. We
vali-dated FD through implementation over a 64-node cluster and simulation
using the BRITE topology generator and SimJava. Our performance evaluation
shows that FD can achieve major performance gains in terms of communication and
response time
Multi-objective scheduling of Scientific Workflows in multisite clouds
Clouds appear as appropriate infrastructures for executing Scientific Workflows (SWfs). A cloud is typically made of several sites (or data centers), each with its own resources and data. Thus, it becomes important to be able to execute some SWfs at more than one cloud site because of the geographical distribution of data or available resources among different cloud sites. Therefore, a major problem is how to execute a SWf in a multisite cloud, while reducing execution time and monetary costs. In this paper, we propose a general solution based on multi-objective scheduling in order to execute SWfs in a multisite cloud. The solution consists of a multi-objective cost model including execution time and monetary costs, a Single Site Virtual Machine (VM) Provisioning approach (SSVP) and ActGreedy, a multisite scheduling approach. We present an experimental evaluation, based on the execution of the SciEvol SWf in Microsoft Azure cloud. The results reveal that our scheduling approach significantly outperforms two adapted baseline algorithms (which we propose by adapting two existing algorithms) and the scheduling time is reasonable compared with genetic and brute-force algorithms. The results also show that our cost model is accurate and that SSVP can generate better VM provisioning plans compared with an existing approach.Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft
(ZcloudFlow project) and performed in the context of the Computational Biology Institute (www.ibc-montpellier.fr). We would like to thank Kary Ocaña for her help in modeling and
executing the SciEvol SWf.Peer ReviewedPostprint (author's final draft
Survey of data replication in P2P systems
Large-scale distributed collaborative applications are getting common as a result of rapid progress in distributed technologies (grid, peer-to-peer, and mobile computing). Peer-to-peer (P2P) systems are particularly interesting for collaborative applications as they can scale without the need for powerful servers. In P2P systems, data storage and processing are distributed across autonomous peers, which can join and leave the network at any time. To provide high data availability in spite of such dynamic behavior, P2P systems rely on data replication. Some replication approaches assume static, read-only data (e.g. music files). Other solutions deal with updates, but they simplify replica management by assuming no update conflicts or single-master replication (i.e. only one copy of the replicated data accepts write operations). P2P advanced applications, which must deal with semantically rich data (e.g. XML documents, relational tables, etc.) using a high-level SQL-like query language, are likely to need more sophisticated capabilities such as multi-master replication (i.e. all replicas accept write operations) and update conflict resolution. These issues are addressed by optimistic replication. Optimistic replication allows asynchronous updating of replicas so that applications can progress even though some nodes are disconnected or have failed. As a result, users can collaborate asynchronously. However, concurrent updates may cause replica divergence and conflicts, which should be reconciled. In this survey, we present an overview of data replication, focusing on the optimistic approach that provides good properties for dynamic environments. We also introduce P2P systems and the replication solutions they implement. In particular, we show that current P2P systems do not provide eventual consistency among replicas in the presence of updates, apart from APPA system, a P2P data management system that we are building
Semantic Query Reformulation in Social PDMS
We consider social peer-to-peer data management systems (PDMS), where each
peer maintains both semantic mappings between its schema and some
acquaintances, and social links with peer friends. In this context,
reformulating a query from a peer's schema into other peer's schemas is a hard
problem, as it may generate as many rewritings as the set of mappings from that
peer to the outside and transitively on, by eventually traversing the entire
network. However, not all the obtained rewritings are relevant to a given
query. In this paper, we address this problem by inspecting semantic mappings
and social links to find only relevant rewritings. We propose a new notion of
'relevance' of a query with respect to a mapping, and, based on this notion, a
new semantic query reformulation approach for social PDMS, which achieves great
accuracy and flexibility. To find rapidly the most interesting mappings, we
combine several techniques: (i) social links are expressed as FOAF (Friend of a
Friend) links to characterize peer's friendship and compact mapping summaries
are used to obtain mapping descriptions; (ii) local semantic views are special
views that contain information about external mappings; and (iii) gossiping
techniques improve the search of relevant mappings. Our experimental
evaluation, based on a prototype on top of PeerSim and a simulated network
demonstrate that our solution yields greater recall, compared to traditional
query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM
Flower-CDN: A hybrid P2P overlay for Efficient Query Processing in CDN
International audienceMany websites with a large user base, e.g., websites of non-profit organizations, do not have the financial means to install large web-servers or use specialized content distribution networks such as Akamai. For those websites, we have developed Flower-CDN, a locality-aware peer-to-peer based content-distribution network in which the users that are interested in a website support the distribution of its content. The idea is that peers keep the web-pages they retrieve and later serve them to other peers that are close to them in locality. Our architecture is a hybrid between structured and unstructured networks. When a node requests a web-page from a website for the first time, a locality-aware DHT quickly finds a peer in its neighborhood that has the web-page available. Additionally, all peers in a given region that maintain content of a particular website build an unstructured content overlay. Within a content overlay peers gossip information about their content allowing the system to maintain accurate information despite failures and churn. In our detailed performance evaluation, we compare Flower-CDN with Squirrel, which is a content distribution network that is strictly based on DHTs and not locality aware. Compared to Squirrel, Flower-CDN reduces lookup latency by a factor of 9 and the transfer distance by a factor of 2. We also show that Flower-CDN's gossiping has low overhead and can be adjusted according to hit ratio requirements and bandwidth availability
A Highly Robust P2P-CDN Under Large-Scale and Dynamic Participation
International audienceBy building a P2P Content Distribution Network (CDN), peers collaborate to distribute the content of under-provisioned websites and to serve queries for larger audiences on behalf of the websites. This can reveal very challenging, given the highly dynamic and autonomous participation of peers. Indeed, the P2P-CDN should adapt to increasing numbers of participants and provide robust algorithms under churn because these issues have a key impact on performance. Also, the distribution of tasks and content over peers should take into account their interests in order to give them proper incentives to cooperate. Finally, the routing of queries should aim peers close in locality and serve content from close-by providers to reduce network overload and achieve scalability. We have previously proposed a locality and interest-aware P2P-CDN, Flower-CDN, that lacks efficient management of robustness and scalability. In this paper, we focus on these crucial shortcomings and propose PetalUp-CDN. The performance evaluation with respect to scalability and churn shows highly significant gains
Location-Aware Index Caching and Searching for P2P Systems
International audienceUnstructured P2P networks remain widely deployed in file-sharing systems, due to their simple features. However, the P2P traffic, mainly composed of repetitive query messages, contributes the largest portion of the Internet traffic. The principal causes of this critical issue are the search inefficiency and the construction of the P2P overlay without any knowledge of the underlying topology. In order to reduce the P2P redundant traffic and to address the limitations of existing solutions, we propose a solution that performs index caching and efficient query routing while supporting keyword search. We aim at improving the probability of finding available copies of requested files by leveraging file replication. In addition, our scheme tries to direct queries to close results, by using topological information in terms of file physical distribution. We believe that the traffic can be reduced and the user experience ameliorated in terms of faster downloads, with minimum overhead
- …