9 research outputs found
An Analysis of BitTorrent Cross-Swarm Peer Participation and Geolocational Distribution
Peer-to-Peer (P2P) file-sharing is becoming increasingly popular in recent
years. In 2012, it was reported that P2P traffic consumed over 5,374 petabytes
per month, which accounted for approximately 20.5% of consumer internet
traffic. TV is the popular content type on The Pirate Bay (the world's largest
BitTorrent indexing website). In this paper, an analysis of the swarms of the
most popular pirated TV shows is conducted. The purpose of this data gathering
exercise is to enumerate the peer distribution at different geolocational
levels, to measure the temporal trend of the swarm and to discover the amount
of cross-swarm peer participation. Snapshots containing peer related
information involved in the unauthorised distribution of this content were
collected at a high frequency resulting in a more accurate landscape of the
total involvement. The volume of data collected throughout the monitoring of
the network exceeded 2 terabytes. The presented analysis and the results
presented can aid in network usage prediction, bandwidth provisioning and
future network design.Comment: The First International Workshop on Hot Topics in Big Data and
Networking (HotData I
Static Web content distribution and request routing in a P2P overlay
The significance of collaboration over the Internet has become a corner-stone of modern computing, as the essence of information processing and content management has shifted to networked and Webbased systems. As a result, the effective and reliable access to networked resources has become a critical commodity in any modern infrastructure.
In order to cope with the limitations introduced by the traditional client-server networking model, most of the popular Web-based services have employed separate Content Delivery Networks (CDN) to distribute the server-side resource consumption. Since the Web applications are often latency-critical, the CDNs are additionally being adopted for optimizing the content delivery latencies perceived by the Web clients. Because of the prevalent connection model, the Web content delivery has grown to a notable industry. The rapid growth in the amount of mobile devices further contributes to the amount of resources required from the originating server, as the content is also accessible on the go.
While the Web has become one of the most utilized sources of information and digital content, the openness of the Internet is simultaneously being reduced by organizations and governments preventing access to any undesired resources. The access to information may be regulated or altered to suit any political interests or organizational benefits, thus conflicting with the initial design principle of an unrestricted and independent information network.
This thesis contributes to the development of more efficient and open Internet by combining a feasibility study and a preliminary design of a peer-to-peer based Web content distribution and request routing mechanism. The suggested design addresses both the challenges related to effectiveness of current client-server networking model and the openness of information distributed over the Internet. Based on the properties of existing peer-to-peer implementations, the suggested overlay design is intended to provide low-latency access to any Web content without sacrificing the end-user privacy. The overlay is additionally designed to increase the cost of censorship by forcing a successful blockade to isolate the censored network from the rest of the Internet
Incentive-driven QoS in peer-to-peer overlays
A well known problem in peer-to-peer overlays is that no single entity has control over the software,
hardware and configuration of peers. Thus, each peer can selfishly adapt its behaviour to maximise its
benefit from the overlay. This thesis is concerned with the modelling and design of incentive mechanisms
for QoS-overlays: resource allocation protocols that provide strategic peers with participation incentives,
while at the same time optimising the performance of the peer-to-peer distribution overlay.
The contributions of this thesis are as follows. First, we present PledgeRoute, a novel contribution
accounting system that can be used, along with a set of reciprocity policies, as an incentive mechanism
to encourage peers to contribute resources even when users are not actively consuming overlay services.
This mechanism uses a decentralised credit network, is resilient to sybil attacks, and allows peers to
achieve time and space deferred contribution reciprocity. Then, we present a novel, QoS-aware resource
allocation model based on Vickrey auctions that uses PledgeRoute as a substrate. It acts as an incentive
mechanism by providing efficient overlay construction, while at the same time allocating increasing
service quality to those peers that contribute more to the network. The model is then applied to lagsensitive
chunk swarming, and some of its properties are explored for different peer delay distributions.
When considering QoS overlays deployed over the best-effort Internet, the quality received by a
client cannot be adjudicated completely to either its serving peer or the intervening network between
them. By drawing parallels between this situation and well-known hidden action situations in microeconomics,
we propose a novel scheme to ensure adherence to advertised QoS levels. We then apply
it to delay-sensitive chunk distribution overlays and present the optimal contract payments required,
along with a method for QoS contract enforcement through reciprocative strategies. We also present a
probabilistic model for application-layer delay as a function of the prevailing network conditions.
Finally, we address the incentives of managed overlays, and the prediction of their behaviour. We
propose two novel models of multihoming managed overlay incentives in which overlays can freely
allocate their traffic flows between different ISPs. One is obtained by optimising an overlay utility
function with desired properties, while the other is designed for data-driven least-squares fitting of the
cross elasticity of demand. This last model is then used to solve for ISP profit maximisation
High-performance and fault-tolerant techniques for massive data distribution in online communities
The amount of digital information produced and consumed is increasing each day.
This rapid growth is motivated by the advances in computing power, hardware technologies,
and the popularization of user generated content networks. New hardware
is able to process larger quantities of data, which permits to obtain finer results, and
as a consequence more data is generated. In this respect, scientific applications have
evolved benefiting from the new hardware capabilities. This type of application is
characterized by requiring large amounts of information as input, generating a significant
amount of intermediate data resulting in large files. This increase not only
appears in terms of volume, but also in terms of size, we need to provide methods
that permit a efficient and reliable data access mechanism. Producing such a method
is a challenging task due to the amount of aspects involved. However, we can leverage
the knowledge found in social networks to improve the distribution process. In
this respect, the advent of the Web 2.0 has popularized the concept of social network,
which provides valuable knowledge about the relationships among users, and
the users with the data. However, extracting the knowledge and defining ways to
actively use it to increase the performance of a system remains an open research
direction.
Additionally, we must also take into account other existing limitations. In particular,
the interconnection between different elements of the system is one of the key
aspects. The availability of new technologies such as the mass-production of multicore
chips, large storage media, better sensors, etc. contributed to the increase of
data being produced. However, the underlying interconnection technologies have
not improved with the same speed as the others. This leads to a situation where
vast amounts of data can be produced and need to be consumed by a large number
of geographically distributed users, but the interconnection between both ends does
not match the required needs.
In this thesis, we address the problem of efficient and reliable data distribution in
a geographically distributed systems. In this respect, we focus on providing a solution
that 1) optimizes the use of existing resources, 2) does not requires changes in
the underlying interconnection, and 3) provides fault-tolerant capabilities. In order
to achieve this objectives, we define a generic data distribution architecture composed
of three main components: community detection module, transfer scheduling
module, and distribution controller. The community detection module leverages the
information found in the social network formed by the users requesting files and
produces a set of virtual communities grouping entities with similar interests. The
transfer scheduling module permits to produce a plan to efficiently distribute all requested
files improving resource utilization. For this purpose, we model the distribution
problem using linear programming and offer a method to permit a distributed
solving of the problem. Finally, the distribution controller manages the distribution
process using the aforementioned schedule, controls the available server infrastructure,
and launches new on-demand resources when necessary
Increasing the robustness of networked systems
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 133-143).What popular news do you recall about networked systems? You've probably heard about the several hour failure at Amazon's computing utility that knocked down many startups for several hours, or the attacks that forced the Estonian government web-sites to be inaccessible for several days, or you may have observed inexplicably slow responses or errors from your favorite web site. Needless to say, keeping networked systems robust to attacks and failures is an increasingly significant problem. Why is it hard to keep networked systems robust? We believe that uncontrollable inputs and complex dependencies are the two main reasons. The owner of a web-site has little control on when users arrive; the operator of an ISP has little say in when a fiber gets cut; and the administrator of a campus network is unlikely to know exactly which switches or file-servers may be causing a user's sluggish performance. Despite unpredictable or malicious inputs and complex dependencies we would like a network to self-manage itself, i.e., diagnose its own faults and continue to maintain good performance. This dissertation presents a generic approach to harden networked systems by distinguishing between two scenarios. For systems that need to respond rapidly to unpredictable inputs, we design online solutions that re-optimize resource allocation as inputs change. For systems that need to diagnose the root cause of a problem in the presence of complex subsystem dependencies, we devise techniques to infer these dependencies from packet traces and build functional representations that facilitate reasoning about the most likely causes for faults. We present a few solutions, as examples of this approach, that tackle an important class of network failures. Specifically, we address (1) re-routing traffic around congestion when traffic spikes or links fail in internet service provider networks, (2) protecting websites from denial of service attacks that mimic legitimate users and (3) diagnosing causes of performance problems in enterprises and campus-wide networks. Through a combination of implementations, simulations and deployments, we show that our solutions advance the state-of-the-art.by Srikanth Kandula.Ph.D
Identifying, Analyzing, and Modeling Flashcrowds in BitTorrent
Flashcrowds—sudden surges of user arrivals—do occur in BitTorrent, and they can lead to severe service deprivation. However, very little is known about their occurrence patterns and their characteristics in real-world deployments, and many basic questions about BitTorrent flashcrowds, such as How often do they occur? and How long do they last?, remain unanswered. In this paper, we address these questions by studying three datasets that cover millions of swarms from two of the largest BitTorrent trackers. We first propose a model for BitTorrent flashcrowds and a procedure for identifying, analyzing, and modeling BitTorrent flashcrowds. Then we evaluate quantitatively the impact of flashcrowds on BitTorrent users, and we develop an algorithm that identifies BitTorrent flashcrowds. Finally, we study statistically the properties of BitTorrent flashcrowds identified from our datasets, such as their arrival time, duration, and magnitude, and we investigate the relationship between flashcrowds and swarm growth, and the arrival rate of flashcrowds in BitTorrent trackers. In particular, we find that BitTorrent flashcrowds only occur in very small fractions (0.3-2%) of the swarms but that they can affect over ten million users