53 research outputs found

    Exploring Peer-to-Peer Locality in Multiple Torrent Environment

    Full text link

    Experimental analysis of the socio-economic phenomena in the BitTorrent ecosystem

    Get PDF
    BitTorrent is the most successful Peer-to-Peer (P2P) application and is responsible for a major portion of Internet traffic. It has been largely studied using simulations, models and real measurements. Although simulations and modelling are easier to perform, they typically simplify analysed problems and in case of BitTorrent they are likely to miss some of the effects which occur in real swarms. Thus, in this thesis we rely on real measurements. In the first part of the thesis we present the summary of measurement techniques used so far and we use it as a base to design our tools that allow us to perform different types of analysis at different resolution level. Using these tools we collect several large-scale datasets to study different aspects of BitTorrent with a special focus on socio-economic aspects. Using our datasets, we first investigate the topology of real BitTorrent swarms and how the traffic is actually exchanged among peers. Our analysis shows that the resilience of BitTorrent swarms is lower than corresponding random graphs. We also observe that ISP policies, locality-aware clients and network events (e.g., network congestion) lead to locality-biased composition of neighbourhood in the swarms. This means that the peer contains more neighbours from local provider than expected from purely random neighbours selection process. Those results are of interest to the companies which use BitTorrent for daily operations as well as for ISPs which carry BitTorrent traffic. In the next part of the thesis we look at the BitTorrent from the perspective of the content and content publishers in a major BitTorrent portals. We focus on the factors that seem to drive the popularity of the BitTorrent and, as a result, could affect its associated traffic in the Internet. We show that a small fraction of publishers (around 100 users) is responsible for more than two-thirds of the published content. Those publishers can be divided into two groups: (i) profit driven and (ii)fake publishers. The former group leverages the published copyrighted content (typically very popular) on BitTorrent portals to attract content consumers to their web sites for financial gain. Removing this group may have a significant impact on the popularity of BitTorrent portals and, as a result, may affect a big portion of the Internet traffic associated to BitTorrent. The latter group is responsible for fake content, which is mostly linked to malicious activity and creates a serious threat for the Bit- Torrent ecosystem and for the Internet in general. To mitigate this threat, in the last part of the thesis we present a new tool named TorrentGuard for the early detection of fake content that could help to significantly reduce the number of computer infections and scams suffered by BitTorrent users. This tool is available through web portal and as a plugin to Vuze, a popular BitTorrent client. Finally, we present MYPROBE, the web portal that allows to query our database and to gather different pieces of information regarding BitTorrent content publishers. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------BitTorrent es la aplicación peer-to-peer para compartición de ficheros de mayor éxito y responsable de una fracción importante del tráfico de Internet. Trabajos previos han estudiado BitTorrent usando técnicas de simulación, modelos analíticos y medidas reales. Aunque las técnicas analíticas y de simulación son más sencillas de aplicar, típicamente presentan versiones simplificadas de los problemas analizados y en el caso concreto de BitTorrent pueden obviar aspectos o interacciones fundamentales que ocurren en los enjambres de BitTorrent. Por lo tanto, en esta tesis utilizaremos como pilar de nuestra investigación técnicas de medidas reales. En primer lugar presentaremos un resumen de las técnicas de medidas usadas hasta el momento en el ámbito de BitTorrent que suponen la base teórica para el diseño de nuestras propias herramientas de medida que nos permitirán analizar enjambres reales de BitTorrent. Usando los datos obtenidos con estas herramientas estudiaremos aspectos diferentes de BitTorrent con un enfoque especial de los aspectos socioeconómicos. En la primera parte de la tesis, realizaremos un estudio detallado de la topología de los enjambres reales de BitTorrent así como de detalles acerca de las interacciones entre peers. Nuestro análisis demuestra que la resistencia de la topología de los enjambres reales de BitTorrent es menor que la ofrecida por grafos aleatorios equivalentes. Además, los resultados revelan que las políticas de los Provedores de Internet junto con la incipiente utilización de clientes de BitTorrent modificados y otros efectos en la red (p.ej. congestión) hacen que los enjambres reales de BitTorrent presentan una composicin de localidad. Es decir, un nodo tiene un número de vecinos dentro de su mismo Proveedor de Internet mayor del que obtendría en una topología puramente aleatoria. Estos resultados son de interés para las empresas que utilizan BitTorrent en sus operaciones, así como para los Provedores de Internet responsables de transportar el tráfico de BitTorrent. En la segunda parte de la tesis, analizamos los aspectos de publicación de contenido en los mayores portales de BitTorrent. En concreto, los resultados presentados muestran que sólo un pequeño grupo de publicadores (alrededor de 100) es responsable de hacer disponible más de dos tercios del contenido publicado. Además estos publicadores se pueden dividir en dos grupos: (i) aquellos con incentivos económicos y (ii) publicadores de contenido falso. El primer grupo hace disponible contenido protegido por derechos de autor (que es típicamente muy popular) en los principales portales de BitTorrent con el objetivo de atraer a los consumidores de dicho contenido a sus propios sitios web y obtener un beneficio económico. La eliminación de este grupo puede tener un impacto importante en la popularidad de los principales portales de BitTorrent así como en el tráfico generado por BitTorrent en Internet. El segundo grupo es responsable de la publicación de contenidos falsos. La mayor parte de dichos contenidos están asociados a una actividad maliciosa (p.ej. la distribución de software malicioso) y por tanto suponen una seria amenaza para el ecosistema de BitTorrent, en particular, y para Internet en general. Para minimizar los efectos de la amenaza que presentan estos publicadores, en la última parte de la tesis presentaremos una nueva herramienta denominada TorrentGuard para la pronta detección de contenidos falsos. Esta herramienta puede accederse a través de un portal web y a través de un plugin del cliente de BitTorrent Vuze. Finalmente, presentamos MYPROBE, un portal web que permite consultar una base de datos con información actualizada sobre los publicadores de contenidos en BitTorrent

    Bitocast: a hybrid BitTorrent and IP Multicast content distribution solution

    Get PDF
    Dissertação apresentada para obtenção do Grau de Mestre em Informática, pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaIn recent years we have observed an increased use of the Internet as a means for transmitting large content. There have been several technology attempts to attack this problem, including costly distribution networks and, more recently, peer to peer (P2P) protocols. Amongst these P2P protocols, BitTorrent has proven itself as an effective means for transmitting large content items and today enjoys great popularity. Numerous researchers have analyzed BitTorrent and proposed concepts and models to enhance its reliability, efficiency and fairness. Further, there are proposals to extend BitTorrent to support on-demand multimedia streaming. In this Dissertation we present Bitocast, a content distribution system that combines IP Multicast and BitTorrent protocols in order to achieve a more efficient usage of an Internet Service Provider’s network and reduce download time when serving large sets of contents to large audiences

    BitTorrent 시스템에서 컨텐트 번들링 및 배포

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 최양희.BitTorrent는 컨텐트 공유에 사용되는 가장 인기있는 인터넷 소프트웨어이다. BitTorrent가 널리 사용됨에 따라, 연구자들은 BitTorrent의 처리량, 공정성, 인센티브와 같은 이슈에 대해 연구해 왔고, 이러한 연구들은 BitTorrent 성능과 관련된 가치있는 결과들을 보여주었다. 하지만 대부분의 연구에서는, BitTorrent에서의 컨텐트 번들링 및 배포 전략과 관련해서 (1) BitTorrent 배포자가 파일을 어떤 목적으로 어떻게 번들 하는지와 (2) BitTorrent의 배포자들이 그들의 목적을 성취하기 위해 어떠한 전략들을 사용하는지 등에 대해 다루고 있지 않다. 본 학위 논문에서는, 앞서 언급한 문제들을 측정된 데이터를 바탕으로 조사하기 위해서, BitTorrent 포탈중 가장 큰 규모인 The Pirate Bay (TPB)에 대한 종합적인 측정 연구를 수행하였다. 측정된 데이터셋은 12만개의 토런트와 1600만명의 사용자로 구성되었고, 컨텐트 배포자를 (i) 가짜 배포자, (ii) 이윤추구 배포자, (iii) 이타적 배포자 세가지 종류로 분류하여 연구를 진행하였다. 또한 영화, TV, 성인물, 음악, 응용프로그램, 게임, 전자책과 같은 컨텐트 카테고리에 따라 번들링과 컨테트 배포 현황이 어떻게 되는지 조사하였다. 첫번째로, 토런트의 구조적 패턴과 스왐 참여자의 행동 패턴을 파악하기 위해 컨텐트 번들링과 관련된 현황을 조사하였다. 특별히, (1) 얼마나 컨텐트 번들링이 널리 사용되는가, (2) 어떤 파일들이 어떻게 토런트로 번들되는가, (3) 왜 배포자들이 파일을 번들해서 사용하는가, (4) 사용자들이 번들된 파일들을 어떻게 다운로드 받는가에 초점을 맞추어 연구를 수행하였다. 측정결과 72% 이상의 토런트들이 여러개의 파일로 구성되어 있는 것을 알 수 있었고, 이것은 번들이 BitTorrent의 파일 공유를 위해 널리 사용되고 있음을 보여준다. 그리고 경제적인 이득을 위해 웹사이트를 광고하는 이윤추구 배포자들이 번들을 선호하여 사용하는 경향이 있음을 알 수 있었다. 또한 번들된 토런트의 대부분의 파일(94%)이 사용자들에 의해 선택되고, 번들된 토런트가 번들이 아닌 토런트보다 평균적으로 더 인기가 좋음을 알 수 있었다. 전체적으로, 토런트의 구조적 패턴과 스왐 참여자의 특징은 컨텐트의 카테고리 종류에 따라서, 그리고 번들된 토런트인지 번들되지 않은 토런트인지에 따라서 주목할만한 차이점이 있음을 발견할 수 있었다. 다음으로, 사회경제적 관점에서 BitTorrent의 컨텐트 배포 패턴을 (1) 배포자에 의해서 파일이 어떻게 배포되는가, (2) 각 배포자들은 어떤 전략들을 사용하는가, (3) 배포 전략들이 얼마나 효과가 있는가의 측면에서 조사하였다. 측정결과 상당한 양의 트래픽(61%)이 가짜 토런트를 다운받을 때 발생하고 있는 것을 알 수 있었고, 이는 많은 양의 인터넷 트래픽이 불필요하게 낭비되고 있음을 보여 주는 것이다. 따라서 본 측정 결과로부터 알 수 있는 가짜 배포자들의 배포 패턴을 고려해서 TPB의 가짜 배포자를 걸러낼 수 있는 방법을 제안하였고, 제안된 방법이 전체 다운로드 트래픽의 45% 가량을 줄일 수 있음을 보여 주었다. 또한 이윤추구 배포자들은 그들의 수익모델(예를 들어, 개인 트래커 사이트에 새로운 사용자를 영입하는 것이나 사람들이 사진과 연결된 URL 링크를 클릭하도록 하는 것)에 따라 다른 배포 전략을 이용하고 있음을 알 수 있었다.BitTorrent is one of the most popular applications for sharing contents over the Internet. The huge success of BitTorrent has attracted the research community to investigate BitTorrent's behavior in terms of throughput, fairness, and incentive issues, revealing valuable insights into the performance aspects of BitTorrent. However, most of these studies paid little attention to understand content bundling and publishing strategies in BitTorrent from the following perspectives: (1) how, and for what purposes, are constituent files bundled by BitTorrent publishers? and (2) what strategies are adopted by BitTorrent publishers to achieve their goals? To answer these questions with data from a large-scale BitTorrent system, we conduct comprehensive measurements on one of the largest BitTorrent portals: the Pirate Bay (TPB). From the datasets of the 120 K torrents and 16 M peers, we classify BitTorrent publishers into three types: (i) fake publishers, (ii) profit-driven publishers, and (iii) altruistic publishers. Throughout this dissertation, we investigate the current practice of bundling and publishing across different content categories: Movie, TV, Porn, Music, Application, Game, and E-book. We first investigate the current practice of content bundling to understand the structural patterns of torrents and the participant behaviors of swarms. In particular, we focus on: (1) how prevalent content bundling is, (2) how and what files are bundled into torrents, (3) what motivates publishers to bundle files, and (4) how peers access the bundled files. We find that over 72% of BitTorrent torrents contain multiple files, which indicates that bundling is widely used for file sharing. We reveal that profit-driven BitTorrent publishers who promote their own web sites for financial gains like advertising tend to prefer to use the bundling. We also observe that most files (94%) in a bundle torrent are selected by users and the bundle torrents are more popular than the single (or non-bundle) ones on average. Overall, there are notable differences in the structural patterns of torrents and swarm characteristics (i) across different content categories and (ii) between single and bundle torrents. We next investigate the current practice of content publishing in BitTorrent from a socio-economic point of view, by unraveling (1) how files are published by publishers, (2) what strategies are adopted by publishers, and (3) how effective those strategies are. We show that a significant amount of traffic (61%) of BitTorrent has been generated (i.e., unnecessarily wasted) to download fake torrents. Therefore, we suggest a method to filter out fake publishers on TPB by considering their distinct publishing patterns learned from our measurement study, and show that the proposed method can reduce around 45% of the total download traffic. We also reveal that profit-driven publishers adopt different publishing strategies according to their revenue models (e.g., advertising private tracker sites to attract potential new members, or exposing image URLs to make people click the URL links).Abstract i I. Introduction 1 II. Related Work 5 2.1 Multi-torrent Systems 5 2.2 Bundling in BitTorrent 6 2.3 Bundling in Economics 7 2.4 Content publishing in BitTorrent 7 III. Methodology 9 3.1 Measurement Methodology 9 3.2 Publisher Classification 11 IV. Bundling Practice in BitTorrent: What, How, and Why 14 4.1 Introduction 14 4.2 Datasets 16 4.2.1 Torrent Datasets 17 4.2.2 Swarm Datasets 17 4.3 Single vs. Bundle 18 4.3.1 Bundling is widespread 18 4.3.2 How files are bundled 20 4.4 Main File Analysis in Bundling 27 4.4.1 Identifying Main Files 28 4.4.2 Constituents of Bundle-k 29 4.5 Publisher Analysis 32 4.5.1 Contribution of Top-20 Publishers 33 4.5.2 Cross-category Publishing of Top-20 Publishers 39 4.6 User Access Pattern Analysis 40 4.6.1 Popularity Analysis 40 4.6.2 Availability Analysis 43 4.6.3 The Number of Files Requested by Users in a Bundle Torrent 44 4.6.4 Swarm Behaviors versus Bundle-k 47 4.7 Discussions 50 V. Content Publishing Practice in BitTorrent 52 5.1 Introduction 52 5.2 The Number of Published Torrents 54 5.3 Publishers Strategies 58 5.3.1 Lifetime of Publishers and their Publishing Rates 59 5.3.2 Content Categories 60 5.3.3 Advertising Strategies of Profit-driven Publishers 63 5.4 Downloaders Behavior 64 5.5 Implications on Publishers Strategies 69 5.5.1 Fake Publishers 69 5.5.2 Profit-driven Publishers 71 VI. Summary & Future Work 73 Bibliography 75 Korean Abstract 80Docto

    TOWARDS PRIVACY-PRESERVING AND ROBUST WEB OVERLAYS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Development of a system compliant with the Application-Layer Traffic Optimization Protocol

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaWith the ever-increasing Internet usage that is following the start of the new decade, the need to optimize this world-scale network of computers becomes a big priority in the technological sphere that has the number of users rising, as are the Quality of Service (QoS) demands by applications in domains such as media streaming or virtual reality. In the face of rising traffic and stricter application demands, a better understand ing of how Internet Service Providers (ISPs) should manage their assets is needed. An important concern regards to how applications utilize the underlying network infras tructure over which they reside. Most of these applications act with little regard for ISP preferences, as exemplified by their lack of care in achieving traffic locality during their operation, which would be a preferable feature for network administrators, and that could also improve application performance. However, even a best-effort attempt by applications to cooperate will hardly succeed if ISP policies aren’t clearly commu nicated to them. Therefore, a system to bridge layer interests has much potential in helping achieve a mutually beneficial scenario. The main focus of this thesis is the Application-Layer Traffic Optimization (ALTO) work ing group, which was formed by the Internet Engineering Task Force (IETF) to explore standardizations for network information retrieval. This group specified a request response protocol where authoritative entities provide resources containing network status information and administrative preferences. Sharing of infrastructural insight is done with the intent of enabling a cooperative environment, between the network overlay and underlay, during application operations, to obtain better infrastructural re sourcefulness and the consequential minimization of the associated operational costs. This work gives an overview of the historical network tussle between applications and service providers, presents the ALTO working group’s project as a solution, im plements an extended system built upon their ideas, and finally verifies the developed system’s efficiency, in a simulation, when compared to classical alternatives.Com o acrescido uso da Internet que acompanha o início da nova década, a necessidade de otimizar esta rede global de computadores passa a ser uma grande prioridade na esfera tecnológica que vê o seu número de utilizadores a aumentar, assim como a exigência, por parte das aplicações, de novos padrões de Qualidade de Serviço (QoS), como visto em domínios de transmissão de conteúdo multimédia em tempo real e em experiências de realidade virtual. Face ao aumento de tráfego e aos padrões de exigência aplicacional mais restritos, é necessário melhor compreender como os fornecedores de serviços Internet (ISPs) devem gerir os seus recursos. Um ponto fulcral é como aplicações utilizam os seus recursos da rede, onde muitas destas não têm consideração pelas preferências dos ISPs, como exemplificado pela sua falta de esforço em localizar tráfego, onde o contrário seria preferível por administradores de rede e teria potencial para melhorar o desempenho aplicacional. Uma tentativa de melhor esforço, por parte das aplicações, em resolver este problema, não será bem-sucedida se as preferências administrativas não forem claramente comunicadas. Portanto, um sistema que sirva de ponte de comunicação entre camadas pode potenciar um cenário mutuamente benéfico. O foco principal desta tese é o grupo de trabalho Application-Layer Traffic Optimization (ALTO), que foi formado pelo Internet Engineering Task Force (IETF) para explorar estandardizações para recolha de informação da rede. Este grupo especificou um protocolo onde entidades autoritárias disponibilizam recursos com informação de estado de rede, e preferências administrativas. A partilha de conhecimento infraestrutural é feita para possibilitar um ambiente cooperativo entre redes overlay e underlay, para uma mais eficiente utilização de recursos e a consequente minimização de custos operacionais. É pretendido dar uma visão da histórica disputa entre aplicações e ISPs, assim como apresentar o projeto do grupo de trabalho ALTO como solução, implementar e melhorar sobre as suas ideias, e finalmente verificar a eficiência do sistema numa simulação, quando comparado com alternativas clássicas

    Cognitive networking techniques on content distribution networks

    Get PDF
    First we want to design a strategy based on Artificial Intelligence (AI) techniques with the aim of increasing peers download performance. Some AI algorithms can find patterns in the information available to a peer locally, and use it to predict values that cannot be calculated by means of mathematical formulas. An important aspect of these techniques is that can be trained in order to improve its interpretation of the local available information. With this process they can make more accurate predictions and perform better results. We will use this prediction system to increase our knowledge about the swarm and the peers who are part of it. This global knowledge increase can be used to optimize the algorithms of BitTorrent and can represent a great improvement in peers download capacity. Our second challenge is to create a reduced group of peers (Crowd) that focus their efforts on improving the condition of the swarm through collaborative techniques. The basic idea of this approach is to organize a group of peers to act as a single node and focus them on getting all pieces of the content they are interested in. This involves avoiding, as far as possible, to download pieces that any of the members already have. The main goal of this technique consists of reaching as quickly as possible a copy of the content distributed between all members of the Crowd. Getting a distributed copy of the content is expected to increase the availability of parts and reduce dependence on the seeds (users who have the complete content), which would represent a great benefit for the whole swarm. Another aspect that we want to investigate is the use of a priority system among members of the Crowd. We consider that in certain situations to prioritize the Crowd peers at expense of regular peers can result in a significant increase of the download ratio

    Understand the Similarity of Internet Service Providers via Peer-to-Peer User Interest Analysis

    Get PDF
    University of Minnesota M.S. thesis. June 2019. Major: Computer Science. Advisor: Haiyang Wang. 1 computer file (PDF); 63 pages.Internet traffic continues to exhibit exponential growth in the past few years. This forces Internet service providers(ISPs) to continuously invest in infrastructure upgrades and deploy traffic management techniques, such as caching and locality, to fulfill the increasing user demand. To help ISPs better manage their infrastructures, it is important to compare and understand the similarity of their user interests. However, such a comparison is challenging because the ISP data is hard to obtain, not to mention the related modeling and analysis issues. In this thesis, we aim to understand the ISP similarity through an extensive analysis of Peer-to-Peer(P2P) user interest. To collect the P2P dataset, we develop a tool to automatically download BitTorrent's meta-info(torrent) files on the Internet. This tool also helps us to collect important peer and content information in these BitTorrent swarms without uploading any copyrighted files. As a result, we successfully obtained 16,697 active peers from 1,721 torrents in 1,097 unique Autonomous Systems(ASes). After that, we adopt the classic statistical and clustering approaches to compare their different user interests. Our research for the first time shows the existence of cloud users in such real-world content distribution systems as BitTorrent. The model analysis further indicates that we can adopt similar traffic management approaches (e.g., caching similar contents) across geographically closer ASes

    Static Web content distribution and request routing in a P2P overlay

    Get PDF
    The significance of collaboration over the Internet has become a corner-stone of modern computing, as the essence of information processing and content management has shifted to networked and Webbased systems. As a result, the effective and reliable access to networked resources has become a critical commodity in any modern infrastructure. In order to cope with the limitations introduced by the traditional client-server networking model, most of the popular Web-based services have employed separate Content Delivery Networks (CDN) to distribute the server-side resource consumption. Since the Web applications are often latency-critical, the CDNs are additionally being adopted for optimizing the content delivery latencies perceived by the Web clients. Because of the prevalent connection model, the Web content delivery has grown to a notable industry. The rapid growth in the amount of mobile devices further contributes to the amount of resources required from the originating server, as the content is also accessible on the go. While the Web has become one of the most utilized sources of information and digital content, the openness of the Internet is simultaneously being reduced by organizations and governments preventing access to any undesired resources. The access to information may be regulated or altered to suit any political interests or organizational benefits, thus conflicting with the initial design principle of an unrestricted and independent information network. This thesis contributes to the development of more efficient and open Internet by combining a feasibility study and a preliminary design of a peer-to-peer based Web content distribution and request routing mechanism. The suggested design addresses both the challenges related to effectiveness of current client-server networking model and the openness of information distributed over the Internet. Based on the properties of existing peer-to-peer implementations, the suggested overlay design is intended to provide low-latency access to any Web content without sacrificing the end-user privacy. The overlay is additionally designed to increase the cost of censorship by forcing a successful blockade to isolate the censored network from the rest of the Internet
    corecore