224 research outputs found

    Scalable service for flexible access to personal content

    Get PDF

    The evolution of P2P networks for file exchange: the interaction between social controversy and technical change

    Get PDF
    Since the irruption of Napster in 1999, Peer-to-Peer computer networks for file exchange have been at the heart of a heated debate that has eventually evolved into a wide social controversy across the world, involving legal, economical, and even political issues. This essay analyzes the effects of this controversy on the technical innovations that have shaped the evolution of those systems. It argues that the usual image of a single two-sided conflict does not account for most of the technical changes involved. P2P entrepreneurs and creators show a wide range of motivations and business strategies -if any- and users are not a monolithic group with a common set of goals and values. As a result, the actual historical evolution of those networks does not follow a simple linear path but a more complex and multidirectional development

    Deploying Large-Scale Datasets on-Demand in the Cloud: Treats and Tricks on Data Distribution

    Get PDF
    Public clouds have democratised the access to analytics for virtually any institution in the world. Virtual Machines (VMs) can be provisioned on demand, and be used to crunch data after uploading into the VMs. While this task is trivial for a few tens of VMs, it becomes increasingly complex and time consuming when the scale grows to hundreds or thousands of VMs crunching tens or hundreds of TB. Moreover, the elapsed time comes at a price: the cost of provisioning VMs in the cloud and keeping them waiting to load the data. In this paper we present a big data provisioning service that incorporates hierarchical and peer-to-peer data distribution techniques to speed-up data loading into the VMs used for data processing. The system dynamically mutates the sources of the data for the VMs to speed-up data loading. We tested this solution with 1000 VMs and 100 TB of data, reducing time by at least 30 % over current state of the art techniques. This dynamic topology mechanism is tightly coupled with classic declarative machine configuration techniques (the system takes a single high-level declarative configuration file and configures both software and data loading). Together, these two techniques simplify the deployment of big data in the cloud for end users who may not be experts in infrastructure management. Index Terms—Large-scale data transfer, flash crowd, big data, BitTorrent, p2p overlay, provisioning, big data distribution I

    Static Web content distribution and request routing in a P2P overlay

    Get PDF
    The significance of collaboration over the Internet has become a corner-stone of modern computing, as the essence of information processing and content management has shifted to networked and Webbased systems. As a result, the effective and reliable access to networked resources has become a critical commodity in any modern infrastructure. In order to cope with the limitations introduced by the traditional client-server networking model, most of the popular Web-based services have employed separate Content Delivery Networks (CDN) to distribute the server-side resource consumption. Since the Web applications are often latency-critical, the CDNs are additionally being adopted for optimizing the content delivery latencies perceived by the Web clients. Because of the prevalent connection model, the Web content delivery has grown to a notable industry. The rapid growth in the amount of mobile devices further contributes to the amount of resources required from the originating server, as the content is also accessible on the go. While the Web has become one of the most utilized sources of information and digital content, the openness of the Internet is simultaneously being reduced by organizations and governments preventing access to any undesired resources. The access to information may be regulated or altered to suit any political interests or organizational benefits, thus conflicting with the initial design principle of an unrestricted and independent information network. This thesis contributes to the development of more efficient and open Internet by combining a feasibility study and a preliminary design of a peer-to-peer based Web content distribution and request routing mechanism. The suggested design addresses both the challenges related to effectiveness of current client-server networking model and the openness of information distributed over the Internet. Based on the properties of existing peer-to-peer implementations, the suggested overlay design is intended to provide low-latency access to any Web content without sacrificing the end-user privacy. The overlay is additionally designed to increase the cost of censorship by forcing a successful blockade to isolate the censored network from the rest of the Internet

    Smart PIN: performance and cost-oriented context-aware personal information network

    Get PDF
    The next generation of networks will involve interconnection of heterogeneous individual networks such as WPAN, WLAN, WMAN and Cellular network, adopting the IP as common infrastructural protocol and providing virtually always-connected network. Furthermore, there are many devices which enable easy acquisition and storage of information as pictures, movies, emails, etc. Therefore, the information overload and divergent content’s characteristics make it difficult for users to handle their data in manual way. Consequently, there is a need for personalised automatic services which would enable data exchange across heterogeneous network and devices. To support these personalised services, user centric approaches for data delivery across the heterogeneous network are also required. In this context, this thesis proposes Smart PIN - a novel performance and cost-oriented context-aware Personal Information Network. Smart PIN's architecture is detailed including its network, service and management components. Within the service component, two novel schemes for efficient delivery of context and content data are proposed: Multimedia Data Replication Scheme (MDRS) and Quality-oriented Algorithm for Multiple-source Multimedia Delivery (QAMMD). MDRS supports efficient data accessibility among distributed devices using data replication which is based on a utility function and a minimum data set. QAMMD employs a buffer underflow avoidance scheme for streaming, which achieves high multimedia quality without content adaptation to network conditions. Simulation models for MDRS and QAMMD were built which are based on various heterogeneous network scenarios. Additionally a multiple-source streaming based on QAMMS was implemented as a prototype and tested in an emulated network environment. Comparative tests show that MDRS and QAMMD perform significantly better than other approaches

    Efficient Processing of Ranking Queries in Novel Applications

    Get PDF
    Ranking queries, which return only a subset of results matching a user query, have been studied extensively in the past decade due to their importance in a wide range of applications. In this thesis, we study ranking queries in novel environments and settings where they have not been considered so far. With the advancements in sensor technologies, these small devices are today present in all corners of human life. Millions of them are deployed in various places and are sending data on a continuous basis. These sensors which before mainly monitored environmental phenomena or production chains, have now found their way into our daily lives as well; health monitoring being a plausible example of how much we rely on continuous observation of measurements. As the Web technology evolves and facilitates data stream transmissions, sensors do not remain the sole producers of data in form of streams. The Web 2.0 has escalated the production of user-generated content which appear in form of annotated posts in a Weblog (blog), pictures and videos, or small textual snippets reflecting the current activity or status of users and can be regarded as natural items of a temporal stream. A major part of this thesis is devoted to developing novel methods which assist in keeping track of this ever increasing flow of information with continuous monitoring of ranking queries over them, particularly when traditional approaches fail to meet the newly raised requirements. We consider the ranking problem when the information flow is not synchronized among its sources. This is a recurring situation, since sensors are run by different organizations, measure moving entities, or are simply represented by users which are inherently not synchronizable. Our methods are in particular designed for handling unsynchronized streams, calculating an object's score based on both its currently observed contribution to the registered queries as well as the contribution it might have in future. While this uncertainty in score calculation causes linear growth in the space necessary for providing exact results, we are able to define criteria which allows for evicting unpromising objects as early as possible. We also leverage statistical properties that reflect the correlation between multiple streams to predict the future to provide better bounds for the best possible contribution of an object, consequently limiting the necessary storage dramatically. To achieve this, we make use of small statistical synopses that are periodically refreshed during runtime. Furthermore, we consider user generated queries in the context of Web 2.0 applications which aim at filtering data streams in forms of textual documents, based on personal interests. In this case, the dimensionality of the data, the large cardinality of the subscribed queries, as well as the desire for consuming recent information, raise new challenges. We develop new approaches which efficiently filter the information and provide real-time updates to the user subscribed queries. Our methods rely on a novel ordering of user queries in traditional inverted lists which allows the system to effectively prune those queries for which a new piece of information is of no interest. Finally, we investigate high quality search in user generated content in Web 2.0 applications in form of images or videos. These resources are inherently dispersed all over the globe, therefore can be best managed in a purely distributed peer-to-peer network which eliminates single points of failure. Search in such a huge repository of high dimensional data involves evaluating ranking queries in form of nearest neighbor queries. Therefore, we study ranking queries in high dimensional spaces, where the index of the objects is maintained in a purely distributed fashion. Our solution meets the two major requirements of a viable solution in distributing the index and evaluating ranking queries: the underlying peer-to-peer network remains load balanced, and efficient query evaluation is feasible as similar objects are assigned to nearby peers

    Estudo do IPFS como protocolo de distribuição de conteúdos em redes veiculares

    Get PDF
    Over the last few years, vehicular ad-hoc networks (VANETs) have been the focus of great progress due to the interest in autonomous vehicles and in distributing content not only between vehicles, but also to the Cloud. Performing a download/upload to/from a vehicle typically requires the existence of a cellular connection, but the costs associated with mobile data transfers in hundreds or thousands of vehicles quickly become prohibitive. A VANET allows the costs to be several orders of magnitude lower - while keeping the same large volumes of data - because it is strongly based in the communication between vehicles (nodes of the network) and the infrastructure. The InterPlanetary File System (IPFS) is a protocol for storing and distributing content, where information is addressed by its content, instead of its location. It was created in 2014 and it seeks to connect all computing devices with the same system of files, comparable to a BitTorrent swarm exchanging Git objects. It has been tested and deployed in wired networks, but never in an environment where nodes have intermittent connectivity, such as a VANET. This work focuses on understanding IPFS, how/if it can be applied to the vehicular network context, and comparing it with other content distribution protocols. In this dissertation, IPFS has been tested in a small and controlled network to understand its working applicability to VANETs. Issues such as neighbor discoverability times and poor hashing performance have been addressed. To compare IPFS with other protocols (such as Veniam’s proprietary solution or BitTorrent) in a relevant way and in a large scale, an emulation platform was created. The tests in this emulator were performed in different times of the day, with a variable number of files and file sizes. Emulated results show that IPFS is on par with Veniam’s custom V2V protocol built specifically for V2V, and greatly outperforms BitTorrent regarding neighbor discoverability and data transfers. An analysis of IPFS’ performance in a real scenario was also conducted, using a subset of STCP’s vehicular network in Oporto, with the support of Veniam. Results from these tests show that IPFS can be used as a content dissemination protocol, showing it is up to the challenge provided by a constantly changing network topology, and achieving throughputs up to 2.8 MB/s, values similar or in some cases even better than Veniam’s proprietary solution.Nos últimos anos, as redes veiculares (VANETs) têm sido o foco de grandes avanços devido ao interesse em veículos autónomos e em distribuir conteúdos, não só entre veículos mas também para a "nuvem" (Cloud). Tipicamente, fazer um download/upload de/para um veículo exige a utilização de uma ligação celular (SIM), mas os custos associados a fazer transferências com dados móveis em centenas ou milhares de veículos rapidamente se tornam proibitivos. Uma VANET permite que estes custos sejam consideravelmente inferiores - mantendo o mesmo volume de dados - pois é fortemente baseada na comunicação entre veículos (nós da rede) e a infraestrutura. O InterPlanetary File System (IPFS - "sistema de ficheiros interplanetário") é um protocolo de armazenamento e distribuição de conteúdos, onde a informação é endereçada pelo conteúdo, em vez da sua localização. Foi criado em 2014 e tem como objetivo ligar todos os dispositivos de computação num só sistema de ficheiros, comparável a um swarm BitTorrent a trocar objetos Git. Já foi testado e usado em redes com fios, mas nunca num ambiente onde os nós têm conetividade intermitente, tal como numa VANET. Este trabalho tem como foco perceber o IPFS, como/se pode ser aplicado ao contexto de rede veicular e compará-lo a outros protocolos de distribuição de conteúdos. Numa primeira fase o IPFS foi testado numa pequena rede controlada, de forma a perceber a sua aplicabilidade às VANETs, e resolver os seus primeiros problemas como os tempos elevados de descoberta de vizinhos e o fraco desempenho de hashing. De modo a poder comparar o IPFS com outros protocolos (tais como a solução proprietária da Veniam ou o BitTorrent) de forma relevante e em grande escala, foi criada uma plataforma de emulação. Os testes neste emulador foram efetuados usando registos de mobilidade e conetividade veicular de alturas diferentes de um dia, com um número variável de ficheiros e tamanhos de ficheiros. Os resultados destes testes mostram que o IPFS está a par do protocolo V2V da Veniam (desenvolvido especificamente para V2V e VANETs), e que o IPFS é significativamente melhor que o BitTorrent no que toca ao tempo de descoberta de vizinhos e transferência de informação. Uma análise do desempenho do IPFS em cenário real também foi efetuada, usando um pequeno conjunto de nós da rede veicular da STCP no Porto, com o apoio da Veniam. Os resultados destes testes demonstram que o IPFS pode ser usado como protocolo de disseminação de conteúdos numa VANET, mostrando-se adequado a uma topologia constantemente sob alteração, e alcançando débitos até 2.8 MB/s, valores parecidos ou nalguns casos superiores aos do protocolo proprietário da Veniam.Mestrado em Engenharia de Computadores e Telemátic
    corecore