26,080 research outputs found

    Content and Geographical Locality in User-Generated Content Sharing Systems

    Get PDF
    International audienceUser Generated Content (UGC), such as YouTube videos, accounts for a substantial fraction of the Internet traffic. To optimize their performance, UGC services usually rely on both proactive and reactive approaches that exploit spatial and temporal locality in access patterns. Alternative types of locality are also relevant and hardly ever considered together. In this paper, we show on a large (more than 650,000 videos) YouTube dataset that content locality (induced by the related videos feature) and geographic locality, are in fact correlated. More specifically, we show how the geographic view distribution of a video can be inferred to a large extent from that of its related videos. We leverage these findings to propose a UGC storage system that proactively places videos close to the expected requests. Compared to a caching-based solution, our system decreases by 16% the number of requests served from a different country than that of the requesting user, and even in this case, the distance between the user and the server is 29% shorter on average

    CLOSER: A Collaborative Locality-aware Overlay SERvice

    Get PDF
    Current Peer-to-Peer (P2P) file sharing systems make use of a considerable percentage of Internet Service Providers (ISPs) bandwidth. This paper presents the Collaborative Locality-aware Overlay SERvice (CLOSER), an architecture that aims at lessening the usage of expensive international links by exploiting traffic locality (i.e., a resource is downloaded from the inside of the ISP whenever possible). The paper proves the effectiveness of CLOSER by analysis and simulation, also comparing this architecture with existing solutions for traffic locality in P2P systems. While savings on international links can be attractive for ISPs, it is necessary to offer some features that can be of interest for users to favor a wide adoption of the application. For this reason, CLOSER also introduces a privacy module that may arouse the users' interest and encourage them to switch to the new architectur

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Passive characterization of sopcast usage in residential ISPs

    Get PDF
    Abstract—In this paper we present an extensive analysis of traffic generated by SopCast users and collected from operative networks of three national ISPs in Europe. After more than a year of continuous monitoring, we present results about the popularity of SopCast which is the largely preferred application in the studied networks. We focus on analysis of (i) application and bandwidth usage at different time scales, (ii) peer lifetime, arrival and departure processes, (iii) peer localization in the world. Results provide useful insights into users ’ behavior, including their attitude towards P2P-TV application usage and the conse-quent generated load on the network, that is quite variable based on the access technology and geographical location. Our findings are interesting to Researchers interested in the investigation of users ’ attitude towards P2P-TV services, to foresee new trends in the future usage of the Internet, and to augment the design of their application. I

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    On the Accuracy of Hyper-local Geotagging of Social Media Content

    Full text link
    Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.Comment: 10 page
    corecore