26,080 research outputs found
Content and Geographical Locality in User-Generated Content Sharing Systems
International audienceUser Generated Content (UGC), such as YouTube videos, accounts for a substantial fraction of the Internet traffic. To optimize their performance, UGC services usually rely on both proactive and reactive approaches that exploit spatial and temporal locality in access patterns. Alternative types of locality are also relevant and hardly ever considered together. In this paper, we show on a large (more than 650,000 videos) YouTube dataset that content locality (induced by the related videos feature) and geographic locality, are in fact correlated. More specifically, we show how the geographic view distribution of a video can be inferred to a large extent from that of its related videos. We leverage these findings to propose a UGC storage system that proactively places videos close to the expected requests. Compared to a caching-based solution, our system decreases by 16% the number of requests served from a different country than that of the requesting user, and even in this case, the distance between the user and the server is 29% shorter on average
CLOSER: A Collaborative Locality-aware Overlay SERvice
Current Peer-to-Peer (P2P) file sharing systems make use of a considerable percentage of Internet Service Providers (ISPs) bandwidth. This paper presents the Collaborative Locality-aware Overlay SERvice (CLOSER), an architecture that aims at lessening the usage of expensive international links by exploiting traffic locality (i.e., a resource is downloaded from the inside of the ISP whenever possible). The paper proves the effectiveness of CLOSER by analysis and simulation, also comparing this architecture with existing solutions for traffic locality in P2P systems. While savings on international links can be attractive for ISPs, it is necessary to offer some features that can be of interest for users to favor a wide adoption of the application. For this reason, CLOSER also introduces a privacy module that may arouse the users' interest and encourage them to switch to the new architectur
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Passive characterization of sopcast usage in residential ISPs
Abstract—In this paper we present an extensive analysis of traffic generated by SopCast users and collected from operative networks of three national ISPs in Europe. After more than a year of continuous monitoring, we present results about the popularity of SopCast which is the largely preferred application in the studied networks. We focus on analysis of (i) application and bandwidth usage at different time scales, (ii) peer lifetime, arrival and departure processes, (iii) peer localization in the world. Results provide useful insights into users ’ behavior, including their attitude towards P2P-TV application usage and the conse-quent generated load on the network, that is quite variable based on the access technology and geographical location. Our findings are interesting to Researchers interested in the investigation of users ’ attitude towards P2P-TV services, to foresee new trends in the future usage of the Internet, and to augment the design of their application. I
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
On the Accuracy of Hyper-local Geotagging of Social Media Content
Social media users share billions of items per year, only a small fraction of
which is geotagged. We present a data- driven approach for identifying
non-geotagged content items that can be associated with a hyper-local
geographic area by modeling the location distributions of hyper-local n-grams
that appear in the text. We explore the trade-off between accuracy, precision
and coverage of this method. Further, we explore differences across content
received from multiple platforms and devices, and show, for example, that
content shared via different sources and applications produces significantly
different geographic distributions, and that it is best to model and predict
location for items according to their source. Our findings show the potential
and the bounds of a data-driven approach to geotag short social media texts,
and offer implications for all applications that use data-driven approaches to
locate content.Comment: 10 page
- …