16,103 research outputs found
Enabling Social Applications via Decentralized Social Data Management
An unprecedented information wealth produced by online social networks,
further augmented by location/collocation data, is currently fragmented across
different proprietary services. Combined, it can accurately represent the
social world and enable novel socially-aware applications. We present
Prometheus, a socially-aware peer-to-peer service that collects social
information from multiple sources into a multigraph managed in a decentralized
fashion on user-contributed nodes, and exposes it through an interface
implementing non-trivial social inferences while complying with user-defined
access policies. Simulations and experiments on PlanetLab with emulated
application workloads show the system exhibits good end-to-end response time,
low communication overhead and resilience to malicious attacks.Comment: 27 pages, single ACM column, 9 figures, accepted in Special Issue of
Foundations of Social Computing, ACM Transactions on Internet Technolog
Dynamic data placement and discovery in wide-area networks
The workloads of online services and applications such as social networks, sensor data platforms and web search engines have become increasingly global and dynamic, setting new challenges to providing users with low latency access to data. To achieve this, these services typically leverage a multi-site wide-area networked infrastructure. Data access latency in such an infrastructure depends on the network paths between users and data, which is determined by the data placement and discovery strategies. Current strategies are static, which offer low latencies upon deployment but worse performance under a dynamic workload.
We propose dynamic data placement and discovery strategies for wide-area networked infrastructures, which adapt to the data access workload. We achieve this with data activity correlation (DAC), an application-agnostic approach for determining the correlations between data items based on access pattern similarities. By dynamically clustering data according to DAC, network traffic in clusters is kept local. We utilise DAC as a key component in reducing access latencies for two application scenarios, emphasising different aspects of the problem:
The first scenario assumes the fixed placement of data at sites, and thus focusses on data discovery. This is the case for a global sensor discovery platform, which aims to provide low latency discovery of sensor metadata. We present a self-organising hierarchical infrastructure consisting of multiple DAC clusters, maintained with an online and distributed split-and-merge algorithm. This reduces the number of sites visited, and thus latency, during discovery for a variety of workloads.
The second scenario focusses on data placement. This is the case for global online services that leverage a multi-data centre deployment to provide users with low latency access to data. We present a geo-dynamic partitioning middleware, which maintains DAC clusters with an online elastic partition algorithm. It supports the geo-aware placement of partitions across data centres according to the workload. This provides globally distributed users with low latency access to data for static and dynamic workloads.Open Acces
Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement
The problem of identifying the optimal location for a new retail store has
been the focus of past research, especially in the field of land economy, due
to its importance in the success of a business. Traditional approaches to the
problem have factored in demographics, revenue and aggregated human flow
statistics from nearby or remote areas. However, the acquisition of relevant
data is usually expensive. With the growth of location-based social networks,
fine grained data describing user mobility and popularity of places has
recently become attainable.
In this paper we study the predictive power of various machine learning
features on the popularity of retail stores in the city through the use of a
dataset collected from Foursquare in New York. The features we mine are based
on two general signals: geographic, where features are formulated according to
the types and density of nearby places, and user mobility, which includes
transitions between venues or the incoming flow of mobile users from distant
areas. Our evaluation suggests that the best performing features are common
across the three different commercial chains considered in the analysis,
although variations may exist too, as explained by heterogeneities in the way
retail facilities attract users. We also show that performance improves
significantly when combining multiple features in supervised learning
algorithms, suggesting that the retail success of a business may depend on
multiple factors.Comment: Proceedings of the 19th ACM SIGKDD international conference on
Knowledge discovery and data mining, Chicago, 2013, Pages 793-80
- …