189,747 research outputs found
An Efficient Holistic Data Distribution and Storage Solution for Online Social Networks
In the past few years, Online Social Networks (OSNs) have dramatically spread over the world. Facebook [4], one of the largest worldwide OSNs, has 1.35 billion users, 82.2% of whom are outside the US [36]. The browsing and posting interactions (text content) between OSN users lead to user data reads (visits) and writes (updates) in OSN datacenters, and Facebook now serves a billion reads and tens of millions of writes per second [37]. Besides that, Facebook has become one of the top Internet traļ¬c sources [36] by sharing tremendous number of large multimedia ļ¬les including photos and videos. The servers in datacenters have limited resources (e.g. bandwidth) to supply latency eļ¬cient service for multimedia ļ¬le sharing among the rapid growing users worldwide. Most online applications operate under soft real-time constraints (e.g., ā¤ 300 ms latency) for good user experience, and its service latency is negatively proportional to its income. Thus, the service latency is a very important requirement for Quality of Service (QoS) to the OSN as a web service, since it is relevant to the OSNās revenue and user experience. Also, to increase OSN revenue, OSN service providers need to constrain capital investment, operation costs, and the resource (bandwidth) usage costs. Therefore, it is critical for the OSN to supply a guaranteed QoS for both text and multimedia contents to users while minimizing its costs. To achieve this goal, in this dissertation, we address three problems. i) Data distribution among datacenters: how to allocate data (text contents) among data servers with low service latency and minimized inter-datacenter network load; ii) Eļ¬cient multimedia ļ¬le sharing: how to facilitate the servers in datacenters to eļ¬ciently share multimedia ļ¬les among users; iii) Cost minimized data allocation among cloud storages: how to save the infrastructure (datacenters) capital investment and operation costs by leveraging commercial cloud storage services. Data distribution among datacenters. To serve the text content, the new OSN model, which deploys datacenters globally, helps reduce service latency to worldwide distributed users and release the load of the existing datacenters. However, it causes higher inter-datacenter communica-tion load. In the OSN, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. The distributed data storage, which only stores a userās data to his/her geographically closest datacenters, simply mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter com-munication and hence long service latencies. Therefore, the OSNs need a data allocation algorithm among datacenters with minimized network load and low service latency. Eļ¬cient multimedia ļ¬le sharing. To serve multimedia ļ¬le sharing with rapid growing user population, the ļ¬le distribution method should be scalable and cost eļ¬cient, e.g. minimiza-tion of bandwidth usage of the centralized servers. The P2P networks have been widely used for ļ¬le sharing among a large amount of users [58, 131], and meet both scalable and cost eļ¬cient re-quirements. However, without fully utilizing the altruism and trust among friends in the OSNs, current P2P assisted ļ¬le sharing systems depend on strangers or anonymous users to distribute ļ¬les that degrades their performance due to user selļ¬sh and malicious behaviors. Therefore, the OSNs need a cost eļ¬cient and trustworthy P2P-assisted ļ¬le sharing system to serve multimedia content distribution. Cost minimized data allocation among cloud storages. The new trend of OSNs needs to build worldwide datacenters, which introduce a large amount of capital investment and maintenance costs. In order to save the capital expenditures to build and maintain the hardware infrastructures, the OSNs can leverage the storage services from multiple Cloud Service Providers (CSPs) with existing worldwide distributed datacenters [30, 125, 126]. These datacenters provide diļ¬erent Get/Put latencies and unit prices for resource utilization and reservation. Thus, when se-lecting diļ¬erent CSPsā datacenters, an OSN as a cloud customer of a globally distributed application faces two challenges: i) how to allocate data to worldwide datacenters to satisfy application SLA (service level agreement) requirements including both data retrieval latency and availability, and ii) how to allocate data and reserve resources in datacenters belonging to diļ¬erent CSPs to minimize the payment cost. Therefore, the OSNs need a data allocation system distributing data among CSPsā datacenters with cost minimization and SLA guarantee. In all, the OSN needs an eļ¬cient holistic data distribution and storage solution to minimize its network load and cost to supply a guaranteed QoS for both text and multimedia contents. In this dissertation, we propose methods to solve each of the aforementioned challenges in OSNs. Firstly, we verify the beneļ¬ts of the new trend of OSNs and present OSN typical properties that lay the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters (SD3) to allocate user data among geographical distributed datacenters. In SD3,a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a userās diļ¬erent types of data (e.g., status update, friend post) for replication, making sure that a replica always reduces inter-datacenter communication. Secondly, we analyze a BitTorrent ļ¬le sharing trace, which proves the necessity of proximity-and interest-aware clustering. Based on the trace study and OSN properties, to address the second problem, we propose a SoCial Network integrated P2P ļ¬le sharing system for enhanced Eļ¬ciency and Trustworthiness (SOCNET) to fully and cooperatively leverage the common-interest, geographically-close and trust properties of OSN friends. SOCNET uses a hierarchical distributed hash table (DHT) to cluster common-interest nodes, and then further clusters geographically close nodes into a subcluster, and connects the nodes in a subcluster with social links. Thus, when queries travel along trustable social links, they also gain higher probability of being successfully resolved by proximity-close nodes, simultaneously enhancing eļ¬ciency and trustworthiness. Thirdly, to handle the third problem, we model the cost minimization problem under the SLA constraints using integer programming. According to the system model, we propose an Eco-nomical and SLA-guaranteed cloud Storage Service (ES3), which ļ¬nds a data allocation and resource reservation schedule with cost minimization and SLA guarantee. ES3 incorporates (1) a data al-location and reservation algorithm, which allocates each data item to a datacenter and determines the reservation amount on datacenters by leveraging all the pricing policies; (2) a genetic algorithm based data allocation adjustment approach, which makes data Get/Put rates stable in each data-center to maximize the reservation beneļ¬t; and (3) a dynamic request redirection algorithm, which dynamically redirects a data request from an over-utilized datacenter to an under-utilized datacenter with suļ¬cient reserved resource when the request rate varies greatly to further reduce the payment. Finally, we conducted trace driven experiments on a distributed testbed, PlanetLab, and real commercial cloud storage (Amazon S3, Windows Azure Storage and Google Cloud Storage) to demonstrate the eļ¬ciency and eļ¬ectiveness of our proposed systems in comparison with other systems. The results show that our systems outperform others in the network savings and data distribution eļ¬ciency
Data transfer scheduling with advance reservation and provisioning
Over the years, scientific applications have become more complex and more data intensive. Although through the use of distributed resources the institutions and organizations gain access to the resources needed for their large-scale applications, complex middleware is required to orchestrate the use of these storage and network resources between collaborating parties, and to manage the end-to-end processing of data. We present a new data scheduling paradigm with advance reservation and provisioning. Our methodology provides a basis for provisioning end-to-end high performance data transfers which require integration between system, storage and network resources, and coordination between reservation managers and data transfer nodes. This allows researchers/users and higher level meta-schedulers to use data placement as a service where they can plan ahead and reserve time and resources for their data movement operations. We present a novel approach for evaluating time-dependent structures with bandwidth guaranteed paths. We present a practical online scheduling model using advance reservation in dynamic network with time constraints. In addition, we report a new polynomial algorithm presenting possible reservation options and alternatives for earliest completion and shortest transfer duration. We enhance the advance network reservation system by extending the underlying mechanism to provide a new service in which users submit their constraints and the system suggests possible reservation requests satisfying users\u27 requirements. We have studied scheduling data transfer operation with resource and time conflicts. We have developed a new scheduling methodology considering resource allocation in client sites and bandwidth allocation on network link connecting resources. Some other major contributions of our study include enhanced reliability, adaptability, and performance optimization of distributed data placement tasks. While designing this new data scheduling architecture, we also developed other important methodologies such as early error detection, failure awareness, job aggregation, and dynamic adaptation of distributed data placement tasks. The adaptive tuning includes dynamically setting data transfer parameters and controlling utilization of available network capacity. Our research aims to provide a middleware to improve the data bottleneck in high performance computing systems
Predicting Intermediate Storage Performance for Workflow Applications
Configuring a storage system to better serve an application is a challenging
task complicated by a multidimensional, discrete configuration space and the
high cost of space exploration (e.g., by running the application with different
storage configurations). To enable selecting the best configuration in a
reasonable time, we design an end-to-end performance prediction mechanism that
estimates the turn-around time of an application using storage system under a
given configuration. This approach focuses on a generic object-based storage
system design, supports exploring the impact of optimizations targeting
workflow applications (e.g., various data placement schemes) in addition to
other, more traditional, configuration knobs (e.g., stripe size or replication
level), and models the system operation at data-chunk and control message
level.
This paper presents our experience to date with designing and using this
prediction mechanism. We evaluate this mechanism using micro- as well as
synthetic benchmarks mimicking real workflow applications, and a real
application.. A preliminary evaluation shows that we are on a good track to
meet our objectives: it can scale to model a workflow application run on an
entire cluster while offering an over 200x speedup factor (normalized by
resource) compared to running the actual application, and can achieve, in the
limited number of scenarios we study, a prediction accuracy that enables
identifying the best storage system configuration
When Queueing Meets Coding: Optimal-Latency Data Retrieving Scheme in Storage Clouds
In this paper, we study the problem of reducing the delay of downloading data
from cloud storage systems by leveraging multiple parallel threads, assuming
that the data has been encoded and stored in the clouds using fixed rate
forward error correction (FEC) codes with parameters (n, k). That is, each file
is divided into k equal-sized chunks, which are then expanded into n chunks
such that any k chunks out of the n are sufficient to successfully restore the
original file. The model can be depicted as a multiple-server queue with
arrivals of data retrieving requests and a server corresponding to a thread.
However, this is not a typical queueing model because a server can terminate
its operation, depending on when other servers complete their service (due to
the redundancy that is spread across the threads). Hence, to the best of our
knowledge, the analysis of this queueing model remains quite uncharted.
Recent traces from Amazon S3 show that the time to retrieve a fixed size
chunk is random and can be approximated as a constant delay plus an i.i.d.
exponentially distributed random variable. For the tractability of the
theoretical analysis, we assume that the chunk downloading time is i.i.d.
exponentially distributed. Under this assumption, we show that any
work-conserving scheme is delay-optimal among all on-line scheduling schemes
when k = 1. When k > 1, we find that a simple greedy scheme, which allocates
all available threads to the head of line request, is delay optimal among all
on-line scheduling schemes. We also provide some numerical results that point
to the limitations of the exponential assumption, and suggest further research
directions.Comment: Original accepted by IEEE Infocom 2014, 9 pages. Some statements in
the Infocom paper are correcte
A game theoretic approach to a peer-to-peer cloud storage model
Classical cloud storage based on external data providers has been recognized
to suffer from a number of drawbacks. This is due to its inherent centralized
architecture which makes it vulnerable to external attacks, malware, technical
failures, as well to the large premium charged for business purposes. In this
paper, we propose an alternative distributed peer-to-peer cloud storage model
which is based on the observation that the users themselves often have
available storage capabilities to be offered in principle to other users. Our
set-up is that of a network of users connected through a graph, each of them
being at the same time a source of data to be stored externally and a possible
storage resource. We cast the peer-to-peer storage model to a Potential Game
and we propose an original decentralized algorithm which makes units interact,
cooperate, and store a complete back up of their data on their connected
neighbors. We present theoretical results on the algorithm as well a good
number of simulations which validate our approach.Comment: 10 page
A Taxonomy for Management and Optimization of Multiple Resources in Edge Computing
Edge computing is promoted to meet increasing performance needs of
data-driven services using computational and storage resources close to the end
devices, at the edge of the current network. To achieve higher performance in
this new paradigm one has to consider how to combine the efficiency of resource
usage at all three layers of architecture: end devices, edge devices, and the
cloud. While cloud capacity is elastically extendable, end devices and edge
devices are to various degrees resource-constrained. Hence, an efficient
resource management is essential to make edge computing a reality. In this
work, we first present terminology and architectures to characterize current
works within the field of edge computing. Then, we review a wide range of
recent articles and categorize relevant aspects in terms of 4 perspectives:
resource type, resource management objective, resource location, and resource
use. This taxonomy and the ensuing analysis is used to identify some gaps in
the existing research. Among several research gaps, we found that research is
less prevalent on data, storage, and energy as a resource, and less extensive
towards the estimation, discovery and sharing objectives. As for resource
types, the most well-studied resources are computation and communication
resources. Our analysis shows that resource management at the edge requires a
deeper understanding of how methods applied at different levels and geared
towards different resource types interact. Specifically, the impact of mobility
and collaboration schemes requiring incentives are expected to be different in
edge architectures compared to the classic cloud solutions. Finally, we find
that fewer works are dedicated to the study of non-functional properties or to
quantifying the footprint of resource management techniques, including
edge-specific means of migrating data and services.Comment: Accepted in the Special Issue Mobile Edge Computing of the Wireless
Communications and Mobile Computing journa
Dynamic Parameter Allocation in Parameter Servers
To keep up with increasing dataset sizes and model complexity, distributed
training has become a necessity for large machine learning tasks. Parameter
servers ease the implementation of distributed parameter management---a key
concern in distributed training---, but can induce severe communication
overhead. To reduce communication overhead, distributed machine learning
algorithms use techniques to increase parameter access locality (PAL),
achieving up to linear speed-ups. We found that existing parameter servers
provide only limited support for PAL techniques, however, and therefore prevent
efficient training. In this paper, we explore whether and to what extent PAL
techniques can be supported, and whether such support is beneficial. We propose
to integrate dynamic parameter allocation into parameter servers, describe an
efficient implementation of such a parameter server called Lapse, and
experimentally compare its performance to existing parameter servers across a
number of machine learning tasks. We found that Lapse provides near-linear
scaling and can be orders of magnitude faster than existing parameter servers
- ā¦