142,673 research outputs found
Computing in the RAIN: a reliable array of independent nodes
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology
Remove-Win: a Design Framework for Conflict-free Replicated Data Collections
Internet-scale distributed systems often replicate data within and across
data centers to provide low latency and high availability despite node and
network failures. Replicas are required to accept updates without coordination
with each other, and the updates are then propagated asynchronously. This
brings the issue of conflict resolution among concurrent updates, which is
often challenging and error-prone. The Conflict-free Replicated Data Type
(CRDT) framework provides a principled approach to address this challenge.
This work focuses on a special type of CRDT, namely the Conflict-free
Replicated Data Collection (CRDC), e.g. list and queue. The CRDC can have
complex and compound data items, which are organized in structures of rich
semantics. Complex CRDCs can greatly ease the development of upper-layer
applications, but also makes the conflict resolution notoriously difficult.
This explains why existing CRDC designs are tricky, and hard to be generalized
to other data types. A design framework is in great need to guide the
systematic design of new CRDCs.
To address the challenges above, we propose the Remove-Win Design Framework.
The remove-win strategy for conflict resolution is simple but powerful. The
remove operation just wipes out the data item, no matter how complex the value
is. The user of the CRDC only needs to specify conflict resolution for
non-remove operations. This resolution is destructed to three basic cases and
are left as open terms in the CRDC design skeleton. Stubs containing
user-specified conflict resolution logics are plugged into the skeleton to
obtain concrete CRDC designs. We demonstrate the effectiveness of our design
framework via a case study of designing a conflict-free replicated priority
queue. Performance measurements also show the efficiency of the design derived
from our design framework.Comment: revised after submissio
Online Algorithms for Geographical Load Balancing
It has recently been proposed that Internet energy costs, both monetary and environmental, can be reduced by exploiting temporal variations and shifting processing to data centers located in regions where energy currently has low cost. Lightly loaded data centers can then turn off surplus servers. This paper studies online algorithms for determining the number of servers to leave on in each data center, and then uses these algorithms to study the environmental potential of geographical load balancing (GLB). A commonly suggested algorithm for this setting is “receding horizon control” (RHC), which computes the provisioning for the current time by optimizing over a window of predicted future loads. We show that RHC performs well in a homogeneous setting, in which all servers can serve all jobs equally well; however, we also prove that differences in propagation delays, servers, and electricity prices can cause RHC perform badly, So, we introduce variants of RHC that are guaranteed to perform as well in the face of such heterogeneity. These algorithms are then used to study the feasibility of powering a continent-wide set of data centers mostly by renewable sources, and to understand what portfolio of renewable energy is most effective
InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services
Cloud computing providers have setup several data centers at different
geographical locations over the Internet in order to optimally serve needs of
their customers around the world. However, existing systems do not support
mechanisms and policies for dynamically coordinating load distribution among
different Cloud-based data centers in order to determine optimal location for
hosting application services to achieve reasonable QoS levels. Further, the
Cloud computing providers are unable to predict geographic distribution of
users consuming their services, hence the load coordination must happen
automatically, and distribution of services must change in response to changes
in the load. To counter this problem, we advocate creation of federated Cloud
computing environment (InterCloud) that facilitates just-in-time,
opportunistic, and scalable provisioning of application services, consistently
achieving QoS targets under variable workload, resource and network conditions.
The overall goal is to create a computing environment that supports dynamic
expansion or contraction of capabilities (VMs, services, storage, and database)
for handling sudden variations in service demands.
This paper presents vision, challenges, and architectural elements of
InterCloud for utility-oriented federation of Cloud computing environments. The
proposed InterCloud environment supports scaling of applications across
multiple vendor clouds. We have validated our approach by conducting a set of
rigorous performance evaluation study using the CloudSim toolkit. The results
demonstrate that federated Cloud computing model has immense potential as it
offers significant performance gains as regards to response time and cost
saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape
- …