3,505 research outputs found
Improving the scalability of cloud-based resilient database servers
Many rely now on public cloud infrastructure-as-a-service for
database servers, mainly, by pushing the limits of existing pooling and
replication software to operate large shared-nothing virtual server clusters.
Yet, it is unclear whether this is still the best architectural choice,
namely, when cloud infrastructure provides seamless virtual shared storage
and bills clients on actual disk usage.
This paper addresses this challenge with Resilient Asynchronous Commit
(RAsC), an improvement to awell-known shared-nothing design based
on the assumption that a much larger number of servers is required for
scale than for resilience. Then we compare this proposal to other database
server architectures using an analytical model focused on peak throughput
and conclude that it provides the best performance/cost trade-off while at
the same time addressing a wide range of fault scenarios
Next Generation Cloud Computing: New Trends and Research Directions
The landscape of cloud computing has significantly changed over the last
decade. Not only have more providers and service offerings crowded the space,
but also cloud infrastructure that was traditionally limited to single provider
data centers is now evolving. In this paper, we firstly discuss the changing
cloud infrastructure and consider the use of infrastructure from multiple
providers and the benefit of decentralising computing away from data centers.
These trends have resulted in the need for a variety of new computing
architectures that will be offered by future cloud infrastructure. These
architectures are anticipated to impact areas, such as connecting people and
devices, data-intensive computing, the service space and self-learning systems.
Finally, we lay out a roadmap of challenges that will need to be addressed for
realising the potential of next generation cloud systems.Comment: Accepted to Future Generation Computer Systems, 07 September 201
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
DISCO: Distributed Multi-domain SDN Controllers
Modern multi-domain networks now span over datacenter networks, enterprise
networks, customer sites and mobile entities. Such networks are critical and,
thus, must be resilient, scalable and easily extensible. The emergence of
Software-Defined Networking (SDN) protocols, which enables to decouple the data
plane from the control plane and dynamically program the network, opens up new
ways to architect such networks. In this paper, we propose DISCO, an open and
extensible DIstributed SDN COntrol plane able to cope with the distributed and
heterogeneous nature of modern overlay networks and wide area networks. DISCO
controllers manage their own network domain and communicate with each others to
provide end-to-end network services. This communication is based on a unique
lightweight and highly manageable control channel used by agents to
self-adaptively share aggregated network-wide information. We implemented DISCO
on top of the Floodlight OpenFlow controller and the AMQP protocol. We
demonstrated how DISCO's control plane dynamically adapts to heterogeneous
network topologies while being resilient enough to survive to disruptions and
attacks and providing classic functionalities such as end-point migration and
network-wide traffic engineering. The experimentation results we present are
organized around three use cases: inter-domain topology disruption, end-to-end
priority service request and virtual machine migration
Recommended from our members
High Availability for Carrier-Grade SIP Infrastructure on Cloud Platforms
SIP infrastructure on cloud platforms has the potential to be both scalable and highly available. In our previous project, we focused on the scalability aspect of SIP services on cloud platforms; the focus of this project is on the high availability aspect. We investigated the effects of component fault on service availability with the goal of understanding how high availability can be guaranteed even in the face of component faults. The experiments were conducted empirically on a real system that runs on Amazon EC2. Our analysis shows that most component faults are masked with a simple automatic failover technique. However, we have also identified fundamental problems that cannot be addressed by simple failover techniques; a problem involving DNS cache in resolvers and a problem involving static failover configurations. Recommendations on how to solve these problems are included in the report
- âŠ