458 research outputs found
Reliable Computing Service in Massive-scale Systems Through Rapid Low-cost Failover
Large-scale distributed systems deployed as Cloud datacenters are capable of provisioning service to consumers with diverse business requirements. Providers face pressure to provision uninterrupted reliable services while reducing operational costs due to significant software and hardware failures. A widely adopted means to achieve such a goal is using redundant system components to implement user-transparent failover, yet its effectiveness must be balanced carefully without incurring heavy overhead when deployed-an important practical consideration for complex large-scale systems. Failover techniques developed for Cloud systems often suffer serious limitations, including mandatory restart leading to poor cost-effectiveness, as well as solely focusing on crash failures, omitting other important types, such as timing failures and simultaneous failures. This paper addresses these limitations by presenting a new approach to user-transparent failover for massive-scale systems. The approach uses soft-state inference to achieve rapid failure recovery and avoid unnecessary restart, with minimal system resource overhead. It also copes with different failures, including correlated and simultaneous events. The proposed approach was implemented, deployed and evaluated within Fuxi system, the underlying resource management system used within Alibaba Cloud. Results demonstrate that our approach tolerates complex failure scenarios while incurring at worst 228.5 microsecond instance overhead with 1.71 percent additional CPU usage
Computing at massive scale: Scalability and dependability challenges
Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications
Software Defined Networks based Smart Grid Communication: A Comprehensive Survey
The current power grid is no longer a feasible solution due to
ever-increasing user demand of electricity, old infrastructure, and reliability
issues and thus require transformation to a better grid a.k.a., smart grid
(SG). The key features that distinguish SG from the conventional electrical
power grid are its capability to perform two-way communication, demand side
management, and real time pricing. Despite all these advantages that SG will
bring, there are certain issues which are specific to SG communication system.
For instance, network management of current SG systems is complex, time
consuming, and done manually. Moreover, SG communication (SGC) system is built
on different vendor specific devices and protocols. Therefore, the current SG
systems are not protocol independent, thus leading to interoperability issue.
Software defined network (SDN) has been proposed to monitor and manage the
communication networks globally. This article serves as a comprehensive survey
on SDN-based SGC. In this article, we first discuss taxonomy of advantages of
SDNbased SGC.We then discuss SDN-based SGC architectures, along with case
studies. Our article provides an in-depth discussion on routing schemes for
SDN-based SGC. We also provide detailed survey of security and privacy schemes
applied to SDN-based SGC. We furthermore present challenges, open issues, and
future research directions related to SDN-based SGC.Comment: Accepte
Cassandra File System Over Hadoop Distributed File System
Cassandra is an open source distributed database management system is designed to handle large amounts of data across many commodity servers, provides a high availability with no single point of failure. Cassandra will be offering the robust support for clusters spanning multiple data centers with asynchronous masterless replica which allow low latency operations for all the clients. N oSQL data stores target the unstructured data, which nature has dynamic and a key focus area for "Big Data" research. New generation data can prove costly and also unpractical to administer with databases SQL, due to lack of structure, high scalability and needs for the elasticity. N oSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient for data queries. The Hadoop Distributed File System is one of many different components and projects contained within the community Hadoop ecosystem. The Apache Hadoop project defines Had oop - DFS as "the primary storage system which is used by Hadoop applications" that enables "reliable, extremely rapid computations". This paper was providing high - level overview of how Hadoop - styled analytics (MapReduce, Pig, Mahout and Hive) can be run on data contained in Apache Cassandra wit hout the need for Hadoop - DFS
Recommended from our members
FutureGRID: A Program for long-term research into GRID systems architecture
Proceedings of the 2003 UK e-Science All Hands Meeting, 31st August - 3rd September, Nottingham UKThis is a project to carry out research into long-term GRID architecture, in the University of Cambridge
Computer Laboratory and the Cambridge eScience Center, with support from the Microsoft Research
Laboratory, Cambridge.
It is part of a larger vision for future systems architectures for public computing platforms, including
both scientitic GRID and commodity level computing such as games, peer2peer computing and storage
services and so forth, based on work in the laboratories in recent years into massively scaleable distributed systems for storage, computation, content distribution and collaboration[26]
- …