Search CORE

140,765 research outputs found

Building global and scalable systems with atomic multicast

Author: Benz Samuel
Pedone Fernando
Publication venue
Publication date: 06/04/2018
Field of study

The rise of worldwide Internet-scale services demands large distributed systems. Indeed, when handling several millions of users, it is common to operate thousands of servers spread across the globe. Here, replication plays a central role, as it contributes to improve the user experience by hiding failures and by providing acceptable latency. In this thesis, we claim that atomic multicast, with strong and well-defined properties, is the appropriate abstraction to efficiently design and implement globally scalable distributed systems. Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter), services are increasingly becoming geographically distributed. Data partitioning and replication, combined with local and geographical distribution, introduce daunting challenges, including the need to carefully order requests among replicas and partitions. One way to tackle this problem is to use group communication primitives that encapsulate order requirements. While replication is a common technique used to design such reliable distributed systems, to cope with the requirements of modern cloud based ``always-on'' applications, replication protocols must additionally allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. We propose a dynamic atomic multicast protocol which fulfills these requirements. It allows to dynamically add and remove resources to an online replicated state machine and to recover crashed processes. Major efforts have been spent in recent years to improve the performance, scalability and reliability of distributed systems. In order to hide the complexity of designing distributed applications, many proposals provide efficient high-level communication abstractions. Since the implementation of a production-ready system based on this abstraction is still a major task, we further propose to expose our protocol to developers in the form of distributed data structures. B-trees for example, are commonly used in different kinds of applications, including database indexes or file systems. Providing a distributed, fault-tolerant and scalable data structure would help developers to integrate their applications in a distribution transparent manner. This work describes how to build reliable and scalable distributed systems based on atomic multicast and demonstrates their capabilities by an implementation of a distributed ordered map that supports dynamic re-partitioning and fast recovery. To substantiate our claim, we ported an existing SQL database atop of our distributed lock-free data structure. Here, replication plays a central role, as it contributes to improve the user experience by hiding failures and by providing acceptable latency. In this thesis, we claim that atomic multicast, with strong and well-defined properties, is the appropriate abstraction to efficiently design and implement globally scalable distributed systems. Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter), services are increasingly becoming geographically distributed. Data partitioning and replication, combined with local and geographical distribution, introduce daunting challenges, including the need to carefully order requests among replicas and partitions. One way to tackle this problem is to use group communication primitives that encapsulate order requirements. While replication is a common technique used to design such reliable distributed systems, to cope with the requirements of modern cloud based ``always-on'' applications, replication protocols must additionally allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. We propose a dynamic atomic multicast protocol which fulfills these requirements. It allows to dynamically add and remove resources to an online replicated state machine and to recover crashed processes. Major efforts have been spent in recent years to improve the performance, scalability and reliability of distributed systems. In order to hide the complexity of designing distributed applications, many proposals provide efficient high-level communication abstractions. Since the implementation of a production-ready system based on this abstraction is still a major task, we further propose to expose our protocol to developers in the form of distributed data structures. B- trees for example, are commonly used in different kinds of applications, including database indexes or file systems. Providing a distributed, fault-tolerant and scalable data structure would help developers to integrate their applications in a distribution transparent manner. This work describes how to build reliable and scalable distributed systems based on atomic multicast and demonstrates their capabilities by an implementation of a distributed ordered map that supports dynamic re-partitioning and fast recovery. To substantiate our claim, we ported an existing SQL database atop of our distributed lock-free data structure

RERO DOC Digital Library

Combining high performance and fault tolerance in a distributed file server

Author: Mullender S.J. (Sape)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/1990
Field of study

Among the most reliable and fault tolerant components in a distributed system are storage systems. Obviously, reliability of storage systems belongs to the most researched issues in distributed computing. Every distributed file system project is based on different assumptions about size, load, amount of sharing, and desirable semantics, making it hard to compare research results fairly. The current Amoeba file server is the Bullet File Server [van Renesse, Tanenbaum, and Wilschut, 1989] which provides immutable files, is optimized for whole-file transfer and does caching at the file server. It has excellent performance for reading cached files (1.5 + 1.5 n ms for n kilobytes) and for sustained file I/O (680 kilobytes per second, both on read and write). Although performance is excellent, there is room for improvement, especially in the area of fault tolerance, sharing semantics and caching. I am currently doing the back-of-the-envelope design for a new file server that will form the basis of both our normal file system and of a complex-object server which is being designed by the database group at CWI. In addition to those desirable properties of fault tolerance, persistency, consistency, and availability, I am anxious to achieve even better performance than the Bullet server by extensive use of client and server caching.This position paper presents some of our design ideas. Note that this is work in progress; that

CWI's Institutional Repository

Quality-aware model-driven service engineering

Author: Barrett Ronan
Boskovic Marko
Hasselbring Wilhelm
Pahl Claus
Publication venue: 'IGI Global'
Publication date: 01/01/2008
Field of study

Service engineering and service-oriented architecture as an integration and platform technology is a recent approach to software systems integration. Quality aspects ranging from interoperability to maintainability to performance are of central importance for the integration of heterogeneous, distributed service-based systems. Architecture models can substantially influence quality attributes of the implemented software systems. Besides the benefits of explicit architectures on maintainability and reuse, architectural constraints such as styles, reference architectures and architectural patterns can influence observable software properties such as performance. Empirical performance evaluation is a process of measuring and evaluating the performance of implemented software. We present an approach for addressing the quality of services and service-based systems at the model-level in the context of model-driven service engineering. The focus on architecture-level models is a consequence of the black-box character of services

Irish Universities

DCU Online Research Access Service

Distributed Operating Systems

Author: Mullender Sape J.
Publication venue
Publication date: 01/01/1996
Field of study

Crossref

University of Twente Research Information

Maintaining consistency in distributed systems

Author: Birman Kenneth P.
Publication venue
Publication date: 01/01/1991
Field of study

In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability

CiteSeerX

NASA Technical Reports Server

eCommons@Cornell

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

Reliable scientific service compositions

Author: Emmerich W
Wassermann B
Publication venue: SPRINGER-VERLAG BERLIN
Publication date: 01/01/2007
Field of study

Abstract. Distributed service oriented architectures (SOAs) are increas-ingly used by users, who are insufficiently skilled in the art of distributed system programming. A good example are computational scientists who build large-scale distributed systems using service-oriented Grid comput-ing infrastructures. Computational scientists use these infrastructure to build scientific applications, which are composed from basic Web ser-vices into larger orchestrations using workflow languages, such as the Business Process Execution Language. For these users reliability of the infrastructure is of significant importance and that has to be provided in the presence of hardware or operational failures. The primitives avail-able to achieve such reliability currently leave much to be desired by users who do not necessarily have a strong education in distributed sys-tem construction. We characterise scientific service compositions and the environment they operate in by introducing the notion of global scien-tific BPEL workflows. We outline the threats to the reliability of such workflows and discuss the limited support that available specifications and mechanisms provide to achieve reliability. Furthermore, we propose a line of research to address the identified issues by investigating auto-nomic mechanisms that assist computational scientists in building, exe-cuting and maintaining reliable workflows.

CiteSeerX

UCL Discovery

Recommended from our members

Net solar generation potential from urban rooftops in Los Angeles

Author: Cheng D
Federico F
Fournier E
Gustafson H
Hirashiki C
Pincetl S
Porse E
Publication venue: eScholarship, University of California
Publication date: 01/07/2020
Field of study

Rooftops provide accessible locations for solar energy installations. While rooftop solar arrays can offset in-building electricity needs, they may also stress electric grid operations. Here we present an analysis of net electricity generation potential from distributed rooftop solar in Los Angeles. We integrate spatial and temporal data for property-level electricity demands, rooftop solar generation potential, and grid capacity constraints to estimate the potential for solar to meet on-site demands and supply net exports to the electric grid. In the study area with 1.2 million parcels, rooftop solar could meet 7200 Gigawatt Hours (GWh) of on-site building demands (~29% of demand). Overall potential net generation is negative, meaning buildings use more electricity than they can produce. Yet, cumulative net export potential from solar to grid circuits is 16,400 GWh. Current policies that regulate solar array interconnection to the grid result in unutilized solar power output of 1700 MW. Lower-income and at-risk communities in LA have greater potential for exporting net solar generation to the grid. This potential should be recognized through investments and policy innovations. The method demonstrates the need for considering time-dependent calculations of net solar potential and offers a template for distributed renewable energy planning in cities

eScholarship - University of California

Distributed multimedia systems

Author: Mullender Sape J.
Publication venue: North-Holland
Publication date: 01/01/1992
Field of study

Multimedia systems will allow professionals worldwide to collaborate more effectively and to travel substantially less. But for multimedia systems to be effective, a good systems infrastructure is essential. In particular, support is needed for global and consistent sharing of information, for long-distance, high-bandwidth multimedia interpersonal communication, greatly enhanced reliability and availability, and security. These systems will also need to be easily usable by lay computer users. \ud In this paper we explore the operating system support that these multimedia systems must have in order to do the job properly

University of Twente Research Information