Search CORE

2,600 research outputs found

Middleware support for locality-aware wide area replication

Author: Susarla Sai R.
Publication venue: University of Utah
Publication date: 18/11/2004
Field of study

technical reportCoherent wide-area data caching can improve the scalability and responsiveness of distributed services such as wide-area file access, database and directory services, and content distribution. However, distributed services differ widely in the frequency of read/write sharing, the amount of contention between clients for the same data, and their ability to make tradeoffs between consistency and availability. Aggressive replication enhances the scalability and availability of services with read-mostly data or data that need not be kept strongly consistent. However, for applications that require strong consistency of writeshared data, you must throttle replication to achieve reasonable performance. We have developed a middleware data store called Swarm designed to support the widearea data sharing needs of distributed services. To support the needs of diverse distributed services, Swarm provides: (i) a failure-resilient proximity-aware data replication mechanism that adjusts the replication hierarchy based on observed network characteristics and node availability, (ii) a customizable consistency mechanism that allows applications to specify allowable consistency-availability tradeoffs, and (iii) a contention-aware caching mechanism that monitors contention between replicas and adjusts its replication policies accordingly. On a 240-node P2P file sharing system, Swarm's proximity-aware caching and replica hierarchy maintenance mechanisms improve latency by 80%, reduce WAN bandwidth consumed by 80%, and limit the impact of high node churn (5 node deaths/sec) to roughly one-fifth that of random replication. In addition, Swarm's contention-aware caching mechanism outperforms RPCs and static caching mechanisms at all levels of contention on an enterprise service workload

The University of Utah: J. Willard Marriott Digital Library

Preliminary investigation of the practicality of an industrial training for engineering technology program-industries view

Author: Jalani Jamaludin
Mashori Sumaiya
Norjali Rasida
Sadun Amirul Syafiq
Publication venue: Asian Research Publishing Network (ARPN)
Publication date: 01/01/2015
Field of study

One of the important aspects of Engineering Technology (ET) program is the students must be able to apply a significant hands-on job throughout the program. Apart from laboratory work carried out at the university, the industrial training components can also contribute a significant practical work to enhance the skills of the students. In this study, the difference between ET and Engineering program is distinguished by proposing longer periods of industrial training in ET program. However, the effectiveness of longer periods of training must be investigated in order to find out whether this framework has to be retained for future training. For this, the university has structured the industrial training by imposing the students to undergo two (2) months training during the third (3rd) semester of year two (2), another two (2) months during the third (3rd) semester of year three (3) and finally, six (6) months during the last semester of fourth (4th) year (i.e. final semester). An interview has been conducted with two industrial panels to find out the effectiveness of the proposed training. A few suggestions and ideas given by both panels were considered for the development for industrial training syllabus in ET program

UTHM Institutional Repository

Directory of Open Access Journals

ElfStore: A Resilient Data Storage Service for Federated Edge and Fog Resources

Author: Monga Sumit Kumar
R Sheshadri K
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2019
Field of study

Edge and fog computing have grown popular as IoT deployments become wide-spread. While application composition and scheduling on such resources are being explored, there exists a gap in a distributed data storage service on the edge and fog layer, instead depending solely on the cloud for data persistence. Such a service should reliably store and manage data on fog and edge devices, even in the presence of failures, and offer transparent discovery and access to data for use by edge computing applications. Here, we present Elfstore, a first-of-its-kind edge-local federated store for streams of data blocks. It uses reliable fog devices as a super-peer overlay to monitor the edge resources, offers federated metadata indexing using Bloom filters, locates data within 2-hops, and maintains approximate global statistics about the reliability and storage capacity of edges. Edges host the actual data blocks, and we use a unique differential replication scheme to select edges on which to replicate blocks, to guarantee a minimum reliability and to balance storage utilization. Our experiments on two IoT virtual deployments with 20 and 272 devices show that ElfStore has low overheads, is bound only by the network bandwidth, has scalable performance, and offers tunable resilience.Comment: 24 pages, 14 figures, To appear in IEEE International Conference on Web Services (ICWS), Milan, Italy, 201

arXiv.org e-Print Archive

Crossref

Operating System Support for Redundant Multithreading

Author: Döbel Björn
Publication venue
Publication date: 25/11/2014
Field of study

Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors

Technische Universität Dresden: Qucosa

Extended Fault Taxonomy of SOA-Based Systems

Author: Guru Prasad Bhandari
Ratneshwer Gupta
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2017
Field of study

Service Oriented Architecture (SOA) is considered as a standard for enterprise software development. The main characteristics of SOA are dynamic discovery and composition of software services in a heterogeneous environment. These properties pose newer challenges in fault management of SOA-based systems (SBS). A proper understanding of different faults in an SBS is very necessary for effective fault handling. A comprehensive three-fold fault taxonomy is presented here that covers distributed, SOA specific and non-functional faults in a holistic manner. A comprehensive fault taxonomy is a key starting point for providing techniques and methods for accessing the quality of a given system. In this paper, an attempt has been made to outline several SBSs faults into a well-structured taxonomy that may assist developers to plan suitable fault repairing strategies. Some commonly emphasized fault recovery strategies are also discussed. Some challenges that may occur during fault handling of SBSs are also mentioned

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Flexible consistency for wide area peer replication

Author: Susarla Sai R.
Publication venue: University of Utah
Publication date: 18/11/2004
Field of study

technical reportThe lack of a flexible consistency management solution hinders P2P implementation of applications involving updates, such as read-write file sharing, directory services, online auctions and wide area collaboration. Managing mutable shared data in a P2P setting requires a consistency solution that can operate efficiently over variable-quality failure-prone networks, support pervasive replication for scaling, and give peers autonomy to tune consistency to their sharing needs and resource constraints. Existing solutions lack one or more of these features. In this paper, we describe a new consistency model for P2P sharing of mutable data called composable consistency, and outline its implementation in a wide area middleware file service called Swarm1. Composable consistency lets applications compose consistency semantics appropriate for their sharing needs by combining a small set of primitive options. Swarm implements these options efficiently to support scalable, pervasive, failure-resilient, wide-area replication behind a simple yet flexible interface. We present two applications to demonstrate the expressive power and effectiveness of composable consistency: a wide area file system that outperforms Coda in providing close-to-open consistency overWANs, and a replicated BerkeleyDB database that reaps order-of-magnitude performance gains by relaxing consistency for queries and updates

The University of Utah: J. Willard Marriott Digital Library

Achieving Causal Consistency under Partial Replication for Geo-distributed Cloud Storage

Author: Mahmood Tariq
Puzhavakath Narayanan Shankaranarayanan
Rao Sanjay
Thottethodi Mithuna
Vijaykumar T. N
Publication venue: 'Purdue University (bepress)'
Publication date: 03/11/2016
Field of study

Causal consistency has emerged as an attractive middle-ground to architecting cloud storage systems, as it allows for high availability and low latency, while supporting stronger-than-eventual-consistency semantics. However, causally-consistent cloud storage systems have seen limited deployment in practice. A key factor is these systems employ full replication of all the data in all the data centers (DCs), incurring high cost. A simple extension of current causal systems to support partial replication by clustering DCs into rings incurs availability and latency problems. We propose Karma, the first system to enable causal consistency for partitioned data stores while achieving the cost advantages of partial replication without the availability and latency problems of the simple extension. Our evaluation with 64 servers emulating 8 geo-distributed DCs shows that Karma (i) incurs much lower cost than a fully-replicated causal store (obviously due to the lower replication factor); and (ii) offers higher availability and better performance than the above partial-replication extension at similar costs

Purdue E-Pubs

Recommended from our members

Building Reliable Software for Persistent Memory

Author: Zhang Lu
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Persistent memory (PMEM) technologies preserve data across power cycles and provide performance comparable to DRAM. In emerging computer systems, PMEM will operate on the main memory bus, becoming byte-addressable and cache-coherent. One key feature enabled by persistent memory is to allow software directly accessing durable data using the CPU’s load/store instructions, even from the user-space.However, building reliable software for persistent memory faces new challenges from two aspects: crash consistency and fault tolerance. Maintaining crash consistency requires the ability to recover data integrity in the event of system crashes. Using load/store instructions to access durable data introduces a new programming paradigm, that is prone to new types of programming errors. Fault tolerance involves detecting and recovering from persistent memory errors, including memory media errors and scribbles from software bugs. With direct access, file systems and user-space applications have to explicitly manage these errors, instead of relying on convenient functions from lower I/O stacks.We identify unique challenges in improving reliability for PMEM-based software and propose solutions. The thesis first introduces NOVA-Fortis, a fault-tolerant PMEM file system incorporating replication, checksums, and parity for protecting the file system’s metadata and the user’s file data. NOVA-Fortis is both fast and resilient in the face of corruption due to media errors and software bugs.NOVA-Fortis only protects file data via the read() and write() system calls. When an application memory-maps a PMEM file, NOVA-Fortis has to disable file data protection because mmap() leaves the file system unaware of updates made to the file. For protecting memory-mapped PMEM data, we present Pangolin, a fault-tolerant persistent object library to protect an application’s objects from persistent memory errors.Writing programs to ensure crash consistency in PMEM remains challenging. Recovery bugs arise as a new type of programming error, preventing a post-crash PMEM file from recovering to a consistent state. Thus, we design two debugging tools for persistent memory programming: PmemConjurer and PmemSanitizer. PmemConjurer is a static analyzer using symbolic execution to find recovery bugs without running a compiled program. PmemSanitizer contains compiler instrumentation and run-time recovery bug analysis, compensating PmemConjurer with multi-threading support and store reordering tests

eScholarship - University of California

DATA REPLICATION IN DISTRIBUTED SYSTEMS USING OLYMPIAD OPTIMIZATION ALGORITHM

Author: Arasteh Bahman
Bouyer Asgarali
Ghanbarzadeh Reza
Nazeri Mehrabani Mahsa
Rouhi Alireza
Tirkolaee Erfan Babaee
Publication venue: 'University of Nis - Faculty of Philosophy'
Publication date: 05/10/2023
Field of study

Achieving timely access to data objects is a major challenge in big distributed systems like the Internet of Things (IoT) platforms. Therefore, minimizing the data read and write operation time in distributed systems has elevated to a higher priority for system designers and mechanical engineers. Replication and the appropriate placement of the replicas on the most accessible data servers is a problem of NP-complete optimization. The key objectives of the current study are minimizing the data access time, reducing the quantity of replicas, and improving the data availability. The current paper employs the Olympiad Optimization Algorithm (OOA) as a novel population-based and discrete heuristic algorithm to solve the replica placement problem which is also applicable to other fields such as mechanical and computer engineering design problems. This discrete algorithm was inspired by the learning process of student groups who are preparing for the Olympiad exams. The proposed algorithm, which is divide-and-conquer-based with local and global search strategies, was used in solving the replica placement problem in a standard simulated distributed system. The 'European Union Database' (EUData) was employed to evaluate the proposed algorithm, which contains 28 nodes as servers and a network architecture in the format of a complete graph. It was revealed that the proposed technique reduces data access time by 39% with around six replicas, which is vastly superior to the earlier methods. Moreover, the standard deviation of the results of the algorithm's different executions is approximately 0.0062, which is lower than the other techniques' standard deviation within the same experiments

University of Niš: Facta Universitatis (E-Journals) / Универзитет у Нишу

Speculation in Parallel and Distributed Event Processing Systems

Author: Brito Andrey
Publication venue
Publication date: 10/05/2010
Field of study

Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data. ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful. As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs. Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations. For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.:1 Introduction 1 1.1 Event stream processing systems ......................... 1 1.2 Running example ................................. 3 1.3 Challenges and contributions ........................... 4 1.4 Outline ...................................... 6 2 Background 7 2.1 Event stream processing ............................. 7 2.1.1 State in operators: Windows and synopses ............................ 8 2.1.2 Types of operators ............................ 12 2.1.3 Our prototype system........................... 13 2.2 Software transactional memory.......................... 18 2.2.1 Overview ................................. 18 2.2.2 Memory operations............................ 19 2.3 Fault tolerance in distributed systems ...................................... 23 2.3.1 Failure model and failure detection ...................................... 23 2.3.2 Recovery semantics............................ 24 2.3.3 Active and passive replication ...................... 24 2.4 Summary ..................................... 26 3 Extending event stream processing systems with speculation 27 3.1 Motivation..................................... 27 3.2 Goals ....................................... 28 3.3 Local versus distributed speculation ....................... 29 3.4 Models and assumptions ............................. 29 3.4.1 Operators................................. 30 3.4.2 Events................................... 30 3.4.3 Failures .................................. 31 4 Local speculation 33 4.1 Overview ..................................... 33 4.2 Requirements ................................... 35 4.2.1 Order ................................... 35 4.2.2 Aborts................................... 37 4.2.3 Optimism control ............................. 38 4.2.4 Notifications ............................... 39 4.3 Applications.................................... 40 4.3.1 Out-of-order processing ......................... 40 4.3.2 Optimistic parallelization......................... 42 4.4 Extensions..................................... 44 4.4.1 Avoiding unnecessary aborts ....................... 44 4.4.2 Making aborts unnecessary........................ 45 4.5 Evaluation..................................... 47 4.5.1 Overhead of speculation ......................... 47 4.5.2 Cost of misspeculation .......................... 50 4.5.3 Out-of-order and parallel processing micro benchmarks ........... 53 4.5.4 Behavior with example operators .................... 57 4.6 Summary ..................................... 60 5 Distributed speculation 63 5.1 Overview ..................................... 63 5.2 Requirements ................................... 64 5.2.1 Speculative events ............................ 64 5.2.2 Speculative accesses ........................... 69 5.2.3 Reliable ordered broadcast with optimistic delivery .................. 72 5.3 Applications .................................... 75 5.3.1 Passive replication and rollback recovery ................................ 75 5.3.2 Active replication ............................. 80 5.4 Extensions ..................................... 82 5.4.1 Active replication and software bugs ..................................... 82 5.4.2 Enabling operators to output multiple events ........................ 87 5.5 Evaluation .................................... 87 5.5.1 Passive replication ............................ 88 5.5.2 Active replication ............................. 88 5.6 Summary ..................................... 93 6 Related work 95 6.1 Event stream processing engines ......................... 95 6.2 Parallelization and optimistic computing ................................ 97 6.2.1 Speculation ................................ 97 6.2.2 Optimistic parallelization ......................... 98 6.2.3 Parallelization in event processing .................................... 99 6.2.4 Speculation in event processing ..................... 99 6.3 Fault tolerance .................................. 100 6.3.1 Passive replication and rollback recovery ............................... 100 6.3.2 Active replication ............................ 101 6.3.3 Fault tolerance in event stream processing systems ............. 103 7 Conclusions 105 7.1 Summary of contributions ............................ 105 7.2 Challenges and future work ............................ 106 Appendices Publications 107 Pseudocode for the consensus protocol 10

Technische Universität Dresden: Qucosa