7,943 research outputs found

    Detecting Failures in an Asynchronous System That Never Stops Changing

    Get PDF
    This thesis presents an algorithm for detecting failures in dynamic asynchronous distributed systems or environments in which new participants may continually join the system and old participants may continually leave the system (a phenomenon called churn), and active participants may fail. Such behavior is exhibited by many dynamic modern networks, for example, peer-to-peer networks. Devices are continually joining and leaving, and peers often remain in the network only long enough to retrieve the data they require. Another example would be mobile networks. Devices are constantly on the move, resulting in a continual change in participants. In these types of networks, the set of participants is rarely stable for very long and is dynamically changing. Many problems are not solvable if the fraction of participants that are crashed is too large. Yet the participants will continue to leave the system or crash. To avoid crossing the threshold where too many participants are crashed, it is of paramount importance to detect crashed participants and remove them from the system. The problem of detecting failures has been solved in static and synchronous distributed systems. However, since processes in an asynchronous dynamic distributed systems possess no global clock or synchronized logical clocks or timing information, detecting failures is a hard problem to solve in such systems. We propose a failure detector for an asynchronous system with churn by exploiting the dynamic nature of the system to estimate elapsed time. We design an algorithm to detect failed processes in such an asynchronous system and prove that if a process is identified as crashed by our failure detector, it has indeed failed. Additionally, we also prove that if the churn continues forever, then under certain circumstances every failed process is eventually identified as crashed

    Stronger Methods of Making Quantum Interactive Proofs Perfectly Complete

    Full text link
    This paper presents stronger methods of achieving perfect completeness in quantum interactive proofs. First, it is proved that any problem in QMA has a two-message quantum interactive proof system of perfect completeness with constant soundness error, where the verifier has only to send a constant number of halves of EPR pairs. This in particular implies that the class QMA is necessarily included by the class QIP_1(2) of problems having two-message quantum interactive proofs of perfect completeness, which gives the first nontrivial upper bound for QMA in terms of quantum interactive proofs. It is also proved that any problem having an mm-message quantum interactive proof system necessarily has an (m+1)(m+1)-message quantum interactive proof system of perfect completeness. This improves the previous result due to Kitaev and Watrous, where the resulting system of perfect completeness requires m+2m+2 messages if not using the parallelization result.Comment: 41 pages; v2: soundness parameters improved, correction of a minor error in Lemma 23, and removal of the sentences claiming that our techniques are quantumly nonrelativizin

    Fault-Tolerant Distributed Services in Message-Passing Systems

    Get PDF
    Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation: • Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages. • Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable. We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems. • We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown

    The Impact of RDMA on Agreement

    Full text link
    Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n2fP+1n \geq 2f_P + 1 processes (where fPf_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires nfP+1n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

    Average Case Analysis of a Shared Register Emulation Algorithm

    Get PDF
    Distributed algorithms are important for managing systems with multiple networked components, where each component operates independently but coordinates to achieve a common goal. Previous theoretical research has produced numerous distributed algorithms but primarily uses mathematical proofs to yield theoretical results, such as worst-case runtime complexity. However, less research has been done on how these algorithms behave in practice, such as average-case runtime complexity. This paper will describe the empirical behavior of the distributed algorithm CCReg in a realistic environment, using the language DistAlgo to implement said algorithm. CCReg emulates a shared read/write register using a message-passing system. In particular, CCReg allows the underlying message-passing system to experience continuous changes to the set of components present, and tolerates crash failures of components. When the rate of component change and the fraction of crash failures are bounded, CCReg is proven to work correctly. The original paper specifies bounds for both that are guaranteed to work, and gives proof for those bounds. However, these bounds are restrictive and do not allow for much component change or many crash failures. Thus the goal of our implementation is to determine if CCReg's theoretical bounds can be relaxed in practice. We focus on CCReg's safety and liveness conditions: the algorithm eventually terminates, and a consistency condition called linearizability is maintained. We use a general method developed by Gibbons and Korach for determining if any ordering of operations satisfying linearizability exists. We find that, for executions where operations are randomly invoked, the algorithm does not exhibit any adverse behaviors. Each execution we tested terminates in finite time and has an order of operations satisfying linearizability. We discuss these findings, as well as future approaches and methodology for testing the theoretical boundaries in practice

    The Parallel Persistent Memory Model

    Full text link
    We consider a parallel computational model that consists of PP processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of O(W/PA+D(P/PA)log1/fW)O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil) in expectation, where WW and DD are the work and depth of the computation (in the absence of failures), PAP_A is the average number of processors available during the computation, and f1/2f \le 1/2 is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

    Enabling preemptive multiprogramming on GPUs

    Get PDF
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan- der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al- varez, and Marc Jorda on their comments and help improving our work and this paper. This work is supported by Euro- pean Commission through TERAFLUX (FP7-249013), Mont- Blanc (FP7-288777), and RoMoL (GA-321253) projects, NVIDIA through the CUDA Center of Excellence program, Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557 projects.Peer ReviewedPostprint (author’s final draft

    Communication in membrana Systems with symbol Objects.

    Get PDF
    Esta tesis está dedicada a los sistemas de membranas con objetos-símbolo como marco teórico de los sistemas paralelos y distribuidos de procesamiento de multiconjuntos.Una computación de parada puede aceptar, generar o procesar un número, un vector o una palabra; por tanto el sistema define globalmente (a través de los resultados de todas sus computaciones) un conjunto de números, de vectores, de palabras (es decir, un lenguaje), o bien una función. En esta tesis estudiamos la capacidad de estos sistemas para resolver problemas particulares, así como su potencia computacional. Por ejemplo, las familias de lenguajes definidas por diversas clases de estos sistemas se comparan con las familias clásicas, esto es, lenguajes regulares, independientes del contexto, generados por sistemas 0L tabulados extendidos, generados por gramáticas matriciales sin chequeo de apariciones, recursivamente enumerables, etc. Se prestará especial atención a la comunicación de objetos entre regiones y a las distintas formas de cooperación entre ellos.Se pretende (Sección 3.4) realizar una formalización los sistemas de membranas y construir una herramienta tipo software para la variante que usa cooperación no distribuida, el navegador de configuraciones, es decir, un simulador, en el cual el usuario selecciona la siguiente configuración entre todas las posibles, estando permitido volver hacia atrás. Se considerarán diversos modelos distribuidos. En el modelo de evolución y comunicación (Capítulo 4) separamos las reglas tipo-reescritura y las reglas de transporte (llamadas symport y antiport). Los sistemas de bombeo de protones (proton pumping, Secciones 4.8, 4.9) constituyen una variante de los sistemas de evolución y comunicación con un modo restrictivo de cooperación. Un modelo especial de computación con membranas es el modelo puramente comunicativo, en el cual los objetos traspasan juntos una membrana. Estudiamos la potencia computacional de las sistemas de membranas con symport/antiport de 2 o 3 objetos (Capítulo 5) y la potencia computacional de las sistemas de membranas con alfabeto limitado (Capítulo 6).El determinismo (Secciones 4.7, 5.5, etc.) es una característica especial (restrictiva) de los sistemas computacionales. Se pondrá especial énfasis en analizar si esta restricción reduce o no la potencia computacional de los mismos. Los resultados obtenidos para sistemas de bombeo del protones están transferidos (Sección 7.3) a sistemas con catalizadores bistabiles. Unos ejemplos de aplicación concreta de los sistemas de membranas (Secciones 7.1, 7.2) son la resolución de problemas NP-completos en tiempo polinomial y la resolución de problemas de ordenación.This thesis deals with membrane systems with symbol objects as a theoretical framework of distributed parallel multiset processing systems.A halting computation can accept, generate or process a number, a vector or a word, so the system globally defines (by the results of all its computations) a set of numbers or a set of vectors or a set of words, (i.e., a language), or a function. The ability of these systems to solve particular problems is investigated, as well as their computational power, e.g., the language families defined by different classes of these systems are compared to the classical ones, i.e., regular, context-free, languages generated by extended tabled 0L systems, languages generated by matrix grammars without appearance checking, recursively enumerable languages, etc. Special attention is paid to communication of objects between the regions and to the ways of cooperation between the objects.An attempt to formalize the membrane systems is made (Section 3.4), and a software tool is constructed for the non-distributed cooperative variant, the configuration browser, i.e., a simulator, where the user chooses the next configuration among the possible ones and can go back. Different distributed models are considered. In the evolution-communication model (Chapter 4) rewriting-like rules are separated from transport rules. Proton pumping systems (Sections 4.8, 4.9) are a variant of the evolution-communication systems with a restricted way of cooperation. A special membrane computing model is a purely communicative one: the objects are moved together through a membrane. We study the computational power of membrane systems with symport/antiport of 2 or 3 objects (Chapter 5) and the computational power of membrane systems with a limited alphabet (Chapter 6).Determinism (Sections 4.7, 5.5, etc.) is a special property of computational systems; the question of whether this restriction reduces the computational power is addressed. The results on proton pumping systems can be carried over (Section 7.3) to the systems with bi-stable catalysts. Some particular examples of membrane systems applications are solving NP-complete problems in polynomial time, and solving the sorting problem
    corecore