34 research outputs found

    Programming with process groups: Group and multicast semantics

    Get PDF
    Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects

    Tolerância a falhas em bancos de dados distribuídos com a utilização da técnica de comunicação de grupos de objetos tolerantes a falhas

    Get PDF
    Com o aumento do uso de redes de computadores, aplicações que se utilizam de bancos de dados distribuídos crescem na mesma proporção em grandes corporações. Com isso, falhas de comunicação são detectadas com maior freqüência, gerando assim prejuízos para tais empresas. Visando resolver esses problemas, surge a área de tolerância a falhas e dentro desta a técnica de comunicação de grupos. Tal técnica visa a detecção e correção erros, esses decorrentes de problemas de comunicação, processamento ou equipamento. Com o objetivo de validar tal técnica, este estudo implementa uma biblioteca de classes que tem como função detectar erros gerados em ambientes de dados distribuídos. Uma vez detectados tais falhas, a aplicação também deve corrigi-las. A ferramenta utilizada para construir os módulos que compõem a aplicação, foi a linguagem Visual J++.With the increase of the use of nets of computers, applications that they are used of distributed databases they grow in the same proportion in great corporations. With that, communication fault are detected with larger frequency, generating like this damages for such companies. Seeking to solve those problems, the area of fault tolerance appears you it and inside of this the groups communication technique. Such a technique seeks the detection and correction mistakes, that current of communication problems, processing or equipment. With the objective of validating such technique, this study implements a library of classes that has as function to detect mistakes generated in environment of distributed data. Once detected such flaws, the application should also correct them. The tool used to build the modules that compose the application, the Visual J++ language was.I Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    Building on Quicksand

    Full text link
    Reliable systems have always been built out of unreliable components. Early on, the reliable components were small such as mirrored disks or ECC (Error Correcting Codes) in core memory. These systems were designed such that failures of these small components were transparent to the application. Later, the size of the unreliable components grew larger and semantic challenges crept into the application when failures occurred. As the granularity of the unreliable component grows, the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system. There are two implications of asynchronous state capture: 1) Everything promised by the primary is probabilistic. There is always a chance that an untimely failure shortly after the promise results in a backup proceeding without knowledge of the commitment. Hence, nothing is guaranteed! 2) Applications must ensure eventual consistency. Since work may be stuck in the primary after a failure and reappear later, the processing order for work cannot be guaranteed. Platform designers are struggling to make this easier for their applications. Emerging patterns of eventual consistency and probabilistic execution may soon yield a way for applications to express requirements for a "looser" form of consistency while providing availability in the face of ever larger failures. This paper recounts portions of the evolution of these trends, attempts to show the patterns that span these changes, and talks about future directions as we continue to "build on quicksand".Comment: CIDR 200

    C-RAM: Breaking Mobile Device Memory Barriers Using the Cloud

    Get PDF
    Mobile applications are constrained by the available memory of mobile devices. We present C-RAM, a system that uses cloud-based memory to extend the memory of mobile devices. It splits application state and its associated computation between a mobile device and a cloud node to allow applications to consume more memory, while minimising the performance impact. C-RAM thus enables developers to realise new applications or port legacy desktop applications with a large memory footprint to mobile platforms without explicitly designing them to account for memory limitations. To handle network failures with partitioned application state, C-RAM uses a new snapshot-based fault tolerance mechanism in which changes to remote memory objects are periodically backed up to the device. After failure, or when network usage exceeds a given limit, the device rolls back execution to continue from the last snapshot. C-RAM supports local execution with an application state that exceeds the available device memory through a user-level virtual memory: objects are loaded on-demand from snapshots in flash memory. Our C-RAM prototype supports Objective-C applications on the unmodified iOS platform. With C-RAM, applications can consume 10× more memory than the device capacity, with a negligible impact on application performance. In some cases, C-RAM even achieves a significant speed-up in execution time (up to 9.7×)

    Comunicação entre objetos distribuídos replicados tolerantes a falhas

    Get PDF
    Este trabalho apresenta uma forma de transformar aplicações orientadas a objetos tolerantes a falhas de crash, podendo ser utilizado em um ambiente distribuído baseado em objetos que se comunicam entre si, a fim de executarem uma ação. Para garantir a disponibilidade de uma aplicação distribuída, os objetos distribuídos devem ser replicados. Desta forma, a ocorrência de falha de crash no processador que está executando um objeto distribuído não fará com que toda a aplicação deixe de ser executada. Para validar a proposta, são apresentadas duas implementações de uma mesma aplicação tolerante a falhas. Estas implementações foram realizadas com PVM e com Java-Sockets e são apresentados os resultados obtidos.Eje: RedesRed de Universidades con Carreras en Informática (RedUNCI

    Comunicação entre objetos distribuídos replicados tolerantes a falhas

    Get PDF
    Este trabalho apresenta uma forma de transformar aplicações orientadas a objetos tolerantes a falhas de crash, podendo ser utilizado em um ambiente distribuído baseado em objetos que se comunicam entre si, a fim de executarem uma ação. Para garantir a disponibilidade de uma aplicação distribuída, os objetos distribuídos devem ser replicados. Desta forma, a ocorrência de falha de crash no processador que está executando um objeto distribuído não fará com que toda a aplicação deixe de ser executada. Para validar a proposta, são apresentadas duas implementações de uma mesma aplicação tolerante a falhas. Estas implementações foram realizadas com PVM e com Java-Sockets e são apresentados os resultados obtidos.Eje: RedesRed de Universidades con Carreras en Informática (RedUNCI

    Reliable distributed data stream management in mobile environments

    Get PDF
    The proliferation of sensor technology, especially in the context of embedded systems, has brought forward novel types of applications that make use of streams of continuously generated sensor data. Many applications like telemonitoring in healthcare or roadside traffic monitoring and control particularly require data stream management (DSM) to be provided in a distributed, yet reliable way. This is even more important when DSM applications are deployed in a failure-prone distributed setting including resource-limited mobile devices, for instance in applications which aim at remotely monitoring mobile patients. In this paper, we introduce a model for distributed and reliable DSM. The contribution of this paper is threefold. First, in analogy to the SQL isolation levels, we define levels of reliability and describe necessary consistency constraints for distributed DSM that specify the tolerated loss, delay, or re-ordering of data stream elements, respectively. Second, we use this model to design and analyze an algorithm for reliable distributed DSM, namely efficient coordinated operator checkpointing (ECOC). We show that ECOC provides lossless and delay-limited reliable data stream management and thus can be used in critical application domains such as healthcare, where the loss of data stream elements can not be tolerated. Third, we present detailed performance evaluations of the ECOC algorithm running on mobile, resource-limited devices. In particular, we can show that ECOC provides a high level of reliability while, at the same time, featuring good performance characteristics with moderate resource consumption

    The Ridge Operating System: High Performance through MessagePassing and Virtual Memory

    Get PDF
    ABSTRACT The Ridge operating system is decomposed into processes and relies on message passing for its interprocess communication. Messages and processes are used to improve reliability and extensibility and to facilitate networking. The challenge was to provide a high performance UNIXt implementation in this environment. The technique used was to blend in other operating facilities, such as virtual memory, with the message system. Key aspects of the design were to minimize the number of primitives and to provide support from the Ridge instruction set architecture

    Integrating reliable memory in databases

    Full text link
    Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory that is as safe as disk from operating system crashes. This paper explores how to integrate the reliable memory provided by the Rio file cache into a database system. Prior studies have analyzed the performance benefits of reliable memory; we focus instead on how different designs affect reliability. We propose three designs for integrating reliable memory into databases: non-persistent database buffer cache, persistent database buffer cache, and persistent database buffer cache with protection. Non-persistent buffer caches use an I/O interface to reliable memory and require the fewest modifications to existing databases. However, they waste memory capacity and bandwidth due to double buffering. Persistent buffer caches use a memory interface to reliable memory by mapping it into the database address space. This places reliable memory under complete database control and eliminates double buffering, but it may expose the buffer cache to database errors. Our third design reduces this exposure by write protecting the buffer pages. Extensive fault tests show that mapping reliable memory into the database address space does not significantly hurt reliability. This is because wild stores rarely touch dirty, committed pages written by previous transactions. As a result, we believe that databases should use a memory interface to reliable memory.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42329/1/778-7-3-194_80070194.pd
    corecore