37,526 research outputs found

    FADI: a fault-tolerant environment for open distributed computing

    Get PDF
    FADI is a complete programming environment that serves the reliable execution of distributed application programs. FADI encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel selective message logging technique delivers an efficient, low-overhead backup and recovery mechanism for distributed processes. FADI also provides means for remote automatic process allocation on the distributed system nodes

    Distributed Compressive CSIT Estimation and Feedback for FDD Multi-user Massive MIMO Systems

    Full text link
    To fully utilize the spatial multiplexing gains or array gains of massive MIMO, the channel state information must be obtained at the transmitter side (CSIT). However, conventional CSIT estimation approaches are not suitable for FDD massive MIMO systems because of the overwhelming training and feedback overhead. In this paper, we consider multi-user massive MIMO systems and deploy the compressive sensing (CS) technique to reduce the training as well as the feedback overhead in the CSIT estimation. The multi-user massive MIMO systems exhibits a hidden joint sparsity structure in the user channel matrices due to the shared local scatterers in the physical propagation environment. As such, instead of naively applying the conventional CS to the CSIT estimation, we propose a distributed compressive CSIT estimation scheme so that the compressed measurements are observed at the users locally, while the CSIT recovery is performed at the base station jointly. A joint orthogonal matching pursuit recovery algorithm is proposed to perform the CSIT recovery, with the capability of exploiting the hidden joint sparsity in the user channel matrices. We analyze the obtained CSIT quality in terms of the normalized mean absolute error, and through the closed-form expressions, we obtain simple insights into how the joint channel sparsity can be exploited to improve the CSIT recovery performance.Comment: 16 double-column pages, accepted for publication in IEEE Transactions on Signal Processin

    An approach to rollback recovery of collaborating mobile agents

    Get PDF
    Fault-tolerance is one of the main problems that must be resolved to improve the adoption of the agents' computing paradigm. In this paper, we analyse the execution model of agent platforms and the significance of the faults affecting their constituent components on the reliable execution of agent-based applications, in order to develop a pragmatic framework for agent systems fault-tolerance. The developed framework deploys a communication-pairs independent check pointing strategy to offer a low-cost, application-transparent model for reliable agent- based computing that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly-one execution property
    • …
    corecore