37,526 research outputs found
FADI: a fault-tolerant environment for open distributed computing
FADI is a complete programming environment that serves the reliable execution of distributed application programs. FADI encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel selective message logging technique delivers an efficient, low-overhead backup and recovery mechanism for distributed processes. FADI also provides means for remote automatic process allocation on the distributed system nodes
Distributed Compressive CSIT Estimation and Feedback for FDD Multi-user Massive MIMO Systems
To fully utilize the spatial multiplexing gains or array gains of massive
MIMO, the channel state information must be obtained at the transmitter side
(CSIT). However, conventional CSIT estimation approaches are not suitable for
FDD massive MIMO systems because of the overwhelming training and feedback
overhead. In this paper, we consider multi-user massive MIMO systems and deploy
the compressive sensing (CS) technique to reduce the training as well as the
feedback overhead in the CSIT estimation. The multi-user massive MIMO systems
exhibits a hidden joint sparsity structure in the user channel matrices due to
the shared local scatterers in the physical propagation environment. As such,
instead of naively applying the conventional CS to the CSIT estimation, we
propose a distributed compressive CSIT estimation scheme so that the compressed
measurements are observed at the users locally, while the CSIT recovery is
performed at the base station jointly. A joint orthogonal matching pursuit
recovery algorithm is proposed to perform the CSIT recovery, with the
capability of exploiting the hidden joint sparsity in the user channel
matrices. We analyze the obtained CSIT quality in terms of the normalized mean
absolute error, and through the closed-form expressions, we obtain simple
insights into how the joint channel sparsity can be exploited to improve the
CSIT recovery performance.Comment: 16 double-column pages, accepted for publication in IEEE Transactions
on Signal Processin
An approach to rollback recovery of collaborating mobile agents
Fault-tolerance is one of the main problems that must be resolved to improve the adoption of the agents' computing paradigm. In this paper, we analyse the execution model of agent platforms and the significance of the faults affecting their constituent components on the reliable execution of agent-based applications, in order to develop a pragmatic framework for agent systems fault-tolerance. The developed framework deploys a communication-pairs independent check pointing strategy to offer a low-cost, application-transparent model for reliable agent- based computing that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly-one execution property
- …