An Approach for Fast Fault Detection in Virtual Network

Abstract

The diversity of applications in cloud computing and the dynamic nature of environment deployment makes virtual machines, containers, and distributed software systems to often have various software failures, which make it impossible to provide external services normally. Whether it is cloud management or distributed application itself, it takes a few seconds to find the fault of protocol class detection methods on the management or control surfaces of distributed applications, hundreds of milliseconds to find the fault of protocol class detection methods based on user interfaces, and the main time from the failure to recovery of distributed software systems is spent in detecting the fault. Therefore, timely discovery of faults (virtual machines, containers, software) is the key to subsequent fault diagnosis, isolation and recovery. Considering the network connection of virtual machines/containers in cloud infrastructure, more and more intelligent virtual network cards are used to connect virtual network elements (Virtual Router or Virtual Switch). This paper studies a fault detection mechanism of virtual machines, containers and distributed software based on the message driven mode of virtual network elements. Taking advantage of the VIRTIO message queue memory sharing feature between the front-end and back-end in the virtual network card of the virtualization network element and the virtual machine or container it detects in the same server in the cloud network, when the virtualization network element sends packets to the virtual machine or container, quickly check whether the message on the queue header of the previously sent VIRTIO message has been received and processed. If it has not been received and processed beyond a certain time threshold, it indicates that the virtual machine, the container and distributed software have failed. The method in this paper can significantly improve the fault detection performance of virtual machine/container/distributed application (from the second pole to the millisecond level) for a large number of business message scenarios, and provide faster fault detection for the rapid convergence of virtual network traffic, migration of computing nodes, and high availability of distributed applications

    Similar works