Inadequate service availability is the top concern when employing Cloud
computing. It has been recognized that zero downtime is impossible for
large-scale Internet services. By learning from the previous and others'
mistakes, nevertheless, it is possible for Cloud vendors to minimize the risk
of future downtime or at least keep the downtime short. To facilitate
summarizing lessons for Cloud providers, we performed a systematic survey of
public Cloud service outage events. This paper reports the result of this
survey. In addition to a set of findings, our work generated a lessons
framework by classifying the outage root causes. The framework can in turn be
used to arrange outage lessons for reference by Cloud providers. By including
potentially new root causes, this lessons framework will be smoothly expanded
in our future work.Comment: 11 page