Logs have been widely adopted in software system development and maintenance
because of the rich system runtime information they contain. In recent years,
the increase of software size and complexity leads to the rapid growth of the
volume of logs. To handle these large volumes of logs efficiently and
effectively, a line of research focuses on intelligent log analytics powered by
AI (artificial intelligence) techniques. However, only a small fraction of
these techniques have reached successful deployment in industry because of the
lack of public log datasets and necessary benchmarking upon them. To fill this
significant gap between academia and industry and also facilitate more research
on AI-powered log analytics, we have collected and organized loghub, a large
collection of log datasets. In particular, loghub provides 17 real-world log
datasets collected from a wide range of systems, including distributed systems,
supercomputers, operating systems, mobile systems, server applications, and
standalone software. In this paper, we summarize the statistics of these
datasets, introduce some practical log usage scenarios, and present a case
study on anomaly detection to demonstrate how loghub facilitates the research
and practice in this field. Up to the time of this paper writing, loghub
datasets have been downloaded over 15,000 times by more than 380 organizations
from both industry and academia.Comment: Dateset available at https://zenodo.org/record/322717