The rapid progress of modern computing systems has led to a growing interest
in informative run-time logs. Various log-based anomaly detection techniques
have been proposed to ensure software reliability. However, their
implementation in the industry has been limited due to the lack of high-quality
public log resources as training datasets.
While some log datasets are available for anomaly detection, they suffer from
limitations in (1) comprehensiveness of log events; (2) scalability over
diverse systems; and (3) flexibility of log utility. To address these
limitations, we propose AutoLog, the first automated log generation methodology
for anomaly detection. AutoLog uses program analysis to generate run-time log
sequences without actually running the system. AutoLog starts with probing
comprehensive logging statements associated with the call graphs of an
application. Then, it constructs execution graphs for each method after pruning
the call graphs to find log-related execution paths in a scalable manner.
Finally, AutoLog propagates the anomaly label to each acquired execution path
based on human knowledge. It generates flexible log sequences by walking along
the log execution paths with controllable parameters. Experiments on 50 popular
Java projects show that AutoLog acquires significantly more (9x-58x) log events
than existing log datasets from the same system, and generates log messages
much faster (15x) with a single machine than existing passive data collection
approaches. We hope AutoLog can facilitate the benchmarking and adoption of
automated log analysis techniques.Comment: The paper has been accepted by ASE 2023 (Research Track