Realistic, relevant, and reproducible experiments often need input traces
collected from real-world environments. We focus in this work on traces of
workflows---common in datacenters, clouds, and HPC infrastructures. We show
that the state-of-the-art in using workflow-traces raises important issues: (1)
the use of realistic traces is infrequent, and (2) the use of realistic, {\it
open-access} traces even more so. Alleviating these issues, we introduce the
Workflow Trace Archive (WTA), an open-access archive of workflow traces from
diverse computing infrastructures and tooling to parse, validate, and analyze
traces. The WTA includes >48 million workflows captured from >10
computing infrastructures, representing a broad diversity of trace domains and
characteristics. To emphasize the importance of trace diversity, we
characterize the WTA contents and analyze in simulation the impact of trace
diversity on experiment results. Our results indicate significant differences
in characteristics, properties, and workflow structures between workload
sources, domains, and fields.Comment: Technical repor