Process mining, i.e., a sub-field of data science focusing on the analysis of
event data generated during the execution of (business) processes, has seen a
tremendous change over the past two decades. Starting off in the early 2000's,
with limited to no tool support, nowadays, several software tools, i.e., both
open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis,
ProcessGold, etc., exist. The commercial process mining tools provide limited
support for implementing custom algorithms. Moreover, both commercial and
open-source process mining tools are often only accessible through a graphical
user interface, which hampers their usage in large-scale experimental settings.
Initiatives such as RapidProM provide process mining support in the scientific
workflow-based data science suite RapidMiner. However, these offer limited to
no support for algorithmic customization. In the light of the aforementioned,
in this paper, we present a novel process mining library, i.e. Process Mining
for Python (PM4Py) that aims to bridge this gap, providing integration with
state-of-the-art data science libraries, e.g., pandas, numpy, scipy and
scikit-learn. We provide a global overview of the architecture and
functionality of PM4Py, accompanied by some representative examples of its
usage