1 research outputs found
Design and Execution of make-like, distributed Analyses based on Spotify's Pipelining Package Luigi
In high-energy particle physics, workflow management systems are primarily
used as tailored solutions in dedicated areas such as Monte Carlo production.
However, physicists performing data analyses are usually required to steer
their individual workflows manually which is time-consuming and often leads to
undocumented relations between particular workloads. We present a generic
analysis design pattern that copes with the sophisticated demands of end-to-end
HEP analyses and provides a make-like execution system. It is based on the
open-source pipelining package Luigi which was developed at Spotify and enables
the definition of arbitrary workloads, so-called Tasks, and the dependencies
between them in a lightweight and scalable structure. Further features are
multi-user support, automated dependency resolution and error handling, central
scheduling, and status visualization in the web. In addition to already
built-in features for remote jobs and file systems like Hadoop and HDFS, we
added support for WLCG infrastructure such as LSF and CREAM job submission, as
well as remote file access through the Grid File Access Library. Furthermore,
we implemented automated resubmission functionality, software sandboxing, and a
command line interface with auto-completion for a convenient working
environment. For the implementation of a cross section measurement,
we created a generic Python interface that provides programmatic access to all
external information such as datasets, physics processes, statistical models,
and additional files and values. In summary, the setup enables the execution of
the entire analysis in a parallelized and distributed fashion with a single
command