1 research outputs found
Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures
Over the last years, scientific workflows have become mature enough to be
used in a production style. However, despite the increasing maturity, there is
still a shortage of tools for searching, adapting, and reusing workflows that
hinders a more generalized adoption by the scientific communities. Indeed, due
to the limited availability of machine-readable scientific metadata and the
heterogeneity of workflow specification formats and representations, new ways
to leverage alternative sources of information that complement existing
approaches are needed. In this paper we address such limitations by applying
statistically enriched generalized trie structures to exploit workflow
execution provenance information in order to assist the analysis, indexing and
search of scientific workflows. Our method bridges the gap between the
description of what a workflow is supposed to do according to its specification
and related metadata and what it actually does as recorded in its provenance
execution trace. In doing so, we also prove that the proposed method
outperforms SPARQL 1.1 Property Paths for querying provenance graphs