Optimizing data-intensive workflow execution is essential to many modern
scientific projects such as the Square Kilometre Array (SKA), which will be the
largest radio telescope in the world, collecting terabytes of data per second
for the next few decades. At the core of the SKA Science Data Processor is the
graph execution engine, scheduling tens of thousands of algorithmic components
to ingest and transform millions of parallel data chunks in order to solve a
series of large-scale inverse problems within the power budget. To tackle this
challenge, we have developed the Data Activated Liu Graph Engine (DALiuGE) to
manage data processing pipelines for several SKA pathfinder projects. In this
paper, we discuss the DALiuGE graph scheduling sub-system. By extending
previous studies on graph scheduling and partitioning, we lay the foundation on
which we can develop polynomial time optimization methods that minimize both
workflow execution time and resource footprint while satisfying resource
constraints imposed by individual algorithms. We show preliminary results
obtained from three radio astronomy data pipelines.Comment: Accepted in HPDC ScienceCloud 2018 Worksho