High-speed research networks are built to meet the ever-increasing needs of
data-intensive distributed workflows. However, data transfers in these networks
often fail to attain the promised transfer rates for several reasons, including
I/O and network interference, server misconfigurations, and network anomalies.
Although understanding the root causes of performance issues is critical to
mitigating them and increasing the utilization of expensive network
infrastructures, there is currently no available mechanism to monitor data
transfers in these networks. In this paper, we present a scalable, end-to-end
monitoring framework to gather and store key performance metrics for file
transfers to shed light on the performance of transfers. The evaluation results
show that the proposed framework can monitor up to 400 transfers per host and
more than 40, 000 transfers in total while collecting performance statistics at
one-second precision. We also introduce a heuristic method to automatically
process the gathered performance metrics and identify the root causes of
performance anomalies with an F-score of 87 - 98%.Comment: 11 pages, 7 figures, 6 table