Graph or network representations are an important foundation for data mining
and machine learning tasks in relational data. Many tools of network analysis,
like centrality measures, information ranking, or cluster detection rest on the
assumption that links capture direct influence, and that paths represent
possible indirect influence. This assumption is invalidated in time-stamped
network data capturing, e.g., dynamic social networks, biological sequences or
financial transactions. In such data, for two time-stamped links (A,B) and
(B,C) the chronological ordering and timing determines whether a causal path
from node A via B to C exists. A number of works has shown that for that reason
network analysis cannot be directly applied to time-stamped network data.
Existing methods to address this issue require statistics on causal paths,
which is computationally challenging for big data sets.
Addressing this problem, we develop an efficient algorithm to count causal
paths in time-stamped network data. Applying it to empirical data, we show that
our method is more efficient than a baseline method implemented in an
OpenSource data analytics package. Our method works efficiently for different
values of the maximum time difference between consecutive links of a causal
path and supports streaming scenarios. With it, we are closing a gap that
hinders an efficient analysis of big time series data on complex networks.Comment: 10 pages, 2 figure