29,901 research outputs found
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks
We introduce a framework for the modeling of sequential data capturing
pathways of varying lengths observed in a network. Such data are important,
e.g., when studying click streams in information networks, travel patterns in
transportation systems, information cascades in social networks, biological
pathways or time-stamped social interactions. While it is common to apply graph
analytics and network analysis to such data, recent works have shown that
temporal correlations can invalidate the results of such methods. This raises a
fundamental question: when is a network abstraction of sequential data
justified? Addressing this open question, we propose a framework which combines
Markov chains of multiple, higher orders into a multi-layer graphical model
that captures temporal correlations in pathways at multiple length scales
simultaneously. We develop a model selection technique to infer the optimal
number of layers of such a model and show that it outperforms previously used
Markov order detection techniques. An application to eight real-world data sets
on pathways and temporal networks shows that it allows to infer graphical
models which capture both topological and temporal characteristics of such
data. Our work highlights fallacies of network abstractions and provides a
principled answer to the open question when they are justified. Generalizing
network representations to multi-order graphical models, it opens perspectives
for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy
available on gitHu
Temporal Networks
A great variety of systems in nature, society and technology -- from the web
of sexual contacts to the Internet, from the nervous system to power grids --
can be modeled as graphs of vertices coupled by edges. The network structure,
describing how the graph is wired, helps us understand, predict and optimize
the behavior of dynamical systems. In many cases, however, the edges are not
continuously active. As an example, in networks of communication via email,
text messages, or phone calls, edges represent sequences of instantaneous or
practically instantaneous contacts. In some cases, edges are active for
non-negligible periods of time: e.g., the proximity patterns of inpatients at
hospitals can be represented by a graph where an edge between two individuals
is on throughout the time they are at the same ward. Like network topology, the
temporal structure of edge activations can affect dynamics of systems
interacting through the network, from disease contagion on the network of
patients to information diffusion over an e-mail network. In this review, we
present the emergent field of temporal networks, and discuss methods for
analyzing topological and temporal structure and models for elucidating their
relation to the behavior of dynamical systems. In the light of traditional
network theory, one can see this framework as moving the information of when
things happen from the dynamical system on the network, to the network itself.
Since fundamental properties, such as the transitivity of edges, do not
necessarily hold in temporal networks, many of these methods need to be quite
different from those for static networks
The pseudotemporal bootstrap for predicting glaucoma from cross-sectional visual field data
Progressive loss of the field of vision is characteristic of a number of eye diseases such as glaucoma, a leading cause of irreversible blindness in the world. Recently, there has been an explosion in the amount of data being stored on patients who suffer from visual deterioration, including visual field (VF) test, retinal image, and frequent intraocular pressure measurements. Like the progression of many biological and medical processes, VF progression is inherently temporal in nature. However, many datasets associated with the study of such processes are often cross sectional and the time dimension is not measured due to the expensive nature of such studies. In this paper, we address this issue by developing a method to build artificial time series, which we call pseudo time series from cross-sectional data. This involves building trajectories through all of the data that can then, in turn, be used to build temporal models for forecasting (which would otherwise be impossible without longitudinal data). Glaucoma, like many diseases, is a family of conditions and it is, therefore, likely that there will be a number of key trajectories that are important in understanding the disease. In order to deal with such situations, we extend the idea of pseudo time series by using resampling techniques to build multiple sequences prior to model building. This approach naturally handles outliers and multiple possible disease trajectories. We demonstrate some key properties of our approach on synthetic data and present very promising results on VF data for predicting glaucoma
Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms
Network models of healthcare systems can be used to examine how providers
collaborate, communicate, refer patients to each other. Most healthcare service
network models have been constructed from patient claims data, using billing
claims to link patients with providers. The data sets can be quite large,
making standard methods for network construction computationally challenging
and thus requiring the use of alternate construction algorithms. While these
alternate methods have seen increasing use in generating healthcare networks,
there is little to no literature comparing the differences in the structural
properties of the generated networks. To address this issue, we compared the
properties of healthcare networks constructed using different algorithms and
the 2013 Medicare Part B outpatient claims data. Three different algorithms
were compared: binning, sliding frame, and trace-route. Unipartite networks
linking either providers or healthcare organizations by shared patients were
built using each method. We found that each algorithm produced networks with
substantially different topological properties. Provider networks adhered to a
power law, and organization networks to a power law with exponential cutoff.
Censoring networks to exclude edges with less than 11 shared patients, a common
de-identification practice for healthcare network data, markedly reduced edge
numbers and greatly altered measures of vertex prominence such as the
betweenness centrality. We identified patterns in the distance patients travel
between network providers, and most strikingly between providers in the
Northeast United States and Florida. We conclude that the choice of network
construction algorithm is critical for healthcare network analysis, and discuss
the implications for selecting the algorithm best suited to the type of
analysis to be performed.Comment: With links to comprehensive, high resolution figures and networks via
figshare.co
Temporal networks of face-to-face human interactions
The ever increasing adoption of mobile technologies and ubiquitous services
allows to sense human behavior at unprecedented levels of details and scale.
Wearable sensors are opening up a new window on human mobility and proximity at
the finest resolution of face-to-face proximity. As a consequence, empirical
data describing social and behavioral networks are acquiring a longitudinal
dimension that brings forth new challenges for analysis and modeling. Here we
review recent work on the representation and analysis of temporal networks of
face-to-face human proximity, based on large-scale datasets collected in the
context of the SocioPatterns collaboration. We show that the raw behavioral
data can be studied at various levels of coarse-graining, which turn out to be
complementary to one another, with each level exposing different features of
the underlying system. We briefly review a generative model of temporal contact
networks that reproduces some statistical observables. Then, we shift our focus
from surface statistical features to dynamical processes on empirical temporal
networks. We discuss how simple dynamical processes can be used as probes to
expose important features of the interaction patterns, such as burstiness and
causal constraints. We show that simulating dynamical processes on empirical
temporal networks can unveil differences between datasets that would otherwise
look statistically similar. Moreover, we argue that, due to the temporal
heterogeneity of human dynamics, in order to investigate the temporal
properties of spreading processes it may be necessary to abandon the notion of
wall-clock time in favour of an intrinsic notion of time for each individual
node, defined in terms of its activity level. We conclude highlighting several
open research questions raised by the nature of the data at hand.Comment: Chapter of the book "Temporal Networks", Springer, 2013. Series:
Understanding Complex Systems. Holme, Petter; Saram\"aki, Jari (Eds.
- âŠ