29,901 research outputs found

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks

    Full text link
    We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in information networks, travel patterns in transportation systems, information cascades in social networks, biological pathways or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: when is a network abstraction of sequential data justified? Addressing this open question, we propose a framework which combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms previously used Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models which capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy available on gitHu

    Temporal Networks

    Full text link
    A great variety of systems in nature, society and technology -- from the web of sexual contacts to the Internet, from the nervous system to power grids -- can be modeled as graphs of vertices coupled by edges. The network structure, describing how the graph is wired, helps us understand, predict and optimize the behavior of dynamical systems. In many cases, however, the edges are not continuously active. As an example, in networks of communication via email, text messages, or phone calls, edges represent sequences of instantaneous or practically instantaneous contacts. In some cases, edges are active for non-negligible periods of time: e.g., the proximity patterns of inpatients at hospitals can be represented by a graph where an edge between two individuals is on throughout the time they are at the same ward. Like network topology, the temporal structure of edge activations can affect dynamics of systems interacting through the network, from disease contagion on the network of patients to information diffusion over an e-mail network. In this review, we present the emergent field of temporal networks, and discuss methods for analyzing topological and temporal structure and models for elucidating their relation to the behavior of dynamical systems. In the light of traditional network theory, one can see this framework as moving the information of when things happen from the dynamical system on the network, to the network itself. Since fundamental properties, such as the transitivity of edges, do not necessarily hold in temporal networks, many of these methods need to be quite different from those for static networks

    The pseudotemporal bootstrap for predicting glaucoma from cross-sectional visual field data

    Get PDF
    Progressive loss of the field of vision is characteristic of a number of eye diseases such as glaucoma, a leading cause of irreversible blindness in the world. Recently, there has been an explosion in the amount of data being stored on patients who suffer from visual deterioration, including visual field (VF) test, retinal image, and frequent intraocular pressure measurements. Like the progression of many biological and medical processes, VF progression is inherently temporal in nature. However, many datasets associated with the study of such processes are often cross sectional and the time dimension is not measured due to the expensive nature of such studies. In this paper, we address this issue by developing a method to build artificial time series, which we call pseudo time series from cross-sectional data. This involves building trajectories through all of the data that can then, in turn, be used to build temporal models for forecasting (which would otherwise be impossible without longitudinal data). Glaucoma, like many diseases, is a family of conditions and it is, therefore, likely that there will be a number of key trajectories that are important in understanding the disease. In order to deal with such situations, we extend the idea of pseudo time series by using resampling techniques to build multiple sequences prior to model building. This approach naturally handles outliers and multiple possible disease trajectories. We demonstrate some key properties of our approach on synthetic data and present very promising results on VF data for predicting glaucoma

    Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms

    Full text link
    Network models of healthcare systems can be used to examine how providers collaborate, communicate, refer patients to each other. Most healthcare service network models have been constructed from patient claims data, using billing claims to link patients with providers. The data sets can be quite large, making standard methods for network construction computationally challenging and thus requiring the use of alternate construction algorithms. While these alternate methods have seen increasing use in generating healthcare networks, there is little to no literature comparing the differences in the structural properties of the generated networks. To address this issue, we compared the properties of healthcare networks constructed using different algorithms and the 2013 Medicare Part B outpatient claims data. Three different algorithms were compared: binning, sliding frame, and trace-route. Unipartite networks linking either providers or healthcare organizations by shared patients were built using each method. We found that each algorithm produced networks with substantially different topological properties. Provider networks adhered to a power law, and organization networks to a power law with exponential cutoff. Censoring networks to exclude edges with less than 11 shared patients, a common de-identification practice for healthcare network data, markedly reduced edge numbers and greatly altered measures of vertex prominence such as the betweenness centrality. We identified patterns in the distance patients travel between network providers, and most strikingly between providers in the Northeast United States and Florida. We conclude that the choice of network construction algorithm is critical for healthcare network analysis, and discuss the implications for selecting the algorithm best suited to the type of analysis to be performed.Comment: With links to comprehensive, high resolution figures and networks via figshare.co

    Temporal networks of face-to-face human interactions

    Full text link
    The ever increasing adoption of mobile technologies and ubiquitous services allows to sense human behavior at unprecedented levels of details and scale. Wearable sensors are opening up a new window on human mobility and proximity at the finest resolution of face-to-face proximity. As a consequence, empirical data describing social and behavioral networks are acquiring a longitudinal dimension that brings forth new challenges for analysis and modeling. Here we review recent work on the representation and analysis of temporal networks of face-to-face human proximity, based on large-scale datasets collected in the context of the SocioPatterns collaboration. We show that the raw behavioral data can be studied at various levels of coarse-graining, which turn out to be complementary to one another, with each level exposing different features of the underlying system. We briefly review a generative model of temporal contact networks that reproduces some statistical observables. Then, we shift our focus from surface statistical features to dynamical processes on empirical temporal networks. We discuss how simple dynamical processes can be used as probes to expose important features of the interaction patterns, such as burstiness and causal constraints. We show that simulating dynamical processes on empirical temporal networks can unveil differences between datasets that would otherwise look statistically similar. Moreover, we argue that, due to the temporal heterogeneity of human dynamics, in order to investigate the temporal properties of spreading processes it may be necessary to abandon the notion of wall-clock time in favour of an intrinsic notion of time for each individual node, defined in terms of its activity level. We conclude highlighting several open research questions raised by the nature of the data at hand.Comment: Chapter of the book "Temporal Networks", Springer, 2013. Series: Understanding Complex Systems. Holme, Petter; Saram\"aki, Jari (Eds.
    • 

    corecore