2,301 research outputs found

    Outbreak detection for temporal contact data

    Full text link
    Epidemic spreading is a widely studied process due to its importance and possibly grave consequences for society. While the classical context of epidemic spreading refers to pathogens transmitted among humans or animals, it is straightforward to apply similar ideas to the spread of information (e.g., a rumor) or the spread of computer viruses. This paper addresses the question of how to optimally select nodes for monitoring in a network of timestamped contact events between individuals. We consider three optimization objectives: the detection likelihood, the time until detection, and the population that is affected by an outbreak. The optimization approach we use is based on a simple greedy approach and has been proposed in a seminal paper focusing on information spreading and water contamination. We extend this work to the setting of disease spreading and present its application with two example networks: a timestamped network of sexual contacts and a network of animal transports between farms. We apply the optimization procedure to a large set of outbreak scenarios that we generate with a susceptible-infectious-recovered model. We find that simple heuristic methods that select nodes with high degree or many contacts compare well in terms of outbreak detection performance with the (greedily) optimal set of nodes. Furthermore, we observe that nodes optimized on past periods may not be optimal for outbreak detection in future periods. However, seasonal effects may help in determining which past period generalizes well to some future period. Finally, we demonstrate that the detection performance depends on the simulation settings. In general, if we force the simulator to generate larger outbreaks, the detection performance will improve, as larger outbreaks tend to occur in the more connected part of the network where the top monitoring nodes are typically located. A natural progression of this work is to analyze how a representative set of outbreak scenarios can be generated, possibly taking into account more realistic propagation models

    Observer Placement for Source Localization: The Effect of Budgets and Transmission Variance

    Get PDF
    When an epidemic spreads in a network, a key question is where was its source, i.e., the node that started the epidemic. If we know the time at which various nodes were infected, we can attempt to use this information in order to identify the source. However, maintaining observer nodes that can provide their infection time may be costly, and we may have a budget kk on the number of observer nodes we can maintain. Moreover, some nodes are more informative than others due to their location in the network. Hence, a pertinent question arises: Which nodes should we select as observers in order to maximize the probability that we can accurately identify the source? Inspired by the simple setting in which the node-to-node delays in the transmission of the epidemic are deterministic, we develop a principled approach for addressing the problem even when transmission delays are random. We show that the optimal observer-placement differs depending on the variance of the transmission delays and propose approaches in both low- and high-variance settings. We validate our methods by comparing them against state-of-the-art observer-placements and show that, in both settings, our approach identifies the source with higher accuracy.Comment: Accepted for presentation at the 54th Annual Allerton Conference on Communication, Control, and Computin

    Inferring spatial source of disease outbreaks using maximum entropy

    Get PDF
    Mathematical modeling of disease outbreaks can infer the future trajectory of an epidemic, allowing for making more informed policy decisions. Another task is inferring the origin of a disease, which is relatively difficult with current mathematical models. Such frameworks, across varying levels of complexity, are typically sensitive to input data on epidemic parameters, case counts, and mortality rates, which are generally noisy and incomplete. To alleviate these limitations, we propose a maximum entropy framework that fits epidemiological models, provides calibrated infection origin probabilities, and is robust to noise due to a prior belief model. Maximum entropy is agnostic to the parameters or model structure used and allows for flexible use when faced with sparse data conditions and incomplete knowledge in the dynamical phase of disease-spread, providing for more reliable modeling at early stages of outbreaks. We evaluate the performance of our model by predicting future disease trajectories based on simulated epidemiological data in synthetic graph networks and the real mobility network of New York State. In addition, unlike existing approaches, we demonstrate that the method can be used to infer the origin of the outbreak with accurate confidence. Indeed, despite the prevalent belief on the feasibility of contact-tracing being limited to the initial stages of an outbreak, we report the possibility of reconstructing early disease dynamics, including the epidemic seed, at advanced stages

    SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

    Full text link
    Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.Comment: 10 pages, published in KDD 201

    Deep Neural Network for Anomaly Detection

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ๊ธฐ์ˆ ๊ฒฝ์˜ยท๊ฒฝ์ œยท์ •์ฑ…์ „๊ณต,2019. 8. ํ™ฉ์ค€์„.Human insurgency is one of the prevalent, incessant, and threatening events happening worldwide. Among many topics of developmental studies, one of the seminal research focuses is to understand and model armed conflicts, which have been suspected to be linked to the capacity of a country in various ways, such as food security, child nutrition, economic welfare, and even environmental issues. Mapping human insurgencies is, therefore, imperative. To cope with the atrocities, there have been previous attempts to uncover the latent patterns of human insurgent incidents. The salient behavior of these insurgencies follows the 'power-law' distribution, which exhibits a heavy-tail. This feature implies that events far from the norm are nontrivial when compared with the normal distribution, where essentially no weight is far from the mean. This pattern indicates that the insurgencies are the few incidents happening with relentless severity, while the majority of the events occur with mere severity. To fully exploit the latent behavior of human insurgencies, this research focuses on the anomalies โ€” the events that have a great number of fatalities but little probability of occurrence, lying on the heavy tailโ€”. To detect such anomalies, a novel approach, variational autoencoder, is used. The seminal essence of this model lies in processing high-volume data and capturing their non-linearity, which makes data-driven detection possible. The results show that the trained model successfully detects anomalies when given test data, showing no false negatives (Type III error) or false positives (Type I error). This predictive model, if well deployed, can provide humanitarian aid agencies and governments the ability to efficiently allocate resources, reducing wastes and mitigating the level of conflict through targeted preventive policies๋ฌด๋ ฅ ์ถฉ๋Œ, ๊ธ‰์ง„์ ์ธ ํ…Œ๋Ÿฌ ๋“ฑ์„ ํฌํ•จํ•˜๋Š” ์ธ๊ฐ„ ํญ๋“ฑ (Human Insurgency)์€ ์ง€๊ธˆ๊ป ๋งŒ์—ฐํ•˜๊ณ  ๋Š์ž„์—†์ด ์ „ ์„ธ๊ณ„์ ์œผ๋กœ ์œ„ํ˜‘์„ ์ฃผ๋Š” ํ˜„์ƒ์ด๋‹ค. ์„ธ๊ณ„ ๊ฐœ๋ฐœ ๋ถ„์•ผ์˜ ์—ฌ๋Ÿฌ ์ฃผ์ œ ์ค‘ ๊ธฐ๊ด€ ๋ฐ ์กฐ์ง์˜ ์ฃผ์š”ํ•œ ๊ด€์‹ฌ์€ ์ด๋Ÿฌํ•œ ์ถฉ๋Œ์„ ์ดํ•ดํ•˜๊ณ  ๋ชจ๋ธ ํ•˜๋Š” ๊ฒƒ์ด์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜„์ƒ์€ ํ•œ ์ฒด์ œ์˜ ์‹๋Ÿ‰ ์•ˆ๋ณด, ์•„๋™ ์˜์–‘, ๊ฒฝ์ œ์ , ๊ทธ๋ฆฌ๊ณ  ํ™˜๊ฒฝ์  ๋ฌธ์ œ์™€๋„ ์—ฐ๊ด€์ด ๋˜๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผ๋˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฌด๋ ฅ ์ถฉ๋Œ์„ ํฌํ•จํ•œ ์ธ๊ฐ„ ํญ๋“ฑ ํ˜„์ƒ์„ ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์€ ํ•„์ˆ˜์ ์œผ๋กœ ์š”๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๊ธฐ์กด์˜ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ์ด๋Ÿฐ ํ˜„์ƒ๋“ค์˜ ์ˆจ์–ด ์žˆ๋Š” ํŒจํ„ด์„ ๋ถ„์„ํ•˜๊ณ ์ž ํ•˜๋Š” ์‹œ๋„๋“ค์ด ์žˆ์—ˆ๋‹ค. ๊ทธ์ค‘ ๋‘๋“œ๋Ÿฌ์ง„ ํŠน์ง•์€ ๋ฉฑ๋ฒ•์น™ (power law) ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋ฉฑ๋ฒ•์น™ ๋ถ„ํฌ๋Š” ๊ผฌ๋ฆฌ๊ฐ€ ๊ธด ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๊ธฐ์กด ์ •๊ทœ๋ถ„ํฌ์™€๋Š” ๋‹ฌ๋ฆฌ, ๋ฉฑ๋ฒ•์น™ ๋ถ„ํฌ๋Š” ์ค‘์‹ฌ์—์„œ ๋ฉ€๋ฆฌ ์œ„์น˜ํ•˜๊ณ  ์žˆ๋Š” ํ˜„์ƒ์ด ๋” ์ด์ƒ ๊ฐ„์†Œํžˆ ์กด์žฌํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์ƒ๋‹นํžˆ ์กด์žฌํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธ‰์ฆ ๊ทœ์น™์€ ์†Œ์ˆ˜์˜ ์‚ฌ๊ฑด๋“ค์ด ์ž‘์€ ํ™•๋ฅ ๋กœ ํฐ ๊ทœ๋ชจ๋กœ ๋ฐœ์ƒํ•˜๋Š” ๋ฐ˜๋ฉด, ๋‹ค์ˆ˜์˜ ์‚ฌ๊ฑด๋“ค์€ ์ž‘์€ ๊ทœ๋ชจ๋กœ ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•”์‹œํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ทœ์น™๋“ค์„ ํ™œ์šฉํ•˜์—ฌ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํญ๋“ฑ์„ ๋งคํ•‘ํ•˜๊ธฐ ์œ„ํ•ด ๋น„์ด์ƒ์  ์‚ฌ๊ฑด๋“ค์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค. ๋น„์ด์ƒ์  ์‚ฌ๊ฑด๋“ค์€ ๋ฉฑ๋ฒ•์น™์˜ ๊ผฌ๋ฆฌ์— ์œ„์น˜ํ•˜๋Š”, ๊ทœ๋ชจ๊ฐ€ ํฌ์ง€๋งŒ ์ž‘์€ ํ™•๋ฅ ๋กœ ๋ฐœ์ƒํ•˜๋Š” ์‚ฌ๊ฑด๋“ค์„ ์ผ์ปซ๋Š”๋‹ค. ์ด๋Ÿฌํ•œ ๋น„์ด์ƒ์  ์‚ฌ๊ฑด๋“ค์„ ํŒ๋ณ„ํ•˜๊ธฐ ์œ„ํ•ด, Variational Autoencoder (VAE)๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ๋„์ž…ํ•˜์˜€๋‹ค. ์ด ๋ชจ๋ธ์˜ ์žฅ์ ์€ ๋งŽ์€ ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , ๋น„ ์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋“ค์„ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ data-driven ํŒ๋ณ„์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋กœ ํ›ˆ๋ จํ•œ ๊ฒฐ๊ณผ, False Negative (Type III error)์™€ False Positive (Type I error)์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜๊ณ , ๋น„์ด์ƒ์ ์ธ ์‚ฌ๊ฑด๋“ค์„ ์„ฑ๊ณต์ ์œผ๋กœ ํŒ๋ณ„ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ์ œ์‹œํ•จ์œผ๋กœ ์‹ค์ œ ์ธ๋„์ฃผ์˜ ๋‹จ์ฒด๋‚˜ ์ •๋ถ€์— ์ ์šฉ๋˜์—ˆ์„ ๊ฒฝ์šฐ, ์ œํ•œ๋œ ์ž์›์„ ํšจ์œจ์ ์œผ๋กœ ๋ฐฐ์น˜ํ•˜์—ฌ ํ–ฅํ›„ ํญ๋“ฑ ๊ทœ๋ชจ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ณธ๋‹ค.Abstract i Contents iii List of Tables v List of Figures vi Chapter 1. Introduction 1 Chapter 2. Literature Review 5 2.1 Human Insurgency 5 2.1.1 Definition of Human Insurgency 7 2.2 Predictive Models of Human Insurgency 8 2.2.1 The limitation of previous literature 10 2.3 Latent Behavior of Human Insurgency 10 2.4 Anomaly Detection 12 2.4.1 Anomaly Detection Methods 13 Chapter 3. Methodology and Data 15 3.1 Model 15 3.1.1 Variational Inference 15 3.1.2 Autoencoder 17 3.1.3 Variational Autoencoder (VAE) 18 3.2 VAE Anomaly detection 23 3.3 Analysis Sequence 25 3.4 Data Description 26 Chapter 4. Result 31 4.1 Reconstruction Error 31 4.2 Performance Analysis 33 4.2.1 Accuracy 34 4.2.2 ROC (Receiver Operating Characteristics) 35 4.2.3 Precision 36 4.2.4 Recall (Sensitivity) 37 4.2.5 Precision vs Recall 38 Chapter 5. Discussion and Conclusion 40 5.1 Implication 40 5.2 Limitations 41 5.3 Further Research 42 Bibliography 43 Abstract (Korean) 51Maste
    • โ€ฆ
    corecore