15 research outputs found

    Learning the Probability of Activation in the Presence of Latent Spreaders

    Full text link
    When an infection spreads in a community, an individual's probability of becoming infected depends on both her susceptibility and exposure to the contagion through contact with others. While one often has knowledge regarding an individual's susceptibility, in many cases, whether or not an individual's contacts are contagious is unknown. We study the problem of predicting if an individual will adopt a contagion in the presence of multiple modes of infection (exposure/susceptibility) and latent neighbor influence. We present a generative probabilistic model and a variational inference method to learn the parameters of our model. Through a series of experiments on synthetic data, we measure the ability of the proposed model to identify latent spreaders, and predict the risk of infection. Applied to a real dataset of 20,000 hospital patients, we demonstrate the utility of our model in predicting the onset of a healthcare associated infection using patient room-sharing and nurse-sharing networks. Our model outperforms existing benchmarks and provides actionable insights for the design and implementation of targeted interventions to curb the spread of infection.Comment: To appear in AAA1-1

    Learning patterns from sequential and network data using probabilistic models

    Get PDF
    The focus of this thesis is on developing probabilistic models for data observed over temporal and graph domains, and the corresponding variational inference algorithms. In many real-world phenomena, sequential data points that are observed closer in time often exhibit higher degrees of dependency. Similarly, data points observed over a graph domain (e.g., user interests in a social network) may exhibit higher dependencies with lower degrees of separation over the graph. Furthermore, the connectivity structures that define the graph domain can also evolve temporally (i.e., temporal networks) and exhibit dependencies over time. The data sets observed over temporal and graph domains often (but not always) violate the independent and identically distributed (i.i.d.) assumption made by many mathematical models. The works presented in this dissertation address various challenges in modelling data sets that exhibit dependencies over temporal and graph domains. In Chapter 3, I present a stochastic variational inference algorithm that enables factorial hidden Markov models for sequential data to scale up to extremely long sequences. In Chapter 4, I propose a simple but powerful Gaussian process model that captures the dependencies of data points observed on a graph domain, and demonstrate its viability in graph-based semi-supervised learning problems. In Chapter 5, I present a dynamical model for graphs that captures the temporal evolution of the connectivity structures as well as the sparse connectivity structures often observed in temporal real network data sets. Finally, I summarise the contributions of the thesis and propose several directions for future works that can build on the proposed methods in Chapter 6

    Cumulative Distribution Functions As The Foundation For Probabilistic Models

    Get PDF
    This thesis discusses applications of probabilistic and connectionist models for constructing and training cumulative distribution functions (CDFs). First, it is shown how existing tools from the copula literature can be combined to build probabilistic models. It is found that this simple construction leads to numerical and scalability issues that make training and inference challenging. Next, several innovative ideas, combining neural networks, automatic differentiation and copula functions, introduce how to assemble black-box probabilistic models. The basic building block is a cumulative distribution function that is straightforward to construct, composed of arithmetic operations and nonlinear functions. There is no need to assume any specific parametric probability density function (PDF), making the model flexible and normalisation unnecessary. The only requirement is to design a computational graph that parameterises monotonically non-decreasing functions with a constrained range. Training can be then performed using standard tools from any neural network software library. Finally, factorial hidden Markov models (FHMMs) for sequential data are presented. It is shown how to leverage cumulative distribution functions in the form of the Gaussian copula and amortised stochastic variational method to encode hidden Markov chains coherently. This approach enables efficient learning and inference to model long sequences of high-dimensional data with long-range dependencies. Tackling such complex problems was impossible with the established FHMM approximate inference algorithm. It is empirically verified on several problems that some of the estimators introduced in this work can perform comparably or better than the currently popular models. Especially for tasks requiring tail-area or marginal probabilities that can be read directly from a cumulative distribution function

    Topic Modeling with Structured Priors for Text-Driven Science

    Get PDF
    Many scientific disciplines are being revolutionized by the explosion of public data on the web and social media, particularly in health and social sciences. For instance, by analyzing social media messages, we can instantly measure public opinion, understand population behaviors, and monitor events such as disease outbreaks and natural disasters. Taking advantage of these data sources requires tools that can make sense of massive amounts of unstructured and unlabeled text. Topic models, statistical models that posit low-dimensional representations of data, can uncover interesting latent structure in large text datasets and are popular tools for automatically identifying prominent themes in text. For example, prominent themes of discussion in social media might include politics and health. To be useful in scientific analyses, topic models must learn interpretable patterns that accurately correspond to real-world concepts of interest. This thesis will introduce topic models that can encode additional structures such as factorizations, hierarchies, and correlations of topics, and can incorporate supervision and domain knowledge. For example, topics about elections and Congressional legislation are related to each other (as part of a broader topic of “politics”), and certain political topics have partisan associations. These types of relations between topics can be modeled by formulating the Bayesian priors over parameters as functions of underlying “components,” which can be constrained in various ways to induce different structures. This approach is first introduced through a topic model called factorial LDA, which models a factorized structure in which topics are conceptually arranged in multiple dimensions. Factorial LDA can be used to model multiple types of information, for example topic and political ideology. We then introduce a family of structured-prior topic models called SPRITE, which creates a unifying representation that generalizes factorial LDA as well as other existing topic models, and creates a powerful framework for building new models. This thesis will also show how these topic models can be used in various scientific applications, such as extracting medical information from forums, measuring healthcare quality from patient reviews, and monitoring public opinion in social media

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    A Unifying Variational Inference Framework for Hierarchical Graph-Coupled HMM with an Application to Influenza Infection

    No full text
    The Hierarchical Graph-Coupled Hidden Markov Model (hGCHMM) is a useful tool for tracking and predicting the spread of contagious diseases, such as influenza, by leveraging social contact data collected from individual wearable devices. However, the existing inference algorithms depend on the assumption that the infection rates are small in probability, typically close to 0. The purpose of this paper is to build a unified learning framework for latent infection state estimation for the hGCHMM, regardless of the infection rate and transition function. We derive our algorithm based on a dynamic auto-encoding variational inference scheme, thus potentially generalizing the hGCHMM to models other than those that work on highly contagious diseases. We experimentally compare our approach with previous Gibbs EM algorithms and standard variational method mean-field inference, on both semi-synthetic data and app collected epidemiological and social records
    corecore