6 research outputs found

    An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

    Get PDF
    Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.</jats:p

    Particle filtering in compartmental projection models

    Get PDF
    Simulation models are important tools for real-time forecasting of pandemics. Models help health decision makers examine interventions and secure strong guidance when anticipating outbreak evolution. However, models usually diverge from the real observations. Stochastics involved in pandemic systems, such as changes in human contact patterns play a substantial role in disease transmissions and are not usually captured in traditional dynamic models. In addition, models of emerging diseases face the challenge of limited epidemiological knowledge about the natural history of disease. Even when the information about natural history is available -- for example for endemic seasonal diseases -- transmission models are often simplified and are involved with omissions. Availability of data streams can provide a view of early days of a pandemic, but fail to predict how the pandemic will evolve. Recent developments of computational statistics algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo, provide the possibility of creating models based on historical data as well as re-grounding models based on ongoing data observations. The objective of this thesis is to combine particle filtering -- a Sequential Monte Carlo algorithm -- with system dynamics models of pandemics. We developed particle filtering models that can recurrently be re-grounded as new observations become available. To this end, we also examined the effectiveness of this arrangement which is subject to specifics of the configuration (e.g., frequency of data sampling). While clinically-diagnosed cases are valuable incoming data stream during an outbreak, new generation of geo-spatially specific data sources, such as search volumes can work as a complementary data resource to clinical data. As another contribution, we used particle filtering in a model which can be re-grounded based on both clinical and search volume data. Our results indicate that the particle filtering in combination with compartmental models provides accurate projection systems for the estimation of model states and also model parameters (particularly compared to traditional calibration methodologies and in the context of emerging communicable diseases). The results also suggest that more frequent sampling from clinical data improves predictive accuracy outstandingly. The results also present that assumptions to make regarding the parameters associated with the particle filtering itself and changes in contact rate were robust across adequacy of empirical data since the beginning of the outbreak and inter-observation interval. The results also support the use of data from Google search API along with clinical data

    Data and Design: Advancing Theory for Complex Adaptive Systems

    Get PDF
    Complex adaptive systems exhibit certain types of behaviour that are difficult to predict or understand using reductionist approaches, such as linearization or assuming conditions of optimality. This research focuses on the complex adaptive systems associated with public health. These are noted for being driven by many latent forces, shaped centrally by human behaviour. Dynamic simulation techniques, including agent-based models (ABMs) and system dynamics (SD) models, have been used to study the behaviour of complex adaptive systems, including in public health. While much has been learned, such work is still hampered by important limitations. Models of complex systems themselves can be quite complex, increasing the difficulty in explaining unexpected model behaviour, whether that behaviour comes from model code errors or is due to new learning. Model complexity also leads to model designs that are hard to adapt to growing knowledge about the subject area, further reducing model-generated insights. In the current literature of dynamic simulations of human public health behaviour, few focus on capturing explicit psychological theories of human behaviour. Given that human behaviour, especially health and risk behaviour, is so central to understanding of processes in public health, this work explores several methods to improve the utility and flexibility of dynamic models in public health. This work is undertaken in three projects. The first uses a machine learning algorithm, the particle filter, to augment a simple ABM in the presence of continuous disease prevalence data from the modelled system. It is shown that, while using the particle filter improves the accuracy of the ABM, when compared with previous work using SD with a particle filter, the ABM has some limitations, which are discussed. The second presents a model design pattern that focuses on scalability and modularity to improve the development time, testability, and flexibility of a dynamic simulation for tobacco smoking. This method also supports a general pattern of constructing hybrid models --- those that contain elements of multiple methods, such as agent-based or system dynamics. This method is demonstrated with a stylized example of tobacco smoking in a human population. The final line of work implements this modular design pattern, with differing mechanisms of addiction dynamics, within a rich behavioural model of tobacco purchasing and consumption. It integrates the results from a discrete choice experiment, which is a widely used economic method for study human preferences. It compares and contrasts four independent addiction modules under different population assumptions. A number of important insights are discussed: no single module was universally more accurate across all human subpopulations, demonstrating the benefit of exploring a diversity of approaches; increasing the number of parameters does not necessarily improve a module's predictions, since the overall least accurate module had the second highest number of parameters; and slight changes in module structure can lead to drastic improvements, implying the need to be able to iteratively learn from model behaviour

    Architectures and GPU-Based Parallelization for Online Bayesian Computational Statistics and Dynamic Modeling

    Get PDF
    Recent work demonstrates that coupling Bayesian computational statistics methods with dynamic models can facilitate the analysis of complex systems associated with diverse time series, including those involving social and behavioural dynamics. Particle Markov Chain Monte Carlo (PMCMC) methods constitute a particularly powerful class of Bayesian methods combining aspects of batch Markov Chain Monte Carlo (MCMC) and the sequential Monte Carlo method of Particle Filtering (PF). PMCMC can flexibly combine theory-capturing dynamic models with diverse empirical data. Online machine learning is a subcategory of machine learning algorithms characterized by sequential, incremental execution as new data arrives, which can give updated results and predictions with growing sequences of available incoming data. While many machine learning and statistical methods are adapted to online algorithms, PMCMC is one example of the many methods whose compatibility with and adaption to online learning remains unclear. In this thesis, I proposed a data-streaming solution supporting PF and PMCMC methods with dynamic epidemiological models and demonstrated several successful applications. By constructing an automated, easy-to-use streaming system, analytic applications and simulation models gain access to arriving real-time data to shorten the time gap between data and resulting model-supported insight. The well-defined architecture design emerging from the thesis would substantially expand traditional simulation models' potential by allowing such models to be offered as continually updated services. Contingent on sufficiently fast execution time, simulation models within this framework can consume the incoming empirical data in real-time and generate informative predictions on an ongoing basis as new data points arrive. In a second line of work, I investigated the platform's flexibility and capability by extending this system to support the use of a powerful class of PMCMC algorithms with dynamic models while ameliorating such algorithms' traditionally stiff performance limitations. Specifically, this work designed and implemented a GPU-enabled parallel version of a PMCMC method with dynamic simulation models. The resulting codebase readily has enabled researchers to adapt their models to the state-of-art statistical inference methods, and ensure that the computation-heavy PMCMC method can perform significant sampling between the successive arrival of each new data point. Investigating this method's impact with several realistic PMCMC application examples showed that GPU-based acceleration allows for up to 160x speedup compared to a corresponding CPU-based version not exploiting parallelism. The GPU accelerated PMCMC and the streaming processing system can complement each other, jointly providing researchers with a powerful toolset to greatly accelerate learning and securing additional insight from the high-velocity data increasingly prevalent within social and behavioural spheres. The design philosophy applied supported a platform with broad generalizability and potential for ready future extensions. The thesis discusses common barriers and difficulties in designing and implementing such systems and offers solutions to solve or mitigate them

    Incorporating Particle Filtering and System Dynamic Modelling in Infection Transmission of Measles and Pertussis

    Get PDF
    Childhood viral and bacterial infections remain an important public problem, and research into their dynamics has broader scientific implications for understanding both dynamical systems and associated methodologies at the population level. Measles and pertussis are two important childhood infectious diseases. Measles is a highly transmissible disease and is one of the leading causes of death among young children under 5 globally. Pertussis (whooping cough) is another common childhood infectious disease, which is most harmful for babies and young children and can be deadly. While the use of ongoing surveillance data and - recently - dynamic models offer insight on measles (or pertussis) dynamics, both suffer notable shortcomings when applied to measles (or pertussis) outbreak prediction. In this thesis, I apply the Sequential Monte Carlo approach of particle filtering, incorporating reported measles and pertussis incidence for Saskatchewan during the pre-vaccination era, using an adaptation of a previously contributed measles and pertussis compartmental models. To secure further insight, I also perform particle filtering on age structured adaptations of the models. For some models, I further consider two different methods of configuring the contact matrix. The results indicate that, when used with a suitable dynamic model, particle filtering can offer high predictive capacity for measles and pertussis dynamics and outbreak occurrence in a low vaccination context. Based on the most competitive model as evaluated by predictive accuracy, I have performed prediction and outbreak classification analysis. The prediction results demonstrated that the most competitive models could predict the measles and pertussis outbreak patterns and classify whether there will be an outbreak or not in the next month (Area under the ROC Curve of measles is 0.89, while pertussis is 0.91). I conclude that anticipating the outbreak dynamics of measles and pertussis in low vaccination regions by applying particle filtering with simple measles and pertussis transmission models, and incorporating time series of reported case counts, is a valuable technique to assist public health authorities in estimating risk and magnitude of measles and pertussis outbreaks. Such approach offers particularly strong value proposition for other pathogens with little-known dynamics, important latent drivers, and in the context of the growing number of high-velocity electronic data sources. Strong additional benefits are also likely to be realized from extending the application of this technique to highly vaccinated populations

    Transmission Modeling with Smartphone-based Sensing

    Get PDF
    Infectious disease spread is difficult to accurately measure and model. Even for well-studied pathogens, uncertainties remain regarding the dynamics of mixing behavior and how to balance simulation-generated estimates with empirical data. Smartphone-based sensing data promises the availability of inferred proximate contacts, with which we can improve transmission models. This dissertation addresses the problem of informing transmission models with proximity contact data by breaking it down into three sub-questions. Firstly, can proximity contact data inform transmission models? To this question, an extended-Kalman-filter enhanced System Dynamics Susceptible-Infectious-Removed (EKF-SD-SIR) model demonstrated the filtering approach, as a framework, for informing Systems Dynamics models with proximity contact data. This combination results in recurrently-regrounded system status as empirical data arrive throughout disease transmission simulations---simultaneously considering empirical data accuracy, growing simulation error between measurements, and supporting estimation of changing model parameters. However, as revealed by this investigation, this filtering approach is limited by the quality and reliability of sensing-informed proximate contacts, which leads to the dissertation's second and third questions---investigating the impact of temporal and spatial resolution on sensing inferred proximity contact data for transmission models. GPS co-location and Bluetooth beaconing are two of those common measurement modalities to sense proximity contacts with different underlying technologies and tradeoffs. However, both measurement modalities have shortcomings and are prone to false positives or negatives when used to detect proximate contacts because unmeasured environmental influences bias the data. Will differences in sensing modalities impact transmission models informed by proximity contact data? The second part of this dissertation compares GPS- and Bluetooth-inferred proximate contacts by accessing their impact on simulated attack rates in corresponding proximate-contact-informed agent-based Susceptible-Exposed-Infectious-Recovered (ABM-SEIR) models of four distinct contagious diseases. Results show that the inferred proximate contacts resulting from these two measurement modalities are different and give rise to significantly different attack rates across multiple data collections and pathogens. While the advent of commodity mobile devices has eased the collection of proximity contact data, battery capacity and associated costs impose tradeoffs between the frequency and scanning duration used for proximate-contact detection. The choice of a balanced sensing regime involves specifying temporal resolutions and interpreting sensing data---depending on circumstances such as the characteristics of a particular pathogen, accompanying disease, and underlying population. How will the temporal resolution of sensing impact transmission models informed by proximity contact data? Furthermore, how will circumstances alter the impact of temporal resolution? The third part of this dissertation investigates the impacts of sensing regimes on findings from two sampling methods of sensing at widely varying inter-observation intervals by synthetically downsampling proximity contact data from five contact network studies---with each of these five studies measuring participant-participant contact every 5 minutes for durations of four or more weeks. The impact of downsampling is evaluated through ABM-SEIR simulations from both population- and individual-level for 12 distinct contagious diseases and associated variants of concern. Studies in this part find that for epidemiological models employing proximity contact data, both the observation paradigms and the inter-observation interval configured to collect proximity contact data exert impacts on the simulation results. Moreover, the impact is subject to the population characteristics and pathogen infectiousness reflective (such as the basic reproduction number, R0R_0). By comparing the performance of two sampling methods of sensing, we found that in most cases, periodically observing for a certain duration can collect proximity contact data that allows agent-based models to produce a reasonable estimation of the attack rate. However, higher-resolution data are preferred for modeling individual infection risk. Findings from this part of the dissertation represent a step towards providing the empirical basis for guidelines to inform data collection that is at once efficient and effective. This dissertation addresses the problem of informing transmission models with proximity contact data in three steps. Firstly, the demonstration of an EKF-SD-SIR model suggests that the filtering approach could improve System Dynamics transmission models by leveraging proximity contact data. In addition, experiments with the EKF-SD-SIR model also revealed that the filtering approach is constrained by the limited quality and reliability of sensing-data-inferred proximate contacts. The following two parts of this dissertation investigate spatial-temporal factors that could impact the quality and reliability of sensor-collected proximity contact data. In the second step, the impact of spatial resolution is illustrated by differences between two typical sensing modalities---Bluetooth beaconing versus GPS co-location. Experiments show that, in general, proximity contact data collected with Bluetooth beaconing lead to transmission models with results different from those driven by proximity contact data collected with GPS co-location. Awareness of the differences between sensing modalities can aid researchers in incorporating proximity contact data into transmission models. Finally, in the third step, the impact of temporal resolution is elucidated by investigating the differences between results of transmission models led by proximity contact data collected with varying observation frequencies. These differences led by varying observation frequencies are evaluated under circumstances with alternative assumptions regarding sampling method, disease/pathogen type, and the underlying population. Experiments show that the impact of sensing regimes is influenced by the type of diseases/pathogens and underlying population, while sampling once in a while can be a decent choice across all situations. This dissertation demonstrated the value of a filtering approach to enhance transmission models with sensor-collected proximity contact data, as well as explored spatial-temporal factors that will impact the accuracy and reliability of sensor-collected proximity contact data. Furthermore, this dissertation suggested guidance for future sensor-based proximity contact data collection and highlighted needs and opportunities for further research on sensing-inferred proximity contact data for transmission models
    corecore