119 research outputs found

    Iterative Reclassification in Agglomerative Clustering

    Get PDF

    Adaptive sequential Monte Carlo for multiple changepoint analysis

    Get PDF
    Process monitoring and control requires detection of structural changes in a data stream in real time. This article introduces an efficient sequential Monte Carlo algorithm designed for learning unknown changepoints in continuous time. The method is intuitively simple: new changepoints for the latest window of data are proposed by conditioning only on data observed since the most recent estimated changepoint, as these observations carry most of the information about the current state of the process. The proposed method shows improved performance over the current state of the art. Another advantage of the proposed algorithm is that it can be made adaptive, varying the number of particles according to the apparent local complexity of the target changepoint probability distribution. This saves valuable computing time when changes in the changepoint distribution are negligible, and enables re-balancing of the importance weights of existing particles when a significant change in the target distribution is encountered. The plain and adaptive versions of the method are illustrated using the canonical continuous time changepoint problem of inferring the intensity of an inhomogeneous Poisson process, although the method is generally applicable to any changepoint problem. Performance is demonstrated using both conjugate and non-conjugate Bayesian models for the intensity. Appendices to the article are available online, illustrating the method on other models and applications

    Network-wide anomaly detection via the Dirichlet process

    No full text
    Statistical anomaly detection techniques provide the next layer of cyber-security defences below traditional signature-based approaches. This article presents a scalable, principled, probability-based technique for detecting outlying connectivity behaviour within a directed interaction network such as a computer network. Independent Bayesian statistical models are fit to each message recipient in the network using the Dirichlet process, which provides a tractable, conjugate prior distribution for an unknown discrete probability distribution. The method is shown to successfully detect a red team attack in authentication data obtained from the enterprise network of Los Alamos National Laboratory

    Mutually exciting point process graphs for modelling dynamic networks

    Get PDF
    A new class of models for dynamic networks is proposed, called mutually exciting point process graphs (MEG). MEG is a scalable network-wide statistical model for point processes with dyadic marks, which can be used for anomaly detection when assessing the significance of future events, including previously unobserved connections between nodes. The model combines mutually exciting point processes to estimate dependencies between events and latent space models to infer relationships between the nodes. The intensity functions for each network edge are characterized exclusively by node-specific parameters, which allows information to be shared across the network. This construction enables estimation of intensities even for unobserved edges, which is particularly important in real world applications, such as computer networks arising in cyber-security. A recursive form of the log-likelihood function for MEG is obtained, which is used to derive fast inferential procedures via modern gradient ascent algorithms. An alternative EM algorithm is also derived. The model and algorithms are tested on simulated graphs and real world datasets, demonstrating excellent performance. Supplementary materials for this article are available online

    Standardized partial sums and products of p-values

    Get PDF
    In meta analysis, a diverse range of methods for combining multiple p-values have been applied throughout the scientific literature. For sparse signals where only a small proportion of the p-values are truly significant, a technique called higher criticism has previously been shown to have asymptotic consistency and more power than Fisher’s original method. However, higher criticism and other related methods can still lack power. Three new, simple to compute statistics are now proposed for detecting sparse signals, based on standardizing partial sums or products of p-value order statistics. The use of standardization is theoretically justified with results demonstrating asymptotic normality, and avoids the computational difficulties encountered when working with analytic forms of the distributions of the partial sums and products. In particular, the standardized partial product demonstrates more power than existing methods for both the standard Gaussian mixture model and a real data example from computer network modeling

    Changepoint detection in non-exchangeable data

    Get PDF
    Changepoint models typically assume the data within each segment are independent and identically distributed conditional on some parameters that change across segments. This construction may be inadequate when data are subject to local correlation patterns, often resulting in many more changepoints fitted than preferable. This article proposes a Bayesian changepoint model that relaxes the assumption of exchangeability within segments. The proposed model supposes data within a segment are m-dependent for some unknown m⩾0 that may vary between segments, resulting in a model suitable for detecting clear discontinuities in data that are subject to different local temporal correlations. The approach is suited to both continuous and discrete data. A novel reversible jump Markov chain Monte Carlo algorithm is proposed to sample from the model; in particular, a detailed analysis of the parameter space is exploited to build proposals for the orders of dependence. Two applications demonstrate the benefits of the proposed model: computer network monitoring via change detection in count data, and segmentation of financial time series

    Measurements of Low Temperature Rate Coefficients for the Reaction of CH with CHâ‚‚O and Application to Dark Cloud and AGB Stellar Wind Models

    Get PDF
    Rate coefficients have been measured for the reaction of CH radicals with formaldehyde, CH₂O, over the temperature range of 31–133 K using a pulsed Laval nozzle apparatus combined with pulsed laser photolysis and laser-induced fluorescence spectroscopy. The rate coefficients are very large and display a distinct decrease with decreasing temperature below 70 K, although classical collision rate theory fails to reproduce this temperature dependence. The measured rate coefficients have been parameterized and used as input for astrochemical models for both dark cloud and Asymptotic Giant Branch stellar outflow scenarios. The models predict a distinct change (up to a factor of two) in the abundance of ketene, H₂CCO, which is the major expected molecular product of the CH + CH₂O reaction

    Clades and clans: a comparison study of two evolutionary models

    Get PDF
    The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model are two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions of clade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important in hypothesis testing and Bayesian analyses in phylogenetics. Here we show that these distributions are log-convex, which implies that very large clades or very small clades are more likely to occur under these two models. Moreover, we prove that there exists a critical value κ(n)\kappa(n) for each n⩾4n\geqslant 4 such that for a given clade with size kk, the probability that this clade is contained in a random tree with nn leaves generated under the YHK model is higher than that under the PDA model if 1<k<κ(n)1<k<\kappa(n), and lower if κ(n)<k<n\kappa(n)<k<n. Finally, we extend our results to binary unrooted trees, and obtain similar results for the distributions of clan sizes.Comment: 21page

    Low temperature gas phase reaction rate coefficient measurements: Toward modeling of stellar winds and the interstellar medium

    Get PDF
    Stellar winds of Asymptotic Giant Branch (AGB) stars are responsible for the production of ∼85% of the gas molecules in the interstellar medium (ISM), and yet very few of the gas phase rate coefficients under the relevant conditions (10 – 3000 K) needed to model the rate of production and loss of these molecules in stellar winds have been experimentally measured. If measured at all, the value of the rate coefficient has often only been obtained at room temperature, with extrapolation to lower and higher temperatures using the Arrhenius equation. However, non-Arrhenius behavior has been observed often in the few measured rate coefficients at low temperatures. In previous reactions studied, theoretical simulations of the formation of long-lived pre-reaction complexes and quantum mechanical tunneling through the barrier to reaction have been utilized to fit these non-Arrhenius behaviours of rate coefficients. Reaction rate coefficients that were predicted to produce the largest change in the production/loss of Complex Organic Molecules (COMs) in stellar winds at low temperatures were selected from a sensitivity analysis. Here we present measurements of rate coefficients using a pulsed Laval nozzle apparatus with the Pump Laser Photolysis - Laser Induced Fluorescence (PLP-LIF) technique. Gas flow temperatures between 30 – 134 K have been produced by the University of Leeds apparatus through the controlled expansion of N2 or Ar gas through Laval nozzles of a range of Mach numbers between 2.49 and 4.25. Reactions of interest include those of OH, CN, and CH with volatile organic species, in particular formaldehyde, a molecule which has been detected in the ISM. Kinetics measurements of these reactions at low temperatures will be presented using the decay of the radical reagent. Since formaldehyde and the formal radical (HCO) are potential building blocks of COMs in the interstellar medium, low temperature reaction rate coefficients for their production and loss can help to predict the formation pathways of COMs observed in the interstellar medium
    • …
    corecore