37 research outputs found

    Simulation of improper complex-valued sequences

    No full text
    Published versio

    Disassortativity of computer networks

    No full text
    Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a ‘corner case’ in the main literature on network data analysis, that greatly affects the performance of standard ‘off-the-shelf’ techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved

    Network-wide anomaly detection via the Dirichlet process

    No full text
    Statistical anomaly detection techniques provide the next layer of cyber-security defences below traditional signature-based approaches. This article presents a scalable, principled, probability-based technique for detecting outlying connectivity behaviour within a directed interaction network such as a computer network. Independent Bayesian statistical models are fit to each message recipient in the network using the Dirichlet process, which provides a tractable, conjugate prior distribution for an unknown discrete probability distribution. The method is shown to successfully detect a red team attack in authentication data obtained from the enterprise network of Los Alamos National Laboratory

    Kinematics of complex-valued time series

    No full text
    Published versio

    On Testing for Impropriety of Complex-Valued Gaussian Vectors

    Get PDF
    Published versio

    Choosing between methods of combining p-values

    No full text
    Combining p-values from independent statistical tests is a popular approach to meta-analysis, particularly when the data underlying the tests are either no longer available or are difficult to combine. A diverse range of p-value combination methods appear in the literature, each with different statistical properties. Yet all too often the final choice used in a meta-analysis can appear arbitrary, as if all effort has been expended building the models that gave rise to the p-values. Birnbaum (1954) showed that any reasonable p-value combiner must be optimal against some alternative hypothesis. Starting from this perspective and recasting each method of combining p-values as a likelihood ratio test, we present theoretical results for some of the standard combiners which provide guidance about how a powerful combiner might be chosen in practice

    Choosing between methods of combining <sub><i>p</i></sub>-values

    Get PDF

    Objective quantification of nanoscale protein distributions

    Get PDF
    Nanoscale distribution of molecules within small subcellular compartments of neurons critically influences their functional roles. Although, numerous ways of analyzing the spatial arrangement of proteins have been described, a thorough comparison of their effectiveness is missing. Here we present an open source software, GoldExt, with a plethora of measures for quantification of the nanoscale distribution of proteins in subcellular compartments (e.g. synapses) of nerve cells. First, we compared the ability of five different measures to distinguish artificial uniform and clustered patterns from random point patterns. Then, the performance of a set of clustering algorithms was evaluated on simulated datasets with predefined number of clusters. Finally, we applied the best performing methods to experimental data, and analyzed the nanoscale distribution of different pre- and postsynaptic proteins, revealing random, uniform and clustered sub-synaptic distribution patterns. Our results reveal that application of a single measure is sufficient to distinguish between different distributions

    Long memory estimation for complex-valued time series

    Get PDF
    Long memory has been observed for time series across a multitude of fields and the accurate estimation of such dependence, e.g. via the Hurst exponent, is crucial for the modelling and prediction of many dynamic systems of interest. Many physical processes (such as wind data), are more naturally expressed as a complex-valued time series to represent magnitude and phase information (wind speed and direction). With data collection ubiquitously unreliable, irregular sampling or missingness is also commonplace and can cause bias in a range of analysis tasks, including Hurst estimation. This article proposes a new Hurst exponent estimation technique for complex-valued persistent data sampled with potential irregularity. Our approach is justified through establishing attractive theoretical properties of a new complex-valued wavelet lifting transform, also introduced in this paper. We demonstrate the accuracy of the proposed estimation method through simulations across a range of sampling scenarios and complex- and real-valued persistent processes. For wind data, our method highlights that inclusion of the intrinsic correlations between the real and imaginary data, inherent in our complex-valued approach, can produce different persistence estimates than when using real-valued analysis. Such analysis could then support alternative modelling or policy decisions compared with conclusions based on real-valued estimation

    Meta-analysis of mid-p-values: some new results based on the convex order

    Get PDF
    The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated by the uniform distribution in a different stochastic order, called the convex order. The property leads us to discover some new finite-sample and asymptotic bounds on functions of mid-p-values, which can be used to combine results from different hypothesis tests conservatively, yet more powerfully, using mid-p-values rather than p-values. Our methodology is demonstrated on real data from a cyber-security application
    corecore