26 research outputs found

    Simulating High-Dimensional Multivariate Data using the bigsimr R Package

    Full text link
    It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to d=10,000d=10,000. We describe example workflows and apply to a high-dimensional data set -- RNA-sequencing data obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures, https://cran.r-project.org/web/packages/bigsimr/index.htm

    Machine learning method for the classification of the state of living organisms’ oscillations

    Get PDF
    The World Health Organization highlights the urgent need to address the global threat posed by antibiotic-resistant bacteria. Efficient and rapid detection of bacterial response to antibiotics and their virulence state is crucial for the effective treatment of bacterial infections. However, current methods for investigating bacterial antibiotic response and metabolic state are time-consuming and lack accuracy. To address these limitations, we propose a novel method for classifying bacterial virulence based on statistical analysis of nanomotion recordings. We demonstrated the method by classifying living Bordetella pertussis bacteria in the virulent or avirulence phase, and dead bacteria, based on their cellular nanomotion signal. Our method offers significant advantages over current approaches, as it is faster and more accurate. Additionally, its versatility allows for the analysis of cellular nanomotion in various applications beyond bacterial virulence classification

    Weak Limits for Multivariate Random Sums

    No full text
    Let {Xi, i[greater-or-equal, slanted]1} be a sequence of i.i.d. random vectors inRd, and let[nu]p, 0random sum, stable law, heavy-tailed distribution, geometric stable distribution, Linnik distribution, tail probability, mixture

    On moments and tail behavior of v-stable random variables

    No full text
    In this paper a class of limiting probability distributions of normalized sums of a random number of i.i.d. random variables is considered. The representation of such distributions via stable laws and asymptotic behavior of their moments and tail probabilities are established.Random summation Tail behavior of a distribution Stable law Geometric stable law

    Stochastic modeling of regime shifts

    No full text
    Probabilistic methods for modeling the distribution of regimes and their shifts over time are developed by drawing on statistical decision and limit theory of random sums. Multi-annual episodes of opposite sign are graphically and numerically represented by their duration, magnitude, and intensity. Duration is defined as the number of consecutive years above or below a reference line, magnitude is the sum of time series values for any given duration, and intensity is the ratio between magnitude and duration. Assuming that a regime shift can occur every year, independently of prior years, the waiting times for the regime shift (or regime duration) are naturally modeled by a geometric distribution. Because magnitude can be expressed as a random sum of N random variables (where N is duration), its probability distribution is mathematically derived and can be statistically tested. Here we analyze a reconstructed time series of the Pacific Decadal Oscillation (PDO), explicitly describe the geometric, exponential, and Laplace probability distributions for regime duration and magnitude, and estimate parameters from the data obtaining a reasonably good fit. This stochastic approach to modeling duration and magnitude of multi-annual events enables the computation of probabilities of climatic episodes, and it provides a rigorous solution to deciding whether 2 regimes are significantly different from one another

    A bivariate Levy process with negative binomial and gamma marginals

    Get PDF
    The joint distribution of X and N, where N has a geometric distribution and X is the sum of N IID exponential variables (independent of N), is infinitely divisible. This leads to a bivariate Levy process {(X(t), N(t)), t >= 0}, whose coordinates are correlated negative binomial and gamma processes. We derive basic properties of this process, including its covariance structure, representations, and stochastic self-similarity. We examine the joint distribution of (X(t), N(t)) at a fixed time t, along with the marginal and conditional distributions, joint integral transforms, moments, infinite divisibility, and stability with respect to random summation. We also discuss maximum likelihood estimation and simulation for this model

    Preparing Students for the Future: Extreme Events and Power Tails

    No full text
    AbstractWe provide tools for identification and exploration of data with very large variability having power law tails. Such data describe extreme features of processes such as fire losses, flood, drought, financial gain/loss, hurricanes, population of cities, among others. Prediction and quantification of extreme events are at the forefront of the current research needs, as these events have the strongest impact on our lives, safety, economics, and the environment. We concentrate on the intuitive, rather than rigorous mathematical treatment of models with heavy tails. Our goal is to introduce instructors to these important models and provide some tools for their identification and exploration. The methods we provide may be incorporated into courses such as probability, mathematical statistics, statistical modeling or regression methods. Our examples come from ecology and census fields. Supplementary materials for this article are available online
    corecore