Search CORE

182 research outputs found

Simulating High-Dimensional Multivariate Data using the bigsimr R Package

Author: Bedrick Edward J.
Knudson Alexander D.
Kozubowski Tomasz J.
Nguyen Tin
Panorska Anna K.
Petereit Juli
Piegorsch Walter W.
Schissler A. Grant
Tran Duc
Publication venue
Publication date: 11/11/2021
Field of study

It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to

d=10,000

. We describe example workflows and apply to a high-dimensional data set -- RNA-sequencing data obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures, https://cran.r-project.org/web/packages/bigsimr/index.htm

arXiv.org e-Print Archive

On a Calculus-based Statistics Course for Life Science Students

Author: Diamond J
Efron B
Hall M. R.
Hoel P. G.
Hogg R. V.
Lansing J. S.
Meyer S. L.
Moore D. S.
Nelson D. L.
Nolan D.
Nolan D.
Nolan D.
Piegorsch W. W.
Pitman J.
Polya G.
Powell L. A.
Ross S.
Rossman A.
Siegrist K.
Wiehe T
Zweig M. H.
Publication venue: American Society for Cell Biology
Publication date: 01/01/2010
Field of study

The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the curriculum

CiteSeerX

Crossref

PubMed Central

Efficient Parallel Statistical Model Checking of Biochemical Networks

Author: A. Pnueli
A. S. Miner
Adnan Aziz
B. Novak
Christel Baier
D. Donaldson R.
D.O. Morgan
D.O. Morgan
D.T. Gillespie
D.T. Gillespie
Davide Prandi
E. B. Wilson
Edmund M Clarke
Edmund M. Clarke
Fran¸ cois Fages
H. A. Hansson
H. Kitano
H. Li
H. Younes
J.-P. Katoen
Jaco van de Pol
Jiv r'ı Barnat
L. Dematte
Laurence Calzone
Lawrence D. Brown
Lawrence D. Brown
Lubos Brim
M. Kwiatkowska
M. Kwiatkowska
M. Scarpa
Michele Forlin
P. Ballarini
P. Ballarini
Paolo Ballarini
T. Tian
Thomas Hérault
Tommaso Mazza
Walter W. Piegorsch
William J. Stewart
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2009
Field of study

We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals