68 research outputs found
Simulating High-Dimensional Multivariate Data using the bigsimr R Package
It is critical to accurately simulate data when employing Monte Carlo
techniques and evaluating statistical methodology. Measurements are often
correlated and high dimensional in this era of big data, such as data obtained
in high-throughput biomedical experiments. Due to the computational complexity
and a lack of user-friendly software available to simulate these massive
multivariate constructions, researchers resort to simulation designs that posit
independence or perform arbitrary data transformations. To close this gap, we
developed the Bigsimr Julia package with R and Python interfaces. This paper
focuses on the R interface. These packages empower high-dimensional random
vector simulation with arbitrary marginal distributions and dependency via a
Pearson, Spearman, or Kendall correlation matrix. bigsimr contains
high-performance features, including multi-core and
graphical-processing-unit-accelerated algorithms to estimate correlation and
compute the nearest correlation matrix. Monte Carlo studies quantify the
accuracy and scalability of our approach, up to . We describe example
workflows and apply to a high-dimensional data set -- RNA-sequencing data
obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures,
https://cran.r-project.org/web/packages/bigsimr/index.htm
Efficient Parallel Statistical Model Checking of Biochemical Networks
We consider the problem of verifying stochastic models of biochemical
networks against behavioral properties expressed in temporal logic terms. Exact
probabilistic verification approaches such as, for example, CSL/PCTL model
checking, are undermined by a huge computational demand which rule them out for
most real case studies. Less demanding approaches, such as statistical model
checking, estimate the likelihood that a property is satisfied by sampling
executions out of the stochastic model. We propose a methodology for
efficiently estimating the likelihood that a LTL property P holds of a
stochastic model of a biochemical network. As with other statistical
verification techniques, the methodology we propose uses a stochastic
simulation algorithm for generating execution samples, however there are three
key aspects that improve the efficiency: first, the sample generation is driven
by on-the-fly verification of P which results in optimal overall simulation
time. Second, the confidence interval estimation for the probability of P to
hold is based on an efficient variant of the Wilson method which ensures a
faster convergence. Third, the whole methodology is designed according to a
parallel fashion and a prototype software tool has been implemented that
performs the sampling/verification process in parallel over an HPC
architecture
Stimulus-Response Analysis for Data in the Form of Proportions
INTRODUCTION Dichotomous response models are common in many engineering settings, and they are an important endpoint in quality control and quality testing. Often, they represent the response of some experimental unit to an environmental or chemical stimulus, or of the unit over time, etc. Independent observations on each unit produce a value in the set {0,1} with some probability of binary response, p. A common design involves T populations, treatment groups, dose levels, etc. When some score or other quantification of the stimulus, x i (i=1,...,T), has been recorded along with the observations, an important issue for statistical study is the characterization of the stimulusresponse for use in prediction or assessment of the underlying phenomenon. Statistically, the recorded observations at the i th treatment level are taken as the number of "positive" outcomes, Y i , among the n i experimental units examined
Tables Of P-Values For t- And Chi-Square Reference Distributions
INTRODUCTION An important area of statistical practice involves determination of P-values when performing significance testing. If the null reference distribution is standard normal, then many standard statistical texts provide a table of probabilities that may be used to determine the P-value; examples include Casella and Berger (1990), Hogg and Tanis (1997), Iman (1994), Moore and McCabe (1993), Neter et al. (1996), Snedecor and Cochran (1980), Sokal and Rohlf (1995), and Steel and Torrie (1980), among many others. If the null reference distribution is slightly more complex, however, such as a t-distribution or a x 2 -distribution, most standard textbooks give only upper-a critical points rather than actual P-values. With the advent of modern statistical computing power, this is not a major concern; most statistical computing packages can output P-values associated w
On confidence bands and set estimators for the simple linear model
This paper reviews the duality between confidence bands and (convex) set estimators in a simple linear regression. Applications of this duality are explored. These include the nature of polygonal sets and the development of an algorithm that approximates the coverage probability of smooth confidence band functions.simultaneous inference linear regression linear segment confidence bands coverage probability approximation
- …