34 research outputs found
Two New Estimators of Entropy for Testing Normality
We present two new estimators for estimating the entropy of absolutely
continuous random variables. Some properties of them are considered,
specifically consistency of the first is proved. The introduced estimators are
compared with the existing entropy estimators. Also, we propose two new tests
for normality based on the introduced entropy estimators and compare their
powers with the powers of other tests for normality. The results show that the
proposed estimators and test statistics perform very well in estimating entropy
and testing normality. A real example is presented and analyzed.Comment: 28 page
Improved entropy based test of uniformity using ranked set samples
Ranked set sampling (RSS) is known to be superior to the traditional simple random sampling (SRS) in the sense that it often leads to more efficient inference procedures. Basic version of RSS has been extensively modified to come up with schemes resulting in more accurate estimators of the population attributes. Multistage ranked set sampling (MSRSS) is such a variation surpassing RSS. Entropy has been instrumental in constructing criteria for fitting of parametric models to the data. The goal of this article is to develop tests of uniformity based on sample entropy under RSS and MSRSS designs. A Monte Carlo simulation study is carried out to compare the power of the proposed tests under several alternative distributions with the ordinary test based on SRS. The results report that the new entropy tests have higher power than the original one for nearly all sample sizes and under alternatives considered
New Entropy Estimators with Smaller Root Mean Squared Error
New estimators of entropy of continuous random variable are suggested. The proposed estimators are investigated under simple random sampling (SRS), ranked set sampling (RSS), and double ranked set sampling (DRSS) methods. The estimators are compared with Vasicek (1976) and Al-Omari (2014) entropy estimators theoretically and by simulation in terms of the root mean squared error (RMSE) and bias values. The results indicate that the suggested estimators have less RMSE and bias values than their competing estimators introduced by Vasicek (1976) and Al-Omari (2014)
The Road to Quantum Computational Supremacy
We present an idiosyncratic view of the race for quantum computational
supremacy. Google's approach and IBM challenge are examined. An unexpected
side-effect of the race is the significant progress in designing fast classical
algorithms. Quantum supremacy, if achieved, won't make classical computing
obsolete.Comment: 15 pages, 1 figur
A comprehensive empirical power comparison of univariate goodness-of-fit tests for the Laplace distribution
In this paper, we do a comprehensive survey of all univariate goodness-of-fit
tests that we could find in the literature for the Laplace distribution, which
amounts to a total of 45 different test statistics. After eliminating
duplicates and considering parameters that yield the best power for each test,
we obtain a total of 38 different test statistics. An empirical power
comparison study of unmatched size is then conducted using Monte Carlo
simulations, with 400 alternatives spanning over 20 families of distributions,
for various sample sizes and confidence levels. A discussion of the results
follows, where the best tests are selected for different classes of
alternatives. A similar study was conducted for the normal distribution in
Rom\~ao et al. (2010), although on a smaller scale. Our work improves
significantly on Puig & Stephens (2000), which was previously the best-known
reference of this kind for the Laplace distribution. All test statistics and
alternatives considered here are integrated within the PoweR package for the R
software.Comment: 37 pages, 1 figure, 20 table
Spatial heterogeneity of air pollution statistics in Europe
Air pollution is one of the leading causes of death globally, and continues to have a detrimental effect on our health. In light of these impacts, an extensive range of statistical modelling approaches has been devised in order to better understand air pollution statistics. However, the time-varying statistics of different types of air pollutants are far from being fully understood. The observed probability density functions (PDFs) of concentrations depend very much on the spatial location and on the pollutant substance. In this paper, we analyse a large variety of data from 3544 different European monitoring sites and show that the PDFs of nitric oxide (NO), nitrogen dioxide ([Formula: see text] ) and particulate matter ([Formula: see text] and [Formula: see text] ) concentrations generically exhibit heavy tails and are asymptotically well approximated by q-exponential distributions with a given width parameter [Formula: see text] . We observe that the power-law parameter q and the width parameter [Formula: see text] vary widely for the different spatial locations. For each substance, we find different patterns of parameter clouds in the [Formula: see text] plane. These depend on the type of pollutants and on the environmental characteristics (urban/suburban/rural/traffic/industrial/background). This means the effective statistical physics description of air pollution exhibits a strong degree of spatial heterogeneity
Estimating Gene Interactions Using Information Theoretic Functionals
With an abundance of data resulting from high-throughput technologies, like DNA microarrays,
a race has been on the last few years, to determine the structures and functions of genes and
their products, the proteins. Inference of gene interactions, lies in the core of these efforts.
In all this activity, three important research issues have emerged. First, in much of the current
literature on gene regulatory networks, dependencies among variables in our case genes - are
assumed to be linear in nature, when in fact, in real-life scenarios this is seldom the case.
This disagreement leads to systematic deviation and biased evaluation. Secondly, although
the problem of undersampling, features in every piece of work as one of the major causes for
poor results, in practice it is overlooked and rarely addressed explicitly. Finally, inference
of network structures, although based on rigid mathematical foundations and computational
optimizations, often displays poor fitness values and biologically unrealistic link structures, due
- to a large extend - to the discovery of pairwise only interactions.
In our search for robust, nonlinear measures of dependency, we advocate that mutual information
and related information theoretic functionals (conditional mutual information, total
correlation) are possibly the most suitable candidates to capture both linear and nonlinear
interactions between variables, and resolve higher order dependencies.
To address these issues, we researched and implemented under a common framework, a selection
nonparametric estimators of mutual information for continuous variables. The focus of their
assessment was, their robustness to the limited sample sizes and their expansibility to higher
dimensions - important for the detection of more complex interaction structures. Two different
assessment scenaria were performed, one with simulated data and one with bootstrapping the
estimators in state-of-the-art network inference algorithms and monitor their predictive power
and sensitivity. The tests revealed that, in small sample size regimes, there is a significant difference
in the performance of different estimators, and naive methods such as uniform binning,
gave consistently poor results compared with more sophisticated methods.
Finally, a custom, modular mechanism is proposed, for the inference of gene interactions,
targeting the identi cation of some of the most common substructures in genetic networks,
that we believe will help improve accuracy and predictability scores