3,878,722 research outputs found
Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis
Supervised classifying of biological samples based on genetic information,
(e.g. gene expression profiles) is an important problem in biostatistics. In
order to find both accurate and interpretable classification rules variable
selection is indispensable. This article explores how an assessment of the
individual importance of variables (effect size estimation) can be used to
perform variable selection. I review recent effect size estimation approaches
in the context of linear discriminant analysis (LDA) and propose a new
conceptually simple effect size estimation method which is at the same time
computationally efficient. I then show how to use effect sizes to perform
variable selection based on the misclassification rate which is the data
independent expectation of the prediction error. Simulation studies and real
data analyses illustrate that the proposed effect size estimation and variable
selection methods are competitive. Particularly, they lead to both compact and
interpretable feature sets.Comment: 21 pages, 2 figure
Sample genealogies and genetic variation in populations of variable size
We consider neutral evolution of a large population subject to changes in its
population size. For a population with a time-variable carrying capacity we
have computed the distributions of the total branch lengths of its sample
genealogies. Within the coalescent approximation we have obtained a general
expression, Eq. (27), for the moments of these distributions for an arbitrary
smooth dependence of the population size on time. We investigate how the
frequency of population-size variations alters the distributions. This allows
us to discuss their influence on the distribution of the number of mutations,
and on the population homozygosity in populations with variable size.Comment: 19 pages, 8 figures, 1 tabl
Finite size effect of harmonic measure estimation in a DLA model: Variable size of probe particles
A finite size effect in the probing of the harmonic measure in simulation of
diffusion-limited aggregation (DLA) growth is investigated. We introduce a
variable size of probe particles, to estimate harmonic measure and extract the
fractal dimension of DLA clusters taking two limits, of vanishingly small probe
particle size and of infinitely large size of a DLA cluster. We generate 1000
DLA clusters consisting of 50 million particles each, using an off-lattice
killing-free algorithm developed in the early work. The introduced method leads
to unprecedented accuracy in the estimation of the fractal dimension. We
discuss the variation of the probability distribution function with the size of
probing particles
Less redundant codes for variable size dictionaries
We report on a family of variable-length codes with less redundancy than the flat code used in most of the variable size dictionary-based compression methods. The length of codes belonging to this family is still bounded above by [log_2/ |D|] where |D| denotes the dictionary size. We describe three of these codes, namely, the balanced code, the phase-in-binary code (PB), and the depth-span code (DS). As the name implies, the balanced code is constructed by a height balanced tree, so it has the shortest average codeword length. The corresponding coding tree for the PB code has an interesting property that it is made of full binary phases, and thus the code can be computed efficiently using simple binary shifting operations. The DS coding tree is maintained in such a way that the coder always finds the longest extendable codeword and extends it until it reaches the maximum length. It is optimal with respect to the code-length contrast. The PB and balanced codes have almost similar improvements, around 3% to 7% which is very close to the relative redundancy in flat code. The DS code is particularly good in dealing with files with a large amount of redundancy, such as a running sequence of one symbol. We also did some empirical study on the codeword distribution in the LZW dictionary and proposed a scheme called dynamic block shifting (DBS) to further improve the codes' performance. Experiments suggest that the DBS is helpful in compressing random sequences. From an application point of view, PB code with DBS is recommended for general practical usage
On a Poissonian Change-Point Model with Variable Jump Size
A model of Poissonian observation having a jump (change-point) in the
intensity function is considered. Two cases are studied. The first one
corresponds to the situation when the jump size converges to a non-zero limit,
while in the second one the limit is zero. The limiting likelihood ratios in
these two cases are quite different. In the first case, like in the case of a
fixed jump size, the normalized likelihood ratio converges to a log Poisson
process. In the second case, the normalized likelihood ratio converges to a log
Wiener process, and so, the statistical problems of parameter estimation and
hypotheses testing are asymptotically equivalent in this case to the well known
problems of change-point estimation and testing for the model of a signal in
white Gaussian noise. The properties of the maximum likelihood and Bayesian
estimators, as well as those of the general likelihood ratio, Wald's and
Bayesian tests are deduced form the convergence of normalized likelihood
ratios. The convergence of the moments of the estimators is also established.
The obtained theoretical results are illustrated by numerical simulations
- …
