3,878,722 research outputs found

    Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis

    Get PDF
    Supervised classifying of biological samples based on genetic information, (e.g. gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and variable selection methods are competitive. Particularly, they lead to both compact and interpretable feature sets.Comment: 21 pages, 2 figure

    Sample genealogies and genetic variation in populations of variable size

    Full text link
    We consider neutral evolution of a large population subject to changes in its population size. For a population with a time-variable carrying capacity we have computed the distributions of the total branch lengths of its sample genealogies. Within the coalescent approximation we have obtained a general expression, Eq. (27), for the moments of these distributions for an arbitrary smooth dependence of the population size on time. We investigate how the frequency of population-size variations alters the distributions. This allows us to discuss their influence on the distribution of the number of mutations, and on the population homozygosity in populations with variable size.Comment: 19 pages, 8 figures, 1 tabl

    Finite size effect of harmonic measure estimation in a DLA model: Variable size of probe particles

    Full text link
    A finite size effect in the probing of the harmonic measure in simulation of diffusion-limited aggregation (DLA) growth is investigated. We introduce a variable size of probe particles, to estimate harmonic measure and extract the fractal dimension of DLA clusters taking two limits, of vanishingly small probe particle size and of infinitely large size of a DLA cluster. We generate 1000 DLA clusters consisting of 50 million particles each, using an off-lattice killing-free algorithm developed in the early work. The introduced method leads to unprecedented accuracy in the estimation of the fractal dimension. We discuss the variation of the probability distribution function with the size of probing particles

    Less redundant codes for variable size dictionaries

    Get PDF
    We report on a family of variable-length codes with less redundancy than the flat code used in most of the variable size dictionary-based compression methods. The length of codes belonging to this family is still bounded above by [log_2/ |D|] where |D| denotes the dictionary size. We describe three of these codes, namely, the balanced code, the phase-in-binary code (PB), and the depth-span code (DS). As the name implies, the balanced code is constructed by a height balanced tree, so it has the shortest average codeword length. The corresponding coding tree for the PB code has an interesting property that it is made of full binary phases, and thus the code can be computed efficiently using simple binary shifting operations. The DS coding tree is maintained in such a way that the coder always finds the longest extendable codeword and extends it until it reaches the maximum length. It is optimal with respect to the code-length contrast. The PB and balanced codes have almost similar improvements, around 3% to 7% which is very close to the relative redundancy in flat code. The DS code is particularly good in dealing with files with a large amount of redundancy, such as a running sequence of one symbol. We also did some empirical study on the codeword distribution in the LZW dictionary and proposed a scheme called dynamic block shifting (DBS) to further improve the codes' performance. Experiments suggest that the DBS is helpful in compressing random sequences. From an application point of view, PB code with DBS is recommended for general practical usage

    On a Poissonian Change-Point Model with Variable Jump Size

    Get PDF
    A model of Poissonian observation having a jump (change-point) in the intensity function is considered. Two cases are studied. The first one corresponds to the situation when the jump size converges to a non-zero limit, while in the second one the limit is zero. The limiting likelihood ratios in these two cases are quite different. In the first case, like in the case of a fixed jump size, the normalized likelihood ratio converges to a log Poisson process. In the second case, the normalized likelihood ratio converges to a log Wiener process, and so, the statistical problems of parameter estimation and hypotheses testing are asymptotically equivalent in this case to the well known problems of change-point estimation and testing for the model of a signal in white Gaussian noise. The properties of the maximum likelihood and Bayesian estimators, as well as those of the general likelihood ratio, Wald's and Bayesian tests are deduced form the convergence of normalized likelihood ratios. The convergence of the moments of the estimators is also established. The obtained theoretical results are illustrated by numerical simulations
    corecore