Search CORE

3,878,722 research outputs found

Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis

Author: Klaus Bernd
Publication venue
Publication date: 08/08/2012
Field of study

Supervised classifying of biological samples based on genetic information, (e.g. gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and variable selection methods are competitive. Particularly, they lead to both compact and interpretable feature sets.Comment: 21 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Scheduling variable-size packets in the DAVID metroplitan area network

Author: Bianco Andrea
Finochietto J.
Galante G.
Neri Fabio
Sarra V.
Publication venue: IEEE
Publication date: 01/01/2004
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Sample genealogies and genetic variation in populations of variable size

Author: Eriksson A.
Mehlig B.
Rafajlovic M.
Sagitov S.
Publication venue: 'Genetics Society of America'
Publication date: 30/11/2009
Field of study

We consider neutral evolution of a large population subject to changes in its population size. For a population with a time-variable carrying capacity we have computed the distributions of the total branch lengths of its sample genealogies. Within the coalescent approximation we have obtained a general expression, Eq. (27), for the moments of these distributions for an arbitrary smooth dependence of the population size on time. We investigate how the frequency of population-size variations alters the distributions. This allows us to discuss their influence on the distribution of the number of mutations, and on the population homozygosity in populations with variable size.Comment: 19 pages, 8 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Finite size effect of harmonic measure estimation in a DLA model: Variable size of probe particles

Author: Anton Yu. Menshutin
Ball
Barra
Cardy
Halsey
Hastings
Hastings
Hastings
Hentschel
Kaufman
Lawler
Lawler
Lee
Lev N. Shchur
Mandelbrot
Menshutin
Menshutin
Niemeyer
Plischke
Redner
Rostunov
Saffman
Sander
Sander
Schramm
Somfai
Tolman
Valery M. Vinokour
Witten
Publication venue: 'Elsevier BV'
Publication date: 01/10/2008
Field of study

A finite size effect in the probing of the harmonic measure in simulation of diffusion-limited aggregation (DLA) growth is investigated. We introduce a variable size of probe particles, to estimate harmonic measure and extract the fractal dimension of DLA clusters taking two limits, of vanishingly small probe particle size and of infinitely large size of a DLA cluster. We generate 1000 DLA clusters consisting of 50 million particles each, using an off-lattice killing-free algorithm developed in the early work. The introduced method leads to unprecedented accuracy in the estimation of the fractal dimension. We discuss the variation of the probability distribution function with the size of probing particles

arXiv.org e-Print Archive

Crossref

Less redundant codes for variable size dictionaries

Author: Rajpoot Nasir M.
Yao Zhen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

We report on a family of variable-length codes with less redundancy than the flat code used in most of the variable size dictionary-based compression methods. The length of codes belonging to this family is still bounded above by [log_2/ |D|] where |D| denotes the dictionary size. We describe three of these codes, namely, the balanced code, the phase-in-binary code (PB), and the depth-span code (DS). As the name implies, the balanced code is constructed by a height balanced tree, so it has the shortest average codeword length. The corresponding coding tree for the PB code has an interesting property that it is made of full binary phases, and thus the code can be computed efficiently using simple binary shifting operations. The DS coding tree is maintained in such a way that the coder always finds the longest extendable codeword and extends it until it reaches the maximum length. It is optimal with respect to the code-length contrast. The PB and balanced codes have almost similar improvements, around 3% to 7% which is very close to the relative redundancy in flat code. The DS code is particularly good in dealing with files with a large amount of redundancy, such as a running sequence of one symbol. We also did some empirical study on the codeword distribution in the LZW dictionary and proposed a scheme called dynamic block shifting (DBS) to further improve the codes' performance. Experiments suggest that the DBS is helpful in compressing random sequences. From an application point of view, PB code with DBS is recommended for general practical usage

CiteSeerX

Warwick Research Archives Portal Repository

On a Poissonian Change-Point Model with Variable Jump Size

Author: Dachian Serguei
Yang Lin
Publication venue
Publication date: 24/02/2015
Field of study

A model of Poissonian observation having a jump (change-point) in the intensity function is considered. Two cases are studied. The first one corresponds to the situation when the jump size converges to a non-zero limit, while in the second one the limit is zero. The limiting likelihood ratios in these two cases are quite different. In the first case, like in the case of a fixed jump size, the normalized likelihood ratio converges to a log Poisson process. In the second case, the normalized likelihood ratio converges to a log Wiener process, and so, the statistical problems of parameter estimation and hypotheses testing are asymptotically equivalent in this case to the well known problems of change-point estimation and testing for the model of a signal in white Gaussian noise. The properties of the maximum likelihood and Bayesian estimators, as well as those of the general likelihood ratio, Wald's and Bayesian tests are deduced form the convergence of normalized likelihood ratios. The convergence of the moments of the estimators is also established. The obtained theoretical results are illustrated by numerical simulations

arXiv.org e-Print Archive

HAL Clermont Université