20,940 research outputs found
Gene ranking and biomarker discovery under correlation
Biomarker discovery and gene ranking is a standard task in genomic high
throughput analysis. Typically, the ordering of markers is based on a
stabilized variant of the t-score, such as the moderated t or the SAM
statistic. However, these procedures ignore gene-gene correlations, which may
have a profound impact on the gene orderings and on the power of the subsequent
tests.
We propose a simple procedure that adjusts gene-wise t-statistics to take
account of correlations among genes. The resulting correlation-adjusted
t-scores ("cat" scores) are derived from a predictive perspective, i.e. as a
score for variable selection to discriminate group membership in two-class
linear discriminant analysis. In the absence of correlation the cat score
reduces to the standard t-score. Moreover, using the cat score it is
straightforward to evaluate groups of features (i.e. gene sets). For
computation of the cat score from small sample data we propose a shrinkage
procedure. In a comparative study comprising six different synthetic and
empirical correlation structures we show that the cat score improves estimation
of gene orderings and leads to higher power for fixed true discovery rate, and
vice versa. Finally, we also illustrate the cat score by analyzing metabolomic
data.
The shrinkage cat score is implemented in the R package "st" available from
URL http://cran.r-project.org/web/packages/st/Comment: 18 pages, 5 figures, 1 tabl
Face recognition in different subspaces - A comparative study
Face recognition is one of the most successful applications of image analysis and understanding and has gained much attention in recent years. Among many approaches to the problem of face recognition, appearance-based subspace analysis still gives the most promising results. In this paper we study the three most popular appearance-based face recognition projection methods (PCA, LDA and ICA). All methods are tested in equal working conditions regarding preprocessing and algorithm implementation on the FERET data set with its standard tests. We also compare the ICA method with its whitening preprocess and find out that there is no significant difference between them. When we compare different projection with different metrics we found out that the LDA+COS combination is the most promising for all tasks. The L1 metric gives the best results in
combination with PCA and ICA1, and COS is superior to any other metric when used with LDA and ICA2. Our results are compared to other studies and some discrepancies are pointed ou
Classification of geometrical objects by integrating currents and functional data analysis. An application to a 3D database of Spanish child population
This paper focuses on the application of Discriminant Analysis to a set of
geometrical objects (bodies) characterized by currents. A current is a relevant
mathematical object to model geometrical data, like hypersurfaces, through
integration of vector fields along them. As a consequence of the choice of a
vector-valued Reproducing Kernel Hilbert Space (RKHS) as a test space to
integrate hypersurfaces, it is possible to consider that hypersurfaces are
embedded in this Hilbert space. This embedding enables us to consider
classification algorithms of geometrical objects. A method to apply Functional
Discriminant Analysis in the obtained vector-valued RKHS is given. This method
is based on the eigenfunction decomposition of the kernel. So, the novelty of
this paper is the reformulation of a size and shape classification problem in
Functional Data Analysis terms using the theory of currents and vector-valued
RKHS. This approach is applied to a 3D database obtained from an anthropometric
survey of the Spanish child population with a potential application to online
sales of children's wear
PLS dimension reduction for classification of microarray data
PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets
- ā¦