3,691 research outputs found
A new regularized least squares support vector regression for gene selection
<p>Abstract</p> <p>Background</p> <p>Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes.</p> <p>Results</p> <p>A novel approach to gene selection is proposed based on kernel similarities and kernel weights. We do not assume uniformity for subject contribution. Weights are calculated via regularized least squares support vector regression (RLS-SVR) of class levels on kernel similarities and are used to weight subject contribution. The cumulative sum of weighted expression levels are next ranked to select responsible genes. These procedures also work for multiclass classification. We demonstrate this algorithm on acute leukemia, colon cancer, small, round blue cell tumors of childhood, breast cancer, and lung cancer studies, using kernel Fisher discriminant analysis and support vector machines as classifiers. Other procedures are compared as well.</p> <p>Conclusion</p> <p>This approach is easy to implement and fast in computation for both binary and multiclass problems. The gene set provided by the RLS-SVR weight-based approach contains a less number of genes, and achieves a higher accuracy than other procedures.</p
Linear Time Feature Selection for Regularized Least-Squares
We propose a novel algorithm for greedy forward feature selection for
regularized least-squares (RLS) regression and classification, also known as
the least-squares support vector machine or ridge regression. The algorithm,
which we call greedy RLS, starts from the empty feature set, and on each
iteration adds the feature whose addition provides the best leave-one-out
cross-validation performance. Our method is considerably faster than the
previously proposed ones, since its time complexity is linear in the number of
training examples, the number of features in the original data set, and the
desired size of the set of selected features. Therefore, as a side effect we
obtain a new training algorithm for learning sparse linear RLS predictors which
can be used for large scale learning. This speed is possible due to matrix
calculus based short-cuts for leave-one-out and feature addition. We
experimentally demonstrate the scalability of our algorithm and its ability to
find good quality feature sets.Comment: 17 pages, 15 figure
A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data
Gene expression analysis aims at identifying the genes able to accurately
predict biological parameters like, for example, disease subtyping or
progression. While accurate prediction can be achieved by means of many
different techniques, gene identification, due to gene correlation and the
limited number of available samples, is a much more elusive problem. Small
changes in the expression values often produce different gene lists, and
solutions which are both sparse and stable are difficult to obtain. We propose
a two-stage regularization method able to learn linear models characterized by
a high prediction performance. By varying a suitable parameter these linear
models allow to trade sparsity for the inclusion of correlated genes and to
produce gene lists which are almost perfectly nested. Experimental results on
synthetic and microarray data confirm the interesting properties of the
proposed method and its potential as a starting point for further biological
investigationsComment: 17 pages, 8 Post-script figure
Regularization Paths for Generalized Linear Models via Coordinate Descent
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multi- nomial regression problems while the penalties include âÂÂ_1 (the lasso), âÂÂ_2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Adaptive Huber Regression
Big data can easily be contaminated by outliers or contain variables with
heavy-tailed distributions, which makes many conventional methods inadequate.
To address this challenge, we propose the adaptive Huber regression for robust
estimation and inference. The key observation is that the robustification
parameter should adapt to the sample size, dimension and moments for optimal
tradeoff between bias and robustness. Our theoretical framework deals with
heavy-tailed distributions with bounded -th moment for any . We establish a sharp phase transition for robust estimation of regression
parameters in both low and high dimensions: when , the estimator
admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on
the data, while only a slower rate is available in the regime .
Furthermore, this transition is smooth and optimal. In addition, we extend the
methodology to allow both heavy-tailed predictors and observation noise.
Simulation studies lend further support to the theory. In a genetic study of
cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown
to be more robust and predictive.Comment: final versio
- …