Search CORE

Brunel University Research Archive

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Author: Chen Lei
Chen Zhongxue
Deng Youping
Huang Xudong
Liu Jianzhong
Liu Qingzhong
Qiao Mengyu
Sung Andrew H
Wang Zhaohui
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.</p

Scholarly Works @ SHSU (Sam Houston State University)

Harvard University - DASH

Springer - Publisher Connector

Public Library of Science (PLOS)

Feature selection for chemical sensor arrays using mutual information

Author: A Kraskov
A Krause
A Rakotomamonjy
A Vergara
A Vergara
A Vergara
A Vergara
Amalia Z. Berna
AZ Berna
B Nelson
B Raman
C Cortes
C Guestrin
CC Chang
CE Shannon
E Llobet
H Dacres
H Koinuma
H Peng
H Zheng
I Guyon
I Rodriguez-Lujan
I Rodriguez-Lujan
IJ Myung
J Fonollosa
J Gardner
James P. Brody
Joseph T. Lizier
L Breiman
L Olsson
M Aleixandre
M Pardo
M Pardo
Mikhail Prokopenko
MK Muezzinoglu
N Friedman
R Battiti
R Binions
S Marco
S Martínez
S Pashami
Stephen C. Trowell
T Nowotny
TC Pearce
Thomas Nowotny
TM Cover
X. Rosalind Wang
XR Wang
Y Saeys
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays

Western Sydney ResearchDirect

Sussex Research Online

FigShare

Optimized parameter search for large datasets of the regularization parameter and feature selection for ridge regression

Author: AN Tikhonov
B Efron
Benjamin Schrauwen
D Allen
David Verstraeten
F Ojeda
G Cawley
G Golub
G Huang
I Guyon
J Sherman
J Suykens
Joni Dambre
K Brabanter De
K Pelckmans
K Pelckmans
KA Toh
Ken Caluwaerts
M Lukoševičius
P Buteneers
Pieter Buteneers
W Press
X Dutoit
Y Miche
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

In this paper we propose mathematical optimizations to select the optimal regularization parameter for ridge regression using cross-validation. The resulting algorithm is suited for large datasets and the computational cost does not depend on the size of the training set. We extend this algorithm to forward or backward feature selection in which the optimal regularization parameter is selected for each possible feature set. These feature selection algorithms yield solutions with a sparse weight matrix using a quadratic cost on the norm of the weights. A naive approach to optimizing the ridge regression parameter has a computational complexity of the order with the number of applied regularization parameters, the number of folds in the validation set, the number of input features and the number of data samples in the training set. Our implementation has a computational complexity of the order . This computational cost is smaller than that of regression without regularization for large datasets and is independent of the number of applied regularization parameters and the size of the training set. Combined with a feature selection algorithm the algorithm is of complexity and for forward and backward feature selection respectively, with the number of selected features and the number of removed features. This is an order faster than and for the naive implementation, with for large datasets. To show the performance and reduction in computational cost, we apply this technique to train recurrent neural networks using the reservoir computing approach, windowed ridge regression, least-squares support vector machines (LS-SVMs) in primal space using the fixed-size LS-SVM approximation and extreme learning machines

Ghent University Academic Bibliography

ofw: An R package to select continuous variables for multiclass classification with a stochastic wrapper method

Author: Chabrier Patrick
Le Cao Kim-Anh
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

When dealing with high dimensional and low sample size data, feature selection is often needed to help reduce the dimension of the variable space while optimizing the classification task. Few tools exist for selecting variables in such data sets, especially when classes are numerous ( > 2). We have developed ofw, an R package that implements, in the context of classification, the meta algorithm "optimal feature weighting". We focus on microarray data, although the method can be applied to any p >> n problems with continuous variables. The aim is to select relevant variables and to numerically evaluate the resulting variable selection. Two versions are proposed with the application of supervised multiclass classifiers such as classification and regression trees and support vector machines. Furthermore, a weighted approach can be chosen to deal with unbalanced multiclasses, a common characteristic in microarray data sets

Scientific Publications of the University of Toulouse II Le Mirail

Journal of Statistical Software

HAL-INSA Toulouse

ProdInra

University of Queensland eSpace

Acoustic and Device Feature Fusion for Load Recognition

Author: Gluhak Alexander
Imran Muhammad Ali
Nati Michele
Rajasegarar Sutharshan
Zoha Ahmed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Appliance-specific Load Monitoring (LM) provides a possible solution to the problem of energy conservation which is becoming increasingly challenging, due to growing energy demands within offices and residential spaces. It is essential to perform automatic appliance recognition and monitoring for optimal resource utilization. In this paper, we study the use of non-intrusive LM methods that rely on steady-state appliance signatures for classifying most commonly used office appliances, while demonstrating their limitation in terms of accurately discerning the low-power devices due to overlapping load signatures. We propose a multilayer decision architecture that makes use of audio features derived from device sounds and fuse it with load signatures acquired from energy meter. For the recognition of device sounds, we perform feature set selection by evaluating the combination of time-domain and FFT-based audio features on the state of the art machine learning algorithms. The highest recognition performance however is shown by support vector machines, for the device and audio recognition experiments. Further, we demonstrate that our proposed feature set which is a concatenation of device audio feature and load signature significantly improves the device recognition accuracy in comparison to the use of steady-state load signatures only

Deakin Research Online

University of Surrey

Enlighten

Surrey Research Insight

Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study

Author: Baumgartner Christian
Dehmer Matthias
Graber Armin
Kusonmano Kanthida
Liedl Klaus R.
Netzer Michael
Publication venue: The Scientific World Journal
Publication date: 01/01/2012
Field of study

A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints

AJOL - African Journals Online

Optimization of RBF-SVM hyperparameters using genetic algorithm for face recognit

Author: Ibrahim Y.
Okafor E.
Yahaya B.
Publication venue: 'African Journals Online (AJOL)'
Publication date: 24/03/2021
Field of study

Manual grid-search tuning of machine learning hyperparameters is very time-consuming. Hence, to curb this problem, we propose the use of a genetic algorithm (GA) for the selection of optimal radial-basis-function based support vector machine (RBF-SVM) hyperparameters; regularization parameter C and cost-factor γ. The resulting optimal parameters were used during the training of face recognition models. To train the models, we independently extracted features from the ORL face image dataset using local binary patterns (handcrafted) and deep learning architectures (pretrained variants of VGGNet). The resulting features were passed as input to either linear-SVM or optimized RBF-SVM. The results show that the models from optimized RBFSVM combined with deep learning or hand-crafted features yielded performances that surpass models obtained from Linear-SVM combined with the aforementioned features in most of the data splits. The study demonstrated that it is profitable to optimize the hyperparameters of an SVM to obtain the best classification performance. Keywords: Face Recognition, Feature Extraction, Local Binary Patterns, Transfer Learning, Genetic Algorithm and Support Vector  Machines

The Stock Trend Prediction Using Volume Weighted Support Vector Machine with a Hybrid Feature Selection Method to Predict the Stock Price Trend in Tehran Stock Exchange

Author: Nahid Dana
Saeed Bajalan
Saeed Fallahpour
Publication venue: 'Geophysical Center of the Russian Academy of Sciences'
Publication date: 01/11/2016
Field of study

In this study, a prediction model based on support vector machines (SVM) improved by introducing a volume weighted penalty function to the model was introduced to increase the accuracy of forecasting short term trends on the stock market to develop the optimal trading strategy. Along with VW-SVM classifier, a hybrid feature selection method was used that consisted of F-score as the filter part and supported Sequential forward selection as the wrapper part, to select the optimal feature subset. In order to verify the capability of the proposed model in successfully predicting short term trends, a trading strategy was developed. The model input included several technical indicators and statistical measures that were calculated for chosen 10 stocks from Tehran Stock Exchange. The results show that the VW-SVM, combined with the hybrid feature selection method, significantly increases the profitability of the proposed strategy compared to rival strategies, in terms of both overall rate of return and the maximum draw down during trading period