5,721 research outputs found

    Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree With Gene Selection

    Get PDF
    We investigate the problems of multiclass cancer classification with gene selection from gene expression data. Two different constructed multiclass classifiers with gene selection are proposed, which are fuzzy support vector machine (FSVM) with gene selection and binary classification tree based on SVM with gene selection. Using F test and recursive feature elimination based on SVM as gene selection methods, binary classification tree based on SVM with F test, binary classification tree based on SVM with recursive feature elimination based on SVM, and FSVM with recursive feature elimination based on SVM are tested in our experiments. To accelerate computation, preselecting the strongest genes is also used. The proposed techniques are applied to analyze breast cancer data, small round blue-cell tumors, and acute leukemia data. Compared to existing multiclass cancer classifiers and binary classification tree based on SVM with F test or binary classification tree based on SVM with recursive feature elimination based on SVM mentioned in this paper, FSVM based on recursive feature elimination based on SVM can find most important genes that affect certain types of cancer with high recognition accuracy

    Large Margin Distribution Machine Recursive Feature Elimination

    Get PDF
    We gratefully thank Dr Teng Zhang and Prof Zhi-Hua Zhou for providing the source code of “LDM” source code and their kind technical assistance. This work is supported by the National Natural Science Foundation of China (Nos. 61472159, 61572227) and Development Project of Jilin Province of China (Nos. 20160204022GX, 2017C033). This work is also partially supported by the 2015 Scottish Crucible Award funded by the Royal Society of Edinburgh and the 2016 PECE bursary provided by the Scottish Informatics & Computer Science Alliance (SICSA).Postprin

    Random Forests Based Rule Learning And Feature Elimination

    Get PDF
    Much research combines data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. We propose an efficient approach, combining rule extraction and feature elimination, based on 1-norm regularized random forests. This approach simultaneously extracts a small number of rules generated by random forests and selects important features. To evaluate this approach, we have applied it to several drug activity prediction data sets, microarray data sets, a seacoast chemical sensors data set, a Stockori flowering time data set, and three data sets from the UCI repository. This approach performs well compared to state-of-the-art prediction algorithms like random forests in terms of predictive performance and generates only a small number of decision rules. Some of the decision rules extracted are significant in solving the problem being studied. It demonstrates high potential in terms of prediction performance and interpretation on studying real applications

    Assessment of SVM Reliability for Microarray Data Analysis

    Get PDF
    The goal of our research is to provide techniques that can assess and validate the results of SVM-based analysis of microarray data. We present preliminary results of the effect of mislabeled training samples. We conducted several systematic experiments on artificial and real medical data using SVMs. We systematically flipped the labels of a fraction of the training data. We show that a relatively small number of mislabeled examples can dramatically decrease the performance as visualized on the ROC graphs. This phenomenon persists even if the dimensionality of the input space is drastically decreased, by using for example feature selection. Moreover we show that for SVM recursive feature elimination, even a small fraction of mislabeled samples can completely change the resulting set of genes. This work is an extended version of the previous paper [MBN04]

    Backward Sequential Feature Elimination And Joining Algorithms In Machine Learning

    Get PDF
    The Naïve Bayes Model is a special case of Bayesian networks with strong independence assumptions. It is typically used for classification problems. The Naïve Bayes model is trained using the given data to estimate the parameters necessary for classification. This model of classification is very popular since it is simple yet efficient and accurate. While the Naïve Bayes model is considered accurate on most of the problem instances, there is a set of problems for which the Naïve Bayes does not give accurate results when compared to other classifiers such as the decision tree algorithms. One reason for it could be the strong independence assumption of the Naïve Bayes model. This project aims at searching for dependencies between the features and studying the consequences of applying these dependencies in classifying instances. We propose two different algorithms, the Backward Sequential Joining and the Backward Sequential Elimination that can be applied in order to improve the accuracy of the Naïve Bayes model. We then compare the accuracies of the different algorithms and derive conclusion based on the results

    Does Gaussian Elimination Teach About Uninterpretable Feature Elimination in CHL?

    Get PDF
    Are uninterpretable features (uF) similar to vanishing unknowns, such as or in linear algebra? Does Gaussian elimination (GE) teach us anything about uF elimination (uFE) in the computational procedures of human natural language (CHL)? It may teach us nothing. It may be pointless to compare GE to uFE. To answer the question, we compare GE with uFE and raise questions that have not been posed seriously and perform a toy experiment of linear algebraic uFE with and weighted by coefficients and constants, which express the ratio (symmetry degree) of to in the probe (P) and goal (G). The three types of solvability in GE parallel those in uFE. Namely, ① a unique solution (successful GE), ② no solution (failed GE), and ③ infinitely many solutions (failed GE) respectively correspond to ① complete AGREE, ② incomplete AGREE, and ③ internal merge. ③ is recycled in uFE, but not in GE. The experiment answers (or deepens) many uFE puzzles. We also investigate whether graph theory teaches us anything about P, G, and sentence structures.departmental bulletin pape
    corecore