1,577 research outputs found

    Embedded Feature Ranking for Ensemble MLP Classifiers

    Full text link

    Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    Get PDF
    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan

    Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction

    Get PDF
    This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.The study was partially funded by the “Accion Transversal del Cancer”, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI08/0533, PI08/1359, PS09/00773, PS09/01286, PS09/01903, PS09/02078, PS09/01662, PI11/01403, PI11/01889, PI11/00226, PI11/01810, PI11/02213, PI12/00488, PI12/00265, PI12/01270, PI12/00715, PI12/00150), by the Fundación Marqués de Valdecilla (API 10/09), by the ICGC International Cancer Genome Consortium CLL, by the Junta de Castilla y León (LE22A10-2), by the Consejería de Salud of the Junta de Andalucía (PI-0571), by the Conselleria de Sanitat of the Generalitat Valenciana (AP 061/10), by the Recercaixa (2010ACUP 00310), by the Regional Government of the Basque Country by European Commission grants FOOD-CT- 2006-036224- HIWATE, by the Spanish Association Against Cancer (AECC) Scientific Foundation, by the The Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Furthermore, at the Public Health Laboratory from Gipuzkoa and the Basque Biobank. Furthermore, sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d’Oncologia de Catalunya (XBTC). Biological samples were stored at the “Biobanco La Fe” which is supported by Instituto de Salud Carlos III (RD 09 0076/00021) and FISABIO biobanking, which is supported by Instituto de Salud Carlos III (RD09 0076/00058).S

    Astrophysical Data Analytics based on Neural Gas Models, using the Classification of Globular Clusters as Playground

    Get PDF
    In Astrophysics, the identification of candidate Globular Clusters through deep, wide-field, single band HST images, is a typical data analytics problem, where methods based on Machine Learning have revealed a high efficiency and reliability, demonstrating the capability to improve the traditional approaches. Here we experimented some variants of the known Neural Gas model, exploring both supervised and unsupervised paradigms of Machine Learning, on the classification of Globular Clusters, extracted from the NGC1399 HST data. Main focus of this work was to use a well-tested playground to scientifically validate such kind of models for further extended experiments in astrophysics and using other standard Machine Learning methods (for instance Random Forest and Multi Layer Perceptron neural network) for a comparison of performances in terms of purity and completeness.Comment: Proceedings of the XIX International Conference "Data Analytics and Management in Data Intensive Domains" (DAMDID/RCDL 2017), Moscow, Russia, October 10-13, 2017, 8 pages, 4 figure

    DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAndroid malware has continued to grow in volume and complexity posing significant threats to the security of mobile devices and the services they enable. This has prompted increasing interest in employing machine learning to improve Android malware detection. In this paper, we present a novel classifier fusion approach based on a multilevel architecture that enables effective combination of machine learning algorithms for improved accuracy. The framework (called DroidFusion), generates a model by training base classifiers at a lower level and then applies a set of ranking-based algorithms on their predictive accuracies at the higher level in order to derive a final classifier. The induced multilevel DroidFusion model can then be utilized as an improved accuracy predictor for Android malware detection. We present experimental results on four separate datasets to demonstrate the effectiveness of our proposed approach. Furthermore, we demonstrate that the DroidFusion method can also effectively enable the fusion of ensemble learning algorithms for improved accuracy. Finally, we show that the prediction accuracy of DroidFusion, despite only utilizing a computational approach in the higher level, can outperform stacked generalization, a well-known classifier fusion method that employs a meta-classifier approach in its higher level

    Identifying hazardousness of sewer pipeline gas mixture using classification methods: a comparative study

    Get PDF
    In this work, we formulated a real-world problem related to sewer pipeline gas detection using the classification-based approaches. The primary goal of this work was to identify the hazardousness of sewer pipeline to offer safe and non-hazardous access to sewer pipeline workers so that the human fatalities, which occurs due to the toxic exposure of sewer gas components, can be avoided. The dataset acquired through laboratory tests, experiments, and various literature sources was organized to design a predictive model that was able to identify/classify hazardous and non-hazardous situation of sewer pipeline. To design such prediction model, several classification algorithms were used and their performances were evaluated and compared, both empirically and statistically, over the collected dataset. In addition, the performances of several ensemble methods were analyzed to understand the extent of improvement offered by these methods. The result of this comprehensive study showed that the instance-based learning algorithm performed better than many other algorithms such as multilayer perceptron, radial basis function network, support vector machine, reduced pruning tree. Similarly, it was observed that multi-scheme ensemble approach enhanced the performance of base predictors
    corecore