1,399 research outputs found
Ensembles of wrappers for automated feature selection in fish age classification
In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft
The Superiority of the Ensemble Classification Methods: A Comprehensive Review
The modern technologies, which are characterized by cyber-physical systems and internet of things expose organizations to big data, which in turn can be processed to derive actionable knowledge. Machine learning techniques have vastly been employed in both supervised and unsupervised environments in an effort to develop systems that are capable of making feasible decisions in light of past data. In order to enhance the accuracy of supervised learning algorithms, various classification-based ensemble methods have been developed. Herein, we review the superiority exhibited by ensemble learning algorithms based on the past that has been carried out over the years. Moreover, we proceed to compare and discuss the common classification-based ensemble methods, with an emphasis on the boosting and bagging ensemble-learning models. We conclude by out setting the superiority of the ensemble learning models over individual base learners. Keywords: Ensemble, supervised learning, Ensemble model, AdaBoost, Bagging, Randomization, Boosting, Strong learner, Weak learner, classifier fusion, classifier selection, Classifier combination. DOI: 10.7176/JIEA/9-5-05 Publication date: August 31st 2019
Robust predictive modelling of water pollution using biomarker data
This paper describes the methodology of building a predictive model for the
purpose of marine pollution monitoring, based on low quality biomarker data.
A step–by–step, systematic data analysis approach is presented, resulting in
design of a purely data–driven model, able to accurately discriminate between
various coastal water pollution levels.
The environmental scientists often try to apply various machine learning
techniques to their data without much success, mostly because of the lack of
experience with different methods and required ‘under the hood’ knowledge.
Thus this paper is a result of a collaboration between the machine learning and
environmental science communities, presenting a predictive model development
workflow, as well as discussing and addressing potential pitfalls and difficulties.
The novelty of the modelling approach presented lays in successful application
of machine learning techniques to high dimensional, incomplete biomarker
data, which to our knowledge has not been done before and is the result of close
collaboration between machine learning and environmental science communities
Density Preserving Sampling: Robust and Efficient Alternative to Cross-validation for Error Estimation
Estimation of the generalization ability of a classi-
fication or regression model is an important issue, as it indicates
the expected performance on previously unseen data and is
also used for model selection. Currently used generalization
error estimation procedures, such as cross-validation (CV) or
bootstrap, are stochastic and, thus, require multiple repetitions
in order to produce reliable results, which can be computationally
expensive, if not prohibitive. The correntropy-inspired density-
preserving sampling (DPS) procedure proposed in this paper
eliminates the need for repeating the error estimation procedure
by dividing the available data into subsets that are guaranteed to
be representative of the input dataset. This allows the production
of low-variance error estimates with an accuracy comparable to
10 times repeated CV at a fraction of the computations required
by CV. This method can also be used for model ranking and
selection. This paper derives the DPS procedure and investigates
its usability and performance using a set of public benchmark
datasets and standard classifier
Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan
- …