9,745 research outputs found
Imbalanced Ensemble Classifier for learning from imbalanced business school data set
Private business schools in India face a common problem of selecting quality
students for their MBA programs to achieve the desired placement percentage.
Generally, such data sets are biased towards one class, i.e., imbalanced in
nature. And learning from the imbalanced dataset is a difficult proposition.
This paper proposes an imbalanced ensemble classifier which can handle the
imbalanced nature of the dataset and achieves higher accuracy in case of the
feature selection (selection of important characteristics of students) cum
classification problem (prediction of placements based on the students'
characteristics) for Indian business school dataset. The optimal value of an
important model parameter is found. Numerical evidence is also provided using
Indian business school dataset to assess the outstanding performance of the
proposed classifier
Evaluation of Performance Measures for Classifiers Comparison
The selection of the best classification algorithm for a given dataset is a
very widespread problem, occuring each time one has to choose a classifier to
solve a real-world problem. It is also a complex task with many important
methodological decisions to make. Among those, one of the most crucial is the
choice of an appropriate measure in order to properly assess the classification
performance and rank the algorithms. In this article, we focus on this specific
task. We present the most popular measures and compare their behavior through
discrimination plots. We then discuss their properties from a more theoretical
perspective. It turns out several of them are equivalent for classifiers
comparison purposes. Futhermore. they can also lead to interpretation problems.
Among the numerous measures proposed over the years, it appears that the
classical overall success rate and marginal rates are the more suitable for
classifier comparison task
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
- …