Search CORE

3,624 research outputs found

Efficient treatment of outliers and class imbalance for diabetes prediction

Author: KORKONTZELOS YANNIS
NNAMOKO NONSO
Publication venue: 'Elsevier BV'
Publication date: 30/04/2020
Field of study

Edge Hill University Research Information Repository

Linear and Order Statistics Combiners for Pattern Classification

Author: Ghosh Joydeep
Tumer Kagan
Publication venue
Publication date: 01/01/1999
Field of study

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.Comment: 31 page

arXiv.org e-Print Archive

CiteSeerX

A big data MapReduce framework for fault diagnosis in cloud-based manufacturing

Author: Ajay Kumar (192967)
Alok Choudhary (1251471)
Lakshman S. Thakur (7199684)
Ravi Shankar (103040)
Publication venue
Publication date: 04/03/2016
Field of study

This research develops a MapReduce framework for automatic pattern recognition based on fault diagnosis by solving data imbalance problem in a cloud-based manufacturing (CBM). Fault diagnosis in a CBM system significantly contributes to reduce the product testing cost and enhances manufacturing quality. One of the major challenges facing the big data analytics in cloud-based manufacturing is handling of datasets, which are highly imbalanced in nature due to poor classification result when machine learning techniques are applied on such datasets. The framework proposed in this research uses a hybrid approach to deal with big dataset for smarter decisions. Furthermore, we compare the performance of radial basis function based Support Vector Machine classifier with standard techniques. Our findings suggest that the most important task in cloud-based manufacturing, is to predict the effect of data errors on quality due to highly imbalance unstructured dataset. The proposed framework is an original contribution to the body of literature, where our proposed MapReduce framework has been used for fault detection by managing data imbalance problem appropriately and relating it to firm’s profit function. The experimental results are validated using a case study of steel plate manufacturing fault diagnosis, with crucial performance matrices such as accuracy, specificity and sensitivity. A comparative study shows that the methods used in the proposed framework outperform the traditional ones

Loughborough University Institutional Repository

Feature Selection For The Fuzzy Artmap Neural Network Using A Hybrid Genetic Algorithm And Tabu Search

Author: Tang Weng Chin
Publication venue
Publication date: 01/07/2007
Field of study

Prestasi pengelas rangkaian neural amat bergantung kepada set data yang digunakan dalam process pembelajaran. The performance of Neural-Network (NN)-based classifiers is strongly dependent on the data set used for learning

Repository@USM

A Machine Learning Technique for Detection of Social Media Fake News

Author: Arowolo Micheal Olaolu
Misra Sanjay
Ogundokun Oluwaseun Ogundokun
Publication venue
Publication date: 01/01/2023
Field of study

publishedVersio

IFE Brage