Search CORE

36 research outputs found

A hybrid algorithm to improve the accuracy of support vector machines on skewed data-sets

Author: A. Fernández
B.X. Wang
G. Wu
G.E. Batista
H. Han
N.V. Chawla
R. Akbani
S. García
Z.-Q. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Over the past few years, has been shown that generalization power of Support Vector Machines (SVM) falls dramatically on imbalanced data-sets. In this paper, we propose a new method to improve accuracy of SVM on imbalanced data-sets. To get this outcome, firstly, we used undersampling and SVM to obtain the initial SVs and a sketch of the hyperplane. These support vectors help to generate new artificial instances, which will take part as the initial population of a genetic algorithm. The genetic algorithm improves the population in artificial instances from one generation to another and eliminates instances that produce noise in the hyperplane. Finally, the generated and evolved data were included in the original data-set for minimizing the imbalance and improving the generalization ability of the SVM on skewed data-sets

Crossref

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets

Author: Dapeng Wang
Yong Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

In imbalanced learning methods, resampling methods modify an imbalanced dataset to form a balanced dataset. Balanced data sets perform better than imbalanced datasets for many base classifiers. This paper proposes a cost-sensitive ensemble method based on cost-sensitive support vector machine (SVM), and query-by-committee (QBC) to solve imbalanced data classification. The proposed method first divides the majority-class dataset into several subdatasets according to the proportion of imbalanced samples and trains subclassifiers using AdaBoost method. Then, the proposed method generates candidate training samples by QBC active learning method and uses cost-sensitive SVM to learn the training samples. By using 5 class-imbalanced datasets, experimental results show that the proposed method has higher area under ROC curve (AUC), F-measure, and G-mean than many existing class-imbalanced learning methods

Crossref

Directory of Open Access Journals

Hyperspectral Image Processing for Detection and Grading of Skin Erythema

Author: Abdlaty Ramy,
Doerwald-Munoz Lilian,
Drew Mark,
Fang Qiyin
Hayward Joseph,
Madooei Ali
Zerubia Josiane
Publication venue: HAL CCSD
Publication date: 11/02/2017
Field of study

International audienceVisual assessment is the most common clinical investigation of skin reactions in radiotherapy. Due to the subjective nature of this method, additional noninvasive techniques are needed for more accurate evaluation. Our goal is to evaluate the effectiveness of hyperspectral image analysis for that purpose. In this pilot study, we focused on detection and grading of skin Erythema. This paper reports our proposed processing pipeline and experimental findings. Experiments have been performed to demonstrate the efficacy of the proposed approach for (1) reproducing clinical assessments, and (2) outperforming RGB imaging data

INRIA a CCSD electronic archive server

Some Approaches for Software Defect Prediction

Author: Raukas Hans
Publication venue
Publication date: 01/01/2017
Field of study

Käesoleva töö peamiseks eesmärgiks on anda üldisem ülevaade protsessidest tarkvara vigade hindamise mudelites, mis kasutavad masinõppe klassifikaatoreid, ja analüüsida mõningaid hindamiseskperimentide tulemusi, mis on läbi viidud antud töös refereeritud uurimistöödes. Lisaks on antud lühike selgitus antud töös vaadeldavates tarkvara vigade hindamise mudelites kasutatud algoritmidest ja tuuakse välja ning seletatakse lahti mõned hinnangumõõdikud, mida kasutatakse tarkvara vigade hindamise mudelite hindamistäpsuste mõõtmiseks. Tuuakse välja ka üldine ülevaade vaadeldavates tarkvara vigade hindamise mudelites toimuvatest protsessidest.The main idea of this thesis is to give a general overview of the processes within the soft-ware defect prediction models using machine learning classifiers and to provide analysis to some of the results of the evaluation experiments conducted in the research papers covered in this work. Additionally, a brief explanation of the algorithms used within the software defect prediction models covered in this work is given and some of the evaluation measures used to evaluate the prediction accuracy of software defect prediction models are listed and explained. Also, a general overview of the processes within a handful of specific software defect prediction models is provided

DSpace at Tartu University Library

An advance extended binomial GLMBoost ensemble method with synthetic minority over-sampling technique for handling imbalanced datasets

Author: Mallick Manas Kumar
Mishra Debahuti
Rout Neelam
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 31/03/2023
Field of study

Classification is an important activity in a variety of domains. Class imbalance problem have reduced the performance of the traditional classification approaches. An imbalance problem arises when mismatched class distributions are discovered among the instances of class of classification datasets. An advance extended binomial GLMBoost (EBGLMBoost) coupled with synthetic minority over-sampling technique (SMOTE) technique is the proposed model in the study to manage imbalance issues. The SMOTE is used to solve the proposed model, ensuring that the target variable's distribution is balanced, whereas the GLMBoost ensemble techniques are built to deal with imbalanced datasets. For the entire experiment, twenty different datasets are used, and support vector machine (SVM), Nu-SVM, bagging, and AdaBoost classification algorithms are used to compare with the suggested method. The model's sensitivity, specificity, geometric mean (G-mean), precision, recall, and F-measure resulted in percentages for training and testing datasets are 99.37, 66.95, 80.81, 99.21, 99.37, 99.29 and 98.61, 54.78, 69.88, 98.77, 96.61, 98.68, respectively. With the help of the Wilcoxon test, it is determined that the proposed technique performed well on unbalanced data. Finally, the proposed solutions are capable of efficiently dealing with the problem of class imbalance

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

A big data MapReduce framework for fault diagnosis in cloud-based manufacturing

Author: Ajay Kumar (192967)
Alok Choudhary (1251471)
Lakshman S. Thakur (7199684)
Ravi Shankar (103040)
Publication venue
Publication date: 04/03/2016
Field of study

This research develops a MapReduce framework for automatic pattern recognition based on fault diagnosis by solving data imbalance problem in a cloud-based manufacturing (CBM). Fault diagnosis in a CBM system significantly contributes to reduce the product testing cost and enhances manufacturing quality. One of the major challenges facing the big data analytics in cloud-based manufacturing is handling of datasets, which are highly imbalanced in nature due to poor classification result when machine learning techniques are applied on such datasets. The framework proposed in this research uses a hybrid approach to deal with big dataset for smarter decisions. Furthermore, we compare the performance of radial basis function based Support Vector Machine classifier with standard techniques. Our findings suggest that the most important task in cloud-based manufacturing, is to predict the effect of data errors on quality due to highly imbalance unstructured dataset. The proposed framework is an original contribution to the body of literature, where our proposed MapReduce framework has been used for fault detection by managing data imbalance problem appropriately and relating it to firm’s profit function. The experimental results are validated using a case study of steel plate manufacturing fault diagnosis, with crucial performance matrices such as accuracy, specificity and sensitivity. A comparative study shows that the methods used in the proposed framework outperform the traditional ones

Loughborough University Institutional Repository

Automatic Defect Detection for TFT-LCD Array Process Using Quasiconformal Kernel Support Vector Data Description

Author: Burges
Campbell
Chawla
Chawla
Chen
He
Hoffmann
Japkowicz
Jo
Kwok
Lanckriet
Lee
Lee
Liu
Liu
Liu
Manevitz
Manevitz
Markou
Markou
Mease
Pan
Peng
Raskutti
Roweis
Ryan
Schölkopf
Schölkopf
Schölkopf
Song
Sun
Tax
Ting
Tsai
Tsang
Tsang
Vapnik
Weston
Wu
Wu
Yan-Jen Chen
Yi-Hung Liu
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2011
Field of study

Defect detection has been considered an efficient way to increase the yield rate of panels in thin film transistor liquid crystal display (TFT-LCD) manufacturing. In this study we focus on the array process since it is the first and key process in TFT-LCD manufacturing. Various defects occur in the array process, and some of them could cause great damage to the LCD panels. Thus, how to design a method that can robustly detect defects from the images captured from the surface of LCD panels has become crucial. Previously, support vector data description (SVDD) has been successfully applied to LCD defect detection. However, its generalization performance is limited. In this paper, we propose a novel one-class machine learning method, called quasiconformal kernel SVDD (QK-SVDD) to address this issue. The QK-SVDD can significantly improve generalization performance of the traditional SVDD by introducing the quasiconformal transformation into a predefined kernel. Experimental results, carried out on real LCD images provided by an LCD manufacturer in Taiwan, indicate that the proposed QK-SVDD not only obtains a high defect detection rate of 96%, but also greatly improves generalization performance of SVDD. The improvement has shown to be over 30%. In addition, results also show that the QK-SVDD defect detector is able to accomplish the task of defect detection on an LCD image within 60 ms

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

TransportTP: A two-phase classification approach for membrane transporter prediction and characterization

Author: Benedito Vagner A
Li Haiquan
Udvardi Michael K
Zhao Patrick Xuechun
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Membrane transporters play crucial roles in living cells. Experimental characterization of transporters is costly and time-consuming. Current computational methods for transporter characterization still require extensive curation efforts, especially for eukaryotic organisms. We developed a novel genome-scale transporter prediction and characterization system called TransportTP that combined homology-based and machine learning methods in a two-phase classification approach. First, traditional homology methods were employed to predict novel transporters based on sequence similarity to known classified proteins in the Transporter Classification Database (TCDB). Second, machine learning methods were used to integrate a variety of features to refine the initial predictions. A set of rules based on transporter features was developed by machine learning using well-curated proteomes as guides. Results In a cross-validation using the yeast proteome for training and the proteomes of ten other organisms for testing, TransportTP achieved an equivalent recall and precision of 81.8%, based on TransportDB, a manually annotated transporter database. In an independent test using the Arabidopsis proteome for training and four recently sequenced plant proteomes for testing, it achieved a recall of 74.6% and a precision of 73.4%, according to our manual curation. Conclusions TransportTP is the most effective tool for eukaryotic transporter characterization up to date.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Penggunaan Random Under Sampling Untuk Penanganan Ketidakseimbangan Kelas Pada Prediksi Cacat Software Berbasis Neural Network

Author: Irawan E. (Erna)
Wahono R. S. (Romi)
Publication venue: None
Publication date: 01/01/2015
Field of study

Penurunan kualitas software dan biaya perbaikan yang tinggi dapat diakibatkan kesalahan atau cacat pada software. Prediksi cacat software sangat penting di dalam software engineering, terutama dalam mengatasi masalah efektifitas dan efisiensi sehingga dapat meningkatkan kualitas software. Neural Network (NN) merupakan algoritma klasifikasi yang telah terbukti mampu mengatasi masalah data nonlinear dan memiliki sensitifitas yang tinggi terhadap suatu data serta mampu menganalisa data yang besar. Dataset NASA MDP merupakan data metric yang nonlinear perangkat lunak yang biasa digunakan untuk penelitian software defect prediction (prediksi cacat software). Terdapat 62 penelitian dari 208 penelitian menggunakan dataset NASA. NASA MDP memiliki kelemahan yaitu kelas yang tidak seimbang sehingga dapat menurunkan kinerja dari model prediksi cacat software. Untuk menangani ketidakseimbangan kelas dalam dataset NASA MDP adalah dengan menggunakan metode level data yaitu Random Under Sampling (RUS). RUS ditujukan untuk memperbaiki ketidakseimbangan kelas. Metode yang diusulkan untuk menangani ketidakseimbangan kelas pada Neural Network (NN) adalah penerapan RUS. Eksperimen yang diusulkan untuk membandingkan hasil kinerja Neural Network sebelum dan sesudah diterapkan metode RUS, serta dibandingkan dengan model yang lainnya. Hasil Eksperimen rata-rata AUC pada NN (0.80) dan NN+RUS (0.82). Hasil uji Wilcoxon dan Friedman menunjukan bahwa bahwa AUC NN+RUS memiliki perbedaan yang signifikan dengan NN dengan p-value wilcoxon = 0.002 dan p-value friedman = 0.003 (p<0.05). Menurut uji friedman terdapat perbedaan AUC yang signifikan antara NN+RUS dengan NN, NN+SMOTE, NB, dan C45 karena nilai p-value < 0.0001. Maka dapat disimpulkan bahwa penerapan model RUS terbukti dapat menangani masalah ketidakseimbangan kelas pada prediksi cacat software berbasis neural network

Neliti

Máquinas de soporte vectorial sobre conjuntos de datos no balanceados: propuesta de un nuevo sesgo

Author: Angulo Bahón Cecilio
González Abril Luis
Núñez Castro Haydemar
Publication venue
Publication date: 01/01/2012
Field of study

En el aprendizaje con conjuntos de datos no balanceados, la máquina de soporte vectorial (SVM) puede exhibir un bajo rendimiento sobre la clase minoritaria ya que, como otras máquinas de aprendizaje, están diseñadas para inducir un modelo de clasificación basado en un error global. Con el fin de mejorar su desempeño en este tipo de problemas, en este trabajo se propone una estrategia de post-procesamiento basada en el cálculo de un nuevo sesgo o umbral que toma en cuenta la proporción de las clases en el conjunto de datos y que permite ajustar la función aprendida por la SVM para mejorar su desempeño sobre la clase minoritaria. Esta solución no supone la entonación de nuevos parámetros ni la modificación del problema de optimización estándar para entrenar la SVM. Los resultados obtenidos de la experimentación sobre 23 conjuntos de datos con diferentes grados de desbalance, muestran que efectivamente se logra mejorar las clasificaciones sobre la clase minoritaria, medidas en función de g-media y la sensibilidad.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC