36 research outputs found

    A hybrid algorithm to improve the accuracy of support vector machines on skewed data-sets

    Get PDF
    Over the past few years, has been shown that generalization power of Support Vector Machines (SVM) falls dramatically on imbalanced data-sets. In this paper, we propose a new method to improve accuracy of SVM on imbalanced data-sets. To get this outcome, firstly, we used undersampling and SVM to obtain the initial SVs and a sketch of the hyperplane. These support vectors help to generate new artificial instances, which will take part as the initial population of a genetic algorithm. The genetic algorithm improves the population in artificial instances from one generation to another and eliminates instances that produce noise in the hyperplane. Finally, the generated and evolved data were included in the original data-set for minimizing the imbalance and improving the generalization ability of the SVM on skewed data-sets

    A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets

    Get PDF
    In imbalanced learning methods, resampling methods modify an imbalanced dataset to form a balanced dataset. Balanced data sets perform better than imbalanced datasets for many base classifiers. This paper proposes a cost-sensitive ensemble method based on cost-sensitive support vector machine (SVM), and query-by-committee (QBC) to solve imbalanced data classification. The proposed method first divides the majority-class dataset into several subdatasets according to the proportion of imbalanced samples and trains subclassifiers using AdaBoost method. Then, the proposed method generates candidate training samples by QBC active learning method and uses cost-sensitive SVM to learn the training samples. By using 5 class-imbalanced datasets, experimental results show that the proposed method has higher area under ROC curve (AUC), F-measure, and G-mean than many existing class-imbalanced learning methods

    Hyperspectral Image Processing for Detection and Grading of Skin Erythema

    Get PDF
    International audienceVisual assessment is the most common clinical investigation of skin reactions in radiotherapy. Due to the subjective nature of this method, additional noninvasive techniques are needed for more accurate evaluation. Our goal is to evaluate the effectiveness of hyperspectral image analysis for that purpose. In this pilot study, we focused on detection and grading of skin Erythema. This paper reports our proposed processing pipeline and experimental findings. Experiments have been performed to demonstrate the efficacy of the proposed approach for (1) reproducing clinical assessments, and (2) outperforming RGB imaging data

    Some Approaches for Software Defect Prediction

    Get PDF
    KĂ€esoleva töö peamiseks eesmĂ€rgiks on anda ĂŒldisem ĂŒlevaade protsessidest tarkvara vigade hindamise mudelites, mis kasutavad masinĂ”ppe klassifikaatoreid, ja analĂŒĂŒsida mĂ”ningaid hindamiseskperimentide tulemusi, mis on lĂ€bi viidud antud töös refereeritud uurimistöödes. Lisaks on antud lĂŒhike selgitus antud töös vaadeldavates tarkvara vigade hindamise mudelites kasutatud algoritmidest ja tuuakse vĂ€lja ning seletatakse lahti mĂ”ned hinnangumÔÔdikud, mida kasutatakse tarkvara vigade hindamise mudelite hindamistĂ€psuste mÔÔtmiseks. Tuuakse vĂ€lja ka ĂŒldine ĂŒlevaade vaadeldavates tarkvara vigade hindamise mudelites toimuvatest protsessidest.The main idea of this thesis is to give a general overview of the processes within the soft-ware defect prediction models using machine learning classifiers and to provide analysis to some of the results of the evaluation experiments conducted in the research papers covered in this work. Additionally, a brief explanation of the algorithms used within the software defect prediction models covered in this work is given and some of the evaluation measures used to evaluate the prediction accuracy of software defect prediction models are listed and explained. Also, a general overview of the processes within a handful of specific software defect prediction models is provided

    An advance extended binomial GLMBoost ensemble method with synthetic minority over-sampling technique for handling imbalanced datasets

    Get PDF
    Classification is an important activity in a variety of domains. Class imbalance problem have reduced the performance of the traditional classification approaches. An imbalance problem arises when mismatched class distributions are discovered among the instances of class of classification datasets. An advance extended binomial GLMBoost (EBGLMBoost) coupled with synthetic minority over-sampling technique (SMOTE) technique is the proposed model in the study to manage imbalance issues. The SMOTE is used to solve the proposed model, ensuring that the target variable's distribution is balanced, whereas the GLMBoost ensemble techniques are built to deal with imbalanced datasets. For the entire experiment, twenty different datasets are used, and support vector machine (SVM), Nu-SVM, bagging, and AdaBoost classification algorithms are used to compare with the suggested method. The model's sensitivity, specificity, geometric mean (G-mean), precision, recall, and F-measure resulted in percentages for training and testing datasets are 99.37, 66.95, 80.81, 99.21, 99.37, 99.29 and 98.61, 54.78, 69.88, 98.77, 96.61, 98.68, respectively. With the help of the Wilcoxon test, it is determined that the proposed technique performed well on unbalanced data. Finally, the proposed solutions are capable of efficiently dealing with the problem of class imbalance

    A big data MapReduce framework for fault diagnosis in cloud-based manufacturing

    Get PDF
    This research develops a MapReduce framework for automatic pattern recognition based on fault diagnosis by solving data imbalance problem in a cloud-based manufacturing (CBM). Fault diagnosis in a CBM system significantly contributes to reduce the product testing cost and enhances manufacturing quality. One of the major challenges facing the big data analytics in cloud-based manufacturing is handling of datasets, which are highly imbalanced in nature due to poor classification result when machine learning techniques are applied on such datasets. The framework proposed in this research uses a hybrid approach to deal with big dataset for smarter decisions. Furthermore, we compare the performance of radial basis function based Support Vector Machine classifier with standard techniques. Our findings suggest that the most important task in cloud-based manufacturing, is to predict the effect of data errors on quality due to highly imbalance unstructured dataset. The proposed framework is an original contribution to the body of literature, where our proposed MapReduce framework has been used for fault detection by managing data imbalance problem appropriately and relating it to firm’s profit function. The experimental results are validated using a case study of steel plate manufacturing fault diagnosis, with crucial performance matrices such as accuracy, specificity and sensitivity. A comparative study shows that the methods used in the proposed framework outperform the traditional ones

    Automatic Defect Detection for TFT-LCD Array Process Using Quasiconformal Kernel Support Vector Data Description

    Get PDF
    Defect detection has been considered an efficient way to increase the yield rate of panels in thin film transistor liquid crystal display (TFT-LCD) manufacturing. In this study we focus on the array process since it is the first and key process in TFT-LCD manufacturing. Various defects occur in the array process, and some of them could cause great damage to the LCD panels. Thus, how to design a method that can robustly detect defects from the images captured from the surface of LCD panels has become crucial. Previously, support vector data description (SVDD) has been successfully applied to LCD defect detection. However, its generalization performance is limited. In this paper, we propose a novel one-class machine learning method, called quasiconformal kernel SVDD (QK-SVDD) to address this issue. The QK-SVDD can significantly improve generalization performance of the traditional SVDD by introducing the quasiconformal transformation into a predefined kernel. Experimental results, carried out on real LCD images provided by an LCD manufacturer in Taiwan, indicate that the proposed QK-SVDD not only obtains a high defect detection rate of 96%, but also greatly improves generalization performance of SVDD. The improvement has shown to be over 30%. In addition, results also show that the QK-SVDD defect detector is able to accomplish the task of defect detection on an LCD image within 60 ms

    TransportTP: A two-phase classification approach for membrane transporter prediction and characterization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Membrane transporters play crucial roles in living cells. Experimental characterization of transporters is costly and time-consuming. Current computational methods for transporter characterization still require extensive curation efforts, especially for eukaryotic organisms. We developed a novel genome-scale transporter prediction and characterization system called TransportTP that combined homology-based and machine learning methods in a two-phase classification approach. First, traditional homology methods were employed to predict novel transporters based on sequence similarity to known classified proteins in the Transporter Classification Database (TCDB). Second, machine learning methods were used to integrate a variety of features to refine the initial predictions. A set of rules based on transporter features was developed by machine learning using well-curated proteomes as guides.</p> <p>Results</p> <p>In a cross-validation using the yeast proteome for training and the proteomes of ten other organisms for testing, TransportTP achieved an equivalent recall and precision of 81.8%, based on TransportDB, a manually annotated transporter database. In an independent test using the Arabidopsis proteome for training and four recently sequenced plant proteomes for testing, it achieved a recall of 74.6% and a precision of 73.4%, according to our manual curation.</p> <p>Conclusions</p> <p>TransportTP is the most effective tool for eukaryotic transporter characterization up to date.</p

    Penggunaan Random Under Sampling Untuk Penanganan Ketidakseimbangan Kelas Pada Prediksi Cacat Software Berbasis Neural Network

    Full text link
    Penurunan kualitas software dan biaya perbaikan yang tinggi dapat diakibatkan kesalahan atau cacat pada software. Prediksi cacat software sangat penting di dalam software engineering, terutama dalam mengatasi masalah efektifitas dan efisiensi sehingga dapat meningkatkan kualitas software. Neural Network (NN) merupakan algoritma klasifikasi yang telah terbukti mampu mengatasi masalah data nonlinear dan memiliki sensitifitas yang tinggi terhadap suatu data serta mampu menganalisa data yang besar. Dataset NASA MDP merupakan data metric yang nonlinear perangkat lunak yang biasa digunakan untuk penelitian software defect prediction (prediksi cacat software). Terdapat 62 penelitian dari 208 penelitian menggunakan dataset NASA. NASA MDP memiliki kelemahan yaitu kelas yang tidak seimbang sehingga dapat menurunkan kinerja dari model prediksi cacat software. Untuk menangani ketidakseimbangan kelas dalam dataset NASA MDP adalah dengan menggunakan metode level data yaitu Random Under Sampling (RUS). RUS ditujukan untuk memperbaiki ketidakseimbangan kelas. Metode yang diusulkan untuk menangani ketidakseimbangan kelas pada Neural Network (NN) adalah penerapan RUS. Eksperimen yang diusulkan untuk membandingkan hasil kinerja Neural Network sebelum dan sesudah diterapkan metode RUS, serta dibandingkan dengan model yang lainnya. Hasil Eksperimen rata-rata AUC pada NN (0.80) dan NN+RUS (0.82). Hasil uji Wilcoxon dan Friedman menunjukan bahwa bahwa AUC NN+RUS memiliki perbedaan yang signifikan dengan NN dengan p-value wilcoxon = 0.002 dan p-value friedman = 0.003 (p&lt;0.05). Menurut uji friedman terdapat perbedaan AUC yang signifikan antara NN+RUS dengan NN, NN+SMOTE, NB, dan C45 karena nilai p-value &lt; 0.0001. Maka dapat disimpulkan bahwa penerapan model RUS terbukti dapat menangani masalah ketidakseimbangan kelas pada prediksi cacat software berbasis neural network

    MĂĄquinas de soporte vectorial sobre conjuntos de datos no balanceados: propuesta de un nuevo sesgo

    Get PDF
    En el aprendizaje con conjuntos de datos no balanceados, la måquina de soporte vectorial (SVM) puede exhibir un bajo rendimiento sobre la clase minoritaria ya que, como otras måquinas de aprendizaje, estån diseñadas para inducir un modelo de clasificación basado en un error global. Con el fin de mejorar su desempeño en este tipo de problemas, en este trabajo se propone una estrategia de post-procesamiento basada en el cålculo de un nuevo sesgo o umbral que toma en cuenta la proporción de las clases en el conjunto de datos y que permite ajustar la función aprendida por la SVM para mejorar su desempeño sobre la clase minoritaria. Esta solución no supone la entonación de nuevos paråmetros ni la modificación del problema de optimización eståndar para entrenar la SVM. Los resultados obtenidos de la experimentación sobre 23 conjuntos de datos con diferentes grados de desbalance, muestran que efectivamente se logra mejorar las clasificaciones sobre la clase minoritaria, medidas en función de g-media y la sensibilidad.Preprin
    corecore