8 research outputs found
Automated Classification System for HEp-2 Cell Patterns
Human Epithelial Type-2 (HEp-2) cells are essential in diagnosing autoimmune diseases. Indirect immunofluorescence (IIF) imaging is a fundamental technique for detecting antinuclear antibodies in HEp-2 cells. The four main patterns of HEp-2 cells that are being identified are nucleolar, homogeneous, speckled and centromere. The most commonly used method to classify the patterns is manual evaluation. This method is prone to human error. This paper will propose an automated method of classifying HEp-2 cells patterns. The first stage is image enhancement using Histogram equalization contrast adjustment and Wiener Filter. The second stage uses Sobel Filter and Mean Filter for segmentation. The third stage feature extraction based on shape properties data extraction. The last stage uses classification based on different properties data abstracted. The results obtained are more than 90% for nucleolar and centromere and about 70% for homogenous and speckled. For future work, another feature extraction method need to be introduced to increase the accuracy of the classification result. The method suggested is to analyze and obtain the data based on the texture of the image
Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance
Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy
Splitting Rule Dan Penerapan Bagging Pada Pohon Klasifikasi. Studi Kasus: Pekerja Anak di Provinsi Sulawesi Tengah
Pengklasifikasian merupakan metode statistika yang digunakan untuk
mengelompokkan suatu pengamatan. Dalam statistika terdapat beberapa metode
klasifikasi, salah satunya adalah pohon klasifikasi yang merupakan metode
klasifikasi yang menghasilkan model pohon. Pembentukan pohon klasifikasi
ditentukan oleh proses splitting rule yang didasarkan pada ukuran keheterogenan.
Ukuran yang digunakan dalam tesis ini adalah kriteria indeks Gini dan indeks
Twoing. Pohon klasifikasi memiliki beberapa keunggulan terkait model dan hasil
klasifikasinya, tetapi juga memiliki kekurangan pada stabilitas model dan
keakuratan prediksi. Guna mengatasi kelemahan tersebut, bagging (bootstrap
aggregating) diterapkan pada pohon klasifikasi untuk meningkatkan stabilitas dan
keakuratan prediksi. Metode ini diaplikasikan pada data pekerja anak di provinsi
Sulawesi Tengah. Indeks Gini dan indeks Twoing memiliki konsep dan bentuk
yang berbeda namun memiliki keuntungan masing-masing dalam penggunaannya.
Pada kasus pekerja anak di Sulawesi Tengah, kedua indeks ini menghasilkan
pohon klasifikasi optimal yang identik dengan enam variabel pembentuk, yaitu
partisipasi sekolah anak, umur anak, jenis kelamin anak, umur kepala rumah
tangga, jumlah anggota rumah tangga, pendapatan perkapita dan tingkat
pendidikan kepala rumah tangga. Penerapan teknik bagging pada pohon
klasifikasi dalam kasus ini mampu meningkatkan ketepatan klasifikasinya. ========== Classification is statistical method that used to classify an observation. In
statistics, there are several classification methods, one of the methods is
classification tree that resulted classification trees model. Classification tree’s
formation determined by splitting rule process based on impurity measure. In this
thesis, the used measure are Gini index and Twoing index criteria. Classification
tree has some advantages related with the model and the classification result,
although it has a weakness on stability model that resulted and classification
accuracy. To solve this weakness, bagging (bootstrap aggregating) technique was
applied on classification tree method to increase the stability and classification
accuracy. This method applied on child labor’s case in Central Sulawesi province.
Gini index and Twoing index have different concept and form, however both
index have each advantage on its use. On child labor’s case in Central Sulawesi,
both index resulted identic optimal classification tree with six formed variables,
child’s school participation, child’s age, child’s sex, number of household
member, income per capita and head of household’s education level. Bagging
technique on classification tree resulted higher classification accuracy
Class imbalance ensemble learning based on the margin theory
The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning
Class imbalance ensemble learning based on the margin theory.
The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning
Analysis of HEp-2 images using MD-LBP and MAD-bagging
Indirect immunofluorescence imaging is employed to identify antinuclear antibodies in HEp-2 cells which founds the basis for diagnosing autoimmune diseases and other important pathological conditions involving the immune system. Six categories of HEp-2 cells are generally considered, namely homogeneous, fine speckled, coarse speckled, nucleolar, cyto-plasmic, and centromere cells. Typically, this categorisation is performed manually by an expert and is hence both time consuming and subjective. In this paper, we present a method for automatically classifiying HEp-2 cells using texture information in conjunction with a suitable classification system. In particular, we extract multidimensional local binary pattern (MD-LBP) texture features to characterise the cell area. These then form the input for a classification stage, for which we employ a margin distribution based bagging pruning (MAD-Bagging) classifier ensemble. We evaluate our algorithm on the ICPR 2012 HEp-2 contest benchmark dataset, and demonstrate it to give excellent performance, superior to all algorithms that were entered in the competition