8 research outputs found

    Automated Classification System for HEp-2 Cell Patterns

    Get PDF
    Human Epithelial Type-2 (HEp-2) cells are essential in diagnosing autoimmune diseases. Indirect immunofluorescence (IIF) imaging is a fundamental technique for detecting antinuclear antibodies in HEp-2 cells. The four main patterns of HEp-2 cells that are being identified are nucleolar, homogeneous, speckled and centromere. The most commonly used method to classify the patterns is manual evaluation. This method is prone to human error. This paper will propose an automated method of classifying HEp-2 cells patterns. The first stage is image enhancement using Histogram equalization contrast adjustment and Wiener Filter. The second stage uses Sobel Filter and Mean Filter for segmentation. The third stage feature extraction based on shape properties data extraction. The last stage uses classification based on different properties data abstracted. The results obtained are more than 90% for nucleolar and centromere and about 70% for homogenous and speckled. For future work, another feature extraction method need to be introduced to increase the accuracy of the classification result. The method suggested is to analyze and obtain the data based on the texture of the image

    Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance

    Get PDF
    Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy

    Splitting Rule Dan Penerapan Bagging Pada Pohon Klasifikasi. Studi Kasus: Pekerja Anak di Provinsi Sulawesi Tengah

    Get PDF
    Pengklasifikasian merupakan metode statistika yang digunakan untuk mengelompokkan suatu pengamatan. Dalam statistika terdapat beberapa metode klasifikasi, salah satunya adalah pohon klasifikasi yang merupakan metode klasifikasi yang menghasilkan model pohon. Pembentukan pohon klasifikasi ditentukan oleh proses splitting rule yang didasarkan pada ukuran keheterogenan. Ukuran yang digunakan dalam tesis ini adalah kriteria indeks Gini dan indeks Twoing. Pohon klasifikasi memiliki beberapa keunggulan terkait model dan hasil klasifikasinya, tetapi juga memiliki kekurangan pada stabilitas model dan keakuratan prediksi. Guna mengatasi kelemahan tersebut, bagging (bootstrap aggregating) diterapkan pada pohon klasifikasi untuk meningkatkan stabilitas dan keakuratan prediksi. Metode ini diaplikasikan pada data pekerja anak di provinsi Sulawesi Tengah. Indeks Gini dan indeks Twoing memiliki konsep dan bentuk yang berbeda namun memiliki keuntungan masing-masing dalam penggunaannya. Pada kasus pekerja anak di Sulawesi Tengah, kedua indeks ini menghasilkan pohon klasifikasi optimal yang identik dengan enam variabel pembentuk, yaitu partisipasi sekolah anak, umur anak, jenis kelamin anak, umur kepala rumah tangga, jumlah anggota rumah tangga, pendapatan perkapita dan tingkat pendidikan kepala rumah tangga. Penerapan teknik bagging pada pohon klasifikasi dalam kasus ini mampu meningkatkan ketepatan klasifikasinya. ========== Classification is statistical method that used to classify an observation. In statistics, there are several classification methods, one of the methods is classification tree that resulted classification trees model. Classification tree’s formation determined by splitting rule process based on impurity measure. In this thesis, the used measure are Gini index and Twoing index criteria. Classification tree has some advantages related with the model and the classification result, although it has a weakness on stability model that resulted and classification accuracy. To solve this weakness, bagging (bootstrap aggregating) technique was applied on classification tree method to increase the stability and classification accuracy. This method applied on child labor’s case in Central Sulawesi province. Gini index and Twoing index have different concept and form, however both index have each advantage on its use. On child labor’s case in Central Sulawesi, both index resulted identic optimal classification tree with six formed variables, child’s school participation, child’s age, child’s sex, number of household member, income per capita and head of household’s education level. Bagging technique on classification tree resulted higher classification accuracy

    Ensemble Pruning Using Spectral Coefficients

    Full text link

    Class imbalance ensemble learning based on the margin theory

    Get PDF
    The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning

    Class imbalance ensemble learning based on the margin theory.

    Get PDF
    The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning

    Analysis of HEp-2 images using MD-LBP and MAD-bagging

    No full text
    Indirect immunofluorescence imaging is employed to identify antinuclear antibodies in HEp-2 cells which founds the basis for diagnosing autoimmune diseases and other important pathological conditions involving the immune system. Six categories of HEp-2 cells are generally considered, namely homogeneous, fine speckled, coarse speckled, nucleolar, cyto-plasmic, and centromere cells. Typically, this categorisation is performed manually by an expert and is hence both time consuming and subjective. In this paper, we present a method for automatically classifiying HEp-2 cells using texture information in conjunction with a suitable classification system. In particular, we extract multidimensional local binary pattern (MD-LBP) texture features to characterise the cell area. These then form the input for a classification stage, for which we employ a margin distribution based bagging pruning (MAD-Bagging) classifier ensemble. We evaluate our algorithm on the ICPR 2012 HEp-2 contest benchmark dataset, and demonstrate it to give excellent performance, superior to all algorithms that were entered in the competition
    corecore