28 research outputs found
Implementasi Metode Support Vector Machine Untuk Indentifikasi Penyakit Daun Tanaman Kubis
Tanaman kubis merupakan salah satu sayuran yang banyak dikonsumsi masyarakat, dalam produksi bibit tanaman kubis sering mengalami hambatan karena serangan hama. Salah satu komponen dalam keberhasilan produksi kubis adalah masa perkembangan bibit, yang dikhawatirkan banyak mendapat serangan hama. Dalam penelitian ini pengolahan citra digital digunakan untuk mengidentifikasi hama/penyakit terhadap bibit tanaman kubis. Penelitian ini dimulai dengan pengumpulan citra daun tanaman kubis. Tahapan selanjutnya adalah pre-processing citra dengan menghilangkan background dari citra masukan kemudian dilakukan proses grayscale untuk mendapatkan nilai yang akan digunakan untuk proses selanjutnya. Hasil tersebut kemudian akan dihitung dengan menggunakan metode Support Vector Machine (SVM). Proses training dilakukan dengan Sequential Training yang kemudian dilakukan proses testing. Hasil dari klasifikasi dipengaruhi oleh proses segmentasi yang dilakukan serta input parameter yang digunakan saat proses training. Dari hasil pengujian menunjukkan rata-rata akurasi hasil klasifikasi mencapai 80.55%
Efficient Optimization of Performance Measures by Classifier Adaptation
In practical applications, machine learning algorithms are often needed to
learn classifiers that optimize domain specific performance measures.
Previously, the research has focused on learning the needed classifier in
isolation, yet learning nonlinear classifier for nonlinear and nonsmooth
performance measures is still hard. In this paper, rather than learning the
needed classifier by optimizing specific performance measure directly, we
circumvent this problem by proposing a novel two-step approach called as CAPO,
namely to first train nonlinear auxiliary classifiers with existing learning
methods, and then to adapt auxiliary classifiers for specific performance
measures. In the first step, auxiliary classifiers can be obtained efficiently
by taking off-the-shelf learning algorithms. For the second step, we show that
the classifier adaptation problem can be reduced to a quadratic program
problem, which is similar to linear SVMperf and can be efficiently solved. By
exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear
classifier which optimizes a large variety of performance measures including
all the performance measure based on the contingency table and AUC, whilst
keeping high computational efficiency. Empirical studies show that CAPO is
effective and of high computational efficiency, and even it is more efficient
than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 201
Surrogate regret bounds for generalized classification performance metrics
We consider optimization of generalized performance metrics for binary
classification by means of surrogate losses. We focus on a class of metrics,
which are linear-fractional functions of the false positive and false negative
rates (examples of which include -measure, Jaccard similarity
coefficient, AM measure, and many others). Our analysis concerns the following
two-step procedure. First, a real-valued function is learned by minimizing
a surrogate loss for binary classification on the training sample. It is
assumed that the surrogate loss is a strongly proper composite loss function
(examples of which include logistic loss, squared-error loss, exponential loss,
etc.). Then, given , a threshold is tuned on a separate
validation sample, by direct optimization of the target performance metric. We
show that the regret of the resulting classifier (obtained from thresholding
on ) measured with respect to the target metric is
upperbounded by the regret of measured with respect to the surrogate loss.
We also extend our results to cover multilabel classification and provide
regret bounds for micro- and macro-averaging measures. Our findings are further
analyzed in a computational study on both synthetic and real data sets.Comment: 22 page
A Feature Selection Method for Multivariate Performance Measures
Feature selection with specific multivariate performance measures is the key
to the success of many applications, such as image retrieval and text
classification. The existing feature selection methods are usually designed for
classification error. In this paper, we propose a generalized sparse
regularizer. Based on the proposed regularizer, we present a unified feature
selection framework for general loss functions. In particular, we study the
novel feature selection paradigm by optimizing multivariate performance
measures. The resultant formulation is a challenging problem for
high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed
to solve this problem, and the convergence is presented. In addition, we adapt
the proposed method to optimize multivariate measures for multiple instance
learning problems. The analyses by comparing with the state-of-the-art feature
selection methods show that the proposed method is superior to others.
Extensive experiments on large-scale and high-dimensional real world datasets
show that the proposed method outperforms -SVM and SVM-RFE when choosing a
small subset of features, and achieves significantly improved performances over
SVM in terms of -score
Regularized F-Measure Maximization for Feature Selection and Classification
Receiver Operating Characteristic (ROC) analysis is a common tool for
assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems
misclassification costs are not known, and thus, ROC curve and related utility
functions such as F-measure can be more meaningful performance measures.
F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization.
The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with L1 penalty. This method is useful especially when data set is highly unbalanced, or the
labels for negative (positive) samples are missing. Our experiments with the
benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments