264,827 research outputs found
Binarized support vector machines
The widely used Support Vector Machine (SVM) method has shown to yield very good results in
Supervised Classification problems. Other methods such as Classification Trees have become
more popular among practitioners than SVM thanks to their interpretability, which is an important
issue in Data Mining.
In this work, we propose an SVM-based method that automatically detects the most important
predictor variables, and the role they play in the classifier. In particular, the proposed method is
able to detect those values and intervals which are critical for the classification. The method
involves the optimization of a Linear Programming problem, with a large number of decision
variables. The numerical experience reported shows that a rather direct use of the standard
Column-Generation strategy leads to a classification method which, in terms of classification
ability, is competitive against the standard linear SVM and Classification Trees. Moreover, the
proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of
scale or measurement units of the predictor variables.
When the complexity of the classifier is an important issue, a wrapper feature selection method is
applied, yielding simpler, still competitive, classifiers
Binarized support vector machines
The widely used Support Vector Machine (SVM) method has shown to yield very good results in Supervised Classification problems. Other methods such as Classification Trees have become more popular among practitioners than SVM thanks to their interpretability, which is an important issue in Data Mining. In this work, we propose an SVM-based method that automatically detects the most important predictor variables, and the role they play in the classifier. In particular, the proposed method is able to detect those values and intervals which are critical for the classification. The method involves the optimization of a Linear Programming problem, with a large number of decision variables. The numerical experience reported shows that a rather direct use of the standard Column-Generation strategy leads to a classification method which, in terms of classification ability, is competitive against the standard linear SVM and Classification Trees. Moreover, the proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of scale or measurement units of the predictor variables. When the complexity of the classifier is an important issue, a wrapper feature selection method is applied, yielding simpler, still competitive, classifiers.Supervised classification, Binarization, Column generation, Support vector machines
Support Vector Machines Yang Didukung K-means Clustering Dalam Klasifikasi Dokumen
Dokumen dengan jumlah data yang besar dan bervariasi seringkali mempersulit proses klasifikasi. Hal ini dapat diperbaiki dengan mengatasi variasi data untuk menghasilkan akurasi yang lebih baik. Penelitian ini mengusulkan sebuah metode baru untuk kategorisasi dokumen teks berbahasa Inggris dengan terlebih dahulu melakukan pengelompokan menggunakan K-Means Clustering kemudian dokumen diklasifikasikan menggunakan multi-class Support Vector Machines (SVM). Dengan adanya pengelompokan tersebut, variasi data dalam membentuk model klasifikasi akan lebih seragam. Hasil uji coba terhadap judul artikel jurnal ilmiah menunjukkan bahwa metode yang diusulkan mampu meningkatkan akurasi dengan menghasilkan akurasi sebesar 88,1%, presisi sebesar 96,7% dan recall sebesar 94,4% dengan parameter jumlah kelompok sebesar 5
Probabilistic Kernel Support Vector Machines
We propose a probabilistic enhancement of standard kernel Support Vector
Machines for binary classification, in order to address the case when, along
with given data sets, a description of uncertainty (e.g., error bounds) may be
available on each datum. In the present paper, we specifically consider
Gaussian distributions to model uncertainty. Thereby, our data consist of pairs
, , along with an indicator
to declare membership in one of two categories for each pair.
These pairs may be viewed to represent the mean and covariance, respectively,
of random vectors taking values in a suitable linear space (typically
). Thus, our setting may also be viewed as a modification of
Support Vector Machines to classify distributions, albeit, at present, only
Gaussian ones. We outline the formalism that allows computing suitable
classifiers via a natural modification of the standard "kernel trick." The main
contribution of this work is to point out a suitable kernel function for
applying Support Vector techniques to the setting of uncertain data for which a
detailed uncertainty description is also available (herein, "Gaussian points").Comment: 6 pages, 6 figure
- …