201,453 research outputs found

    Model Selection for Support Vector Machine Classification

    Get PDF
    We address the problem of model selection for Support Vector Machine (SVM) classification. For fixed functional form of the kernel, model selection amounts to tuning kernel parameters and the slack penalty coefficient CC. We begin by reviewing a recently developed probabilistic framework for SVM classification. An extension to the case of SVMs with quadratic slack penalties is given and a simple approximation for the evidence is derived, which can be used as a criterion for model selection. We also derive the exact gradients of the evidence in terms of posterior averages and describe how they can be estimated numerically using Hybrid Monte Carlo techniques. Though computationally demanding, the resulting gradient ascent algorithm is a useful baseline tool for probabilistic SVM model selection, since it can locate maxima of the exact (unapproximated) evidence. We then perform extensive experiments on several benchmark data sets. The aim of these experiments is to compare the performance of probabilistic model selection criteria with alternatives based on estimates of the test error, namely the so-called ``span estimate'' and Wahba's Generalized Approximate Cross-Validation (GACV) error. We find that all the ``simple'' model criteria (Laplace evidence approximations, and the Span and GACV error estimates) exhibit multiple local optima with respect to the hyperparameters. While some of these give performance that is competitive with results from other approaches in the literature, a significant fraction lead to rather higher test errors. The results for the evidence gradient ascent method show that also the exact evidence exhibits local optima, but these give test errors which are much less variable and also consistently lower than for the simpler model selection criteria

    Mixed Integer Linear Programming for Feature Selection in Support Vector Machine

    Get PDF
    This work focuses on support vector machine (SVM) with feature selection. A MILP formulation is proposed for the problem. The choice of suitable features to construct the separating hyperplanes has been modelled in this formulation by including a budget constraint that sets in advance a limit on the number of features to be used in the classification process. We propose both an exact and a heuristic procedure to solve this formulation in an efficient way. Finally, the validation of the model is done by checking it with some well-known data sets and comparing it with classical classification methods.Comment: 37 pages, 20 figure

    METODE SUPPORT VECTOR MACHINE UNTUK MELAKUKAN ANALISIS SENTIMEN PADA MARKETPLACE DENGAN PERBANDINGAN CIRI PADA LEVEL ASPEK

    Get PDF
    Analisis sentimen merupakan bidang interdisipliner antara pengolahan bahasa alami, kecerdasan buatan dan text mining. Kunci utama dari analisis sentimen adalah klasifikasi polaritas yang menentukan apakah sentimen tersebut bersifat positif atau negatif. Pada penelitian ini menggunakan metode klasifikasi support vector machine dengan jumlah data ulasan konsumen berjumlah 648 data. Data tersebut didapatkan dari ulasan konsumen dari marketplace dengan produk yang dijual adalah handpone. Hasil penelitian ini mendapatkan 3 aspek yang mengindikasikan sentimen analisis pada marketplace yaitu aspek pelayanan , pengiriman dan produk. Kamus slang yang digunakan untuk proses normalisasi berjumlah 552 kata slang. Penelitian ini membandingkan analisis ciri untuk mendapatkan hasil klasifikasi terbaik, karena akurasi klasifikasi dipengaruhi oleh proses analisis ciri. Hasil nilai perbandingan dari analisis ciri antara n-gram dan TF-IDF dengan menggunakan metode Support Vector Machine didapatkan bahwa unigram mempunyai nilai akurasi tertinggi, dengan nilai akurasi sebesar 80,87 %. Hasil penelitian ini menjelaskan bahwa pada kasus sentimen analisis pada level aspek dengan perbandingan ciri dengan model klasifikasi Support Vector Machine didapatkan bahwa model analisis ciri unigram dan klasifikasi Support Vector Machine adalah model terbaik. Kata-kunci : analisis sentimen, e-commerce, marketplace, ekstraksi ciri, TF-IDF, n-gram, support vector machine Sentiment analysis is an interdisciplinary field between natural language processing, artificial intelligence and text mining. The main key of the sentiment analysis is the polarity that is meant by the sentiment is positive or negative. In this study using the method of classification support vector machine with the amount of data consumer reviews amounted to 648 data. The data obtained from consumer reviews from the marketplace with products sold is handpone. The results of this study get 3 aspects that indicate sentiment analysis on the marketplace aspects of service, delivery and products. The slang dictionary used for the normalization process is 552 words slang. This study compares the characteristic analysis to obtain the best classification result, because classification accuration is influenced by characteristic analysis process. The result of comparison value from characteristic analysis between n-gram and TF-IDF by using Support Vector Machine method found that unigram has the highest akurasi value, with akurasi value 80,87%. The results of this study explain that in the case of analysis sentiment at the aspect level with the comparison of characteristics with the classification model of support vector machine found that the analysis model of unigram character and classification of support vector machine is the best model. Keywords : sentiment analysis, e-commerce, marketplace, features selection, TF-IDF, n-gram, support vector machin

    Gene Expression-Based Glioma Classification Using Hierarchical Bayesian Vector Machines

    Get PDF
    This paper considers several Bayesian classification methods for the analysis of the glioma cancer with microarray data based on reproducing kernel Hilbert space under the multiclass setup. We consider the multinomial logit likelihood as well as the likelihood related to the multiclass Support Vector Machine (SVM) model. It is shown that our proposed Bayesian classification models with multiple shrinkage parameters can produce more accurate classification scheme for the glioma cancer compared to several existing classical methods. We have also proposed a Bayesian variable selection scheme for selecting the differentially expressed genes integrated with our model. This integrated approach improves classifier design by yielding simultaneous gene selection

    Incremental continuous ant colony optimization technique for support vector machine model selection problem

    Get PDF
    Ant Colony Optimization has been used to solve Support Vector Machine model selection problem.Ant Colony Optimization originally deals with discrete optimization problem. In applying Ant Colony Optimization for optimizing Support Vector Machine parameters which are continuous variables, there is a need to discretize the continuously value into discrete value.This discretize process would result in loss of some information and hence affect the classification accuracy and seeking time. This study proposes an algorithm that can optimize Support Vector Machine parameters using Incremental Continuous Ant Colony Optimization without the need to discretize continuous value for support vector machine parameters.Seven datasets from UCI were used to evaluate the credibility of the proposed hybrid algorithmin terms of classification accuracy.Promising results were obtained when compared to grid search technique

    Modeling Suspicious Email Detection using Enhanced Feature Selection

    Full text link
    The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algorithms achieved good accuracy for the desired task. However, the results achieved by those algorithms can be further improved by using appropriate feature selection mechanisms. We have identified the use of a specific feature selection scheme that improves the performance of the existing algorithms

    Feature selection for sky image classification based on self adaptive ant colony system algorithm

    Get PDF
    Statistical-based feature extraction has been typically used to purpose obtaining the important features from the sky image for cloud classification. These features come up with many kinds of noise, redundant and irrelevant features which can influence the classification accuracy and be time consuming. Thus, this paper proposed a new feature selection algorithm to distinguish significant features from the extracted features using an ant colony system (ACS). The informative features are extracted from the sky images using a Gaussian smoothness standard deviation, and then represented in a directed graph. In feature selection phase, the self-adaptive ACS (SAACS) algorithm has been improved by enhancing the exploration mechanism to select only the significant features. Support vector machine, kernel support vector machine, multilayer perceptron, random forest, k-nearest neighbor, and decision tree were used to evaluate the algorithms. Four datasets are used to test the proposed model: Kiel, Singapore whole-sky imaging categories, MGC Diagnostics Corporation, and greatest common divisor. The SAACS algorithm is compared with six bio-inspired benchmark feature selection algorithms. The SAACS algorithm achieved classification accuracy of 95.64% that is superior to all the benchmark feature selection algorithms. Additionally, the Friedman test and Mann-Whitney U test are employed to statistically evaluate the efficiency of the proposed algorithms

    Solving Support Vector Machine Model Selection Problem Using Continuous Ant Colony Optimization

    Get PDF
    Ant Colony Optimization has been used to solve Support Vector Machine model selection problem.Ant Colony Optimization originally deals with discrete optimization problem.In applying Ant Colony Optimization for optimizing Support Vector Machine parameters which are continuous variables, there is a need to discretize the continuously value into discrete value.This discretize process would result in loss of some information and hence affect the classification accuracy and seeking time.This study proposes an algorithm that can optimize Support Vector Machine parameters using Continuous Ant Colony Optimization without the need to discretize continuous value for Support Vector Machine parameters.Eight datasets from UCI were used to evaluate the credibility of the proposed hybrid algorithm in terms of classification accuracy and size of features subset.Promising results were obtained when compared to grid search technique, GA with feature chromosome-SVM, PSO-SVM, and GA-SVM
    • …
    corecore