182,909 research outputs found

    Robust Support Vector Machines For Implicit Outlier Removal

    Get PDF
    The support vector machine is a machine learning algorithm which has been successfully applied to solve classification problems since its introduction in the early 1990s. It is based on the work of Vladimir Vapnik on Statistical Learning Theory and is theoretically well founded. Following the discriminative approach, the SVM yields a classifier which separates two classes by a hyperplane. The training instances are classified according to the sign of their distance to the hyperplane. This hyperplane is defined by a small number of training instances such that the distance of the training instances of both classes to the hyperplane is maximized and the misclassification error is minimized. Hence the support vector machine belongs to the family of maximum margin classifiers. Since the support vector machine does not estimate the underlying class conditional distribution of the training instances, but instead uses them directly to construct the classifier, it is important that the training instances are sampled from the underlying class conditional distribution. If this is not the case because the training set is contaminated with outliers, the accuracy of the classifier defined by the support vector machine decreases. Based on this observation several approaches have been proposed to improve the robustness of the support vector machine against outliers in the training data. In this thesis we will discuss the class robust support vector machines which aim to make the standard support vector machine robust against noise by implicit outlier filtering. Those approaches are using the support vector machine to detect and remove outliers based on their position relative to the separating hyperplane. Since the success of those methods is only empirically proven, we conduct a thoroughly experimental study in order to determine under which conditions those robust methods can be applied in practice. We are especially interested if the additional parameter which controls the removal of outliers can be estimated from a training set which is contamined by outliers

    Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository

    Full text link
    Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that portrays an input to an output hinged on training input-output pairs [3]. Most efficient and widely used supervised learning algorithms are K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Large Margin Nearest Neighbor (LMNN), and Extended Nearest Neighbor (ENN). The main contribution of this paper is to implement these elegant learning algorithms on eleven different datasets from the UCI machine learning repository to observe the variation of accuracies for each of the algorithms on all datasets. Analyzing the accuracy of the algorithms will give us a brief idea about the relationship of the machine learning algorithms and the data dimensionality. All the algorithms are developed in Matlab. Upon such accuracy observation, the comparison can be built among KNN, SVM, LMNN, and ENN regarding their performances on each dataset.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

    Fuzzy Least Squares Twin Support Vector Machines

    Full text link
    Least Squares Twin Support Vector Machine (LST-SVM) has been shown to be an efficient and fast algorithm for binary classification. It combines the operating principles of Least Squares SVM (LS-SVM) and Twin SVM (T-SVM); it constructs two non-parallel hyperplanes (as in T-SVM) by solving two systems of linear equations (as in LS-SVM). Despite its efficiency, LST-SVM is still unable to cope with two features of real-world problems. First, in many real-world applications, labels of samples are not deterministic; they come naturally with their associated membership degrees. Second, samples in real-world applications may not be equally important and their importance degrees affect the classification. In this paper, we propose Fuzzy LST-SVM (FLST-SVM) to deal with these two characteristics of real-world data. Two models are introduced for FLST-SVM: the first model builds up crisp hyperplanes using training samples and their corresponding membership degrees. The second model, on the other hand, constructs fuzzy hyperplanes using training samples and their membership degrees. Numerical evaluation of the proposed method with synthetic and real datasets demonstrate significant improvement in the classification accuracy of FLST-SVM when compared to well-known existing versions of SVM

    Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

    Get PDF
    The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

    Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description

    Full text link
    Support Vector Data Description (SVDD) is a machine-learning technique used for single class classification and outlier detection. SVDD formulation with kernel function provides a flexible boundary around data. The value of kernel function parameters affects the nature of the data boundary. For example, it is observed that with a Gaussian kernel, as the value of kernel bandwidth is lowered, the data boundary changes from spherical to wiggly. The spherical data boundary leads to underfitting, and an extremely wiggly data boundary leads to overfitting. In this paper, we propose empirical criterion to obtain good values of the Gaussian kernel bandwidth parameter. This criterion provides a smooth boundary that captures the essential geometric features of the data

    Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

    Get PDF
    The recent development of more sophisticated spectroscopic methods allows acqui- sition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches
    • …