67,425 research outputs found

    On robustness properties of convex risk minimization methods for pattern recognition

    Get PDF
    The paper brings together methods from two disciplines: machine learning theory and robust statistics. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. A sensitivity analysis of the support vector machine is given. --AdaBoost loss function,influence function,kernel logistic regression,robustness,sensitivity curve,statistical learning,support vector machine,total variation

    Learning from data with uncertainty: Robust multiclass kernel-based classifiers and regressors.

    Get PDF
    Motivated by the presence of uncertainty in real data, in this research we investigate a robust optimization approach applied to multiclass support vector machines (SVMs) and support vector regression. Two new kernel based-methods are developed to address data with uncertainty where each data point is inside a sphere of uncertainty. For classification problems, the models are called robust SVM (R-SVM) and robust feasibility approach (R-FA) respectively as extensions of SVM approach. The two models are compared in terms of robustness and generalization error. For comparison purposes, the robust minimax probability machine (MPM) is applied and compared with the above methods. From the empirical results, we conclude that the R-SVM performs better than robust MPM. For regression problems, the models are called robust support vector regression (R-SVR) and robust feasibility approach for regression (R-FAR.). The proposed robust methods can improve the mean square error (MSE) in regression problems

    Nonparametric Regression via StatLSSVM

    Get PDF
    We present a new MATLAB toolbox under Windows and Linux for nonparametric regression estimation based on the statistical library for least squares support vector machines (StatLSSVM). The StatLSSVM toolbox is written so that only a few lines of code are necessary in order to perform standard nonparametric regression, regression with correlated errors and robust regression. In addition, construction of additive models and pointwise or uniform confidence intervals are also supported. A number of tuning criteria such as classical cross-validation, robust cross-validation and cross-validation for correlated errors are available. Also, minimization of the previous criteria is available without any user interaction

    Supervised Machine Learning for Signals Having RRC Shaped Pulses

    Full text link
    Classification performances of the supervised machine learning techniques such as support vector machines, neural networks and logistic regression are compared for modulation recognition purposes. The simple and robust features are used to distinguish continuous-phase FSK from QAM-PSK signals. Signals having root-raised-cosine shaped pulses are simulated in extreme noisy conditions having joint impurities of block fading, lack of symbol and sampling synchronization, carrier offset, and additive white Gaussian noise. The features are based on sample mean and sample variance of the imaginary part of the product of two consecutive complex signal values.Comment: 5 page

    Using neural networks and support vector machines for default prediction in South Africa

    Get PDF
    A thesis submitted to the Faculty of Computer Science and Applied Mathematics, University of Witwatersrand, in fulfillment of the requirements for the Master of Science (MSc) Johannesburg Feb 2017This is a thesis on credit risk and in particular bankruptcy prediction. It investigates the application of machine learning techniques such as support vector machines and neural networks for this purpose. This is not a thesis on support vector machines and neural networks, it simply looks at using these functions as tools to preform the analysis. Neural networks are a type of machine learning algorithm. They are nonlinear mod- els inspired from biological network of neurons found in the human central nervous system. They involve a cascade of simple nonlinear computations that when aggre- gated can implement robust and complex nonlinear functions. Neural networks can approximate most nonlinear functions, making them a quite powerful class of models. Support vector machines (SVM) are the most recent development from the machine learning community. In machine learning, support vector machines (SVMs) are su- pervised learning algorithms that analyze data and recognize patterns, used for clas- si cation and regression analysis. SVM takes a set of input data and predicts, for each given input, which of two possible classes comprises the input, making the SVM a non-probabilistic binary linear classi er. A support vector machine constructs a hyperplane or set of hyperplanes in a high or in nite dimensional space, which can be used for classi cation into the two di erent data classes. Traditional bankruptcy prediction medelling has been criticised as it makes certain underlying assumptions on the underlying data. For instance, a frequent requirement for multivarate analysis is a joint normal distribution and independence of variables. Support vector machines (and neural networks) are a useful tool for default analysis because they make far fewer assumptions on the underlying data. In this framework support vector machines are used as a classi er to discriminate defaulting and non defaulting companies in a South African context. The input data required is a set of nancial ratios constructed from the company's historic nancial statements. The data is then Divided into the two groups: a company that has defaulted and a company that is healthy (non default). The nal data sample used for this thesis consists of 23 nancial ratios from 67 companies listed on the jse. Furthermore for each company the company's probability of default is predicted. The results are benchmarked against more classical methods that are commonly used for bankruptcy prediction such as linear discriminate analysis and logistic regression. Then the results of the support vector machines, neural networks, linear discriminate analysis and logistic regression are assessed via their receiver operator curves and pro tability ratios to gure out which model is more successful at predicting default.MT 201

    Robust Learning from Bites for Data Mining

    Get PDF
    Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets. --Breakdown point,convex risk minimization,data mining,distributed computing,influence function,logistic regression,robustness,scalability

    Comparing Machine Learning and Logistic Regression Methods for Predicting Hypertension Using a Combination of Gene Expression and Next-Generation Sequencing Data

    Get PDF
    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data

    Predictability, Stability, and Computability of Locally Learnt SVMs

    Get PDF
    We will have a look at the principles predictability, stability, and computability in the field of support vector machines. Support vector machines (SVMs), well-known in machine learning, play a successful role in classification and regression in many areas of science. In the past three decades, much research has been conducted on the statistical and computational properties of support vector machines and related kernel methods. On the one hand, consistency (predictability) and robustness (stability) of the method are of interest. On the other hand, from an applied point of view, there is interest in a method that can deal with many observations and many features (computability). Since SVMs require a lot of computing power and storage capacity, various possibilities for processing large data sets have been proposed. One of them is called regionalization. It divides the space of declaring variables into possibly overlapping domains in a data driven way and defines the function to predict the output by the formation of locally learnt support vector machines. Another advantage of regionalization should be mentioned. If the generating distribution in different regions of the input space has different characteristics, learning only one “global” SVM may lead to an imprecise estimate. Locally trained predictors can overcome this problem. It is possible to show that a locally learnt predictor is consistent and robust under assumptions that can be checked by the user of this method
    corecore