124,988 research outputs found

    e-Distance Weighted Support Vector Regression

    Full text link
    We propose a novel support vector regression approach called e-Distance Weighted Support Vector Regression (e-DWSVR).e-DWSVR specifically addresses two challenging issues in support vector regression: first, the process of noisy data; second, how to deal with the situation when the distribution of boundary data is different from that of the overall data. The proposed e-DWSVR optimizes the minimum margin and the mean of functional margin simultaneously to tackle these two issues. In addition, we use both dual coordinate descent (CD) and averaged stochastic gradient descent (ASGD) strategies to make e-DWSVR scalable to large scale problems. We report promising results obtained by e-DWSVR in comparison with existing methods on several benchmark datasets

    Reliability and validity in comparative studies of software prediction models

    Get PDF
    Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models

    A New Robust Regression Method Based on Minimization of Geodesic Distances on a Probabilistic Manifold: Application to Power Laws

    Get PDF
    In regression analysis for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. In many situations, the assumptions underlying OLS are not fulfilled, and several other approaches have been proposed. However, most techniques address only part of the shortcomings of OLS. We here discuss a new and more general regression method, which we call geodesic least squares regression (GLS). The method is based on minimization of the Rao geodesic distance on a probabilistic manifold. For the case of a power law, we demonstrate the robustness of the method on synthetic data in the presence of significant uncertainty on both the data and the regression model. We then show good performance of the method in an application to a scaling law in magnetic confinement fusion.Comment: Published in Entropy. This is an extended version of our paper at the 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014), 21-26 September 2014, Amboise, Franc

    Fitting Prediction Rule Ensembles with R Package pre

    Get PDF
    Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction
    corecore