12,892 research outputs found

    Mathematical programming for piecewise linear regression analysis

    Get PDF
    In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base

    Mathematical programming for piecewise linear regression analysis

    Get PDF

    Piecewise Regression through the Akaike Information Criterion using Mathematical Programming

    Get PDF
    In machine learning, regression analysis is a tool for predicting the output variables from a set of known independent variables. Through regression analysis, a function that captures the relationship between the variables is fitted to the data. Many methods from literature tackle this problem with various degrees of difficulty. Some simple methods include linear regression and least squares, while some are more complicated such as support vector regression. Piecewise or segmented regression is a method of analysis that partitions the independent variables into intervals and a function is fitted to each interval. In this work, the Optimal Piecewise Linear Regression Analysis (OPLRA) model is used from literature to tackle the problem of segmented analysis. This model is a mathematical programming approach that is formulated as a mixed integer linear programming problem that optimally partitions the data into multiple regions and calculates the regression coefficients, while minimising the Mean Absolute Error of the fitting. However, the number of regions is a known priori. For this work, an extension of the model is proposed that can optimally decide on the number of regions using information criteria. Specifically, the Akaike Information Criterion is used and the objective is to minimise its value. By using the criterion, the model no longer needs a heuristic approach to decide on the number of regions and it also deals with the problem of overfitting and model complexity

    Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

    Get PDF
    DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore