16 research outputs found

    Tree regression models using statistical testing and mixed integer programming

    Get PDF
    Regression analysis is a statistical procedure that fits a mathematical function to a set of data in order to capture the relationship between dependent and independent variables. In tree regression, tree structures are constructed by repeated splits of the input space into two subsets, creating if-then-else rules. Such models are popular in the literature due to their ability to be computed quickly and their simple interpretations. This work introduces a tree regression algorithm that exploits an optimisation model of an existing literature method called Mathematical Programming Tree (MPtree) to optimally split nodes into subsets and applies a statistical test to assess the quality of the partitioning. Additionally, an approach of splitting nodes using multivariate decision rules is explored in this work and compared in terms of performance and computational efficiency. Finally, a novel mathematical model is introduced that performs subset selection on each node in order to select an optimal set of variables to considered for splitting, that improves the computational performance of the proposed algorithm

    Piecewise Regression through the Akaike Information Criterion using Mathematical Programming

    Get PDF
    In machine learning, regression analysis is a tool for predicting the output variables from a set of known independent variables. Through regression analysis, a function that captures the relationship between the variables is fitted to the data. Many methods from literature tackle this problem with various degrees of difficulty. Some simple methods include linear regression and least squares, while some are more complicated such as support vector regression. Piecewise or segmented regression is a method of analysis that partitions the independent variables into intervals and a function is fitted to each interval. In this work, the Optimal Piecewise Linear Regression Analysis (OPLRA) model is used from literature to tackle the problem of segmented analysis. This model is a mathematical programming approach that is formulated as a mixed integer linear programming problem that optimally partitions the data into multiple regions and calculates the regression coefficients, while minimising the Mean Absolute Error of the fitting. However, the number of regions is a known priori. For this work, an extension of the model is proposed that can optimally decide on the number of regions using information criteria. Specifically, the Akaike Information Criterion is used and the objective is to minimise its value. By using the criterion, the model no longer needs a heuristic approach to decide on the number of regions and it also deals with the problem of overfitting and model complexity

    Winterberg's conjectured breaking of the superluminal quantum correlations over large distances

    Get PDF
    We elaborate further on a hypothesis by Winterberg that turbulent fluctuations of the zero point field may lead to a breakdown of the superluminal quantum correlations over very large distances. A phenomenological model that was proposed by Winterberg to estimate the transition scale of the conjectured breakdown, does not lead to a distance that is large enough to be agreeable with recent experiments. We consider, but rule out, the possibility of a steeper slope in the energy spectrum of the turbulent fluctuations, due to compressibility, as a possible mechanism that may lead to an increased lower-bound for the transition scale. Instead, we argue that Winterberg overestimated the intensity of the ZPF turbulent fluctuations. We calculate a very generous corrected lower bound for the transition distance which is consistent with current experiments.Comment: 7 pages, submitted to Int. J. Theor. Phy

    Recent Developments in Understanding Two-dimensional Turbulence and the Nastrom-Gage Spectrum

    Get PDF
    Two-dimensional turbulence appears to be a more formidable problem than three-dimensional turbulence despite the numerical advantage of working with one less dimension. In the present paper we review recent numerical investigations of the phenomenology of two-dimensional turbulence as well as recent theoretical breakthroughs by various leading researchers. We also review efforts to reconcile the observed energy spectrum of the atmosphere (the spectrum) with the predictions of two-dimensional turbulence and quasi-geostrophic turbulence.Comment: Invited review; accepted by J. Low Temp. Phys.; Proceedings for Warwick Turbulence Symposium Workshop on Universal features in turbulence: from quantum to cosmological scales, 200

    Piecewise regression analysis through information criteria using mathematical programming

    No full text
    Regression is a predictive analysis tool that examines the relationship between independent and dependent variables. The goal of this analysis is to fit a mathematical function that describes how the value of the response changes when the values of the predictors vary. The simplest form of regression is linear regression which in the case multiple regression, tries to explain the data by simply fitting a hyperplane minimising the absolute error of the fitting. Piecewise regression analysis partitions the data into multiple regions and a regression function is fitted to each one. Such an approach is the OPLRA (Optimal Piecewise Linear Regression Analysis) model (Yang, Liu, Tsoka, & Papage, 2016) which is a mathematical programming approach that optimally partitions the data into multiple regions and fits a linear regression functions minimising the Mean Absolute Error between prediction and truth. However, using many regions to describe the data can lead to overfitting and bad results. In this work an extension of the OPLRA model is proposed that deals with the problem of selecting the optimal number of regions as well as overfitting. To achieve this result, information criteria such as the Akaike and the Bayesian are used that reward predictive accuracy and penalise model complexity

    A graph theory approach for scenario aggregation for stochastic optimisation

    No full text
    The development of fast, robust and reliable computational tools capable of addressing process management under uncertain conditions is an active topic in the current literature, and more precisely for the process systems engineering one. Particularly, scenario reduction strategies have emerged as an alternative to overcome the traditional issues associated with large-scale scenario-based problems. This work proposes a novel and flexible scenario-reduction alternative that integrates data mining, graph theory and community detection concepts to represent the uncertain information as a network and identify the most efficient communities/clusters. The capabilities of the proposed approach were tested by solving a set of two-stage mixed-integer linear programming problems under uncertainty. For comparison and validation purposes, these problems were also solved using two available methods (SCENRED and OSCAR). This comparison demonstrates that the results obtained by using the proposed approach are at least as good or better, in terms of quality and accuracy, than the results obtained bu using SCENRED and OSCAR. Additionally, the practical advantage of the proposed parameter definition rule is demonstrated as a way to overcome the limitations of the current alternatives (i.e. arbitrary user-defined parameters)
    corecore