16 research outputs found
Tree regression models using statistical testing and mixed integer programming
Regression analysis is a statistical procedure that fits a mathematical function to a set of data in order to capture the relationship between dependent and independent variables. In tree regression, tree structures are constructed by repeated splits of the input space into two subsets, creating if-then-else rules. Such models are popular in the literature due to their ability to be computed quickly and their simple interpretations. This work introduces a tree regression algorithm that exploits an optimisation model of an existing literature method called Mathematical Programming Tree (MPtree) to optimally split nodes into subsets and applies a statistical test to assess the quality of the partitioning. Additionally, an approach of splitting nodes using multivariate decision rules is explored in this work and compared in terms of performance and computational efficiency. Finally, a novel mathematical model is introduced that performs subset selection on each node in order to select an optimal set of variables to considered for splitting, that improves the computational performance of the proposed algorithm
Piecewise Regression through the Akaike Information Criterion using Mathematical Programming
In machine learning, regression analysis is a tool for predicting the output variables from a set of known independent variables. Through regression analysis, a function that captures the relationship between the variables is fitted to the data. Many methods from literature tackle this problem with various degrees of difficulty. Some simple methods include linear regression and least squares, while some are more complicated such as support vector regression. Piecewise or segmented regression is a method of analysis that partitions the independent variables into intervals and a function is fitted to each interval. In this work, the Optimal Piecewise Linear Regression Analysis (OPLRA) model is used from literature to tackle the problem of segmented analysis. This model is a mathematical programming approach that is formulated as a mixed integer linear programming problem that optimally partitions the data into multiple regions and calculates the regression coefficients, while minimising the Mean Absolute Error of the fitting. However, the number of regions is a known priori. For this work, an extension of the model is proposed that can optimally decide on the number of regions using information criteria. Specifically, the Akaike Information Criterion is used and the objective is to minimise its value. By using the criterion, the model no longer needs a heuristic approach to decide on the number of regions and it also deals with the problem of overfitting and model complexity
Winterberg's conjectured breaking of the superluminal quantum correlations over large distances
We elaborate further on a hypothesis by Winterberg that turbulent
fluctuations of the zero point field may lead to a breakdown of the
superluminal quantum correlations over very large distances. A phenomenological
model that was proposed by Winterberg to estimate the transition scale of the
conjectured breakdown, does not lead to a distance that is large enough to be
agreeable with recent experiments. We consider, but rule out, the possibility
of a steeper slope in the energy spectrum of the turbulent fluctuations, due to
compressibility, as a possible mechanism that may lead to an increased
lower-bound for the transition scale. Instead, we argue that Winterberg
overestimated the intensity of the ZPF turbulent fluctuations. We calculate a
very generous corrected lower bound for the transition distance which is
consistent with current experiments.Comment: 7 pages, submitted to Int. J. Theor. Phy
Recent Developments in Understanding Two-dimensional Turbulence and the Nastrom-Gage Spectrum
Two-dimensional turbulence appears to be a more formidable problem than
three-dimensional turbulence despite the numerical advantage of working with
one less dimension. In the present paper we review recent numerical
investigations of the phenomenology of two-dimensional turbulence as well as
recent theoretical breakthroughs by various leading researchers. We also review
efforts to reconcile the observed energy spectrum of the atmosphere (the
spectrum) with the predictions of two-dimensional turbulence and
quasi-geostrophic turbulence.Comment: Invited review; accepted by J. Low Temp. Phys.; Proceedings for
Warwick Turbulence Symposium Workshop on Universal features in turbulence:
from quantum to cosmological scales, 200
Piecewise regression analysis through information criteria using mathematical programming
Regression is a predictive analysis tool that examines the relationship between independent and dependent variables. The goal of this analysis is to fit a mathematical function that describes how the value of
the response changes when the values of the predictors vary. The simplest form of regression is linear regression which in the case multiple regression, tries to explain the data by simply fitting a hyperplane
minimising the absolute error of the fitting. Piecewise regression analysis partitions the data into multiple regions and a regression function is fitted to each one. Such an approach is the OPLRA (Optimal Piecewise Linear Regression Analysis) model (Yang, Liu, Tsoka, & Papage, 2016) which is a mathematical programming approach that optimally partitions the data into multiple regions and fits a linear regression functions minimising the Mean Absolute Error between prediction and truth. However, using many regions to describe the data can lead to overfitting and bad results. In this work an extension of the OPLRA model is proposed that deals with the problem of selecting the optimal number of regions as well as overfitting. To achieve this result, information criteria such as the Akaike and the Bayesian are used that reward predictive accuracy and penalise model complexity
A graph theory approach for scenario aggregation for stochastic optimisation
The development of fast, robust and reliable computational tools capable of addressing process management under uncertain conditions is an active topic in the current literature, and more precisely for the process systems engineering one. Particularly, scenario reduction strategies have emerged as an alternative to overcome the traditional issues associated with large-scale scenario-based problems. This work proposes a novel and flexible scenario-reduction alternative that integrates data mining, graph theory and community detection concepts to represent the uncertain information as a network and identify the most efficient communities/clusters. The capabilities of the proposed approach were tested by solving a set of two-stage mixed-integer linear programming problems under uncertainty. For comparison and validation purposes, these problems were also solved using two available methods (SCENRED and OSCAR). This comparison demonstrates that the results obtained by using the proposed approach are at least as good or better, in terms of quality and accuracy, than the results obtained bu using SCENRED and OSCAR. Additionally, the practical advantage of the proposed parameter definition rule is demonstrated as a way to overcome the limitations of the current alternatives (i.e. arbitrary user-defined parameters)