119,037 research outputs found
A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation
Subset selection for multiple linear regression aims to construct a
regression model that minimizes errors by selecting a small number of
explanatory variables. Once a model is built, various statistical tests and
diagnostics are conducted to validate the model and to determine whether the
regression assumptions are met. Most traditional approaches require human
decisions at this step. For example, the user adding or removing a variable
until a satisfactory model is obtained. However, this trial-and-error strategy
cannot guarantee that a subset that minimizes the errors while satisfying all
regression assumptions will be found. In this paper, we propose a fully
automated model building procedure for multiple linear regression subset
selection that integrates model building and validation based on mathematical
programming. The proposed model minimizes mean squared errors while ensuring
that the majority of the important regression assumptions are met. We also
propose an efficient constraint to approximate the constraint for the
coefficient t-test. When no subset satisfies all of the considered regression
assumptions, our model provides an alternative subset that satisfies most of
these assumptions. Computational results show that our model yields better
solutions (i.e., satisfying more regression assumptions) compared to the
state-of-the-art benchmark models while maintaining similar explanatory power
Towards an Iterative Algorithm for the Optimal Boundary Coverage of a 3D Environment
This paper presents a new optimal algorithm for locating a set of sensors in 3D able to see the boundaries of a polyhedral environment. Our approach is iterative and is based on a lower bound on the sensors' number and on a restriction of the original problem requiring each face to be observed in its entirety by at least one sensor. The lower bound allows evaluating the quality of the solution obtained at each step, and halting the algorithm if the solution is satisfactory. The algorithm asymptotically converges to the optimal solution of the unrestricted problem if the faces are subdivided into smaller part
Positive Semidefinite Metric Learning Using Boosting-like Algorithms
The success of many machine learning and pattern recognition methods relies
heavily upon the identification of an appropriate distance metric on the input
data. It is often beneficial to learn such a metric from the input training
data, instead of using a default one such as the Euclidean distance. In this
work, we propose a boosting-based technique, termed BoostMetric, for learning a
quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance
metric requires enforcing the constraint that the matrix parameter to the
metric remains positive definite. Semidefinite programming is often used to
enforce this constraint, but does not scale well and easy to implement.
BoostMetric is instead based on the observation that any positive semidefinite
matrix can be decomposed into a linear combination of trace-one rank-one
matrices. BoostMetric thus uses rank-one positive semidefinite matrices as weak
learners within an efficient and scalable boosting-based learning process. The
resulting methods are easy to implement, efficient, and can accommodate various
types of constraints. We extend traditional boosting algorithms in that its
weak learner is a positive semidefinite matrix with trace and rank being one
rather than a classifier or regressor. Experiments on various datasets
demonstrate that the proposed algorithms compare favorably to those
state-of-the-art methods in terms of classification accuracy and running time.Comment: 30 pages, appearing in Journal of Machine Learning Researc
- …