192,339 research outputs found
Best Subset Selection via a Modern Optimization Lens
In the last twenty-five years (1990-2014), algorithmic advances in integer
optimization combined with hardware improvements have resulted in an
astonishing 200 billion factor speedup in solving Mixed Integer Optimization
(MIO) problems. We present a MIO approach for solving the classical best subset
selection problem of choosing out of features in linear regression
given observations. We develop a discrete extension of modern first order
continuous optimization methods to find high quality feasible solutions that we
use as warm starts to a MIO solver that finds provably optimal solutions. The
resulting algorithm (a) provides a solution with a guarantee on its
suboptimality even if we terminate the algorithm early, (b) can accommodate
side constraints on the coefficients of the linear regression and (c) extends
to finding best subset solutions for the least absolute deviation loss
function. Using a wide variety of synthetic and real datasets, we demonstrate
that our approach solves problems with in the 1000s and in the 100s in
minutes to provable optimality, and finds near optimal solutions for in the
100s and in the 1000s in minutes. We also establish via numerical
experiments that the MIO approach performs better than {\texttt {Lasso}} and
other popularly used sparse learning procedures, in terms of achieving sparse
solutions with good predictive power.Comment: This is a revised version (May, 2015) of the first submission in June
201
Data-driven Algorithms for Dimension Reduction in Causal Inference
In observational studies, the causal effect of a treatment may be confounded
with variables that are related to both the treatment and the outcome of
interest. In order to identify a causal effect, such studies often rely on the
unconfoundedness assumption, i.e., that all confounding variables are observed.
The choice of covariates to control for, which is primarily based on subject
matter knowledge, may result in a large covariate vector in the attempt to
ensure that unconfoundedness holds. However, including redundant covariates can
affect bias and efficiency of nonparametric causal effect estimators, e.g., due
to the curse of dimensionality. Data-driven algorithms for the selection of
sufficient covariate subsets are investigated. Under the assumption of
unconfoundedness the algorithms search for minimal subsets of the covariate
vector. Based, e.g., on the framework of sufficient dimension reduction or
kernel smoothing, the algorithms perform a backward elimination procedure
assessing the significance of each covariate. Their performance is evaluated in
simulations and an application using data from the Swedish Childhood Diabetes
Register is also presented.Comment: 27 pages, 2 figures, 11 table
Considerate Approaches to Achieving Sufficiency for ABC model selection
For nearly any challenging scientific problem evaluation of the likelihood is
problematic if not impossible. Approximate Bayesian computation (ABC) allows us
to employ the whole Bayesian formalism to problems where we can use simulations
from a model, but cannot evaluate the likelihood directly. When summary
statistics of real and simulated data are compared --- rather than the data
directly --- information is lost, unless the summary statistics are sufficient.
Here we employ an information-theoretical framework that can be used to
construct (approximately) sufficient statistics by combining different
statistics until the loss of information is minimized. Such sufficient sets of
statistics are constructed for both parameter estimation and model selection
problems. We apply our approach to a range of illustrative and real-world model
selection problems
Chromatic number of the product of graphs, graph homomorphisms, Antichains and cofinal subsets of posets without AC
We have observations concerning the set theoretic strength of the following
combinatorial statements without the axiom of choice. 1. If in a partially
ordered set, all chains are finite and all antichains are countable, then the
set is countable. 2. If in a partially ordered set, all chains are finite and
all antichains have size , then the set has size
for any regular . 3. CS (Every partially
ordered set without a maximal element has two disjoint cofinal subsets). 4. CWF
(Every partially ordered set has a cofinal well-founded subset). 5. DT
(Dilworth's decomposition theorem for infinite p.o.sets of finite width). 6. If
the chromatic number of a graph is finite (say ), and the
chromatic number of another graph is infinite, then the chromatic
number of is . 7. For an infinite graph and a finite graph , if every finite subgraph of
has a homomorphism into , then so has . Further we study a few statements
restricted to linearly-ordered structures without the axiom of choice.Comment: Revised versio
- …