Search CORE

192,339 research outputs found

Best Subset Selection via a Modern Optimization Lens

Author: Bertsimas Dimitris
King Angela
Mazumder Rahul
Publication venue
Publication date: 11/07/2015
Field of study

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing

k

out of

p

features in linear regression given

n

observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with

n

in the 1000s and

p

in the 100s in minutes to provable optimality, and finds near optimal solutions for

n

in the 100s and

p

in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.Comment: This is a revised version (May, 2015) of the first submission in June 201

arXiv.org e-Print Archive

Data-driven Algorithms for Dimension Reduction in Causal Inference

Author: de Luna Xavier
Häggström Jenny
Persson Emma
Waernbaum Ingeborg
Publication venue: 'Elsevier BV'
Publication date: 31/08/2016
Field of study

In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.Comment: 27 pages, 2 figures, 11 table

arXiv.org e-Print Archive

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Considerate Approaches to Achieving Sufficiency for ABC model selection

Author: Chris Barnes
Michael Stumpf
Michael Stumpf
Sarah Filippi
Tom Thorne
Publication venue
Publication date: 01/01/2011
Field of study

For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared --- rather than the data directly --- information is lost, unless the summary statistics are sufficient. Here we employ an information-theoretical framework that can be used to construct (approximately) sufficient statistics by combining different statistics until the loss of information is minimized. Such sufficient sets of statistics are constructed for both parameter estimation and model selection problems. We apply our approach to a range of illustrative and real-world model selection problems

arXiv.org e-Print Archive

Chromatic number of the product of graphs, graph homomorphisms, Antichains and cofinal subsets of posets without AC

Author: Banerjee Amitayu
Gyenis Zalán
Publication venue
Publication date: 25/08/2020
Field of study

We have observations concerning the set theoretic strength of the following combinatorial statements without the axiom of choice. 1. If in a partially ordered set, all chains are finite and all antichains are countable, then the set is countable. 2. If in a partially ordered set, all chains are finite and all antichains have size

\aleph_{\alpha}

, then the set has size

\aleph_{\alpha}

for any regular

\aleph_{\alpha}

. 3. CS (Every partially ordered set without a maximal element has two disjoint cofinal subsets). 4. CWF (Every partially ordered set has a cofinal well-founded subset). 5. DT (Dilworth's decomposition theorem for infinite p.o.sets of finite width). 6. If the chromatic number of a graph

G_{1}

is finite (say

k<\omega

), and the chromatic number of another graph

G_{2}

is infinite, then the chromatic number of

G_{1}\times G_{2}

k

. 7. For an infinite graph

G=(V_{G}, E_{G})

and a finite graph

H=(V_{H}, E_{H})

, if every finite subgraph of

G

has a homomorphism into

H

, then so has

G

. Further we study a few statements restricted to linearly-ordered structures without the axiom of choice.Comment: Revised versio

arXiv.org e-Print Archive