192,339 research outputs found

    Best Subset Selection via a Modern Optimization Lens

    Get PDF
    In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing kk out of pp features in linear regression given nn observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with nn in the 1000s and pp in the 100s in minutes to provable optimality, and finds near optimal solutions for nn in the 100s and pp in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.Comment: This is a revised version (May, 2015) of the first submission in June 201

    Data-driven Algorithms for Dimension Reduction in Causal Inference

    Full text link
    In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.Comment: 27 pages, 2 figures, 11 table

    Considerate Approaches to Achieving Sufficiency for ABC model selection

    Full text link
    For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared --- rather than the data directly --- information is lost, unless the summary statistics are sufficient. Here we employ an information-theoretical framework that can be used to construct (approximately) sufficient statistics by combining different statistics until the loss of information is minimized. Such sufficient sets of statistics are constructed for both parameter estimation and model selection problems. We apply our approach to a range of illustrative and real-world model selection problems

    Chromatic number of the product of graphs, graph homomorphisms, Antichains and cofinal subsets of posets without AC

    Full text link
    We have observations concerning the set theoretic strength of the following combinatorial statements without the axiom of choice. 1. If in a partially ordered set, all chains are finite and all antichains are countable, then the set is countable. 2. If in a partially ordered set, all chains are finite and all antichains have size ℵα\aleph_{\alpha}, then the set has size ℵα\aleph_{\alpha} for any regular ℵα\aleph_{\alpha}. 3. CS (Every partially ordered set without a maximal element has two disjoint cofinal subsets). 4. CWF (Every partially ordered set has a cofinal well-founded subset). 5. DT (Dilworth's decomposition theorem for infinite p.o.sets of finite width). 6. If the chromatic number of a graph G1G_{1} is finite (say k<ωk<\omega), and the chromatic number of another graph G2G_{2} is infinite, then the chromatic number of G1×G2G_{1}\times G_{2} is kk. 7. For an infinite graph G=(VG,EG)G=(V_{G}, E_{G}) and a finite graph H=(VH,EH)H=(V_{H}, E_{H}), if every finite subgraph of GG has a homomorphism into HH, then so has GG. Further we study a few statements restricted to linearly-ordered structures without the axiom of choice.Comment: Revised versio
    • …
    corecore