Search CORE

7 research outputs found

Conditional Sparse Linear Regression

Author: Juba Brendan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)
Publication date: 17/08/2016
Field of study

Machine learning and statistics typically focus on building models that capture the vast majority of the data, possibly ignoring a small subset of data as "noise" or "outliers." By contrast, here we consider the problem of jointly identifying a significant (but perhaps small) segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit. We contend that such tasks are of interest both because the models themselves may be able to achieve better predictions in such special cases, but also because they may aid our understanding of the data. We give algorithms for such problems under the sup norm, when this unknown segment of the population is described by a k-DNF condition and the regression fit is s-sparse for constant k and s. For the variants of this problem when the regression fit is not so sparse or using expected error, we also give a preliminary algorithm and highlight the question as a challenge for future work

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Conditional Linear Regression

Author: Calderon Diego
Juba Brendan
Li Sirui
Li Zongyi
Ruan Lisa
Publication venue: PMLR
Publication date: 01/08/2020
Field of study

Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there may not exist a good, simple model for the distribution, so we seek to find a small subset where there exists such a model. We give a computationally efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant portion of the data distribution, described by a k-DNF, along with a linear predictor on that portion with a small loss. In contrast to work in robust statistics on small subsets, our loss bounds do not feature a dependence on the density of the portion we fit, and compared to previous work on conditional linear regression, our algorithm’s running time scales polynomially with the sparsity of the linear predictor. We also demonstrate empirically that our algorithm can leverage this advantage to obtain a k-DNF with a better linear predictor in practice

CSP-Completeness And Its Applications

Author: Durgin Alexander
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2020
Field of study

We build off of previous ideas used to study both reductions between CSPrefutation problems and improper learning and between CSP-refutation problems themselves to expand some hardness results that depend on the assumption that refuting random CSP instances are hard for certain choices of predicates (like k-SAT). First, we are able argue the hardness of the fundamental problem of learning conjunctions in a one-sided PAC-esque learning model that has appeared in several forms over the years. In this model we focus on producing a hypothesis that foremost guarantees a small false-positive rate while minimizing the false-negative rate for such hypotheses. Further, we formalize a notion of CSP-refutation reductions and CSP-refutation completeness that and use these, along with candidate CSP-refutatation complete predicates, to provide further evidence for the hardness of several problems

Washington University St. Louis: Open Scholarship

Fundamental limits and algorithms for sparse linear regression with sublinear sparsity

Author: Truong Lan V.
Publication venue
Publication date: 14/07/2021
Field of study

We establish exact asymptotic expressions for the normalized mutual information and minimum mean-square-error (MMSE) of sparse linear regression in the sub-linear sparsity regime. Our result is achieved by a generalization of the adaptive interpolation method in Bayesian inference for linear regimes to sub-linear ones. A modification of the well-known approximate message passing algorithm to approach the MMSE fundamental limit is also proposed, and its state evolution is rigorously analyzed. Our results show that the traditional linear assumption between the signal dimension and number of observations in the replica and adaptive interpolation methods is not necessary for sparse signals. They also show how to modify the existing well-known AMP algorithms for linear regimes to sub-linear ones.Comment: 45 pages, 2 figures. Under review for publication. Add some auxiliary proof

arXiv.org e-Print Archive

Washington University Senior Undergraduate Research Digest (WUURD), Spring 2018

Author: Office of Undergraduate Research
Publication venue: Washington University Open Scholarship
Publication date: 01/04/2018
Field of study

From the Washington University Office of Undergraduate Research Digest (WUURD), Vol. 13, 05-01-2018. Published by the Office of Undergraduate Research. Joy Zalis Kiefer, Director of Undergraduate Research and Associate Dean in the College of Arts & Scien

Washington University St. Louis: Open Scholarship