204 research outputs found

    Approximate resilience, monotonicity, and the complexity of agnostic learning

    Full text link
    A function ff is dd-resilient if all its Fourier coefficients of degree at most dd are zero, i.e., ff is uncorrelated with all low-degree parities. We study the notion of approximate\mathit{approximate} resilience\mathit{resilience} of Boolean functions, where we say that ff is α\alpha-approximately dd-resilient if ff is α\alpha-close to a [1,1][-1,1]-valued dd-resilient function in 1\ell_1 distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class CC over the uniform distribution. Roughly speaking, if all functions in a class CC are far from being dd-resilient then CC can be learned agnostically in time nO(d)n^{O(d)} and conversely, if CC contains a function close to being dd-resilient then agnostic learning of CC in the statistical query (SQ) framework of Kearns has complexity of at least nΩ(d)n^{\Omega(d)}. This characterization is based on the duality between 1\ell_1 approximation by degree-dd polynomials and approximate dd-resilience that we establish. In particular, it implies that 1\ell_1 approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal α\alpha-approximately Ω~(αn)\widetilde{\Omega}(\alpha\sqrt{n})-resilient monotone functions for all α>0\alpha>0. Prior to our work, it was conceivable even that every monotone function is Ω(1)\Omega(1)-far from any 11-resilient function. Furthermore, we construct simple, explicit monotone functions based on Tribes{\sf Tribes} and CycleRun{\sf CycleRun} that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

    Agnostic Learning of Disjunctions on Symmetric Distributions

    Full text link
    We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over {0,1}n\{0,1\}^n. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution D\mathcal{D}, there exists a set of nO(log(1/ϵ))n^{O(\log{(1/\epsilon)})} functions S\mathcal{S}, such that for every disjunction cc, there is function pp, expressible as a linear combination of functions in S\mathcal{S}, such that pp ϵ\epsilon-approximates cc in 1\ell_1 distance on D\mathcal{D} or ExD[c(x)p(x)]ϵ\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time nO(log(1/ϵ))n^{O( \log{(1/\epsilon)})}. The best known previous bound is nO(1/ϵ4)n^{O(1/\epsilon^4)} and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution D\mathcal{D}, such that the minimum degree of a polynomial that 1/31/3-approximates the disjunction of all nn variables is 1\ell_1 distance on D\mathcal{D} is Ω(n)\Omega( \sqrt{n}). Therefore the learning result above cannot be achieved via 1\ell_1-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution D\mathcal{D} and every disjunction cc, there exists a polynomial pp of degree O(log(1/ϵ))O(\log{(1/\epsilon)}) such that pp ϵ\epsilon-approximates cc in 1\ell_1 distance on D\mathcal{D}. This was first proved by Blais et al. (2008) via a more involved argument

    Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

    Get PDF
    Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on Rn\mathbb{R}^n in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to exp(x0.99)\exp(-|x|^{0.99}). Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only O(log(1/δ))O(\log(1/\delta))-wise independence, for a tail probability of δ\delta. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

    Evaluating explainability for machine learning predictions using model-agnostic metrics

    Full text link
    Rapid advancements in artificial intelligence (AI) technology have brought about a plethora of new challenges in terms of governance and regulation. AI systems are being integrated into various industries and sectors, creating a demand from decision-makers to possess a comprehensive and nuanced understanding of the capabilities and limitations of these systems. One critical aspect of this demand is the ability to explain the results of machine learning models, which is crucial to promoting transparency and trust in AI systems, as well as fundamental in helping machine learning models to be trained ethically. In this paper, we present novel metrics to quantify the degree of which AI model predictions can be easily explainable by its features. Our metrics summarize different aspects of explainability into scalars, providing a more comprehensive understanding of model predictions and facilitating communication between decision-makers and stakeholders, thereby increasing the overall transparency and accountability of AI systems

    Interpretable Machine Learning Model for Clinical Decision Making

    Get PDF
    Despite machine learning models being increasingly used in medical decision-making and meeting classification predictive accuracy standards, they remain untrusted black-boxes due to decision-makers\u27 lack of insight into their complex logic. Therefore, it is necessary to develop interpretable machine learning models that will engender trust in the knowledge they generate and contribute to clinical decision-makers intention to adopt them in the field. The goal of this dissertation was to systematically investigate the applicability of interpretable model-agnostic methods to explain predictions of black-box machine learning models for medical decision-making. As proof of concept, this study addressed the problem of predicting the risk of emergency readmissions within 30 days of being discharged for heart failure patients. Using a benchmark data set, supervised classification models of differing complexity were trained to perform the prediction task. More specifically, Logistic Regression (LR), Random Forests (RF), Decision Trees (DT), and Gradient Boosting Machines (GBM) models were constructed using the Healthcare Cost and Utilization Project (HCUP) Nationwide Readmissions Database (NRD). The precision, recall, area under the ROC curve for each model were used to measure predictive accuracy. Local Interpretable Model-Agnostic Explanations (LIME) was used to generate explanations from the underlying trained models. LIME explanations were empirically evaluated using explanation stability and local fit (R2). The results demonstrated that local explanations generated by LIME created better estimates for Decision Trees (DT) classifiers
    corecore