Search CORE

204 research outputs found

Approximate resilience, monotonicity, and the complexity of agnostic learning

Author: Dachman-Soled Dana
Feldman Vitaly
Tan Li-Yang
Wan Andrew
Wimmer Karl
Publication venue
Publication date: 09/07/2014
Field of study

A function

f

d

-resilient if all its Fourier coefficients of degree at most

d

are zero, i.e.,

f

is uncorrelated with all low-degree parities. We study the notion of

\mathit{approximate}

\mathit{resilience}

of Boolean functions, where we say that

f

\alpha

-approximately

d

-resilient if

f

\alpha

-close to a

[-1,1]

-valued

d

-resilient function in

\ell_1

distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class

C

over the uniform distribution. Roughly speaking, if all functions in a class

C

are far from being

d

-resilient then

C

can be learned agnostically in time

n^{O(d)}

and conversely, if

C

contains a function close to being

d

-resilient then agnostic learning of

C

in the statistical query (SQ) framework of Kearns has complexity of at least

n^{\Omega(d)}

. This characterization is based on the duality between

\ell_1

approximation by degree-

d

polynomials and approximate

d

-resilience that we establish. In particular, it implies that

\ell_1

approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal

\alpha

-approximately

\widetilde{\Omega}(\alpha\sqrt{n})

-resilient monotone functions for all

\alpha>0

. Prior to our work, it was conceivable even that every monotone function is

\Omega(1)

-far from any

1

-resilient function. Furthermore, we construct simple, explicit monotone functions based on

{\sf Tribes}

and

{\sf CycleRun}

that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

arXiv.org e-Print Archive

CiteSeerX

Crossref

Agnostic Learning of Disjunctions on Symmetric Distributions

Author: Feldman Vitaly
Kothari Pravesh
Publication venue
Publication date: 25/05/2015
Field of study

We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over

\{0,1\}^n

. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution

\mathcal{D}

, there exists a set of

n^{O(\log{(1/\epsilon)})}

functions

\mathcal{S}

, such that for every disjunction

c

, there is function

p

, expressible as a linear combination of functions in

\mathcal{S}

, such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon

. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time

n^{O( \log{(1/\epsilon)})}

. The best known previous bound is

n^{O(1/\epsilon^4)}

and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution

\mathcal{D}

, such that the minimum degree of a polynomial that

1/3

-approximates the disjunction of all

n

variables is

\ell_1

distance on

\mathcal{D}

\Omega( \sqrt{n})

. Therefore the learning result above cannot be achieved via

\ell_1

-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution

\mathcal{D}

and every disjunction

c

, there exists a polynomial

p

of degree

O(\log{(1/\epsilon)})

such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

. This was first proved by Blais et al. (2008) via a more involved argument

arXiv.org e-Print Archive

CiteSeerX

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

Author: Bun Mark
Steinke Thomas
Publication venue
Publication date: 08/12/2014
Field of study

Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on

\mathbb{R}^n

in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to

\exp(-|x|^{0.99})

. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only

O(\log(1/\delta))

-wise independence, for a tail probability of

\delta

. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Evaluating explainability for machine learning predictions using model-agnostic metrics

Author: da Costa Kleyton
Koshiyama Adriano
Modenesi Bernardo
Munoz Cristian
Publication venue
Publication date: 29/01/2024
Field of study

Rapid advancements in artificial intelligence (AI) technology have brought about a plethora of new challenges in terms of governance and regulation. AI systems are being integrated into various industries and sectors, creating a demand from decision-makers to possess a comprehensive and nuanced understanding of the capabilities and limitations of these systems. One critical aspect of this demand is the ability to explain the results of machine learning models, which is crucial to promoting transparency and trust in AI systems, as well as fundamental in helping machine learning models to be trained ethically. In this paper, we present novel metrics to quantify the degree of which AI model predictions can be easily explainable by its features. Our metrics summarize different aspects of explainability into scalars, providing a more comprehensive understanding of model predictions and facilitating communication between decision-makers and stakeholders, thereby increasing the overall transparency and accountability of AI systems

arXiv.org e-Print Archive

Interpretable Machine Learning Model for Clinical Decision Making

Author: El-Sharif Ali
Publication venue: NSUWorks
Publication date: 01/01/2021
Field of study

Despite machine learning models being increasingly used in medical decision-making and meeting classification predictive accuracy standards, they remain untrusted black-boxes due to decision-makers\u27 lack of insight into their complex logic. Therefore, it is necessary to develop interpretable machine learning models that will engender trust in the knowledge they generate and contribute to clinical decision-makers intention to adopt them in the field. The goal of this dissertation was to systematically investigate the applicability of interpretable model-agnostic methods to explain predictions of black-box machine learning models for medical decision-making. As proof of concept, this study addressed the problem of predicting the risk of emergency readmissions within 30 days of being discharged for heart failure patients. Using a benchmark data set, supervised classification models of differing complexity were trained to perform the prediction task. More specifically, Logistic Regression (LR), Random Forests (RF), Decision Trees (DT), and Gradient Boosting Machines (GBM) models were constructed using the Healthcare Cost and Utilization Project (HCUP) Nationwide Readmissions Database (NRD). The precision, recall, area under the ROC curve for each model were used to measure predictive accuracy. Local Interpretable Model-Agnostic Explanations (LIME) was used to generate explanations from the underlying trained models. LIME explanations were empirically evaluated using explanation stability and local fit (R2). The results demonstrated that local explanations generated by LIME created better estimates for Decision Trees (DT) classifiers

NSU Works