Search CORE

238 research outputs found

A composition theorem for the Fourier Entropy-Influence conjecture

Author: A. De
E. Friedgut
E. Friedgut
N. Keller
R. Heiman
R. O’Donnell
R. O’Donnell
R. Servedio
Publication venue
Publication date: 01/01/2013
Field of study

The Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai [FK96] seeks to relate two fundamental measures of Boolean function complexity: it states that

H[f] \leq C Inf[f]

holds for every Boolean function

f

, where

H[f]

denotes the spectral entropy of

f

Inf[f]

is its total influence, and

C > 0

is a universal constant. Despite significant interest in the conjecture it has only been shown to hold for a few classes of Boolean functions. Our main result is a composition theorem for the FEI conjecture. We show that if

g_1,...,g_k

are functions over disjoint sets of variables satisfying the conjecture, and if the Fourier transform of

F

taken with respect to the product distribution with biases

E[g_1],...,E[g_k]

satisfies the conjecture, then their composition

F(g_1(x^1),...,g_k(x^k))

satisfies the conjecture. As an application we show that the FEI conjecture holds for read-once formulas over arbitrary gates of bounded arity, extending a recent result [OWZ11] which proved it for read-once decision trees. Our techniques also yield an explicit function with the largest known ratio of

C \geq 6.278

between

H[f]

and

Inf[f]

, improving on the previous lower bound of 4.615

arXiv.org e-Print Archive

Crossref

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

Author: Garrido-Lucero Felipe
Heymann Benjamin
Loiseau Patrick
Perchet Vianney
Vono Maxime
Publication venue
Publication date: 03/06/2023
Field of study

Many machine learning problems require performing dataset valuation, i.e. to quantify the incremental gain, to some relevant pre-defined utility, of aggregating an individual dataset to others. As seminal examples, dataset valuation has been leveraged in collaborative and federated learning to create incentives for data sharing across several data owners. The Shapley value has recently been proposed as a principled tool to achieve this goal due to formal axiomatic justification. Since its computation often requires exponential time, standard approximation strategies based on Monte Carlo integration have been considered. Such generic approximation methods, however, remain expensive in some cases. In this paper, we exploit the knowledge about the structure of the dataset valuation problem to devise more efficient Shapley value estimators. We propose a novel approximation of the Shapley value, referred to as discrete uniform Shapley (DU-Shapley) which is expressed as an expectation under a discrete uniform distribution with support of reasonable size. We justify the relevancy of the proposed framework via asymptotic and non-asymptotic theoretical guarantees and show that DU-Shapley tends towards the Shapley value when the number of data owners is large. The benefits of the proposed framework are finally illustrated on several dataset valuation benchmarks. DU-Shapley outperforms other Shapley value approximations, even when the number of data owners is small.Comment: 22 page

arXiv.org e-Print Archive

An efficient approach to pruning regression trees using a modified Bayesian information criterion

Author: Surjanovic Nikola
Publication venue
Publication date: 14/04/2021
Field of study

By identifying relationships between regression tree construction and change-point detection, we show that it is possible to prune a regression tree efficiently using properly modified information criteria. We prove that one of the proposed pruning approaches that uses a modified Bayesian information criterion consistently recovers the true tree structure provided that the true regression function can be represented as a subtree of a full tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that prunes trees adaptively in order to prevent excessive variance, building upon the work of other authors

Simon Fraser University Institutional Repository

Learning Stochastic Decision Trees

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server