238 research outputs found
A composition theorem for the Fourier Entropy-Influence conjecture
The Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai [FK96]
seeks to relate two fundamental measures of Boolean function complexity: it
states that holds for every Boolean function , where
denotes the spectral entropy of , is its total influence,
and is a universal constant. Despite significant interest in the
conjecture it has only been shown to hold for a few classes of Boolean
functions.
Our main result is a composition theorem for the FEI conjecture. We show that
if are functions over disjoint sets of variables satisfying the
conjecture, and if the Fourier transform of taken with respect to the
product distribution with biases satisfies the conjecture,
then their composition satisfies the conjecture. As
an application we show that the FEI conjecture holds for read-once formulas
over arbitrary gates of bounded arity, extending a recent result [OWZ11] which
proved it for read-once decision trees. Our techniques also yield an explicit
function with the largest known ratio of between and
, improving on the previous lower bound of 4.615
DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation
Many machine learning problems require performing dataset valuation, i.e. to
quantify the incremental gain, to some relevant pre-defined utility, of
aggregating an individual dataset to others. As seminal examples, dataset
valuation has been leveraged in collaborative and federated learning to create
incentives for data sharing across several data owners. The Shapley value has
recently been proposed as a principled tool to achieve this goal due to formal
axiomatic justification. Since its computation often requires exponential time,
standard approximation strategies based on Monte Carlo integration have been
considered. Such generic approximation methods, however, remain expensive in
some cases. In this paper, we exploit the knowledge about the structure of the
dataset valuation problem to devise more efficient Shapley value estimators. We
propose a novel approximation of the Shapley value, referred to as discrete
uniform Shapley (DU-Shapley) which is expressed as an expectation under a
discrete uniform distribution with support of reasonable size. We justify the
relevancy of the proposed framework via asymptotic and non-asymptotic
theoretical guarantees and show that DU-Shapley tends towards the Shapley value
when the number of data owners is large. The benefits of the proposed framework
are finally illustrated on several dataset valuation benchmarks. DU-Shapley
outperforms other Shapley value approximations, even when the number of data
owners is small.Comment: 22 page
An efficient approach to pruning regression trees using a modified Bayesian information criterion
By identifying relationships between regression tree construction and change-point detection, we show that it is possible to prune a regression tree efficiently using properly modified information criteria. We prove that one of the proposed pruning approaches that uses a modified Bayesian information criterion consistently recovers the true tree structure provided that the true regression function can be represented as a subtree of a full tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that prunes trees adaptively in order to prevent excessive variance, building upon the work of other authors
- …