Search CORE

81,727 research outputs found

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

Consider the following heuristic for building a decision tree for a function

f : \{0,1\}^n \to \{\pm 1\}

. Place the most influential variable

x_i

f

at the root, and recurse on the subfunctions

f_{x_i=0}

and

f_{x_i=1}

on the left and right subtrees respectively; terminate once the tree is an

\varepsilon

-approximation of

f

. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds:

\circ

Upper bound: For every

f

with decision tree size

s

and every

\varepsilon \in (0,\frac1{2})

, this heuristic builds a decision tree of size at most

s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}

\circ

Lower bound: For every

\varepsilon \in (0,\frac1{2})

and

s \le 2^{\tilde{O}(\sqrt{n})}

, there is an

f

with decision tree size

s

such that this heuristic builds a decision tree of size

s^{\tilde{\Omega}(\log s)}

. We also obtain upper and lower bounds for monotone functions:

s^{O(\sqrt{\log s}/\varepsilon)}

and

s^{\tilde{\Omega}(\sqrt[4]{\log s } )}

respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture

Author: Wan Andrew
Wright John
Wu Chenggang
Publication venue
Publication date: 10/12/2013
Field of study

Given

f:\{-1, 1\}^n \rightarrow \{-1, 1\}

, define the \emph{spectral distribution} of

f

to be the distribution on subsets of

[n]

in which the set

S

is sampled with probability

\widehat{f}(S)^2

. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant

C

such that

\operatorname{H}[\widehat{f}^2] \leq C\cdot\operatorname{Inf}[f]

. Here,

\operatorname{H}[\widehat{f}^2]

denotes the Shannon entropy of

f

's spectral distribution, and

\operatorname{Inf}[f]

is the total influence of

f

. This conjecture is one of the major open problems in the analysis of Boolean functions, and settling it would have several interesting consequences. Previous results on the FEI conjecture have been largely through direct calculation. In this paper we study a natural interpretation of the conjecture, which states that there exists a communication protocol which, given subset

S

[n]

distributed as

\widehat{f}^2

, can communicate the value of

S

using at most

C\cdot\operatorname{Inf}[f]

bits in expectation. Using this interpretation, we are able show the following results: 1. First, if

f

is computable by a read-

k

decision tree, then

\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]

. 2. Next, if

f

has

\operatorname{Inf}[f] \geq 1

and is computable by a decision tree with expected depth

d

, then

\operatorname{H}[\widehat{f}^2] \leq 12d\cdot \operatorname{Inf}[f]

. 3. Finally, we give a new proof of the main theorem of O'Donnell and Tan (ICALP 2013), i.e. that their FEI

^+

conjecture composes. In addition, we show that natural improvements to our decision tree results would be sufficient to prove the FEI conjecture in its entirety. We believe that our methods give more illuminating proofs than previous results about the FEI conjecture

arXiv.org e-Print Archive

CiteSeerX

Bridging the Gap Between the Least and the Most Influential Twitter Users

Author: Del Campo-Ávila José
Moreno-Vergara Natalia
Trella-Lopez Monica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Social networks play an increasingly important role in shaping the behaviour of users of the Web. Conceivably Twitter stands out from the others, not only for the platform's simplicity but also for the great influence that the messages sent over the network can have. The impact of such messages determines the influence of a Twitter user and is what tools such as Klout, PeerIndex or TwitterGrader aim to calculate. Reducing all the factors that make a person influential into a single number is not an easy task, and the effort involved could become useless if the Twitter users do not know how to improve it. In this paper we identify what specific actions should be carried out for a Twitterer to increase their influence in each of above-mentioned tools applying, for this purpose, data mining techniques based on classification and regression algorithms to the information collected from a set of Twitter users.This work has been partially founded by the European Commission Project ”SiSOB: An Observatorium for Science in Society based in Social Models” (http://sisob.lcc.uma.es) (Contract no.: FP7 266588), ”Sistemas Inalámbricos de Gestión de Información Crítica” (with code number TIN2011-23795 and granted by the MEC, Spain) and ”3DTUTOR: Sistema Interoperable de Asistencia y Tutoría Virtual e Inteligente 3D” (with code number IPT-2011-0889- 900000 and granted by the MINECO, Spain

Elsevier - Publisher Connector

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga