81,727 research outputs found
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture
Given , define the \emph{spectral
distribution} of to be the distribution on subsets of in which the
set is sampled with probability . Then the Fourier
Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that
there is some absolute constant such that . Here,
denotes the Shannon entropy of 's spectral distribution, and
is the total influence of . This conjecture is one
of the major open problems in the analysis of Boolean functions, and settling
it would have several interesting consequences.
Previous results on the FEI conjecture have been largely through direct
calculation. In this paper we study a natural interpretation of the conjecture,
which states that there exists a communication protocol which, given subset
of distributed as , can communicate the value of using
at most bits in expectation.
Using this interpretation, we are able show the following results:
1. First, if is computable by a read- decision tree, then
.
2. Next, if has and is computable by a
decision tree with expected depth , then .
3. Finally, we give a new proof of the main theorem of O'Donnell and Tan
(ICALP 2013), i.e. that their FEI conjecture composes.
In addition, we show that natural improvements to our decision tree results
would be sufficient to prove the FEI conjecture in its entirety. We believe
that our methods give more illuminating proofs than previous results about the
FEI conjecture
Bridging the Gap Between the Least and the Most Influential Twitter Users
Social networks play an increasingly important role in shaping the behaviour of users of the Web. Conceivably Twitter stands out from the others, not only for the platform's simplicity but also for the great influence that the messages sent over the network can have. The impact of such messages determines the influence of a Twitter user and is what tools such as Klout, PeerIndex or TwitterGrader aim to calculate. Reducing all the factors that make a person influential into a single number is not an easy task, and the effort involved could become useless if the Twitter users do not know how to improve it. In this paper we identify what specific actions should be carried out for a Twitterer to increase their influence in each of above-mentioned tools applying, for this purpose, data mining techniques based on classification and regression algorithms to the information collected from a set of Twitter users.This work has been partially founded by the European Commission Project ”SiSOB: An Observatorium for Science
in Society based in Social Models” (http://sisob.lcc.uma.es) (Contract no.: FP7 266588), ”Sistemas Inalámbricos
de Gestión de Información Crítica” (with code number TIN2011-23795 and granted by the MEC, Spain) and ”3DTUTOR:
Sistema Interoperable de Asistencia y Tutoría Virtual e Inteligente 3D” (with code number IPT-2011-0889-
900000 and granted by the MINECO, Spain
- …