11,244 research outputs found
TreeGrad: Transferring Tree Ensembles to Neural Networks
Gradient Boosting Decision Tree (GBDT) are popular machine learning
algorithms with implementations such as LightGBM and in popular machine
learning toolkits like Scikit-Learn. Many implementations can only produce
trees in an offline manner and in a greedy manner. We explore ways to convert
existing GBDT implementations to known neural network architectures with
minimal performance loss in order to allow decision splits to be updated in an
online manner and provide extensions to allow splits points to be altered as a
neural architecture search problem. We provide learning bounds for our neural
network.Comment: Technical Report on Implementation of Deep Neural Decision Forests
Algorithm. To accompany implementation here:
https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019).
"Transferring Tree Ensembles to Neural Networks". International Conference on
Neural Information Processing. Springer, 2019. arXiv admin note: text overlap
with arXiv:1909.1179
The cavity approach for Steiner trees packing problems
The Belief Propagation approximation, or cavity method, has been recently
applied to several combinatorial optimization problems in its zero-temperature
implementation, the max-sum algorithm. In particular, recent developments to
solve the edge-disjoint paths problem and the prize-collecting Steiner tree
problem on graphs have shown remarkable results for several classes of graphs
and for benchmark instances. Here we propose a generalization of these
techniques for two variants of the Steiner trees packing problem where multiple
"interacting" trees have to be sought within a given graph. Depending on the
interaction among trees we distinguish the vertex-disjoint Steiner trees
problem, where trees cannot share nodes, from the edge-disjoint Steiner trees
problem, where edges cannot be shared by trees but nodes can be members of
multiple trees. Several practical problems of huge interest in network design
can be mapped into these two variants, for instance, the physical design of
Very Large Scale Integration (VLSI) chips. The formalism described here relies
on two components edge-variables that allows us to formulate a massage-passing
algorithm for the V-DStP and two algorithms for the E-DStP differing in the
scaling of the computational time with respect to some relevant parameters. We
will show that one of the two formalisms used for the edge-disjoint variant
allow us to map the max-sum update equations into a weighted maximum matching
problem over proper bipartite graphs. We developed a heuristic procedure based
on the max-sum equations that shows excellent performance in synthetic networks
(in particular outperforming standard multi-step greedy procedures by large
margins) and on large benchmark instances of VLSI for which the optimal
solution is known, on which the algorithm found the optimum in two cases and
the gap to optimality was never larger than 4 %
Pairwise MRF Calibration by Perturbation of the Bethe Reference Point
We investigate different ways of generating approximate solutions to the
pairwise Markov random field (MRF) selection problem. We focus mainly on the
inverse Ising problem, but discuss also the somewhat related inverse Gaussian
problem because both types of MRF are suitable for inference tasks with the
belief propagation algorithm (BP) under certain conditions. Our approach
consists in to take a Bethe mean-field solution obtained with a maximum
spanning tree (MST) of pairwise mutual information, referred to as the
\emph{Bethe reference point}, for further perturbation procedures. We consider
three different ways following this idea: in the first one, we select and
calibrate iteratively the optimal links to be added starting from the Bethe
reference point; the second one is based on the observation that the natural
gradient can be computed analytically at the Bethe point; in the third one,
assuming no local field and using low temperature expansion we develop a dual
loop joint model based on a well chosen fundamental cycle basis. We indeed
identify a subclass of planar models, which we refer to as \emph{Bethe-dual
graph models}, having possibly many loops, but characterized by a singly
connected dual factor graph, for which the partition function and the linear
response can be computed exactly in respectively O(N) and operations,
thanks to a dual weight propagation (DWP) message passing procedure that we set
up. When restricted to this subclass of models, the inverse Ising problem being
convex, becomes tractable at any temperature. Experimental tests on various
datasets with refined or regularization procedures indicate that
these approaches may be competitive and useful alternatives to existing ones.Comment: 54 pages, 8 figure. section 5 and refs added in V
An analysis of chaining in multi-label classification
The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minimization perspective, thereby helping to gain a better understanding of this approach. As a main result, we clarify that the original chaining method seeks to approximate the joint mode of the conditional distribution of label vectors in a greedy manner. As a result of a theoretical regret analysis, we conclude that this approach can perform quite poorly in terms of subset 0/1 loss. Therefore, we present an enhanced inference procedure for which the worst-case regret can be upper-bounded far more tightly. In addition, we show that a probabilistic variant of chaining, which can be utilized for any loss function, becomes tractable by using Monte Carlo sampling. Finally, we present experimental results confirming the validity of our theoretical findings
- …