Combinatorial optimization problems require an exhaustive search to find the
optimal solution. A convenient approach to solving combinatorial optimization
tasks in the form of Mixed Integer Linear Programs is Branch-and-Bound.
Branch-and-Bound solver splits a task into two parts dividing the domain of an
integer variable, then it solves them recursively, producing a tree of nested
sub-tasks. The efficiency of the solver depends on the branchning heuristic
used to select a variable for splitting. In the present work, we propose a
reinforcement learning method that can efficiently learn the branching
heuristic. We view the variable selection task as a tree Markov Decision
Process, prove that the Bellman operator adapted for the tree Markov Decision
Process is contracting in mean, and propose a modified learning objective for
the reinforcement learning agent. Our agent requires less training data and
produces smaller trees compared to previous reinforcement learning methods.Comment: Submitted to NeurIPS 202