We tackle the Online 3D Bin Packing Problem, a challenging yet practically
useful variant of the classical Bin Packing Problem. In this problem, the items
are delivered to the agent without informing the full sequence information.
Agent must directly pack these items into the target bin stably without
changing their arrival order, and no further adjustment is permitted. Online
3D-BPP can be naturally formulated as Markov Decision Process (MDP). We adopt
deep reinforcement learning, in particular, the on-policy actor-critic
framework, to solve this MDP with constrained action space. To learn a
practically feasible packing policy, we propose three critical designs. First,
we propose an online analysis of packing stability based on a novel stacking
tree. It attains a high analysis accuracy while reducing the computational
complexity from O(N2) to O(NlogN), making it especially suited for RL
training. Second, we propose a decoupled packing policy learning for different
dimensions of placement which enables high-resolution spatial discretization
and hence high packing precision. Third, we introduce a reward function that
dictates the robot to place items in a far-to-near order and therefore
simplifies the collision avoidance in movement planning of the robotic arm.
Furthermore, we provide a comprehensive discussion on several key implemental
issues. The extensive evaluation demonstrates that our learned policy
outperforms the state-of-the-art methods significantly and is practically
usable for real-world applications.Comment: Science China Information Science