Heavy goods vehicles are vital backbones of the supply chain delivery system
but also contribute significantly to carbon emissions with only 60% loading
efficiency in the United Kingdom. Collaborative vehicle routing has been
proposed as a solution to increase efficiency, but challenges remain to make
this a possibility. One key challenge is the efficient computation of viable
solutions for co-loading and routing. Current operations research methods
suffer from non-linear scaling with increasing problem size and are therefore
bound to limited geographic areas to compute results in time for day-to-day
operations. This only allows for local optima in routing and leaves global
optimisation potential untouched. We develop a reinforcement learning model to
solve the three-dimensional loading capacitated vehicle routing problem in
approximately linear time. While this problem has been studied extensively in
operations research, no publications on solving it with reinforcement learning
exist. We demonstrate the favourable scaling of our reinforcement learning
model and benchmark our routing performance against state-of-the-art methods.
The model performs within an average gap of 3.83% to 8.10% compared to
established methods. Our model not only represents a promising first step
towards large-scale logistics optimisation with reinforcement learning but also
lays the foundation for this research stream