Neural combinatorial optimization as an enabler technology to design real-time virtual network function placement decision systems

Abstract

158 p.The Fifth Generation of the mobile network (5G) represents a breakthrough technology for thetelecommunications industry. 5G provides a unified infrastructure capable of integrating over thesame physical network heterogeneous services with different requirements. This is achieved thanksto the recent advances in network virtualization, specifically in Network Function Virtualization(NFV) and Software Defining Networks (SDN) technologies. This cloud-based architecture not onlybrings new possibilities to vertical sectors but also entails new challenges that have to be solvedaccordingly. In this sense, it enables to automate operations within the infrastructure, allowing toperform network optimization at operational time (e.g., spectrum optimization, service optimization,traffic optimization). Nevertheless, designing optimization algorithms for this purpose entails somedifficulties. Solving the underlying Combinatorial Optimization (CO) problems that these problemspresent is usually intractable due to their NP-Hard nature. In addition, solutions to these problems arerequired in close to real-time due to the tight time requirements on this dynamic environment. Forthis reason, handwritten heuristic algorithms have been widely used in the literature for achievingfast approximate solutions on this context.However, particularizing heuristics to address CO problems can be a daunting task that requiresexpertise. The ability to automate this resolution processes would be of utmost importance forachieving an intelligent network orchestration. In this sense, Artificial Intelligence (AI) is envisionedas the key technology for autonomously inferring intelligent solutions to these problems. Combining AI with network virtualization can truly transform this industry. Particularly, this Thesis aims at using Neural Combinatorial Optimization (NCO) for inferring endsolutions on CO problems. NCO has proven to be able to learn near optimal solutions on classicalcombinatorial problems (e.g., the Traveler Salesman Problem (TSP), Bin Packing Problem (BPP),Vehicle Routing Problem (VRP)). Specifically, NCO relies on Reinforcement Learning (RL) toestimate a Neural Network (NN) model that describes the relation between the space of instances ofthe problem and the solutions for each of them. In other words, this model for a new instance is ableto infer a solution generalizing from the problem space where it has been trained. To this end, duringthe learning process the model takes instances from the learning space, and uses the reward obtainedfrom evaluating the solution to improve its accuracy.The work here presented, contributes to the NCO theory in two main directions. First, this workargues that the performance obtained by sequence-to-sequence models used for NCO in the literatureis improved presenting combinatorial problems as Constrained Markov Decision Processes (CMDP).Such property can be exploited for building a Markovian model that constructs solutionsincrementally based on interactions with the problem. And second, this formulation enables toaddress general constrained combinatorial problems under this framework. In this context, the modelin addition to the reward signal, relies on penalty signals generated from constraint dissatisfactionthat direct the model toward a competitive policy even in highly constrained environments. Thisstrategy allows to extend the number of problems that can be addressed using this technology.The presented approach is validated in the scope of intelligent network management, specifically inthe Virtual Network Function (VNF) placement problem. This problem consists of efficientlymapping a set of network service requests on top of the physical network infrastructure. Particularly,we seek to obtain the optimal placement for a network service chain considering the state of thevirtual environment, so that a specific resource objective is accomplished, in this case theminimization of the overall power consumption. Conducted experiments prove the capability of theproposal for learning competitive solutions when compared to classical heuristic, metaheuristic, andConstraint Programming (CP) solvers

    Similar works