VNF-AAPC : accelerator-aware VNF placement and chaining by Sharma, Gourav Prateek et al.
VNF-AAPC: Accelerator-aware VNF Placement and Chaining1
Gourav Prateek Sharma, Wouter Tavernier, Didier Colle, Mario Pickavet2
Email: gouravprateek.sharma@ugent.be3
IDLab, Department of Information Technology4
Ghent University - IMEC5
Technologiepark-Zwijnaarde 126, 9052 Gent6
1 Abstract7
In recent years, telecom operators have been migrating towards network architectures based on Network Function8
Virtualization in order to reduce their high Capital Expenditure (CAPEX) and Operational Expenditure (OPEX).9
However, virtualization of some network functions is accompanied by a significant degradation of Virtual Network10
Function (VNF) performance in terms of their throughput or energy consumption. To address these challenges,11
use of hardware-accelerators, e.g. FPGAs, GPUs, to offload CPU-intensive operations from performance-critical12
VNFs has been proposed.13
Allocation of NFV infrastructure (NFVi) resources for VNF placement and chaining (VNF-PC) has been a major14
area of research recently. A variety of resources allocation models have been proposed to achieve various operator’s15
objectives i.e. minimizing CAPEX, OPEX, latency, etc. However, the VNF-PC resource allocation problem for the16
case when NFVi incorporates hardware-accelerators remains unaddressed. Ignoring hardware-accelerators in NFVi17
while performing resource allocation for VNF-chains can nullify the advantages resulting from the use of hardware-18
accelerators. Therefore, accurate models and techniques for the accelerator-aware VNF-PC (VNF-AAPC) are19
needed in order to achieve the overall efficient utilization of all NFVi resources including hardware-accelerators.20
This paper investigates the problem of VNF-AAPC, i.e., how to allocate usual NFVi resources along-with hardware-21
accelerators to VNF-chains in a cost-efficient manner. Particularly, we propose two methods to tackle the VNF-22
AAPC problem. The first approach is based on Integer Linear Programming (ILP) which jointly optimizes VNF23
placement, chaining and accelerator allocation while concurring to all NFVi constraints. The second approach is24
a heuristic-based method that addresses the scalability issue of the ILP approach. The heuristic addresses the25
VNF-AAPC problem by following a two-step algorithm.26
The experimental evaluations indicate that incorporating accelerator-awareness in VNF-PC strategies can help27
operators to achieve additional cost-savings from the efficient allocation of hardware-accelerator resources.28
2 Keywords29
Hardware-Accelerators; NFV; VNF; Placement; Chaining; Allocation; FPGA; GPU30
3 Introduction31
The incessant expansion in the number of connected users and network-services has resulted in exponential growth32
of traffic on the networks of telecom-operators. Telecom infrastructure thus needs to be scaled periodically to cope33
with the increasing traffic demands which result in high Capital Expenditure (CAPEX) and Operational Expen-34
diture (OPEX). However, the growth in Average Revenue Per User (ARPU) has been very marginal due to the35
cut-throat competition among the operators. As a result, operators are forced to seek new network architectures36
that are scalable, agile and cost-efficient [1].37
Network Function Virtualization (NFV) is a technology which leverages IT virtualization techniques for consoli-38
dating network appliances onto commercial-off-the-shelf (COTS) server machines. NFV aims to replace Network39
Functions (NFs) based on proprietary ASICs, also known as middleboxes, by their software instances running on40
1
the general-purpose platforms consisting of x86 or ARM based high-volume servers (HVS). The software imple-41
mentation of a NF running in a virtualized environment is called Virtual Network Function (VNF). Fig 1 shows the42
reference architecture of NFV as proposed by ETSI [2]. The purpose of the virtualization layer is to abstract the43
NFV Infrastructure (NFVi), which includes the compute, storage and networking resources, from VNFs running44
over it. Various virtualization technologies e.g. VMs, containers, are exploited for the realization of the virtualiza-45
tion layer.46


















Service, VNF, infra. description
Se-Ma
Os-Ma








Figure 1: Reference NFV architecture by ETSI [2]. Components shown in blue needs to be added/updated as a
result of inclusion of hardware-acceleration in NFVi.
47
vantages, such as the reduction in CAPEX and OPEX, faster time-to-market (TTM) of services, ease of service48
management and upgrade, etc [1]. Although, NFV offers several advantages, replacing NFs middleboxes with49
VNFs can have a detrimental effect on their packet-processing performance, e.g. loss of throughput and/or un-50
deterministic latency. Furthermore, the growth in the computational capacity of CPUs is flattening with the time51
due to an expected end of Dennard’s scaling and Moore’s law in the coming years [3]. Performance improvement52
of software-based packet-processing platforms is expected to fall short as compared to the increasing data traffic53
on telecom networks. Therefore, matching the performance of middleboxes will be one of the key challenges faced54
by operators in the future too with regards to the widespread NFV adoption. This challenge has led to a recent55
interest in hardware-acceleration techniques for VNFs using externally connected hardware devices e.g. Graphics56
Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), Network Processing Units (NPUs), etc.57
Hardware-accelerators and CPUs can be used in conjunction such that CPU-intensive tasks can be offloaded from58
VNFs to hardware-accelerators and the rest of the VNF operations can be performed by the CPU of general-purpose59
hardware (COTS servers). As a consequence, an improvement in the overall packet-processing performance can be60
achieved.61
Due to the upward trend of outsourcing network processing to the cloud, data centers (DCs) are being considered62
as NFVi. The share of energy costs in a DC, which includes the cost of energy spent in servers, switches and cooling63
of DCs, mainly constitute the OPEX cost. A large number of VNF CPU cycles are consumed in packet-processing64
tasks which otherwise consume a fraction of energy if implemented in the hardware. For example, using hardware-65
acceleration to offload iFFT/FFT in cloud-RAN (C-RAN) scenarios to FPGAs, GPUs or DSP can result in power66
saving by about 70% per carrier [4]. As a consequence, additional VNFs can be accommodated on the same NFVi67
as some CPU cores are freed because of the offload to hardware-accelerators.68
Accelerators resources are being increasingly integrated with the NFVi layer along with the usual compute, network69
and storage (Fig. 1). However, the current Management and Orchestration (MANO) layer is mostly unaware of the70
acceleration requirements of VNFs and the location of hardware-accelerators in NFVi. As a result, the resource al-71
location decisions taken by the orchestrator are agnostic to VNF requirements and locality of hardware-accelerator72
2
resources. This could lead to sub-optimal utilization of NFVi resources. Particularly, the inefficient allocation of73
hardware-accelerator resources can negate the advantages resulting from the use of hardware-accelerators in NFV74
environments.75
The overview of the accelerator-agnostic and accelerator-aware resource allocation procedure for VNF instantiation76
is depicted in Fig. 2 (a) and (b) respectively [2]. For the regular accelerator-agnostic VNF orchestration proce-77
dure (Fig. 2 (a)), NFV Orchestrator (NFVO) first validates the received VNF instantiation request and passes78
the corresponding VNF descriptor (VNFD) to the VNF Manager (VNFM). As VNFM is agnostic to accelera-79
tor requirement of the VNF or existence of any offload capability in NFVi, it requests the reservation of regular80
NFVi resources (compute, storage, and network) via Virtual Infrastructure Manager (VIM) which in turn allocates81
VMs/containers for the VNF and attach them to the network. VIM acknowledges NFVO when the resource reser-82
vation is complete. Further, deployment-specific configuration of VMs/containers can be performed through the83
corresponding VNFM after the VNF instantiation is completed. The instantiated VNF cannot offload its opera-84
tions to a hardware-accelerator as it is not allocated any such special resource. However, NFVO can ask VIM to85
reserve hardware-accelerator resources for the VNF if it is aware of specific VNF requirements and the presence of86
offload capabilities in NFVi as shown in Fig. 2 (b). After processing the accelerator requirement mentioned in the87
VNFD, the VNFM requests resource allocation including hardware-accelerator resources. The instantiated VNF88
can now offload specific operations depending on the available types of accelerator implementations and amount of89
resources.90






































Figure 2: Processes involved in (a) accelerator-agnostic and (b) accelerator-aware VNF instantiation.
91
in the existing resource allocation models for NFV. VNF Placement and Chaining (VNF-PC) is the most important92
component of the NFV resource allocation procedure. VNF-PC is considered as NP-hard problem and has been93
widely topic researched in the literature [5]. With the inclusion of hardware-accelerator resources in NFVi, solving94
only the VNF-PC problem is not sufficient to obtain efficient allocation of NFVi resources. The VNF-PC problem95
needs to be altered in order to incorporate the resource allocation component for hardware-accelerators. We refer96
to this new problem as the Accelerator-aware VNF Placement and Chaining Problem (VNF-AAPC). Our objective97
in this paper is to model the VNF-AAPC problem and propose a scalable approach to solve this problem in a98
time-efficient manner. In order to address the above-mentioned objective we make the following contributions in99
this paper:100
1. To obtain optimal solutions for the VNF-AAPC problem, we present an Integer Linear Program (ILP)101
formulation of this problem. It is a single-step exact method which jointly optimizes three decisions, namely–102
(i) VNF placement and (ii) chaining and (iii) accelerator allocation.103
2. We design an efficient heuristic to solve the VNF-AAPC problem for DC topologies. This heuristic is particu-104
3
larly useful for large-size instances of the VNF-AAPC problem where ILP model becomes too time-consuming105
to solve.106
3. We also evaluate the ILP model and the proposed heuristic on two different data-center topologies. Further-107
more, we compare the performance of accelerator-agnostic and accelerator-aware heuristics. Additionally, we108
present an analysis on the achievable cost-savings resulting from the use of hardware-accelerators in NFVi.109
Section 4 of this paper deals with the discussion about hardware-acceleration in NFV environments. Relevant110
literature in the domain of NFV resource allocation is presented in Section 5. Section 6 describes the ILP formulation111
of the VNF-AAPC problem. The proposed heuristic to solve the VNF-AAPC problem is discussed in section 8.112
Performance evaluation results and comparison of the ILP model with our heuristic is reported in Section 9. Finally,113
future work and conclusions of this paper are presented in section 10.114
4 Hardware-acceleration in NFV115
The transition of telecom’s network architectures from the purpose-built network appliances to VNFs running on116
COTS servers still face multiple challenges [1], [6]. One of the key obstacles is the virtualization of all NFs without117
breaking the Service Level Agreements (SLAs) of network services. However, it has been observed in many instances118
that the performance of VNFs is significantly degraded as compared to their hardware counterpart [7]. Authors119
in [7] investigated the impact of virtualizing firewall on its packet-processing performance. Processing latency in120
the virtual firewall could even reach ten times the processing latency in case of the hardware firewall. Performance121
bench-marking of IPSec is reported in a white paper by Intel in [8]. The results show that the processing of 48Gbps122
IPSec traffic requires on an average 9.5 CPU cores. The same traffic can, however, be processed using 4.6 CPU123
cores when accelerating AES-GCM de/encryption using a hardware-accelerator resulting in saving of about half of124
the CPU cores. Therefore, not just the performance boost of VNFs but also overall reduction in CPU utilization125
paves the way of hardware-accelerators in NFV environments.126
A large number of VNFs involve CPU-intensive tasks like de-duplication, cryptography, compression, etc [8], [9],127
[10], [11], [12], [13], [14]. The software implementation of these tasks has been found to be very energy inefficient128
(numbers of operations performed/energy consumed) as compared to their hardware implementation resulting in129
excessive CPU utilization. The motivation behind using hardware-accelerators in NFV environments is that specific130
VNF components run more efficiently if implemented in hardware as opposed to a software running on a CPU of131
a general-purpose COTS servers. For example, GPUs have been used to speedup video-transcoding applications132
(H.264 and H.265) by 9.6x over software-only solution while being 6.4x more energy efficient [13].133
Packet-processing in a VNF is usually accomplished by sequentially executing instructions of the VNF (VM/container)134
on one or more CPU cores. The packet-processing paradigm in architectures like FPGAs, NPUs, and GPUs is135
fundamentally different from that of a CPU. A GPU chip consists of thousands of computational cores that can be136
delegated execution units which are also known as GPU threads. Each GPU core executes the same NF on different137
packets sent by the CPU in the GPU memory [15]. With the large thread-level parallelism of GPUs and a good138
memory communication (low latency and high bandwidth), high packet-processing performance can be achieved.139
GPUNFV is a GPU-based NFV system which demonstrated line-rate packet-processing for stateful VNFs (e.g.140
flow monitor, firewall) by exploiting parallelism of GPUs [16]. FPGAs, on the other hand, contain millions of logic141
elements each of which contains lookup tables (LUTs) for implementing combinational logic and registers to store142
intermediary results. FPGAs also contain Block RAM (BRAM) to store a large amount of data which needs to143
be read (written) from (to) the main memory (RAM). Logic elements on an FPGA can be configured to realize144
different packet-processing functionalities. The parallelism in CPUs and GPUs is limited to the number of cores it145
has. Due to the massive amount of parallelism available on an FPGA at the gate-level, many processing tasks can146
be easily pipe-lined [17]. As a result, packet-processing tasks in VNFs can be offloaded to an FPGA very efficiently.147
Hardware-acceleration can be applied to a variety of VNFs that can benefit from the different kinds of parallelism148
available on hardware-accelerators. Table 1 lists various VNFs alongside their sub-tasks that can benefit from of-149
floading to hardware-accelerators. VNFs which contain components like cryptography, compression, de/encoding,150
etc, can be very efficiently offloaded to hardware-accelerators. Using an example of IPSec VNF, we next describe151
the most common approach of hardware-acceleration in NFV, i.e., FPGA look-aside acceleration.152
153
4









[8], [9], [10], [11] CPU usage reduction of 50% and 94%




[11], [12] 20x throughput improvement [12].
Media
Transcoding
VP8, H.264, H.256 [13] 9.6x gain in performance (FPS)




[4], [14] C-RAN power consumption reduction
from 70W/carrier to 18W/carrier when
i/FFT are offloaded. Turbo decoding
time can be reduced by 50-60% by of-
floading it to a accelerator.
Dedup Rabin hash, marker
selection, chunk
hash
[12] 8.2x improvement in throughput over
software-only Dedup VNF.
4.1 VNF hardware-acceleration example154
IPSec tunneling is one of the most popular ways of securing inter-network communication between branch-offices155
of an enterprise or LTE networks via encrypted tunnels [18]. Fig 3 (a) shows a standard IPSec setup. At one156
end of the IPSec tunnel, a VM containing IPSec application (e.g. libreswan 1) is running on a server. The IPSec157
VNF must perform all the required cryptographic functions (en/decryption and SHA) on IPSec packets. These158
functions are usually provided by a software library (e.g. SSL) which contains implementations for various ciphers159
(e.g. DES-128, AES-128,256) and hashes (e.g. md5, SHA-256,512). Nowadays, certain CPU architectures (e.g.160
x86 and AMD) offer AES-NI and SHA-NI instructions dedicated for de/encryption and hashing operations which161
results in a better performance as compared to the traditional CPU architectures. Despite this improvement, a162
large number of CPU cores are still required to process the IPSec traffic at the line-rate, e.g., 9.5 CPU cores are163
required to handle IPSec traffic @ 48Gbps [8] as compared to only 3.3 CPU cores for processing of plain IP traffic164
(without IPSec). Moreover, packet-processing cost (CPU cycles/packet) varies with the packet-size which makes165
software-based IPSec solution inefficient for the IPSec packets of longer sizes (> 1200 B).166
Next, we describe the look-aside VNF hardware-acceleration approach taking IPSec as an example. A hardware167
designer typically first writes the required hardware-accelerator (e.g. AES-256, SHA-512) in a Hardware Descriptor168
Language (HDL), e.g. VHDL or Verilog. The HDL design is then compiled to a programming file, called bitfile169
using FPGA synthesis and implementation tools. The bitfile is then used to program the FPGA fabric in order170
to instantiate the desired accelerator function. The accelerator can then be modified or a new accelerator could171
be instantiated by re-programming the FPGA fabric with a bitfile corresponding to the new accelerator. This172
makes FPGAs re-programmable, unlike ASICs which offer a limited amount of configuration. In Fig. 3 (b), the173
AES (encryption and decryption) and SHA hash accelerators are instantiated by downloading their bitfiles to the174
FPGA card. Now, AES-256 de/encryption and SHA-512 hash operations can be offloaded from the IPSec VNF to175
accelerators running on the FPGA card [8]. For each IPSec packet, its payload is sent to the accelerator memory176
in order to perform the required cryptographic functions. After the function computation is over, the result of177
the operation is copied back from the accelerator memory to the main memory. The communication between the178
main memory and accelerators is accomplished via the PCIe bus. The overhead due to communications between179
CPUs and accelerators becomes insignificant for large packet-sizes. Moreover, hybrid chips like Intel Xeon+FPGA180
integrated-FPGA CPUs provide a tight coupling between CPUs and FPGAs thereby both CPUs and FPGAs can181
access the same memory and can avoid excessive overhead due to data transfers between them [19]. Nevertheless,182
many CPU cores are relieved from performing intensive cryptographic operations, thus a large number of CPU183


















Figure 3: Illustration for the setup in (a) non-accelerated and (b) accelerated operation of IPSec VNF.
4.2 Trade-offs186
Although highest programmability and flexibility can be achieved by running VNFs on CPUs (x86 or ARM), NF187
implementations based on technologies like GPUs, FPGAs, NPUs could be necessary for some performance-critical188
VNFs. Therefore, a spectrum of VNF implementation technologies result in a variety of solutions, ranging from189
one end of the highly-flexible and full-software NFs to the other end of the high-performance ASIC implementation190
and hardware-accelerated VNFs situated in between. Authors in [18] proposed an architecture for the unified191
handling and abstraction of hardware-accelerators in order to ease the manageability of accelerators. A virtual192
accelerator layer along with standard interfaces can be used in order to avoid compatibility and portability issues.193
This also helps to separate the concerns of VNF developers and hardware-accelerators designers. By abstracting194
hardware-accelerators, the same VNF image can be used for many hardware-accelerators without any modification.195
Fig. 4 illustrates the comparison between various VNF implementation technologies based on their performance196
and flexibility metrics [18]. A purpose-built ASIC implementation of an NF will offer the highest packet-processing197
performance but a very limited configuration will be possible e.g. update of forwarding tables in a router. On the198
other hand, platforms based on COTS servers offer huge programmability/flexibility, e.g. update of protocols, at199
the cost of performance. Although, devices having intermediate performance and flexibility, e.g. GPUs and FP-200
GAs can also be used to realize full VNFs, however; more complex the packet processing task is, more challenging201
it is to implement on an FPGA or GPU. Hybrid platforms with a combination of CPU + hardware-accelerator202
(CPU+FPGA or CPU+GPU) are the most popular approach to achieving high-performance without losing too203
much programmability/flexibility. In hybrid platforms, the performance-critical tasks, e.g. en/decryption and204
hashing, etc, are implemented in the hardware and other complex tasks are still run in software running on a CPU.205
Keeping into account service requirements and trade-offs of various technologies, telecom operators or third-party206
VNF developers have to select the right platform for their VNF implementation. For example, IPSec VNF running207
on a CPU when offloaded to an FPGA can improve its throughput and halve its CPU usage with a fraction of more208
investment. Due to their widespread popularity in the packet-processing application, we focus only on FPGAs as209
hardware-accelerators for VNFs. However, models and heuristics proposed in this paper could be easily adapted210
for different types of hardware-accelerators depending upon their nature. There are two popular modes of using211
hardware-accelerators in the NFV environments, namely– look-aside and bump-in-the-wire [20]. Look-aside mode212
of hardware-acceleration is generally used to offload compute-intensive algorithms, e.g., offloading crypto-operations213
of IPSec to an FPGA. ”Bump-in-the-wire” (in-line) is other mode where packet processing is done on the fly, e.g.214
on P4 switches or smartNICs, as they are transferred to/from the network. Bump-in-the-wire mode is therefore215
preferred mode to accelerate first/last VNFs of a VNF-chain [3] as accelerating VNFs. In this paper, our focus216
will be on modelling scenarios with look-aside mode of acceleration. However, to accommodate scenarios with217
bump-in-the-wire acceleration, appropriate constraints regarding required position-aware acceleration and latency218
requirements can be added to the proposed model.219
















Figure 4: Comparison of various technologies for VNF implementation [18]. Green region: CPU+GPU and Orange
region: CPU+FPGA.
accelerator cards. The network packets reaching a VNF running on a host are transferred to a particular accel-221
erator instance over the PCIe bus. Offloading packet processing from stateless VNFs to accelerator instances is222
straightforward, as the order of incoming packet is not important. To offload stateful VNFs, where the state of223
a VNF is required to process packets, input packets along with the VNF state are transferred to the accelerator224
instance [16]. The state in the VNF is updated after the completion of processing in the accelerator instance.225
226
5 Related Works227
Various mathematical models and algorithms have been proposed to tackle the VNF-PC problem. The solution to228
VNF-PC problem attempts to allocate NFVi resources for the placement and chaining of VNFs. This problem is229
similar to Virtual Network Embedding (VNE) problem, a well-known problem in the area of network virtualization230
[5]. In VNF-PC problem, VNFs are equivalent to virtual nodes of VNE which are chained by virtual links. In231
addition to that, VNF-PC has an accompanying optimization goal which is described by the objective function232
of the problem. The objective function could be the minimization of power consumption, the required number of233
server-nodes, and links or maximization of resiliency, QoS, net profit, etc.234
VNF-PC problem has been tackled using two different approaches in the past. The first approach is to exploit235
exact methods that result in an optimal solution but this approach is generally useful for small-scale instances of236
VNF-PC problem. Another approach is to solve VNF-PC problem is to use heuristics and thereby compromise a237
small amount of efficiency for the scalability.238
Using the ILP formulation, authors in [21] modeled resource allocation in a hybrid NFV scenario where services239
are provided using both dedicated hardware appliances and VNFs. This model was evaluated using two types of240
service chain requests and a small service provider scenario. A Mixed Integer Quadratically Constrained Program241
(MIQCP) model for VNF-PC optimization problem was introduced in [22]. Pareto set analysis was performed242
to investigate the trade-offs between three different objective functions. The evaluation of the model shows the243
objective function (e.g. minimization of latency, link utilization or allocated nodes) has a direct impact on the244
VNF placement and chaining. Authors in [23] formulated the multi-objective VNF-PC problem considering both245
legacy Traffic Engineering (TE) ISP goals and combined TE-NFV goals.246
Because of the inherent complexity of VNF-PC problem, exact approaches based on ILP/MILP become impractical247
for realistic network sizes. Therefore, many heuristic-based algorithms have also been proposed to solve this problem248
in a reasonable time.249
The problem of Elastic VNF Placement(EVNFP) was studied in [24] and an ILP model was presented for minimizing250
operational costs in NFV scenarios. Authors also developed an algorithm called Simple Lazy Facility Location251
(SLFL) in order to solve EVNFP problem in polynomial time. Evaluations show that SLFL reduced operational252
costs by 5-8% and also increased the request acceptance rate by 2x as compared to the first-fit and random253
alternatives.254
7
S. Sahhaf et al. studied the decomposition and embedding of network services in [25]. An ILP model was proposed255
whose objective was to minimize the total cost due to the mapping of different decomposed VNF components256
(e.g. VM, container, DPDK) to the physical nodes in NFVi. A heuristic algorithm, consisting of two phases–257
backtracking and mapping, was also proposed. The experimental results show a decrease in mapping cost and an258
increase in the request acceptance ratio in the long run for both ILP and heuristic approaches.259
F. Carpio et al. studied the problem of network load balancing for the deployment of Service Function Chains260
(SFCs) [26]. In particular, the authors addressed the problem of distance-to-data center by the use of VNF replicas261
in order to load balance the network. Three approaches– ILP model, Genetic Algorithm, and random fit placement262
algorithm were designed and compared to realize efficient VNF placement and replication method in an NFV263
environment.264
Although a lot of resource allocation studies have been carried out in the past, only two studies have considered265
hardware-accelerator in their models. H. Fan et. al. proposed an architecture to implement uniform deployment266
and allocation of accelerator resources in NFV environments [27]. The authors proposed an algorithm to achieve267
efficient allotment of accelerator resources in forwarding and server nodes. Algorithms take as an input the network268
topology and capacity of physical resources and output the amount of accelerator resources that should be provided269
on forwarding or server nodes. This study concerns the optimization of accelerators resource provisioning not with270
the optimization of accelerator allocation to VNFs.271
The concept of heterogeneous components has been described in [28]. A heterogeneous service consists of multiple272
implementation options that could be deployed to serve the dynamic requirements of the service. The paper studied273
the problem of joint Scaling, Placement and Routing (SPRING) for heterogeneous services. To address the SPRING274
problem, a MILP formulation and a heuristic algorithm were proposed. The SPRING model focuses on efficient275
resource allocation in heterogeneous infrastructure with lower processing times. This paper does not consider the276
distribution of hardware-accelerator resources in a data-center. Furthermore, VNF-PC decision did not take into277
account the communication between the hardware-accelerator and the CPU on a server-node.278
This work is a major extension to our previous work where we modeled only VNF placement in a heterogeneous279
NFV environment [29] using a best-fit based approach. Here, we address the complete problem of accelerator-aware280
VNF placement and chaining along with a thorough evaluation of the ILP model and heuristics.281
6 Problem Overview282
Services in the NFV domain are realized by processing network traffic through a sequence of VNFs. In order to283
fully exploit the benefits of NFV technology, it is necessary to efficiently allocate NFVi resources to VNF-chains.284
Resource allocation requires a mapping of the service’s VNF Forwarding Graph (VNF-FG) to NFVi resources [5].285
A VNF-FG consists of nodes representing VNFs and edges stand for virtual links between VNFs. Therefore, the286
mapping process can be thought of as a two steps process, namely (i) VNF placement and (ii) VNF Chaining. “VNF287
placement” involves the assignment of VNFs to COTS servers, whereas “VNF chaining” step involves allocation of288
a path in the physical network to every virtual link of VNF-FG. “VNF chaining” ensures the appropriate steering289
of network traffic through the sequence of VNFs constituting the service. Together this problem is referred to as290
VNF placement and chaining (VNF-PC) problem.291
In addition to the usual compute, network and storage resources, NFVi also includes hardware-accelerator resources.292
With the inclusion of hardware-accelerators in NFVi, VNF-PC models must be revised. In order to ensure efficient293
utilization of all NFVi resources, both placement and chaining decision should take into account the accelerator294
resources (e.g. total logic elements and BRAM of FPGAs, cores/threads of GPUs) along with the usual NFVi295
resources, i.e. compute, storage and network. This problem will be referred to as accelerator-aware VNF placement296
and chaining (VNF-AAPC) problem.297
We motivate the importance of modeling the VNF-AAPC problem by a simple example illustrated in Fig 5. As an298
input, NFVi consists of five server-node each with 5 CPU cores and connected with each other as shown in Fig 5.299
One of the server-node is equipped with a hardware-accelerator card connected over the PCIe bus. The objective of300
VNF-PC problem is to deploy VNF-chains s1 and s2 using as few server-nodes as possible. The CPU requirements301
of all VNFs is indicated in the boxes above each VNF. VNF f12 is an ’accelerate-able’ VNF, i.e., it consumes 4302
CPU units when it is not accelerated and 2 CPU units when it is able to offload its operations to an accelerator303
on a hardware-accelerator card. For the sake of simplicity, we assume sufficient bandwidth is available on physical304
links for the chaining of VNFs. The result of the usual (accelerator-agnostic) VNF placement method, where only305
CPU resources are considered, is shown in Fig. 5 (a). In total, five server-nodes are required for the deployment of306
s1 and s2. With the accelerator-aware strategy, however, only four server nodes are required for the placement of307
same VNF-chains as shown in Fig 5 (b). This is because the VNF f12 is deployed on a server-node attached with308
8











































Figure 5: Illustration comparing VNF placement in accelerator-agnostic and accelerator-aware VNF placement
scenarios. The CPU requirement of each VNF is indicated in the box above it.
310
7 ILP Formulation311
Next, we introduce notations, decision variables, objective function and constraints required for the ILP formulation312
of the VNF-AAPC problem. ILP model for the VNF-AAPC problem provides a single-step method for obtaining313
optimal resource allocation.314
Table 2 gives the description of the notations used in the formulation. NFVi network is represented by a connected315
directed graph G = (N,E). Set N consists of all the physical nodes in the NFVi network and E represents all the316
physical links between nodes. A node can be a computational device (e.g. COTS server) or a forwarding device317
(e.g. switch). N c ⊂ N denotes a set of all COTS servers having computational resources required to run VNFs.318
The capacity of different resource types in a node n ∈ N c is denoted by the following three parameters: Rcpu(n),319
Rbus(n) and Racc(n). Rcpu(n) denotes the total number of CPU cores available for running VNFs on node n.320
IO communication capacity of a node is dependent on the bandwidth (Mbps) of the PCI(e) bus which is denoted321
by Rbus(n). The same PCI(e) bus is shared for two tasks. First, for communication with accelerators and secondly322
for sending/receiving packets to/from the network using NIC card.323
The amount of resources present on the hardware-accelerator card attached to server-node n is represented by324
Racc(n). We use Racc(n) only to denote the total number of logic elements present on an FPGA board. However,325
other resources like the amount of BRAM available on an FPGA can also be represented similarly.326
A is a catalog of all types of accelerator implementations available for instantiation on the hardware-accelerator327
cards attached to server-nodes. For example, if FPGA bitfile implementations for only AES and SHA accelerators328
are available, i.e. A = {AES, SHA}, en/decryption (AES) or hashing (SHA) tasks from an IPSec VNF can be329
offloaded to AES and SHA accelerators running on an FPGA card. However, tasks like en/decoding involved in330
the vTC (video transcoding) VNF cannot be offloaded using any accelerator implementation present in A.331
Each implementation of an accelerator type a ∈ A requires a certain amount of resources, represented by r(a),332
on the hardware-accelerator card. Again, r(a) can be used to represent requirement of any type of resource on a333
hardware-accelerator card. In our formulation, r(a) denotes only the required number of logic elements to imple-334
ment an accelerator of type a on an FPGA card.335
We assume each service-request s received by the telecom operator consists of a VNF-FG Gs and corresponding336
bandwidth requirement (ts). The VNF-FG Gs = (Fs,Ls) of the service-request s consists of a set of VNFs Fs and337
9
a set of virtual links between VNFs denoted by Ls. Two consecutive VNFs of the service-request s denoted by fsk338




k+1) ∈ Ls. For the sake of simplicity, we assume the traffic compression339
ratio of every VNF is 1. This implies that the amount of traffic (ts) doesn’t change while passing through a sequence340
of VNFs.341
CPU requirement for a VNF fs ∈ Fs, in terms of the total number of cores required, is denoted by cpu0(fs) and342
the reduction in the total number of cores due to offloading is denoted by cpur(f
s). In other words, cpu0(f
s)343
denotes the number of CPU cores required by the VNF fs to process the network traffic coming at the rate ts.344
The type of accelerator required for offloading a VNF fs is denoted by atype(fs).345
αnfs is a binary variable used to indicate if VNF f of the service-request s is placed on node n. Allocation of an346
accelerator to VNF fs placed on node n is indicated by βnfs . A computational node n ∈ N c is said to be in-use if347
at least one VNF is placed on n. This is denoted by a binary variable xn. Instantiation of an accelerator of type a348
on the hardware-accelerator card attached to n is indicated by a binary variable δna .349





is an indicator variable which denotes if the virtual link (fsk , f
s
k+1) ∈ Ls mapping to a350
path in G contains physical-link (ni, nj) or not.351
In a scenario when a telecom operator leases server-nodes from an Infrastructure Provider (InP) to deploy VNF-352
chains, she ought to acquire a minimum number of server-nodes as possible. The cost of a server-node is included353
to the total cost if that node is used to host at least one VNF. The cost of using a computational node n ∈ N c in354
is denoted by Cn (in $). Usually, parameters like Rcpu(n), Rbus(n) and Racc(n) determine the value of cn.355
356
Table 2: Description of parameters and decision variables
Input parameters
Notation Description
G Directed graph G = (N,E) represents the network.
N Set of all forwarding and computational nodes within the network.
N c Set N c ⊂ N contains all nodes of the network with positive computational
resources (all server-nodes).
b(ni, nj) Maximum bandwidth (in Mbps) of a physical-link (ni, nj) ∈ E. **
Rcpu(n) Maximum CPU resources (in total number of CPU cores) available on n ∈ N c.
Racc(n) Maximum accelerator-fabric resources (in total number of logic elements) avail-
able on n ∈ N c.
Rbus(n) Maximum bandwidth (in Mbps) of the PCIe bus of node n ∈ N c.
A Set of all available accelerator types (in NFVI).
r(a) Resource requirement (logic elements) of the accelerator type a ∈ A.
S Set of all VNF-chains.
Gs Directed graph Gs = (Fs,Ls) represents VNF-FG of request s ∈ S.
Fs Set of all VNFs in VNF-FG of the VNF-chain s ∈ S.
Ls Set of all directed virtual links in the VNF-FG of the VNF-chain s ∈ S.
ts Throughput requirement (Mbps) of the VNF-chain s ∈ S.
cpu0(f
s) CPU requirement (cores) of VNF f ∈ Fs .
cpur(f
s) CPU reduction (cores) for VNF f ∈ Fs.
atype(fs) Type of accelerator needed for acceleration of VNF fFs.
cn Cost ($) of running a computational node n ∈ N c.
Decision variables
Notation Description
αnfs Binary variable indicates if VNF f
s of VNF-chain s is placed on n.
βnfs Binary variable indicates if VNF f
s of VNF-chain s is accelerated on n.
xn Binary variable indicates if computational node n ∈ N c is used for hosting
at-least one VNF.






Binary variable indicates if the virtual link (fsk , f
s
k+1) mapping to a path in the
physical-network contains the physical-link (ni, nj), (ni, nj) ∈ E.
Next, we discuss the objective function and constraints describing the ILP model for the accelerator-aware VNF357
placement and chaining problem.358
10
7.1 Objective359
The objective (1) of our ILP formulation is to minimize the total cost incurred to the operator from the use of360
server-nodes, some of which are attached to a hardware-accelerator card. The decision variable xn is used to de-361









We classify all the constraints in four categories: (i) Physical node constraints, (ii) Link Mapping constraints, (iii)365
Accelerator Constraints and (iv) Auxiliary Constraints, which are explained as follows.366
7.2.1 Physical Node Constraints367
The sum of effective CPU usage of all VNFs placed on any node should not surpass its maximum CPU capacity.368
This constraint in depicted in (2).369
The constraint in (3) indicates the finite availability of resources on the hardware-accelerator card for the instan-370
tiation of accelerators.371
The rate of communication between VNFs and accelerators instantiated on the hardware-accelerator card is bounded372
by the maximum bandwidth of the PCIe bus, as indicated in (4). The first term in the LHS of (4) is the bus band-373
width consumption due to the traffic between neighboring VNFs. First, summation over the traffic coming from374
VNFs (fsk) placed on server-node ni to its neighboring VNFs (f
s
k+1) placed on nj is carried out and a factor of two375
is there to represent the traffic both coming to and from the VNFs running on server-node ni. The term 2tsβ
ni
fs376





s)− βnfscpur(fs) ≤ Rcpu(n) ∀n ∈ N c (2)
∑
a∈A





















fs ≤ Rbus(ni) ∀ni ∈ N
c (4)
7.2.2 Physical link constraints380
The flow-conservation constraint is described in (5). This constraint ensures that a virtual link (fsk , f
s
k+1) is always381
mapped to a physical path in the network. Also, it ensures that for a non-computation node n ∈ N \ N c, the382
net-traffic outflow or inflow is always zero.383
The constraint in (6) guarantees that the sum of bandwidths allocated to virtual links on a physical-link (ni, nj)384

























≤ b(ni, nj) ∀(ni, nj) ∈ E (6)
11
7.2.3 Accelerator constraints386
The constraint in (7) is a consequence of the fact that a VNF fs can be given access to an accelerator on a node387
n only if it is placed on it.388
The constraint in (8) ensures that an accelerator of a particular type is instantiated if a non-zero number of VNFs389
are using that accelerator type. This constraint is easily linearized by replacing it with a pair of constraints indicated390
in (9a-9b). M1 (big M) in constraint (9b) is a constant with a value greater than the total number of VNFs f
s in391
all the service-chain requests s ∈ S.392



















βnfs ≤M1δna ∀n ∈ N, ∀a ∈ A (9b)
7.2.4 Auxiliary Constraints394
The set of constraints in this subsection restrict the value of decision variables xn, yni,nj , α
n








A server-node is considered to be running if at least one VNF is mapped onto it, as indicated by the constraint396
in (10). The pair of constraints (11a - 11b) forces xn to be equal to 1 if at least one VNF is placed on node n.397
In constraint 11b, M2 is a constant with a value greater than the total number of VNFs f
s in all service-chain398











αnfs ∀n ∈ N (11a)
400 ∑
s∈S,fs∈Fs
αnfs ≤M2xn ∀n ∈ N (11b)
Each VNF in a service-chain request must be placed only once. This is represented by the constraint in (12).401









can only take binary (0 or 1) values.402
403 ∑
n∈N













∀n ∈ N, ∀(ni, nj) ∈ E,∀s ∈ S, ∀fs ∈ Fs,∀(fsk , fsk+1) ∈ Ls
(13)
The above ILP formulation implements a single-step method to solve the VNF-AAPC problem. For a given NFVi404
graph G and a set of requested service-chain requests S, the above ILP formulation not only given an optimum405





solution but also gives an optimum accelerator allocation βnfs for VNFs.406
As the VNF-PC is considered to be an NP-hard problem, it does not scale with the problem size. The accelerator407
awareness further increases its complexity. As a result, the ILP formulation of the VNF-AAPC problem is chal-408
lenging to solve for networks of realistic sizes. In order to address the non-scalability issue with ILP, we propose a409




Next, we describe two heuristic-based algorithms for solving the VNF-PC problem for NFVi containing hardware-413
accelerator resources along with the usual resources. The first heuristic we propose is an accelerator-agnostic414
algorithm which does not take into the account presence of hardware-accelerators in NFVi while performing VNF-415
FG mapping. This algorithm will serve as a baseline for the evaluation of our second algorithm, i.e., accelerator-416
aware VNF-PC heuristic.417
8.1 Accelerator-agnostic VNF-PC heuristic418
Accelerator-agnostic VNF-PC heuristic involves the hierarchical deployment of VNF-chains [30]. Hierarchical de-419
ployment exploits classification of nodes into different levels of DC topologies, e.g. different levels in a leaf-spine420
DC topology are server, rack, and cluster. Starting from the lowest level, i.e. server-node level, VNF-PC is at-421
tempted at each level until the VNF-chain is deployed at a level. Also, previously used server-node is checked for422
the placement of subsequent VNF of a VNF-chain resulting in the localization of VNFs of the same VNF-chain.423
The pseudo-code for the accelerator-agnostic VNF-PC algorithm is described in Alg. 1. The procedure AgPlaceChain424
is called from Alg. 2 in order to map VNF-FG (Gs = (Fs,Ls)) corresponding to all service-requests (line 3) onto425
NFVi. The mapping of each VNF-FG is attempted at different levels of the data-center, e.g., in leaf-spine topology,426
first at node, then at rack, and at last at cluster level (Alg. 2 lines 2-6). Mapping on node level is done by assigning427
NodesSet equal to the set of all server-nodes ∀n ∈ N c. If no one server-node is able to allocate all the VNFs of a428
chain, NodesSet is assigned all nodes per rack. If the mapping of a VNF-chain is not possible to any of the rack,429
NodesSet is assigned all the nodes in the cluster and VNF-PC is attempted again.430
In Alg. 1, for each VNF fs ∈ Fs placement is first tried on the previously used node np. A new node is only431
selected if enough CPU resources aren’t available on np (line 7-15). An attempt for accelerator allocation is done on432
node np (line 16) by invoking procedure AccelVNF. When all VNFs ∀fs ∈ Fs are placed, virtual-links are mapped433
to physical-paths in G using the procedure ChainVNFs. If the placement of any VNF fs ∈ Fs fails or procedure434
ChainVNFs returns False, all resources are updated to their previous values just after the start of the procedure435
AgPlaceChain (lines 23-25).436
The procedure AccelVNF (Alg. 3) checks whether an accelerator can be granted to VNF fs node np. This is done437
by verifying whether enough CPU and bus resources are available on np (line 2). If atype(f
s) is not already instan-438
tiated on the hardware-accelerator card attached to node np, it is checked whether enough accelerator resources439
are available on the card (lines 5-11) to instantiate the accelerator type atype(fs). All the required resources are440
updated accordingly if fs is allocated an accelerator (lines 9-10,13-14) in this procedure.441
The chaining procedure for mapping of virtual links to physical paths is described in Alg. 4. For each virtual link442
(fsk , f
s
k+1) ∈ Ls, all set of shortest paths between two physical nodes hosting fsk and fsk+1 are stored first in P (line443
7). Each path is checked sequentially for its available bandwidth on all of its physical links using the procedure444
bw (lines 9). If a path with enough bandwidth is available, γ[fsk , f
s
k+1] (line 13) along with bus (line 12) and link445
bandwidths (line 11) are updated for every physical link (∀(ni, nj) ∈ p) in the path p. If any virtual link cannot446
be mapped to a physical path, values of resources and variables are reverted to their previous values at the start447
of the procedure (lines 19-22).448
An example illustrating the working of the accelerator-agnostic VNF-PC heuristic is shown in Fig. 6. Consider two449
VNF-chains (s1 : f11 → f12 → f13 → f14, s2 : f21 → f22 → f23) supposed to be deployed on a given NFVi, which450
here is a DC in the leaf-spine topology. The heuristic starts with the chain s1 and first tries its deployment on451
the server-node level. As no server-node can accommodate all VNFs of this VNF-chain, the heuristic moves to the452
next level of the topology, i.e., rack level. Again, no rack has enough resources to host the complete VNF-chain s1.453
Therefore, the heuristic now considers all server-nodes of the cluster and uses rack0 and rack1 for the placement of454
VNF-chain s1. After the placement of all VNFs of the first chain is completed, network bandwidth is then allocated455
to the virtual links of the VNF-chain via the ChainVNFs procedure as shown in Fig. 6. The same process will be456
followed for the deployment of the VNF-chain s2 which is deployed in rack2.457
458
8.2 Accelerator-aware VNF-PC heuristic459
The accelerator-aware VNF-PC heuristic is based on combines hierarchical deployment with segmentation of VNF-460
chains. A VNF-chain is first split at ’accelerate-able’ VNFs, i.e. VNFs which require hardware-accelerators461
∀s,∀fs, atype(fs) ∈ A. VNF-chain deployment is then performed in two phases. In the first phase, VNF placement462





















4 2 2 3






Figure 6: Illustration showing placement and chaining in accelerator-agnostic VNF-PC heuristic on the leaf-spine
topology.
to the remaining VNF-chain segments are mapped to the NFVi using the hierarchical deployment in the second464
phase.465
The procedure for allocation of accelerators to VNFs (PlaceAccelVNFs) is shown in Alg. 5. First, a list of all466
server-nodes attached with a hardware-accelerator card are stored in Na. Accelerate-able VNFs constituting the467
VNF-chain request s are assigned to F sacc. Accelerator allocation is then attempted for every VNF in F
s
acc. A468
list of server-nodes with enough resources and having accelerator atype(fs) already instantiated on its attached469
hardware-accelerator card is stored in Nfs (line 6). Out of all server-nodes in Nfs , a node, a closest node where470
(any) previous VNF of the same VNF-chain request s was placed, is assigned to na (line 8-9). If no node has471
sufficient resources and accelerator of type atype(fs) instantiated on it, a node with highest CPU utilization is then472
selected from Na (line 11). Using the procedure AccelVNF (Alg. 3), placement and accelerator allocation for VNF473
fs is attempted on na (line 13).474
In Alg. 6, each server-node used in previous step is iterated over for the complete mapping of remaining VNF-chain475
segments (lines 4-27). Un-mapped segments of all service-requests are identified for which at least one adjacent476
VNF is placed on node n (lines 5-9). An attempt is then made to map each segment seg in Snseg with as much prox-477
imity to n as possible. The process followed for the mapping of each VNF-FG segment ∀seg ∈ Snseg is similar to the478
one followed in the accelerator-agnostic VNF-PC heuristic (Alg. 2). The mapping is attempted first on nodel level,479
then on the rack level containing node and at last on the whole cluster level using the procedure AgPlaceChain480
(line 14). In addition, newly placed VNF segment seg and its adjacent VNFs, which were previously placed using481
PlaceAccelVNFs, are linked via procedure ChainVNFs (line 19).482
At last, VNF-chain requests which haven’t been yet mapped to NFVi are identified (line 28). Set SR contains483
all those VNF-chain requests s ∈ S which either (i) do not have any VNF with an accelerator implementation484
available in A or (ii) enough resources were not available to allocate accelerator to VNFs during the first step (line485
2). Mapping of all service-requests in SR is attempted in the hierarchical way (lines 31-35) discussed in Alg. 2.486
Again, consider the deployment of two VNF-chains (s1 : f11 → f12 → f13 → f14, s2 : f21 → f22 → f23) on the same487
NFVi topology as shown in Fig. 7. In accelerator-aware PC heuristic, accelerate-able VNFs of two VNF-chains488
are placed in the first phase, so f12 is placed on the first server-node of rack0 which has an attached hardware-489
accelerator card. In the second phase, the heuristic loops over the server-nodes which have VNFs placed on them,490
while determining and placing the remaining segments of VNF-chains. Therefore, deployment of the VNF-chain491
segment f11 and f13 → f14 is then attempted using the same procedure as discussed in accelerator-agnostic heuris-492
tic. After the successful deployment of VNF-chain segments of s1, two segments f11, f13 → f14 are chained to the493
VNF f12 via ChainVNFs procedure. At last, the second VNF-chain (no accelerate-able VNFs) s2 is then deployed494
14
Algorithm 1: Accelerator-agnostic VNF-PC procedure.
1 Procedure AgPlaceChain(NodesSet, α, β, γ, (Fs,Ls)):
2 tries, plc, np ← 0,True, φ;
3 while tries ≤MAX TRIES do
4 for Nodes in NodesSet do
5 α0, β0, Nodes0 ← α, β,Nodes;
6 for fs in Fs do








8 N = {n : ∀n ∈ Nodes,Rcpu(n) ≥ (cpu0(fs)};




13 np ← Random(N )*;
14 end
15 end







23 if ChainVNFs(α, γ, Ls, G) == False or plc == False then





29 tries← tries+ 1;
30 end
31 end
Algorithm 2: Main service-chain allocation procedure.
1 Procedure AllocateChain(G,Nc, racks, cluster,Gs, α, β, γ):
2 for NodesSet in
{
{{n} : n ∈ Nc
}
, racks, clusters} do





in rack1 using the same procedure as followed in the accelerator-agnostic heuristic. It can be observed that the495
accelerator-aware VNF-PC heuristic results in using one less server-node as compared to the accelerator-agnostic496
heuristic for the deployment of VNF-chains s1 and s2.497
498
15
Algorithm 3: VNF acceleration procedure
1 Procedure AccelVNF(fs, np, node accels, ts):





or Rpci(np) < 2ts then
3 return False;
4 else
5 if atype(fs) /∈ node accels[np] then
6 if r(atype(fs)) > Racc(np) then
7 return False;
8 else
9 Racc ← Racc − r(atype(fs));








































Figure 7: Illustration showing placement and chaining in accelerator-aware VNF-PC heuristic on the leaf-spine
topology.
16
Algorithm 4: VNF chaining procedure
1 Procedure ChainVNFs(α, γ, Ls, G):
2 G0(N0, E0)← G(N,E);
3 γ0 ← γ ;
4 for (fsk ,f
s
k+1) in Ls do
5 done← False;
6 if α[fsk ] 6= α[fsk+1] then
7 P ← ShortestPaths(G, α[fsk ], α[fsk+1])*;
8 for p in P do
9 if bw(p) >= ts* then
10 for (ni, nj) in p do
11 b(ni, nj)← b(ni, nj)− ts*;
12 Rbus(ni)←Rbus(ni) - 2ts;








19 if done == False then
20 G(N,E)← G0(N0, E0);







Algorithm 5: Placement procedure for accelerate-able VNFs.
1 Procedure PlaceAccelVNFs(α, β, γ, node accels, S):
2 Na ← {n ∈ Nc : Racc(n) > 0};
3 for s in S do
4 F sacc ← {fs ∈ Fs : atype(fs) ∈ A};
5 for fs in Facc do






7 if Nfs 6= φ then
8 np ← select node from Nfs where previous VNF of the service chain s was placed;




11 na ← select a node from Na with sufficient resources;
12 end
13 if AccelVNF(fs, na, node accels, ts) then




18 used nodes← all used nodes in N ;
19 return used nodes;
20 end
17
Algorithm 6: Accelerator-aware VNF-PC procedure.
1 Procedure AccelAwarePlaceChain(α, β, γ, S):
2 used nodes← PlaceAccelVNFs(α, β, S);
3 Spseg ← {};
4 for n in used nodes do
5 node chains ← {s ∈ S : ∃fs ∈ Fs placed on node};
6 Snseg ← {};
7 for chain in node chains do
8 Snseg ← Snseg∪ {all possible chain segments in chain};
9 end
/* place remaining segments of chains */
10 for seg in Snseg do
11 Gseg ← VNF forwarding sub-graph corresponding to seg;
12 if seg 6⊂ Spseg then
13 for NodesSet in {{node}, rack node, cluster node} do
14 if AgPlaceChain({NodesSet}, α, β, γ, Gseg) then
15 fsegl ← leftmost VNF of seg;
16 faccl ← VNF which needs to be linked with the leftmost VNF of seg;
17 fsegr ← rightmost VNF of seg;
18 faccr ← VNF which needs to be linked with the rightmost VNF of seg;




r )}, G) then








28 SR ← S \ {s ∈ S : α[fs] 6= φ, ∀fs ∈ Fs} ;
/* placement and chaining of remaining service-chains */
29 for s in SR do
30 for NodesSet in
{
{{n} : n ∈ Nc
}
, racks, clusters} do








The objective of this section is to assess the scalability and efficiency of the ILP model and VNF-PC heuristics500
using simulation experiments. We first describe the simulation environment used in our evaluation and then present501
the results obtained after performing experiments on the ILP model and heuristics.502
9.1 Setup and Parameters503
Table 3: Default values/range of various parameters involved in simulation experiments.
Parameter Value or range Parameter Value or range
| S | [5, 250] co(f c) 3-5 (cores)
Rcpu(n) 24 (cores) ci(f
c) (0.40 - 0.60)co(f
c)





Rbus(n) 80 (Gbps) b(ni, nj) 10, 40 (Gbps)
VNF-chain length 4-6 accel. type (a) a1 a2 a3
ts 100-500 (Mbps) r(a) (LUTs) 40k 28k 30k
cn 1, 1.20
The ILP model for VNF-AAP problem has been built using Python API of IBM’s ILOG CPLEX called DOcplex504
(Decision Optimization CPLEX Modeling ). DOcplex provides a user-friendly API to write the ILP model which is505
then solved by the CPLEX solver. All heuristic algorithms are written in Python programming language. We used506
an Intel Xeon server machine with quad-core CPU @ 2.40GHz with 16GB of RAM memory running Ubuntu-16.04507
OS to carry out evaluations of the ILP and heuristics. Each data point reported in the evaluations indicates an508
average over 10 iterations along with the corresponding confidence interval of one standard deviation (68%).509
For evaluation of heuristics, we have considered two different DC topologies for simulating the physical network: (i)510
three-tier and (ii) leaf-spine. For three-tier topology, we vary the value of k to adjust the size of the network. For511
example e.g. when k=6 we will have k=6 pods, each pod containing k/2 =3 access switches and k/2 = 3 aggregate512
switches. Each access-switch (ToR switch) is connected to 6/2=3 server-nodes and therefore the total number of513
server-nodes in all the pods equal to 54. For leaf-spine, we have considered 4 core-switches and 16 leaf-switches514
(ToR switch). Each leaf-switch is connected to 16 server-nodes, therefore, resulting in a total of 320 server-nodes.515
In both the topologies, the links connecting server-nodes with ToR switches and switches with switches are 10Gbps516
and 40Gbps links respectively.517
Each server-node has 24 CPU cores, 16GB/s of PCIe bandwidth, and has 100k LUTs if a hardware-accelerator518
card is attached to the server-node. For simplicity we assume cost cn of a server-node to be 1.20$ if it is attached519
with a hardware-accelerator, otherwise 1$. The other parameters considered in evaluations are given in Table 3.520
521
9.1.1 Comparison of ILP and Heuristic522
Before presenting evaluation results regarding total node costs, we first report total execution times for ILP and523
heuristic approach. Fig. 9 shows distribution of total execution for both approaches when deploying 5 VNF-chains524
on a leaf-spine topology shown in Fig. 8. As expected, it can be observed that the execution time of ILP-approach is525
orders of magnitude larger than the heuristic approach. Moreover, when the number of VNF-chains to be deployed526
on the given topology becomes large | S |≥ 15 the total execution time could reach up to several hours.527
Fig. 10 shows the evolution of CPLEX solutions with time for the deployment of 15 VNF-chains. It can be observed528
that CPLEX takes about 2 hours to complete the execution for this instance, yet the gap between the incumbent529
solution and the lower-bound estimated by CPLEX after one hour is negligible ( 0.5%). Nevertheless, only small530
sizes instances of the VNF-AAPC problem can be solved using the ILP approach in a reasonable time.531
Fig. 11 gives the comparison between ILP and heuristic in terms of total nodes cost for the deployment of different532
number of VNF-chains. Here we have limited the maximum execution time of CPEX instances to one hour. The533
bar chart shows (i) ILP incumbent solution (ILP) and (ii) best lower-bound (ILP-LB) estimated by CPLEX until534
one hour and (iii) VNF-AAPC heuristic solution. We can observe that there exists a small penalty (on average535
∼5%) when using the heuristic approach instead of the ILP approach. As mentioned earlier, the gap between536
ILP-LB and ILP is almost negligible after one hour of CPLEX execution time.537

























Figure 9: Comparison of ILP model and heuristic in terms of total execution times for the leaf-spine topology.
than with the heuristic method. As the total time of execution is limited, CPLEX was not able to reach the optimal539
solution in the given time and the VNF-AAPC heuristic method is able to achieve more efficient allocation than540
the ILP approach. Although, CPLEX can find the optimal solution if allowed to run without any time limitation,541
the performance of the heuristic is still very close to the estimated lower bound by CPLEX. As mentioned earlier,542
it will be impracticable to use ILP to solve problem instances of size larger than 15 VNF-chains.543
9.1.2 VNF-PC Heuristic Comparison544
Here, we compare the performance of heuristic algorithms among themselves in terms of the following performance545
metrics.546
1. First, total node cost is the cost due to server-nodes which also includes additional costs due to installation of547
hardware-accelerators in some of the server nodes. The comparison of node costs will indicate the resulting548
cost-saving in NFVi by using a particular VNF-PC scheme.549
2. Second, β/α is the ratio of total VNFs allocated hardware-accelerators to the total VNFs in all VNF-chains.550
It is possible that VNF-PC algorithm might not allocate an accelerator to an accelerate-able VNF. This metric551
shows the efficiency of the VNF-PC algorithm in terms of utilization of hardware-accelerator resources. A552
higher value of β/α indicates efficient allocation of hardware-accelerator resources by the VNF-PC algorithm.553
20




















Figure 10: Evolution of ILP’s incumbent solution and lower-bound for (a) full execution of CPLEX and (b) for
first one hour.














Figure 11: Comparison of ILP model and heuristic in terms of total node costs in the leaf-spine topology for
different number of VNF-chains.
3. Third, CPUrem is the average amount of CPU cores remaining per server-node left unallocated after the554
completion of VNF-PC. A high CPUrem indicates the poor consolidation of VNFs, thereby resulting in555
overall inefficient allocation of resources.556
To bench-mark the performance of the proposed VNF-AAPC heuristic, we also evaluate the performance of the557
accelerator-agnostic VNF-PC heuristic. The accelerator-agnostic VNF-PC heuristic will serve as the baseline for558
the evaluation of our VNF-AAPC heuristic.559
Fig. 12 (a) and (d) show the total node costs incurred to the operator as a result of deployment of different numbers560
of VNF-chains on three-tier and leaf-spine DC topologies, respectively. In both topologies, the results show the561
lowest resource cost in case of accelerator-aware VNF-PC heuristic. This arises from the efficient consolidation562
of VNFs, as explained below, by the accelerator-aware heuristic as contrast to the poor VNF consolidation in563
accelerator-agnostic heuristic.564
For the accelerator-agnostic VNF-PC, the VNF placement process is unaware of the presence of accelerator resources565
on a server-node. The chance of an accelerator being allocated to a VNF depends on the odds of an accelerate-566
able VNF being placed on a server-node with a hardware-accelerator. The probability (pacc) of allocation of an567





Here, ρvnfacc is the fraction of VNF that can be offloaded using a hardware-accelerator (accelerate-able VNFs) and569
21
ρnacc is the fraction of server-nodes attached with a hardware-accelerator. Therefore, pacc gives odds of accelerator570
allocation to a VNF with the accelerator-agnostic heuristic. This can also be verified from the resulting β/α571
ratios depicted in Fig. 12 (b) and (e). The β/α ratio for the accelerator-agnostic heuristic remains smaller than572
the accelerator-aware heuristic for any value of total VNF-chains. The explicit allocation of accelerators to VNFs573
occurs in the accelerator-aware heuristic which is in contrast to the accelerator-agnostic heuristic where accelerator-574
allocation is arbitrary. Moreover, we observed an increase in beta/alpha ratio with the increasing number of total575
VNF-chains. This observation can be attributed to the fact that the accelerator-aware heuristic attempts to576
reuse the deployed accelerator instances and nodes attached with hardware-accelerators. Therefore, increasing the577
number of VNF-chains causes an increase in the number of candidates for accelerator allocation; thus resulting in578
a better beta/alpha ratio.579
CPUrem metrics for both heuristics are depicted in Fig. 12 (c) and (f). VNF consolidation on server-nodes tends580
to increase with the total number of VNF chains as the chance of placing VNF on a server-node will increase with581
the increase in the total number of VNFs. This is confirmed by the decreasing CPUrem with the increasing number582
of VNF-chains for both topologies. Moreover, as more VNFs are granted accelerator using the accelerator-aware583
heuristic compared to the accelerator-agnostic heuristic, the corresponding CPUrem is smaller and therefore more584
VNF consolidation is achieved.585
We also try to show the impact of changing the fraction of nodes ρnacc with an attached hardware-accelerator card586
on overall performance metrics; when deploying the same set of VNF-chains. For this experiment, we decreased587
the fraction of server-nodes attached with a hardware-accelerators in a three-tier topology with k = 10 and mea-588
sured the performance metrics for both heuristics. It can be observed from Fig. 13 that the total nodes cost for589
accelerator-aware heuristic remains less than that of the accelerator-agnostic heuristic for all values of ρnacc. Also,590
as the fraction of nodes with hardware-accelerator ρnacc is reduced, the additional accelerator cost is decreased for591
both accelerator-agnostic and accelerator-aware VNF-PC heuristics which is negated by the additional costs due592
to the requirement of extra server-nodes.593
As expected, β/α ratio decreases with decreasing ρnacc for both accelerator-agnostic and accelerator-aware heuris-594
tics. The β/α ratio decreases almost linearly with the decrease in ρnacc. This, again, arises from the fact that the595
probability of accelerator allocation in the accelerator-agnostic heuristic is directly proportional to ρnacc value which596
is not the case with the accelerator-aware heuristic. As placement decisions for accelerate-able VNFs are separate597
in accelerator-aware VNF-PC heuristic, there is no drastic impact on its β/α ratio with a decrease in ρnacc value.598
There isn’t any significant change in CPUrem for both heuristics with the change in ρ
n
acc values. However, VNF599
consolidation for accelerator-aware heuristic is better than the accelerator-agnostic heuristic as was expected.600
601
9.2 Overall cost analysis602
In this section, we analyze the cost-saving achieved as a result of incorporating hardware-acceleration in NFVi.603
We assume the total number of server-nodes (without any hardware-accelerator) required for the deployment of a604
given set of VNF-chains is N . The total cost Cost0 incurred to the operator as a result of running server-nodes605
can be expressed as follows:606
Cost0 = Nc0 (14)
Here, c0 is the cost of running a single server-node (without hardware-accelerator) and N is the total number of607
server-nodes required for the deployment of a set of VNF-chains.608
After the installation of hardware-accelerators in server-nodes, the total cost of deployment Costacc of for the same609
set of VNF-chains can be expressed as follows:610
Costacc = (1 + cacc)c0N(1− ρred)ρnacc + c0N(1− ρred)(1− ρnacc) (15)
The first term and the second term of eq. 15 refer to the cost of using server-nodes attached with and with-611
out hardware-accelerators, respectively. cacc is the additional cost of installation of hardware-accelerator in a612
server-node relative to the original server-node cost, ρnacc is the fraction of server-nodes that are installed with a613
hardware-accelerator and ρred is the relative (total number of server-nodes) reduction in the number of server-nodes614
after hardware-acceleration for VNFs.615
Fig. 14 compares the total server-nodes required for the deployment of 100 VNF-chains on a leaf-spine topology616
in two cases, (i) when server-nodes are not attached with any hardware accelerator card and (ii) when sever-nodes617
are attached with a hardware-accelerator card. We have used the accelerator-agnostic heuristic for the case when618
NFVi does not contain any hardware-accelerators and for the case when NFVi contains hardware-accelerators, we619
have used the accelerator-aware heuristic. It can be observed that the relative reduction ρred in the total number620
22












































































































Figure 12: Comparison of accelerator-agnostic and accelerator-aware heuristics in terms of total node costs, β/α
ratio and CPUrem for three-tier and leaf-spine topologies. Plots for three-tier topology are shown in (a), (b) and

















































Figure 13: Impact of fraction of nodes with accelerator ρnacc on the total number of required nodes for deployment
of 100 VNF-chains on three-tier topology (k = 10) using accelerator-agnostic and accelerator-aware VNF-PC
heuristics.
23
















Figure 14: Total server-nodes required for the deployment of 100 VNF-chains on a leaf-spine topology in two cases,
(i) when server-nodes are not attached with any hardware accelerator card and (ii) when sever-nodes are attached
with a hardware-accelerator card.
Relative cost-savings (G) is the relative reduction in the total cost as a result of using hardware-acceleration in623
NFVi. G can be obtained by using the expression shown below:624
G = (Cost0 − Costacc)/Cost0 = 1− (1− ρred).
(
1− ρnacc + (1 + cacc).ρnacc
)
(16)
Eq. 16 gives an expression for the achievable cost-saving in terms of relative server-node reduction ρred and625
additional costs of hardware-accelerators (cacc). Using eq. 16, we can plot the required minimum ρred to achieve626
a given cost-saving G as shown in Fig. 15. Fig. 15 shows four different contours corresponding to four different G627
values. For example, to achieve an overall 15% savings (G = 0.15) on server-nodes cost, VNF-PC algorithm should628
achieve at least 18% reduction of total server-nodes, when an additional cost of 18.5% is needed for the installation629
of hardware-accelerators. As expected, one can observe that higher G values require high server-node reduction630
ρred and low additional costs cacc. Therefore, efficient accelerator-aware VNF-PC heuristics are required to gain631
the benefits of hardware-accelerator even when costs of hardware-accelerators are expected to reduce in the future.632
As stated earlier, about ρred = 18 − 20% reduction in total server-nodes can be obtained using our VNF-AAPC633
heuristic. As a result, about 15% of overall cost-saving (G) is achievable by the operator.634
24

















Figure 15: Variation of relative node reduction ρred with respect to additional hardware-accelerator cost for VNF-
PC heuristic’s. Each line represent a locus of all points with fixed value of cost-saving G.
10 Conclusion635
NFVi generally includes all hardware and software components required to build a virtualized environment for636
running VNFs. However, due to specific performance or energy goals, it becomes essential to provide some kind637
of acceleration to certain VNFs. However, the current NFVi resource allocation models do not consider hardware-638
accelerator resources while performing placement and chaining of VNFs; therefore, resulting in an inefficient uti-639
lization of NFVi resources.640
In this paper, we modeled the VNF-AAPC problem for NFV environments containing hardware-accelerators along641
with the usual NFVi resources. To tackle the VNF-AAPC problem, we proposed two approaches: (i) ILP method642
and (ii) heuristic algorithm. As opposed to the ILP-approach, the heuristic-based method is able to scale with the643
problem size at the cost of a small penalty. Both approaches aim at minimizing the cost incurred to the operator644
due to the utilization of resources for the deployment VNF-chains. The heuristic-based approach performs tasks of645
VNF placement and chaining in two different phases: (i) Placement of accelerate-able VNFs, (ii) Placement and646
Chaining of remaining VNF-chain segments. The proposed methods were also evaluated using simulation exper-647
iments and then were compared in terms of their resulting cost and other performance metrics. The simulation648
results indicate that the accelerator-aware heuristic approach can achieve 12-14% cost-savings as compared to the649
accelerator-agnostic heuristic. Finally, we also performed overall cost-analysis on the use of hardware-accelerators650
in NFV environments. The analysis shows that the proposed accelerator-aware VNF-PC heuristic could be used651
to achieve significant cost-savings when using hardware-accelerators in NFVi.652
Hardware-accelerators are not only utilized in cloud DCs for performance enhancements of VNFs but also in653
other scenarios e.g. network edges, Centralized Radio Access Networks (CRANs). To reduce the energy cost and654
meet strict performance requirements in CRAN, various techniques to offload baseband processing functions, e.g.655
iFFT/FFT, turbo-coding, using hardware-accelerators are being investigated. However, the problem to model656
resource dimensioning for virtual base stations in cloud RANs (C-RAN) architectures with hardware-accelerators657
still remains to be investigated.658
11 Acknowledgments659
This work was funded through NGPaaS, under the grant number 761557, in the scope of the European Commission660
Horizon 2020 and 5G-PPP programs.661
25
References662
[1] B. Yi, X. Wang, K. Li, M. Huang, et al., “A comprehensive survey of network function virtualization,”663
Computer Networks, vol. 133, pp. 212–262, 2018.664
[2] “Architectural framework,” 2013. Online; Release v1.1.1 2013-10.665
[3] L. Linguaglossa, S. Lange, S. Pontarelli, G. Rétvári, D. Rossi, T. Zinner, R. Bifulco, M. Jarschel, and666
G. Bianchi, “Survey of performance acceleration techniques for network function virtualization,” Proceedings667
of the IEEE, vol. 107, no. 4, pp. 746–764, 2019.668
[4] N. Nikaein, “Processing radio access network functions in the cloud: Critical issues and modeling,” in Pro-669
ceedings of the 6th International Workshop on Mobile Cloud Computing and Services, pp. 36–43, ACM, 2015.670
[5] J. G. Herrera and J. F. Botero, “Resource allocation in NFV: A comprehensive survey,” IEEE Transactions671
on Network and Service Management, vol. 13, no. 3, pp. 518–532, 2016.672
[6] B. Han, V. Gopalakrishnan, L. Ji, and S. Lee, “Network function virtualization: Challenges and opportunities673
for innovations,” IEEE Communications Magazine, vol. 53, no. 2, pp. 90–97, 2015.674
[7] S. Gebert, A. Müssig, S. Lange, T. Zinner, N. Gray, and P. Tran-Gia, “Processing time comparison of a675
hardware-based firewall and its virtualized counterpart,” in International Conference on Mobile Networks and676
Management, pp. 220–228, Springer, 2016.677
[8] “Addressing 5G Network Function requirements.” White Paper, 2018. Online.678
[9] G. P. Sharma, W. Tavernier, D. Colle, and M. Pickavet, “Dynamic hardware-acceleration of VNFs in NFV679
environments,” in 2019 Sixth International Conference on Software Defined Systems (SDS), pp. 254–259, June680
2019.681
[10] Z. Martinasek, J. Hajny, D. Smekal, L. Malina, D. Matousek, M. Kekely, and N. Mentens, “200 gbps hardware682
accelerated encryption system for fpga network cards,” in Proceedings of the 2018 Workshop on Attacks and683
Solutions in Hardware Security, pp. 11–17, ACM, 2018.684
[11] X. Li, X. Wang, F. Liu, and H. Xu, “Dhl: Enabling flexible software network functions with fpga acceleration,”685
in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1–11, IEEE,686
2018.687
[12] X. Ge, Y. Liu, D. H. Du, L. Zhang, H. Guan, J. Chen, Y. Zhao, and X. Hu, “OpenANFV: Accelerating688
network function virtualization with a consolidated framework in openstack,” in ACM SIGCOMM Computer689
Communication Review, vol. 44, pp. 353–354, ACM, 2014.690
[13] A. Albanese, P. S. Crosta, C. Meani, and P. Paglierani, “Gpu-accelerated video transcoding unit for multi-691
access edge computing scenarios,” in Proceeding of ICN, 2017.692
[14] M. Masoudi, M. G. Khafagy, A. Conte, A. El-Amine, B. Françoise, C. Nadjahi, F. E. Salem, W. Labidi,693
A. Süral, A. Gati, et al., “Green mobile networks for 5g and beyond,” IEEE Access, vol. 7, pp. 107270–107299,694
2019.695
[15] S. Han, K. Jang, K. Park, and S. Moon, “Packetshader: a gpu-accelerated software router,” ACM SIGCOMM696
Computer Communication Review, vol. 41, no. 4, pp. 195–206, 2011.697
[16] X. Yi, J. Duan, and C. Wu, “Gpunfv: a gpu-accelerated nfv system,” in Proceedings of the First Asia-Pacific698
Workshop on Networking, pp. 85–91, 2017.699
[17] B. Li, K. Tan, L. L. Luo, Y. Peng, R. Luo, N. Xu, Y. Xiong, P. Cheng, and E. Chen, “Clicknp: Highly flexible700
and high performance network processing with reconfigurable hardware,” in Proceedings of the 2016 ACM701
SIGCOMM Conference, pp. 1–14, ACM, 2016.702
[18] Z. Bronstein, E. Roch, J. Xia, and A. Molkho, “Uniform handling and abstraction of NFV hardware acceler-703
ators,” IEEE Network, vol. 29, no. 3, pp. 22–29, 2015.704
26
[19] Y. Watanabe, Y. Kobayashi, T. Takenaka, T. Hosomi, and Y. Nakamura, “Accelerating NFV application using705
cpu-fpga tightly coupled architecture,” in 2017 International Conference on Field Programmable Technology706
(ICFPT), pp. 136–143, IEEE, 2017.707
[20] “Acceleration technologies; report on acceleration technologies & use cases,” 2015. Online; Release v1.1.1708
2015-12.709
[21] H. Moens and F. De Turck, “VNF-P: A model for efficient placement of virtualized network functions,” in 10th710
International Conference on Network and Service Management (CNSM) and Workshop, pp. 418–423, IEEE,711
2014.712
[22] S. Mehraghdam, M. Keller, and H. Karl, “Specifying and placing chains of virtual network functions,” in 2014713
IEEE 3rd International Conference on Cloud Networking (CloudNet), pp. 7–13, Oct 2014.714
[23] B. Addis, D. Belabed, M. Bouet, and S. Secci, “Virtual network functions placement and routing optimization,”715
in 2015 IEEE 4th International Conference on Cloud Networking (CloudNet), pp. 171–177, IEEE, 2015.716
[24] M. Ghaznavi, A. Khan, N. Shahriar, K. Alsubhi, R. Ahmed, and R. Boutaba, “Elastic virtual network function717
placement,” in 2015 IEEE 4th International Conference on Cloud Networking (CloudNet), pp. 255–260, IEEE,718
2015.719
[25] S. Sahhaf, W. Tavernier, M. Rost, S. Schmid, D. Colle, M. Pickavet, and P. Demeester, “Network service720
chaining with optimized network function embedding supporting service decompositions,” Computer Networks,721
vol. 93, pp. 492–505, 2015.722
[26] F. Carpio, S. Dhahri, and A. Jukan, “VNF placement with replication for load balancing in NFV networks,”723
in 2017 IEEE International Conference on Communications (ICC), pp. 1–6, IEEE, 2017.724
[27] H. Fan, Y. Hu, S. Zhang, and Q. Ren, “Hardware acceleration resource allocation mechanism for VNF,”725
Procedia computer science, vol. 131, pp. 746–755, 2018.726
[28] S. Dräxler and H. Karl, “SPRING: Scaling, placement, and routing of heterogeneous services with flexible727
structures,” in 2019 IEEE Conference on Network Softwarization (NetSoft), pp. 115–123, June 2019.728
[29] G. P. Sharma, W. Tavernier, D. Colle, and M. Pickavet, “VNF-AAP: Accelerator-aware virtual network729
function placement,” 2019.730
[30] N. Kodirov, S. Bayless, F. Ruffy, I. Beschastnikh, H. H. Hoos, and A. J. Hu, “VNF chain allocation and731
management at data center scale,” in Proceedings of the 2018 Symposium on Architectures for Networking and732
Communications Systems, pp. 125–140, ACM, 2018.733
27
