Abstract-New waveforms are considered by the fifth generation (5G) of cellular networks to exploit the underutilized fragmented spectrum. FBMC is one possibility as it provides better adjacent channel leakage. This paper brings a first estimate of silicon area for FBMC in comparison to OFDM assuming a CMOS 65nm technology. The paper concludes that the silicon area overhead introduced by much more complex waveforms is deemed acceptable.
I. INTRODUCTION
The Next Generation Mobile Networks (NGMN) Alliance highlighted the necessity to make more spectrum available in the existing sub-6 GHz radio bands and introduce new agile waveforms that exploit the existing underutilized fragmented spectrum, in order to satisfy specific Fifth Generation (5G) operating scenarios [1] .
In order to maximize spectral efficiency, strict synchronization and orthogonality between users within a single cell is imposed by LTE and LTE-A standards. However, sporadic traffic has emerged as an important service for future generations of cellular networks (5G) [2] . Besides, spectrum allocation for LTE led to substantially fragmented spectrum. To cope with this fragmentation, carrier aggregation has been designed to achieve much higher rates by variably aggregating non-contiguous frequency bands [3] . Because spectrum is scarce and expensive, its utilization should be as optimal as possible. Waveform orthogonality such as that of OFDM imposes limitations in spectrum utilization, such as the need of providing high guard bands to other networks in order to satisfy spectral mask requirements. On the other hand, sporadic traffic notably introduced by machine-type-communications undergoes serious limitations in terms of latency and signaling overhead, due to synchronization constraints. Therefore relaxed synchronization and access to fragmented spectrum have been considered as key parameters for future generations of wireless networks [2] and [4] . This requirement of spectrum agility has encouraged the study of alternative multicarrier waveforms such as FBMC to provide better adjacent channel leakage performance without compromising spectral efficiency [5] and [6] .
Complexity evaluation of these new waveforms have already been performed [6] . However, it is often very difficult to draw a conclusion in terms of silicon cost when only high level complexity analysis is performed. The most advanced complexity studies have already compared OFDM and FBMC implementation for FPGA [7] . However, FPGAs are highly configurable integrated circuit components and do not preclude silicon area and ultimately silicon cost.
The object of this study is to leverage on the study of [7] and evaluate the overhead in terms on silicon area between OFDM and one of the 5G candidates, namely FBMC.
The paper is organized as follows. After a brief description of the main implementation architectures of OFDM and FBMC, this paper compares both architecture and preliminary results found in [7] . A first silicon area estimation is then made based on the same architecture by evaluating on one hand memory area requirements and logic cell area requirements for both designs. Finally, a first estimate of silicon area is estimated using 65nm CMOS technology node. Section IV concludes the paper.
II. PHY ARCHITECTURE FOR OFDM AND FBMC RECEIVERS

A. OFDM Receiver architecture
The architecture of OFDM receivers has been widely investigated in the literature [8] . A typical architecture of an OFDM receiver is depicted in Figure 1 .
A time domain (TD) synchronization module estimates the start of the multicarrier symbol. The information is used to align a N -point FFT that is processed on the received data every N + N GI samples, where N GI is the size of the guard interval of the OFDM. The N points generated by the FFT are then simultaneously stored to a memory unit for later processing and used by a frequency domain synchronization detector to estimate the carrier frequency offset (CFO).
On the channel estimation datapath, CFO compensation is first performed in the frequency domain using a feed-forward approach. Then, the channel coefficients are estimated on the pilot subcarriers before interpolation for every active subcarrier. Once the channel is estimated on all the active subcarriers the response is stored in a dedicated channel response memory. Depending on the pilot carrier distribution within the time frequency grid, a time interpolation may also be performed. The data buffered in the memory unit are then processed through a one-tap per subcarrier equalizer. Demapping and Log-Likelihood Ratio (LLR) computation complete the inner receiver architecture. A soft-input Forward Error Correction (FEC) decoder finally recovers the originally sent messages.
B. FBMC Receiver architecture
A multicarrier system can be described by a synthesis/analysis filter bank, i.e. a transmultiplexer structure. The Fig. 2 . Typical FBMC receiver block diagram [7] synthesis filter bank is composed of a set of parallel transmit filters. FBMC waveforms utilize a prototype filter designed to give a good frequency localization of the subcarriers. The prototype filter considered in this paper is based on the frequency sampling technique [9] . This technique gives the advantage of using a closed-form representation that includes only a few adjustable design parameters.
The most significant parameter is the duration of the impulse response of the prototype filter also called overlapping factor, K. The impulse response of the prototype filter is given by [9] :
where
overlapping factor of K = 4 and N is the number of carriers. The larger the overlapping factor K, the more localized the signal will be in frequency. Adjacent carriers significantly overlap with this kind of filtering. In order to keep adjacent carriers orthogonal, real and pure imaginary values alternate on successive carrier frequencies and on successive transmitted symbols (Offset-QAM modulation is used) for a given carrier at the transmitter side. The well-adjusted frequency localization of the prototype filter guarantees that only adjacent carriers interfere with each other. This allows for a more flexible operation than OFDM for Frequency Division Multiple Access (FDMA), i.e. non synchronous flexible frequency division multiple access [10] .
Most of the published receiver architectures are based on PolyPhase Network (PPN) receivers [9] . In this scheme, the filterbank process is applied in the time domain before the FFT using a polyphase filter. It reduces the size of the FFT but makes the receiver less tolerant to large channel delay spread or synchronization mismatch of the FFT. Therefore, this strategy is not well adapted to the hardware implementation as it requires more control logic. In [11] , the authors describe a high performance receiver architecture denoted FS-FBMC (Frequency Spreading FBMC). One advantage of this architecture comes from the fact that time synchronization may be performed in the frequency domain independently of the position of the FFT [11] . This is realized by combining time synchronization with channel equalization. Moreover, good performance for channel exhibiting large delay spread is achieved [11] . This asynchronous frequency domain processing of the receiver provides a receiver architecture that allows for flexible reception and is particularly adapted to the envisaged scenarios.
FBMC waveforms are expected to be spectrally more efficient than OFDM when relaxed synchronization between users is considered. Therefore, a preferred architecture for FBMC receivers should be able to efficiently demodulate the signal in the frequency domain without a priori knowledge of the FFT timing alignment (i.e. the location of the FFT block) [7] . A FBMC receiver architecture based on this criteria is given in Figure 2 . A free-running FFT of size KN is processed every blocks of N/2 samples generating KN points that are stored in a memory unit for later processing. In parallel a frequency domain synchronization detector detects the start of burst and directly estimates the CFO at the output of the FFT. On the channel estimation datapath, CFO compensation is first performed in the frequency domain using a feed-forward approach. Then, as for OFDM, channel coefficients are estimated on the pilot subcarriers before being interpolated on every active subcarrier. Once the channel is estimated on all the active subcarriers the response is stored for each user in a dedicated channel response memory. The data buffered in the memory unit is then processed through a one-tap per subcarrier equalizer before filtering by the FBMC prototype filter. Demapping and Log-Likelihood Ratio (LLR) computation complete the inner receiver architecture. As far as the LLR computation is concerned, processing is slightly different for FBMC than OFDM. Indeed, in case of a FS-FBMC architecture based receiver, the computation of the LLR associated to a bit from an observation symbol is a function of 2K − 1 channel coefficients [10] .
III. VLSI EVALUATION AND COMPARISON
A. FPGA reference implementation
A complexity evaluation has already been assessed for a Xilinx Kintex-7 FPGA and the results are summarized in Table  I [7] . The study concluded that in terms of digital logic, when implemented on a FPGA platform, FBMC takes around 30% extra area in comparison to OFDM (as the overall amount of signal processing units such as DSP48E1 is relatively small in comparison to logical units such as Slice Registers and LUTs). However, an overhead of almost 300% of memory is necessary for the FBMC implementation. This overhead essentially comes from the frequency-spreading (FS) FBMC architecture that processes data of blocks of size KN , where K is the overlapping ratio of the FBMC waveform instead of blocks of size N for Polyphase Network (PPN) implementation (K = 4 has been used and N = 1024). [7] suggested that memory overhead is not a significant issue since this overhead may not be as costly in silicon design as memory blocks are highly optimized for occupancy. The purpose of this study is to propose a first evaluation of silicon cost for FBMC and compare it to OFDM receivers. 
B. Memory area requirements
Athough more advanced technology nodes are currently being considered for cellular modem implementations, 65nm Complementary Metal Oxyde Semiconductor (CMOS) technology has been considered for this study. Access to the development kit for the technology has been the main driver for the choice of technology. The FPGA based design has been analysed for both OFDM and FBMC. Memory usage has first to be considered for the OFDM and FBMC architectures presented in Figures 1 and 2 . For OFDM, a synchronization is performed by the time domain (TD) synchronization module and followed by a FFT of size N . The TD synchronization module optimally localizes the FFT window. Successive N point blocks are stored in a memory unit called the main memory unit. The main memory unit buffers the data for later processing. In parallel, a frequency domain synchronization detector estimates the carrier frequency offset (CFO) at the output of the FFT. On the channel estimation datapath, CFO compensation is first performed in the frequency domain using a feed-forward approach. Then, channel coefficients are estimated on the pilot subcarriers before being interpolated for every active subcarrier. Once the channel is estimated on all the active subcarriers the response is stored in a dedicated channel response memory. The data buffered in the memory unit are then processed through a one-tap per subcarrier equalizer. Demapping and Log-Likelihood Ratio (LLR) computation complete the inner receiver architecture. Soft-input FEC decoders finally recover the originally sent messages. An FBMC receiver architecture based on this assumption is depicted in Figure 2 . An asynchronous FFT of size KN is processed every blocks of N/2 samples generating KN points. These successive KN points are stored in a memory unit. The memory unit buffers the data for later processing. In parallel a frequency domain synchronization detector detects the start of burst and estimates CFO directly at the output of the FFT. Once a start of burst is detected, on the channel estimation datapath, CFO compensation is first performed in the frequency domain using a feed-forward approach. Then, as in OFDM, channel coefficients are estimated on the pilot subcarriers before being interpolated for every active subcarrier. Once the channel is estimated on all the active subcarriers the response is stored in a dedicated channel response memory. The data buffered in the memory unit are then processed through a one-tap per subcarrier equalizer before filtering by the FBMC prototype filter which is similar to OFDM receiver. Demapping and LLR computation complete the inner receiver architecture. Four main blocks of memory are considered in both receiver designs: the main memory, the channel interpolation memories in the channel estimation function, the equalizer memory and finally the FEC interleaver memory. For OFDM, when 1024-point FFT (equivalent to the 10MHz LTE mode) are considered, the main memory block consists of 16 I/Q OFDM symbols of 512 carriers (it is assumed that less than half of the carriers are active). We assumed that the output of the FFT is on 16-bit complex numbers. Therefore the size of the main memory block for OFDM is considered to be equal to 512 × 16 × (16 + 16) or 8192-words of 32-bits. The interpolation memories are used for channel estimation. Two memory blocks are considered, one for the data path and one to hold the interpolation filter coefficients. Assuming pilot tones are located every 4 carriers at most the interpolation datapath memory consists of 128 pilots and 512 interpolated carriers or 640 words of 32 bits. Filter coefficients used to interpolate the channel are on 12 bits and consist of 256 real coefficients. Furthermore, for added flexibility indices of the active carriers and of the pilots are dynamically stored and consist of 4224 words of 12 bits. Therefore, the interpolation coefficient memory is of size 4480 words of 12 bits. The equalizer memory is located in the inner decoder and consists of the 512 active carriers at the output of the equalizer and the associated channel information on the carrier. It therefore consists of 512 words of 32 + 16 bit or 48 bits. The channel information is assumed real and represented on 16-bit while the data is assumed complex and represented on 32 bits. Finally, the interleaver memory is considered with an interleaving depth of 2048 word for 6-bit LLR information. Because the block is implemented in ping-pong, the required memory is of size 4096 words over 6-bits. For FBMC, frequency spreading receiver architecture has been considered. In this case, the FFT is K times larger than the number of carriers. When considering an overlapping factor of K = 4, that means that the FFT size is equal to 4096 instead of 1024 points. The main memory block is hence increased by the same amount and must be of size 32768 words of 32 bits. The interpolation memories used for channel estimation are similar to the one used by OFDM. Two main memory blocks are considered, one for the data path and one to hold the interpolation filter coefficients. Assuming the pilot tones are located every 4 carriers at most the interpolation datapath memory consists of 128 pilots and 512 times 4 or 2048 interpolated carriers. This makes the datapath interpolation memory of size 2176 words of 32 bits. For the filter coefficients, the same hypotheses as for OFDM are considered: a memory of 4224 words over 12 bits. Finally, equalizer and interleaver memories are assumed to be of the same size for FBMC as for OFDM, since the processes are performed after the data signal has been filtered by the FBMC prototype filter and decimated. 
Total amount of Memory (in bit) 385536 1221120 A summary of the memory requirement for the described architectures is given in Table II . Memory requirements for FBMC are around 3 times larger than for OFDM when the total amount of required memory bits is compared. This was expected and is already evident from Table I : Block Ram usage is 2.5 times larger for FBMC than for OFDM.
Each of the identified memory blocks has been generated for 65nm CMOS VLSI implementation. Area, maximum operating speed and power consumption have been reported for each generated memory block. Memory area requirements have also been given in terms of equivalent Kgates. For the 65nm CMOS technology, we used the reference number of 855kgates per mm 2 . The results are given in Table III . In terms of area requirement for the memory, OFDM memory area requirement is 3 times smaller than FBMC memory requirement. Despite the very dense optimisation of memory blocks for VLSI applications, memory area is dominated by what we have designed as the main memory. Standard VLSI memory generators have size limitations of around 8192 addresses for 32 bits wide words. Since the main memory is relatively large for FBMC, 4 blocks of 8192 words had to be generated in order to realize the 32768 word main memory. Hence, the main memory did not benefit from the expected scaling effect of the technology when compared to the main memory of OFDM. Maximum speed of the generated memories is of around 350MHz for both designs and power consumption when the memories are operated at 100MHz is estimated at around 20mW for FBMC and 10mW for OFDM. At these frequencies, power consumption is dominated by the main memory. When operating frequency is increased from 100MHz to 340MHz, power consumption is increased as expected and estimated at around 40mW for FBMC and 30mW for OFDM as the power consumption of the other memories become more significant with respect to the power consumption of the main memory.
C. Logic area requirements
Finally, an estimation of the area has been performed for the implemented logic, for both OFDM and FBMC. Starting from Table I , the same architecture design can be evaluated for ASIC VLSI implementation. Memory usage has already been estimated independently. Combinatorial logic, register logic and DSP logic should therefore be evaluated. It is usually not straightforward to evaluate the equivalent gate count of a design from an FPGA synthesis report. However, an upper bound of the gate count could be estimated by using the following assumptions: combinatorial LUTs, slice registers and DSP logic scales similarly for VLSI implementation as for FPGA implementation for these types of design. Furthermore, DSP48 cells implement an 18 × 18 multiplier on the Kintex-7 and could be bounded to a maximum of 2500 equivalent nand gates. Slice Regs can be used to implement up to an 8-bit register and can be estimated to be equivalent to 60 nand gates. Finally, LUTs are used for combinatorial logic and implement design between 1 equivalent nand gate (typically when an inverter is considered) up to 200 nand gates. This is when LUTs are not used for internal memory. We consider that when the design is relatively large an upper bound of LUT usage can be set to 120 equivalent gates. With these hypotheses, a higher estimate of the area results have been processed for both receiver designs (OFDM and FBMC) and summarized in Table IV . We found that the proposed architecture requires a maximum of 9.1mm 2 for OFDM and 12.2mm 2 for FBMC.
In order to refine the area estimate, the FEC decoder has been fully synthesized using the 65nm CMOS technology. The principle of the estimation proposed approach is to take a representative block of the receiver as a benchmark. Because receiver area is mainly dominated by datapath logic, the results found by the representative block can be used to evaluate the rest of the design. Synthesis results for the FEC Decoder give an area of around 0.5mm 2 or 420kgates. We hence estimated that the design is mainly based on an equivalent 14-bit datapath and therefore that DSP48 cells account for an equivalent 1200 gates, Slice Regs are estimated to occupy around 50% of the available logic used per slice reg cells and account for 30 equivalent nand gates per cell for this design. Finally, LUTs are adjusted so that the FEC design exhibits the same area. Adjusted average figure for LUTs is thus equal to 60 equivalent nand gates per LUT. These figures are used to evaluate a more refined area estimate of both receiver designs. With these hypotheses, best estimates of silicon area are 4.7mm 2 for the OFDM receiver design and 6.5mm
2 for the FBMC receiver design. A significant amount of area is taken by the inner receiver. This includes the channel estimation and demapping. As expected memory area is much less significant than datapath digital logic. Memory, FFT, FEC Decoders account each for approximately 10% of the design area. Control accounts for 20% of the design area. This is because control registers and their associated decoding logic are quite complex as the reference design is highly flexible. Finally, the inner receiver (equalizer, channel estimation and demapping) accounts for the remaining half of the design. When OFDM and FBMC are compared in terms of silicon area an overhead of around 40% is expected for the FBMC receiver. This overhead tends to confirm that memory usage is not contributor to silicon area as memory is highly optimized for occupancy.
IV. CONCLUSION
Alternative waveforms have been considered for 5G and this paper is a first attempt to evaluate the silicon cost of one of its candidates, namely FBMC, compared to OFDM. Typical receiver silicon area has been estimated to around 4.7mm 2 for OFDM and 6.5mm
2 for FBMC assuming a CMOS 65nm technology. More than the absolute silicon area estimate is the overhead in terms of surface that should be underlined: despite a significant increase of complexity, the design is only estimated to be 40% larger than for OFDM. However, this alternative waveform offers a very flexible design that match the dynamic spectrum access requirements of the next generation of cellular networks.
V. ACKNOWLEDGMENT
