Abstract-Voltage and frequency is dynamically scaled to produce energy efficient multi-core networkon-chip (NoC). A detailed analysis of the techniques employed for this purpose are studied and based on the optimized performance the most effective ones are reported. We also highlight the most promising high performance and energy minimizing techniques.
I. INTRODUCTION
Power reduction and energy efficiency are one of the key metrics in the design of upcoming muti-core network-on-chip (NoC) architectures. The NoC is the high performance and scalable alternative to the oldfashioned bus-based architecture. However the NoC consumes a lot of power [3] . The dynamic voltage and frequency scaling (DVFS) is a hardware feature that can in real-time adjust the clock frequency based on the operational voltage of a processor [19, 20] . It is an effective technique for increasing the energy efficiency of a network by reducing the energy dissipation. The key idea is to provide the circuit just enough of speed and voltage that it requires to process the assigned workload. This technique is implemented by a transition of the processor to a low-power state from a high-power state, when the workload reduces [5, 12, 13, 16] .
This works is inspired by the emerging technological trends. Modern chips have several cores embedded on them and are designed to achieve optimal power-performance [25] . Each application running on these cores vary in their powerperformance requirements to complete an assigned task. A designer needs to keep the variation in mind, and adjust the voltage and frequency judiciously in proportion with the workload. The DVFS is used to improve energy efficiency during low utilization phase of applications. Lowering the applied voltage and frequency in low-utilization phase prevents thermal violations during sustained high utilization [18] .
The remainder of the paper is organized as follows. Section II focuses on the basics of the DVFS technique. Section III dwells on the relationship between the voltage and frequency. Choice of DVFS as design technique in voltage-frequency scaling is discussed in Section IV. Different methodologies of using DVFS are being detailed in Section V. Section 6 concludes the survey.
II. BASICS OF DVFS Power dissipation in digital Complementary
Metal Oxide Semi-conductor (CMOS) circuits takes place due to two major sources. The one arising from the bias and leakage current is known as the static power [26] . Static power is dependent on the process and design technologies [22] . The other form is called as the dynamic power and it commences from the charging and discharging of the voltage saved in the node capacitances of the circuit [11] . The aim of the DVFS is to reduce the dynamic power that can be modeled as [11] : 
III. V/F SCALING LINEARLY
For an efficient control and smooth application of DVFS, a proportionately linear relationship is expected between the frequency and applied voltage [8] . Therefore, due to the linear dependence of frequency on voltage, lowering voltage has the same proportionate effect as lowering the frequency. Therefore, the power consumption can be modeled in terms of voltage. However, if the ratio of voltage/frequency scaling is not linear, then a reprogramming of the delay line has to be incorporated.
Assuming a linear relationship between DD V and CLK f , the power dissipation is given as:
(2) The key design idea of DVFS is governed by the above proportional relationship, such that a reduction in the clock frequency or supplied voltage, results in a cubic decrease in the power consumed.
IV. DESIGN OBJECTIVE
The design objective in applying DVFS is to propose a design that has the lowest power leakage value of NoC nodes, during the inactive phase. The low power leakage can be obtained by assigning sleep mode to the inactive cores in a NoC. The aforementioned can be achieved by reducing the applied voltage to the node. However, the node in the sleep mode is not completely turned off and remains in its functionality state at a minimum voltage cost [6] . Performance worsens with the application of aggressively scales the supply voltage to improve the energy efficiency [27] .
Energy-saving opportunities are dependent on the workload characteristics and vary accordingly. This is due to the fact that the node count increases in the allotted problem size the energy efficiency decrease. Energy-saving opportunities increases as the workload of a node falls below its utilizable threshold [15] . Using DVFS, the energy efficiency is inclined to a higher value as compared to other voltage and frequency (V/F) scaling techniques, such as Dynamic Power Management (DPM), Dynamic Voltage Scaling (DVS), and Dynamic Frequency Scaling (DFS). Due to the exponential increase in the density of the transistors per core and the level of compaction, a mechanism is required to yield high performance while maintaining good energy budget. The aforementioned technique of DVFS proves to be a need of the day and has a positive impact on the growth of the multi-core architecture.
V. EVOLUTION OF DVFS
A lot of research has been done in the domain of the improvement of energy efficiency in today's multi-core architecture [1] . A brief overview of few recent evolutionary inspired approaches to power optimization using DVFS in multi-core NoC is presented here. Lee et al. [14] emphasized on the need of a Power Management Unit (PMU) that controls the generation of the supply voltage and clock to fine-tune the speed of the target domain. The occupational level and performance requirement of each domain is generated to the PMU that decides the upgrade of frequency and voltage for each of the power domain. V/F line usage by the PMU causes the system to lock the V/F line [14] . The lock essentially discourages multi-threading.
Malkowski et al. [17] exploits the fact that the memory bound code can be power optimized for achieving energy saving. As soon as the application enters a memory bound phase, prefetchers are activated. In the memory bound phase, the execution time is dominated by the latency caused in the access to the main memory. Applications in the memory bound phase tend to be idle, waiting for the required set of data. Moreover, in the memory bound phase, a reduction in the CPU voltage and frequency does not lower the performance. Therefore, power optimization can be achieved.
Beigne et al. [2] proposed a fully power-aware locally-synchronous and globally-asynchronous NoC circuit for the implementation of DVFS mechanism. Adaptive design techniques are used for each of the synchronous NoC unit. The main CPU is required to directly disable/enable the units for entering the power off mode and executing the reset phase [2] . A multiple supply line for V/F decreases the voltagefrequency transition delay. Therefore, the system becomes more time efficient.
Howard et al. [10] described an evolutionary approach for a fine-grained power management of multi-core NoC using Voltage and Frequency Islands (VFI). The power management protocols in the software exploit the benefits of the VFI through the DVFS. With an active DVFS, a reduction of 80% in the measured power can be achieved, as compared to the phase when DVFS is inactive.
Performance of a many-core processor can be increased by using more cores and parallelizing the workloads [7] . Many core processors with DVFS can optimize power-performance of parallel workloads by varying V/F level of VFIs.
Per-core DVFS if applied to chips achieves better application performance and control accuracy as compared to per-chip DVFS. It is known that DVFS can allow a cubic reduction in power density for each of the core in a Chip Multi-processor (CMP) employing DVFS [23] . Performance of the CMP can further be enhanced by minimizing the transition time of V/F scaling. Inter band Tunnel Field Effect Transistors (TFETs) cores are more energy-efficient at low frequencies as compared to the CMOS cores. Therefore, combining the two types of cores in a multi-core architecture and applying DVFS for V/F scaling, a thread migration scheme is proposed. Low frequency applications are migrated to the TFET cores, while high frequency applications are more efficiently handled by the CMOS core. The scheme achieves average dynamic energy and leakage energy savings of 17% and 30%, respectively. However, the scheme has an overhead of performance degradation of 1% [21] .
Herbert et al. [8] proposed a process variation parameter to shift work to efficient dies or cores. The work shifting can be divided into two levels: (a) among dies belonging to a given speed bin and (b) between VFI on a given die. The traditional DVFS schemes have neglected the variations in both the spatially-correlated within-die process and die-to-die, which if addressed can improve energy efficiency [8] . A balance between performance and power in CMP can efficiently be achieved using fine-grained DVFS [9] . Therefore, from full-chip V/F control, the CMP is moving towards finer-grained methods. By shifting work from the inefficient, leaky processing units to less leaky and more efficient ones, more power conservation can be achieved [8] .
Thread packing along with DVFS is used to maximize efficiency [4] . Reda et al. [18] proposed a novel scheme specifies the number of threads running on a core and uses the power cap mechanism to maximize the performance for multi-threaded workloads onto a variable number of cores. The thread reduction methodology along with the power cap enabled, limits the maximum power that a particular core can use at any instant of time. Power conservation is achieved by assigning the sleep mode state to the idle cores [18] . The aforementioned technique was tested on a quad-core (Intel's Core i7) based server and reduced the workload energy consumption by an average of 51.6% compared to existing control techniques, that implements DVFS alone.
Energy consumption can be reduced further in comparison to all prior discussed DVFS techniques if the processor is put into the deep sleep state [24] . In the deep sleep mode, the processor is paused, and the consumption of power in is significantly low [24] . As the intensity of the workload increases form light to medium, the DVFS is used to guarantee high performance by scaling the processor frequency. Table 1 summarizes the research regarding energy efficiency using DVFS in multi-core NOC. The comparison in the table provides a reference to the parameters required to be considered for choosing the best design for implementation technique using DVFS. A check mark ( ) in the table refers to the fulfillment of the parameter, a cross mark ( ) corresponds that the parameter under consideration is unfulfilled, and a negative 1 reveals a performance degradation of -1%. Whereas, a N/A entry in the table refers to the fact that either the parameter was inapplicable or not discussed. [12] 19% -1% [4] 10% [3] 20-60% [13] Info. N/A [2] 22-86% [6] 61.69% [14] 63%
VI. CONCLUSIONS Techniques using DVFS have been studied in detail to find the most energy efficient technique of implementation of DVFS. Implementation of the traditional DVFS operations can be disintegrated into two parts: (a) clock transition and (b) voltage transition depending on the workload of the processor. Optimal core mapping also contributes to boost the performance. The DVFS has become an integral part of the NoC for saving power as well as for reducing congestion by increasing the frequency of the congested router. Therefore, DVFS is essential for increasing the throughput of the system.
