Battery-operated low-power portable computing devices are becoming an inseparable part of human daily life. One of the major goals is to achieve the longest battery life in such a device. Additionally, the need for performance in processing multimedia content is ever increasing. Processing image and video content consume more power than other applications. A widely used approach to improving energy efficiency is to implement the computationally intensive functions as digital hardware accelerators. Spatial filtering is one of the most commonly used methods of digital image processing. As per the Fourier theory, an image can be considered as a two-dimensional signal that is composed of spatially extended two-dimensional sinusoidal patterns called gratings. Spatial frequency theory states that sinusoidal gratings can be characterised by its spatial frequency, phase, amplitude, and orientation. This article presents results from our investigation into assessing the impact of these characteristics of a digital image on the energy efficiency of hardware-accelerated spatial filters employed to process the same image. Two greyscale images each of size 128 × 128 pixels comprising two-dimensional sinusoidal gratings at maximum spatial frequency of 64 cycles per image orientated at 0°a nd 90°, respectively, were processed in a hardware implemented Gaussian smoothing filter. The energy efficiency of the filter was compared with the baseline energy efficiency of processing a featureless plain black image. The results show that energy efficiency of the filter drops to 12.5% when the gratings are orientated at 0°whilst rises to 72.38% at 90°.
INTRODUCTION AND MOTIVATION
For the past 30 years, Moore's law together with Dennard scaling have driven the era of modern computing, providing exponential increases in performance. Moore's law [25] states that the number of transistors in an integrated circuit doubles every 2 years approximately, whereas Dennard scaling [8] claims that even though transistors get smaller, their power density remains constant. Another related law, Koomey's law [20] states that performance per watt would double every 1.57 years. However, the scale of integrated circuits density achievable has exceeded the levels within which Dennard scaling and Koomey's law were applicable, and the computational capabilities of multi-cores are still rising, but with much less enhancement in energy efficiency. The International Technology Roadmap for Semiconductors (ITRS) reported that, following Moore's law, the transistor density continues to double every 2 years; however, the energy efficiency of transistors is increasing only by 1.4×. This shortfall in the energy efficiency indicates the end of the Dennardian scaling era, where progress was measured with improvements in transistor count and speed, and the beginning of a new era, where advances are measured by improvements in transistor energy efficiency [2] . All of this has resulted in another technological constraint known as the utilisation wall, which limits the portion of the chip that can be used at the full performance within the power budget at the same time [11] . This constraint limits the number of transistors that can be active at a given time due to the power constraint. Therefore, some parts of the chip, i.e., transistors, have to remain inactive or underperforming to allow the chip to function within the power budget. This inactivity in the chip presents the current major technological issue of dark silicon.
It is important to mention the three key prevailing technological bottlenecks for highperformance computational efficiency gains. These are the memory bottleneck, the Instructional Level Parallelism (ILP) bottleneck, and the power bottleneck. The memory bottleneck relates to the recognised technological constraint that memory speed does not increase as fast as computing speed, and as a result it is difficult to hide memory latency. ILP quantifies the number of instructions that can be executed in a single clock cycle. However, Amdahl's law [1] states that the maximum speedup of a program is limited by the serial portion of the code. This limitation presents the ILP bottleneck. The utilisation wall and dark silicon together present the power bottleneck. Therefore, it becomes necessary to explore all the avenues of reducing power consumption and improving energy efficiency of such a digital system. Energy efficiency optimisation has become an essential objective in the design of modern embedded systems. The main motivation of this article is to address the third bottleneck, the power bottleneck.
Portable mobile devices such as tablets, mobile phones, IoT devices, wearable computing devices, and so on, are becoming part of daily human life [23] . Many such devices with a screen or a camera include some form of digital image processing circuit. These devices mainly run on a battery, and therefore battery lifetime is a critical factor for their continued functioning. It has now been established that multimedia applications that involve processing image and video content dominate the power consumption in any battery-operated computing device.
Digital images are essentially a collection of pixels. These pixels are samples of intensity values that are represented in the form of binary numbers. The variation in the content of digital images can be considered to be the variation in the values of the constituent pixels and vice versa. These pixel values are typically represented using binary numbers comprising of 1s and 0s. When the image is processed in a hardware-accelerated image processing block, which is fundamentally a digital logic circuit, these binary numbers directly contribute to the switching of the digital logic circuit. It is now well known that the amount of switching is one of the major contributing factors in the dynamic power consumption of digital logic circuit. Therefore, the binary pixel values must have some direct impact on the power consumption and thus the energy efficiency of the circuit.
Let us examine the structure of an image closely by taking an example of a greyscale image of size 128 × 128 pixels, as shown in Figure 1 . A greyscale image is composed of pixels, and these pixel values range from 0 to 255 if the pixel width is, typically, 8 bits. Let the shade of the image be just plain grey, i.e., all the pixel values have one value. Let us, for the sake of this example, take the value of the pixel, i.e., shade, to be 170 in decimal value. If this number is represented in hexadecimal, then it is 0xAA, and in binary it is 101010101. Table 1 shows a generic binary representation of the grey image.
Investigating the Impact of Image Content on the Energy Efficiency 57:3 Please note the pixel values, moving from one pixel to the next and within the pixel one bit to the next bit, the bit value transitions from 1 to 0 and 0 to 1 are purely due to the binary value representation of the pixel data. On a screen, this image would appear to feature less and content less to human eyes as shown in Figure 1 . This means even if the visual content in an image is not changing spatially, just because of the way pixel values are represented, then there exists switching due to the binary values of the pixels. Therefore, if the content changes, then the characteristics of the switching will change even more. In a video of a newsreader delivering a news bulletin, one can say that the content is nearly static and changing very slowly in comparison with a sports video [22] . However, as explained, there is always inherent continuous switching activity due to the way pixel values are represented in binary number system and this switching activity cannot be avoided.
This phenomenon of inherent switching warranted some initial empirical evidence to motivate us to further carry out a detailed investigation. Therefore, a two-dimensional Gaussian filter with a 5 × 5 kernel with an input clock frequency of 500MHz was implemented for a Xilinx Virtex-6 [3] Field Programmable Gate Array (FPGA) using the Xilinx System Generator [38, 24] Electronic Design Automation (EDA) tool. Images of Lena, a chequerboard, a plain white and a plain black of size 640 × 480 pixels, were processed in the filter. Dynamic power consumption to process each of the images was estimated using the simulation-based power estimation design flow in the Xilinx System Generator. The results as detailed in Table 2 intrigued us and pointed to a new more focused direction into investigating the impact of image content on the energy efficiency of the filter.
The research question that begs an answer is as follows: Does the content in an arbitrary image have an impact on the energy efficiency of a hardware-accelerated image processing function employed to filter the same image? If so, then what is the impact, how can it be quantified? To answer this question objectively, other related questions are required to be answered first. This article attempts to answer the following subordinate research questions using supporting evidence provided in the existing literature:
• How can the content of an image be quantified such that the relationship between the content and energy efficiency can be investigated? • What operations can be commonly performed to process a digital image? 
BACKGROUND AND LITERATURE REVIEW
The research in this article combines research from several fields. This means that, against the general expectations of this section, most of the available literature was not suitable for critically reviewing against/for the innovative research work presented in this article. Nonetheless, the review presented in this section is research questions driven and seeks to provide background information and support for the main and subordinate research questions, deductions, experimental design, results, and findings presented in this article as follows:
How can the content of an image be quantified such that the relationship between the content and energy efficiency can be investigated? Spatial frequency theory [27] defines an image as an accumulation of many primitive spatial "atoms" whereby these primitives are spatially extended patterns called sinusoidal gratings. Sinusoidal gratings are two-dimensional patterns whose luminance varies according to the sine wave over one spatial dimension and is constant over the perpendicular dimension. The primitive sinusoidal gratings can be characterised using four parameters: spatial frequency, phase, amplitude, and orientation. Applying the Fourier analysis method to a two-dimensional image produces a sum of a set of sinusoidal gratings that vary in spatial frequency, phase, amplitude, and orientation. The summation of all of these gratings at the proper amplitudes and phases would produce the original image. Fourier analysis can be used to decompose complex images into primitive components [9, 21, 27, 29, 37] .
What operations can be commonly performed to process a digital image? Digital image processing operations are typically classified into three categories: Point-based Operations, Local Neighbourhood Operations, and Global Operations. The Local Neighbourhood Operations exploit and work on the spatial characteristics around a pixel therefore these types of operations are also called spatial filters. The focus of this research was on the image processing operations that work on the spatial characteristics of an image. In a spatial filter, the input is typically convolved with a filter mask or a kernel to generate the output as shown in Figure 2 . The kernel contains weights or coefficients for producing the desired filter response. Spatial filters are widely used in image processing and pre-processing stages of image processing pipelines.
The Convolution of image I by a kernel H is given by:
This is denoted by I = I * H .
Investigating the Impact of Image Content on the Energy Efficiency 57:5 Fig. 2 . Sliding window-based spatial filtering of an image using Convolution [42] .
Here H is the impulse response function. This is because the kernel function, H, convolved with an impulse signal, δ (i, j) (an image that is 0 everywhere except at the origin), reproduces itself,
Why is there a need to accelerate digital image processing operations in hardware (digital integrated circuits)?
There is a significant rising trend in low-power and ultra-low-power battery-operated portable mobile computing devices. Some such devices include mobile phones, tablet computers, Wireless Sensor Network (WSN) Nodes, Internet of Things (IoT) sensors, e-health systems, security systems, home automation and environmental monitoring systems, and so on. Mobile devices run on a battery and are therefore extremely constrained by battery-imposed energy budget. The density of lithium-ion batteries has shown improvement of only 10% a year; therefore, battery technology has not scaled responsive to Moore's law due to a fundamental physics limitation [17] .
Computer vision and image processing applications are becoming popular in mobile batterypowered devices ranging from every-day smart phones to Unmanned Aerial Vehicles (UAVs) [36] . These algorithms and applications were originally designed for high-performance desktop computers; however, they are now required to be deployed onto much less powerful and energyefficient mobile computing platforms. Designers are expected to increase throughput per watt to meet the performance and energy efficiency requirements. For example, a typical digital camera capturing VGA resolution (640 × 480) video at a rate of 30 frames requires processing of 27 million pixels per second [18] . This rise in the required energy efficiency is due to real-time computing requirements and limited data transfer capabilities. This need for energy efficiency imposes an implied requirement to carry out image processing required by these applications on the edge, i.e., locally, in the computing device.
Performance is becoming a major issue as the traditional single and multi-core scaling techniques employed in the design of mobile CPU are failing to keep up with the demands of the mobile technology [17] . The single-core thermal design point (TDP) of mobile CPU's has saturated at around 1.5W, which is similar to the 100W power ceiling common to desktop CPUs. Moreover, the energy efficiency improvement of mobile CPUs has plateaued, as the performance improvements do not make up for the additional power consumption. Additionally, dark silicon is becoming a major problem due to increasing transistor densities and TDP. Customised hardware accelerators appear to be the way forward in terms of sustaining power, performance, and energy improvements for future computing. Modern mobile SoCs comprise a number of custom hardware accelerators, and this number will continue to rise in the future. There is a 3.5 times rise in fixed-function accelerators across the six most recent Apple SoCs. The ITRS predicts thousands of different on-chip accelerators by 2022. To increase performance and reduce energy costs, application-specific processors should be used to exploit the structure of algorithms [5] .
At the sub-symbolic level, the mathematical operations involved in processing images (i.e., convolution operations consisting of Multiplication and Addition, MAC) need to be repeated on the image data many times. Accordingly, it remains difficult to achieve real-time performance in software-based implementations of image processing while maintaining constraints on the energy consumption of battery-operated mobile devices. It can be seen that these types of processing applications could certainly benefit from hardware-enabled parallelisation. In this research, the FP-GAs have been chosen as a hardware platform to deploy and perform the experiments; however, the approach can be generalised to be applied to other hardware acceleration platforms such as ASIC.
How can the power consumption of a digital integrated circuit be calculated? Most modern silicon chips are manufactured using Complementary Metal Oxide Semiconductor (CMOS) technology. The main advantage of CMOS is its low power consumption. The power consumption in a CMOS circuit can be given by the following equation:
where P Total is the total power consumption, P D is the dynamic component, and P S is the static component of the power consumption. Dynamic power consumption, P D , of a CMOS Integrated Circuit (IC) has two extra components, namely the switching, P SW , and the short-circuit power consumption, P SC . A typical example of current flowing through a CMOS NOT gate (Inverter) when its output is switching from 0 to 1 and from 1 to 0 is shown in Figure 3 . The I 0->1 is absorbed into the output capacitance CL during the output transition from 0 -> 1 and the current I 1->0 flows from the output capacitance to the ground during the output transition from 1 -> 0 for discharging the output capacitance. Dynamic power consumption contributes to the overall power consumption significantly when the circuit is switching at high frequency due to charging and discharging of a capacitive output load [34] .
Here, in Equation (4), P D is the total dynamic power, which is the total of P SW , the power consumption due to switching of the transistor and P SC , which is due to the momentary short circuit between V DD and ground. This short circuit occurs when one transistor is turning ON while the other is switching OFF; at the time there exists a direct momentary path between V DD and ground [31] . Equation (5) shows the switching component of the power consumption where C L is the load capacitance, f clk is the clock frequency, V dd is the input voltage, and α t is the node switching activity factor. Equation (6) provides a simplified formula that models the average short circuit power for a CMOS gate [4, 31] . In Equation (6), I CC .max is the peak current, which depends on the saturation current of the devices and therefore on the transistors dimensions, and t SC is related to the signal rise time and fall time [31] . Dynamic power can be reduced significantly using techniques addressing the voltage and frequency parameters of Equation (5) by the way of down-scaling the supply voltage and frequency as and when required [26] . However, in many situations scaling clock frequency or voltage while changing relative speed of the components of the design to support the scaling can cause system malfunctions. For example, the conventional architectures based on time-multiplexing in DSP circuits and microprocessors do not allow downscaling of voltage [26] . In such cases, alternative solutions must be explored. One such method is to reduce the effective capacitance of the digital design. The effective capacitance C Eff is defined as the product of the average switching activity (α t , the average number of transitions per clock cycle) and the total circuit capacitive load.
Can the content processed by a digital integrated circuit have an impact on the power consumption of the circuit?
Much of the research on estimating and optimizing power consumption of embedded systems does not take into account the α t , the node switching activity factor as a potential candidate for power optimization, as shown in Equations (5) and (6) . This inconsideration for the α t can be because α t depends on input data, and in any embedded signal processing system, generally, input data are not known a priori. However, in the case of digital image processing, the input data are the input image, and when the images and videos are processed offline the input data are known a priori. Even in the case of a surveillance camera, when it is capturing live images of a scene, the image of the background and foreground remains static if there is no activity. This staticity enables the image content to be known a priori. The knowledge of the data allows accurate estimation of the resultant switching activity within a hardware processing pipeline and as a result enables accurate estimation of power consumption. In a Power Analysis Attacks (PAAs) scenario, the secret key (data) of a cryptographic core can be retrieved by measuring CMOS power consumption [4] .
If the digitally stored data can be identified reliably by measuring power consumption, then the converse must also be possible where power consumption can be accurately estimated from the data particularly in the case of digital image processing. It was demonstrated that by analysing consumers' household's electricity usage profile at a higher sample rate [13] , it was possible to identify which channel the TV set in the household was displaying. If content could be detected from power consumption, surely, power consumption could be estimated from the content. This phenomenon motivated us further to carry out detailed investigation in the area of our research.
Five algorithms, (1) motion estimation, (2) Discrete Cosine Transform (DCT), (3) Threedimensional graphics rendering, (4) Lempel-Ziv lossless compression, and (5) Viterbi decoding, were examined to be adapted dynamically based on variations in the input signal statistics with a view to reducing power consumption and improve performance [19] . The authors of Burleson et al. [6] provided power-performance tradeoffs for a dynamically parameterised MPEG-4 motion estimation algorithm. They reported that selecting the correct parameters based on the operating environment reduced the average power consumption by 40% for 2% loss in compression. A datadriven clock gating technique to switch off portions of their low-power and low-complexity VLSI architecture implementation of two-dimensional Discrete Cosine Transform DCT and Inverse Discrete Cosine Transform (IDCT) was presented by Fanucci and Saponara [12] . The system monitored input data for being zero and Null Row Check (NRC) and containing sign extended most significant bits, Sign Extension Check (SEC), to turn off portions of the implemented circuit. The authors have stated that for typical H.263/MPEG video coding applications their approach provided 36% and 26% power reduction in IDCT and DCT modes, respectively. All of this research does not explain the relationship between image content characterised by spatial frequencies and power and energy consumption of the implemented circuit. Moreover, they do not take into account the inherent switching present in an image or a frame of a video due to the way pixel values are represented in binary values.
Hadizadeh et al. [15] proposed a method for producing energy-efficient images for energyadaptive displays such as OLED displays while preserving its perceptual quality to their original images. The authors exploited the property of OLED displays whereby the energy consumption of pixels is directly proportional to the luminance of the pixels. The authors used a Just-Noticeable-Difference (JND) threshold to reduce the luminance of the pixels in an image. The authors were able to empirically demonstrate that their proposed method reduced energy consumption by about 14.1% while preserving the perceptual quality of the displayed images. This research clearly demonstrates that image content can have an image on the energy efficiency of the hardware used to display the image and serves as a supporting evidence to the findings presented in this article.
How are the commonly used image processing functions implemented in hardware? Spatial Image filtering is carried out by performing convolution between a two-dimensional kernel and the image. The algorithms in image processing work in a very similar manner to a two-dimensional convolution operation of an image. The process calculating the output pixel in a convolution involves a rectangular window of the input image pixels and a few constant coefficients fetched typically in a row-major order. This window is then slid and traversed on the whole input image to produce the pixels values for the output image. Therefore, convolution is also known to be working in a sliding window manner. This sliding window is also called a stencil [5] . As shown in Figure 4 , the hardware implementation of convolution kernel contains a window function, a line buffer, and a stencil register [5] . The window function accepts the pixel values supplied by the stencil register and processes each of the values with the corresponding coefficient values and calculates the output pixel. The line buffer stores the rows of pixel values that are required to be re-used between successive row traversals. The stencil register is provided with a refreshed column of pixel values for each overlapping window of input image.
Investigating the Impact of Image Content on the Energy Efficiency 57:9 Fig. 5 . Cascading kernels in image processing applications [35] .
As shown in Figure 5 , more complex image processing operations can be implemented by cascading the kernels. These kernels work in the same way as a convolution kernel. Therefore, such convolution family applications can be implemented by reusing hardware components from a single kernel application and interconnecting them [5] .
Most image processing applications can be constructed from a set of "stencil" kernels [5] . Many applications in the domains of computer vision, image and signal processing, and computational photography could be mapped onto a virtual machine model of the stencil kernel. Stencil kernels typically involve computing the pixel in an output image from a fixed-size sliding window of pixels in its corresponding input image. Stencil kernels are essentially spatial filters that mainly use a convolution operator for processing [5] .
The sliding window technique is one of the most widely used techniques in image processing algorithms [10] . Hardware implementation typically comprises of image rows buffered on the chip to benefit from the locality of the data and avoid unnecessary off-chip pixel transfers. The Sliding Window Operations (SWOs) are typically deployed on FPGA-based prototyping boards as hardware accelerators for image processing applications [16] .
Gupta and Prasanna [14] proposed a configurable image convolution architecture where the input pixel resolution, the image size, the convolution window size, coefficients, and the type of memory used can be explored to identify design tradeoffs in obtaining energy efficiency. The authors carried out design space exploration with these parameters and constructed an energy model to estimate the energy consumption. The authors used a number of Operations per joule as their metric for energy efficiency. The authors claimed to have achieved energy efficiency of up to 32.98Gops/J and sustained peak energy efficiency up to 34.38%. Even though the authors carried out the design space exploration with energy efficiency as their main objective, they did not take into account the impact of the switching variability in the input data on the energy consumption.
Experimental Design
The following sections detail the experimental design.
Dependent Variables
Energy efficiency was selected as the main dependent variable; however, power and energy consumption were the other related dependent variables of interest.
Independent Variables
Since the impact of the content of an image was investigated, the parameters that characterise the image content at the fundamental level were selected as the independent variables. The aim was Spatial frequency of sinusoidal gratings present in a synthetic dataset of images of Sinusoidal Gratings. e spatial frequency of gratings ranged from 0 to the maximum number of gratings that can be accommodated in an image of a given size based on the Nyquist-Shannon theorem.
Orientation
Orientation of sinusoidal gratings present in the dataset ofimages of sinusoidal gratings.e orientation of gratings ranged from 0°to 180°. is is because gratings rotate around the centre and cover the remaining 180°, thus covering the entire 360°. Phase
Images of Sinusoidal Gratings where the phase of gratings ranged from 0°t o 360°.
Contrast
Images of Sinusoidal Gratings where the contrast ofimages was varied as given by the Michelson Contrast. e maximum value was 1.
Image Size
From the literature, it was found that, typically, square images were used in image processing research, and their dimensions range from 16 × 16 pixels to 1,024 × 1,024 pixels.
Spatial Filter Operation
Image processing operations that are sliding window with a two-dimensional kernel-based spatial filter architecture were selected for the experimentation. ese operations were implemented using Xilinx System Generator. A library of such operations was created.
to capture a statistically significant sample from the population while considering practicalities of implementation and simulation time. The selected independent variables are shown in Table 3 .
Prototyping Platform
The Xilinx ISE and System Generator tool version 14.7 [38] with Matlab-Simulink with the image processing tool box version 2012a has been used to implement and prototype the entire library of spatial filters for FPGA implementation. The System Generator extends the Matlab-Simulink environment to enable hardware design, providing high-level abstractions that can be automatically compiled into an FPGA. The System Generator also carries out full timing simulation-based power estimation using the Xilinx Power Estimation Tool XPower Analyser (XPA) [40] . The particular design flow offered by the System Generator which is known as the Timing and Power Analysis flow is used in the experiments. The output at the end of this flow shows both timing and power analysis. This tool takes into account the exact logic and routing resources used and the actual activity from design simulation. All of the implemented spatial filters were also functionally validated on the Xilinx ML605 [39] by prototyping them in the HW/FPGA.
Library of HW Implemented Spatial Filters
To explore the impact of energy consumption on various spatial filter operations, a library of hardware implemented spatial filter operations was developed. These filters included line buffers, Difference of Gaussian (DoG) Operation, SIFT Detector, Gaussian 3 × 3, Gaussian 5 × 5, Gaussian 7 × 7, Gaussian 9 × 9, Gaussian Separable 5 × 5, Laplacian 3 × 3, Mean Filter 5 × 5, Median Filter 3 × 3, Morphological Filter 5 × 5, and Sobel Filter 3 × 3. These spatial filters were implemented based on the most commonly used hardware architecture, the two-dimensional kernel sliding window architecture. The input image and the kernel coefficients were not stored in memories to isolate the energy efficiency of the implemented FPGA logic. The image was streamed into and provided by the Matlab-Simulink environment to the hardware block, whereas the kernel coefficients were hardcoded into the logic. The convolution block required multiplying the input sample with a coefficient and then adding the result with a result from the next pixel. Therefore, the implementation required a number of multiply and accumulate blocks consisting of a multiplier and an adder blocks. Since a Matlab-Simulink-based Xilinx System Generator tool is used for the design entry, each of the implemented spatial filters was saved and stored as a Simulink Model file with an "mdl" file extension. Gaussian Smoothing spatial filter with a kernel size of 5 × 5 was selected as the template spatial filter on which most experiments were carried out.
Software
The following software programs were implemented to generate, extract and process the necessary input data for the experimentation.
• Synthetic Image Data Set Generator Tool: a program that synthesised images with Sinusoidal Gratings while varying spatial frequency, orientation, and image size upon user configuration. • Spatial Filter Configuration and model creator Tool: a program that automatically configured the existing hardware implemented spatial filter to adapt and support varying image sizes and clock frequencies • Extraction & Tabulation Tool: a program to automatically extract necessary information from the timing and power reports generated by the EDA tools and tabulate it in a CSV format. • Co-ordinating Tool: a program coordinating the entire experiment automatically.
Generating Synthetic Images with Sinusoidal Patterns
A dataset of synthetic images was generated using a Matlab script. A black circular mask was applied to every image with sinusoidal grating. This application of the mask ensured that the length of the gratings remained uniform across all the different orientations as shown in Figure 6 . The images that were generated had the Michelson contrast set to 1, which meant the range of black and white pixels of most of the gratings is 256 with equal width of black and white half cycles, i.e., from 0x00 to 0xFF.
Results
In this section, the results from the experimental exploration experiments are presented. A software called JMP [32, 33] was used extensively for plotting graphs and data analysis. 
Metrics
In the experimentation, at first, power consumption given by watt (W) was used for initial validation. However, in the final validation, energy efficiency of the spatial filter was investigated.
The image processing workload can be characterised by the image size and the kernel size. The workload in terms of image and kernel size was kept constant in the experiments where the image content was varied. Energy efficiency was considered to be the number of operations per unit dynamic energy consumed. For an image processing operation such as convolution, where an image size is N × N and kernel size K × K, the energy efficiency can be given by N 2 K 2 divided by dynamic energy consumed by the spatial filter [14] . The metric Giga-Operations per joule was used for the energy efficiency analysis. The metric for spatial frequency for an image comprising only of two-dimensional sinusoidal gratings was number of cycles per image. The maximum number of sinusoidal gratings that can be fit in an image is half of the width of image given in pixels. The orientation of the sinusoidal gratings in an image was measured in degrees.
Experimental Assumptions
It has been established that a spatial filter follows a common anatomical structure in its hardware implementation. Therefore, the default template architecture for all the experiments was the 5 × 5 Gaussian Filter implemented in the System Generator. The default clock frequency of the experimentation was set to 100MHz, and image size was set to 128 × 128 pixels; however, image sizes of 256 × 256 and 512 × 512 pixels were also used. The test images were synthesised images of sinusoidal gratings of varying phase, orientation, spatial frequencies, and contrast. The orientation of the sinusoidal gratings was calculated from vertical to clockwise direction in all the experiments.
Initial Validation
First, how the variation in the power consumption is affected by the varying the independent variables. The Coefficient of Variation (CV) in the power consumption results was compared amongst the various independent variables. The coefficient of variation or relative standard deviation (RSD) is the ratio of the standard deviation to the mean (average). This statistic shows the measure of spread that describes the amount of variability relative to the mean. Since the statistic is a unitless ratio, it can be used to objectively compare the spread of datasets that have different units or different means, and that is exactly what was done. If the CV of a set of results was found to be statistically significantly less than the others, then the variable was omitted from the experiments. The threshold for comparison for the CV was set to be statistically significant to 2%. Table 4 shows the summary of CVs for all the independent variables. It is quite clear from the table that the variability in the data for independent variables Contrast and Phase is significantly lower (0.08% and 1.24%, respectively) than all the other variables. This kind of impact on the variability can only happen if the effect of the Contrast and Phase on the dependent variable was negligible. Therefore, in the experiments, the independent variables Contrast and Phase were omitted. Moreover, the contrast was normalized to make image calculations independent of the contrast. Since the contrast is given by the Michelson contrast, the contrast normalisation was performed using the contrast stretching method to cover maximum range of an 8-bit pixel value that ranges from 0 to 255. This normalization was carried out by stretching the range of intensity values to make full use of possible values [30] .
Experimental Exploration
The main aim here was to investigate the relationship of the image content in the form of the spatial frequencies and the orientations of those spatial frequencies present in an image, with the energy consumption of the hardware implemented spatial filter that was applied to process the same image.
Spatial Frequency
First, the key results showing the impact of spatial frequencies on energy consumption are presented. The spatial frequency was varied with orientation while keeping the image size to 128 × 128, kernel size to 5 × 5, clock frequency to 100MHz, and the filtering operation to the template Gaussian Filter. Some of the sample images are shown below. These images were processed in the implemented filter using the simulation-based power estimation flow in the System Generator tool. Figure 7 and Figure 8 are example images of sinusoidal gratings used in the experimentation. Table 5 shows selected results (spatial frequencies 0, 1, 2, 4, 8, 16, 32, and 64) from the experiment. The time taken to process one image of 128 × 128 pixels was 180,499ns. The energy efficiency of a plain black image was considered as the baseline for the analysis of the results. Here the orientation was fixed to 0°and 90°to assess the impact of the variation in spatial frequency. It can be seen from the table that the energy efficiency drops to 12.05% for the maximum spatial frequency 64 cycles per image for a 128 × 128 image. However, for an orientation of 90°, the energy efficiency is at 72.38% when the spatial frequency of the image is at the maximum. The energy efficiency in terms of Giga-Operations per joule versus spatial frequency overlaid with orientation is plotted in Figure 9 . The graph shows that the energy efficiency decreases with the increase in spatial frequency. The energy efficiency is at a maximum when orientation is 90°w
hilst it is at a minimum when orientation is 0°. Table 6 shows selected results (spatial frequency 0, 1, 2, 4, 8, 16, 32, 64, and 128) from the experiment where the spatial frequency and the orientation were varied while the spatial filter was the template Gaussian filter with kernel size 5 × 5, image size 256 × 256, and clock frequency 100MHz. The time taken to process one image of 256 × 256 pixels was 721,171ns. The energy consumption and energy efficiency of the black image was considered as the baseline for the analysis of the results. Here, the orientation was fixed to 0°and 90°to assess the impact of the variation in spatial frequency. The energy efficiency was considered in terms of Giga-Operations per joule. It can be seen from the table that the energy efficiency drops to 14.46% for the maximum spatial frequency 128 cycles per image for a 256 × 256 image. However, for an orientation of 90°, the energy efficiency is at 86.73% when the spatial frequency of the image is at the maximum. The energy efficiency considered in terms of Giga-Operations per joule versus spatial frequency overlaid with orientation is plotted in Figure 10 . The graph shows that the energy efficiency decreases with the increase in spatial frequency. The energy efficiency is at maximum when orientation is 90°whilst it is at minimum when orientation is 0°. Table 7 shows selected results (spatial frequency 0, 1, 2, 4, 8, 16, 32, 64, 128, and 256) from the experiment where the spatial frequency and the orientation were varied while the spatial filter was the template Gaussian filter with kernel size 5 × 5, image size 512 × 512, and clock frequency Investigating the Impact of Image Content on the Energy Efficiency 57:15 Fig. 9 . Energy efficiency vs. spatial frequency, overlaid by orientations, image size 128 × 128. 100MHz. The time taken to process one image of 512 × 512 pixels was 2,883,859ns. The energy consumption and energy efficiency of a black image was considered as the baseline for the analysis of the results. Here, the orientation was fixed to 0°and 90°to assess the impact of the variation in spatial frequency. The energy efficiency was considered in terms of Giga-Operations per joule. It can be seen from the table that the energy efficiency drops to 15.14% for the maximum spatial frequency 256 cycles per image for a 512 × 512 image. However, for orientation 90°, the energy efficiency is at 93.76% when the spatial frequency of the image is at the maximum. The energy efficiency considered in terms of Giga-Operations per joule versus spatial frequency overlaid with orientation is plotted in Figure 11 . The graph shows that the energy efficiency decreases with an increase in spatial frequency. The energy efficiency is at a maximum when the orientation is 90°whilst it is at minimum when the orientation is 0°.
Image Size 256 × 256

Image Size 512 × 512
All Image Sizes
To isolate the impact of image size on the dependent variable, images of varying sizes were synthesised. The spatial frequency of the sinusoidal grating in the image was set to 1 cycle per image, the clock frequency to 100MHz, the phase to 90°, and the orientation to 0°. Images with varying sizes of 16 × 16, 32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512, and 1,024 × 1,024 pixels were synthesised. Since the width of the line buffers in the spatial filter changes with the width of an image, dedicated template spatial filters with line buffers of different sizes to accommodate each of these different image sizes were developed and implemented. These images were processed in the implemented filters and power consumption was estimated in the System Generator tool. Figure 12 shows the graph plot between dynamic power consumption and image size in a linear-log scale.
It is important to note in the graph that, at lower sizes of images such as 16 × 16, 32 × 32, and 64 × 64, more dynamic power has been shown to have been consumed than with some of the larger sizes. This phenomenon is a counterintuitive result because, generally, the increase in the image size increases the amount of logic to store and process the image, which should result in increased dynamic power consumption. When this phenomenon was investigated in detail in the power analysis reports, it was found to be largely due to the power consumed by the primary inputs and These images were processed with varying orientations in the template 5 × 5 Gaussian filter, and power consumption was estimated in the System Generator tool. Figure 13 is a graph of dynamic IO power vs. image size in a linear-log scale. As the image size is increased, the IO power is decreasing. This decline in the IO power can only be possible if the number of IO switching in a unit time is more for the smaller image than it is for the larger image. This phenomenon is explained in detail in the analysis section.
However, when the graph of image size against the energy consumed to process the image was plotted, the graph follows the intuition whereby the consumed energy increases with the image size as shown in Figure 14 . The graph is plotted using a log-log scale on the X-and Y-axes. Figure 15 is a graph that plots dynamic energy against varying image sizes (16 × 16, 32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512, and 1,024 × 1,024 pixels) while varying spatial frequencies and orientation, using a log-log scale. Here the range of the spatial frequencies are from 1 to 4, because the minimum image size that is explored is 16 × 16. The energy consumption increases with the increase in image size.
Investigating the Impact of Image Content on the Energy Efficiency 57:19 Figure 16 shows the graph of energy efficiency in Giga-Operations per joule versus image size overlaid with spatial frequency and orientation set to 0°. This graph is plotted using a linear scale on its Y-axis and log scale on its X-axis to better represent the data. As can be seen, the energy efficiency increases for smaller images but, however, decreases for larger images as the image size and spatial frequency increase. Figure 17 shows the graph of energy efficiency in Giga-Operations per joule versus image size overlaid with spatial frequency and orientation set to 90°. This graph is plotted using a linear-log 57:20 R. K. Raval and A. Badii scale. It can be seen that the energy efficiency increases for smaller images but decreases for larger images as the image size and spatial frequency increase.
Orientation Table 9 shows selected results (Orientations 0°, 30°, 45°, 60°, 90°, 120°, 135°, and 150°of 0°, 11.25°, 22.5°, 30°, 33.75°, 45°, 56.25°, 60°, 67.5°, 78.75°, 90°, 101.25°, 112.5°, 120°, 123.75°, 135°, 146.25°, 150°, 157.5°, and 168.75°) from the experiment where the orientation was varied and the spatial Fig. 17 . Energy efficiency vs. image size while varying spatial frequency and orientation is set to 90°. frequency was set to 1 and 32 for image size of 128 × 128 and the template Gaussian filter with kernel size 5 × 5 and clock frequency 100MHz. The time taken to process one image of 128 × 128 pixels was 180,499ns. The energy consumption and energy efficiency of the black image was considered as the baseline for the analysis of the results. The energy efficiency was considered in terms of Giga-Operations per joule. It can be seen from the table that for spatial frequency one cycle per image the energy efficiency drops to 24.41% for the 0°orientation and peaks at 74.17% 90°orientation. However, for spatial frequency 32 cycles per image, the energy efficiency is at a minimum, at 13.15% at 0°orientation, and peaks at 69.55% at 90°orientation. Figure 18 is the graph between energy efficiency in Giga-Operations per joule and various orientation values in degrees. Again, it is important to note that the energy efficiency is at a maximum when the orientation is 90°while it is at a minimum when the orientation is 0°in images with both the spatial frequencies. Therefore, in the experiments, the orientations were limited to 0°and 90°, as this range covered the entire population.
Image Operations
In this section, the impact of energy consumption on various hardware-accelerated spatial filter operations is explored. These include line buffers, Difference of Gaussian (DoG), SIFT Detector, Gaussian 3 × 3, Gaussian 5 × 5, Gaussian 7 × 7, Gaussian 9 × 9, Gaussian Separable 5 × 5, Laplacian 3 × 3, Mean Filter 5 × 5, Median Filter 3 × 3, Morphological Filter 5 × 5, and Sobel Filter 3 × 3. Figure 19 and Figure 20 show the graphs of dynamic energy efficiency given by Giga-Operations per joule against various filter operations while spatial frequencies are varied and orientation is set to 0°and 90°, respectively. The energy efficiency for 0°orientation shows a slightly decreasing trend as the spatial frequency increases. Here the most complex image processing pipeline is the SIFT detector, which consumes the largest amount of power, and hence it is the least energy efficient, whereas the energy efficiency for 90°orientation is nearly constant for spatial frequency 1 onward. It is important to note that all the lines in the graph follow the general curve as explored previously with the template spatial filter of Gaussian 5 × 5. This means the results for the template filter can be generalized for any spatial filter that follows the same architecture.
Analysis
This section presents the analysis of the results and explains the results.
Impact of Spatial Frequency
It is important to note that, as shown in the graph in Figure 9 , Figure 10 , and Figure 11 , the maximum power consumption occurs when the spatial frequency of the image is 64 sinusoidal gratings. As defined by the Nyquist-Shannon theorem, the maximum number of sinusoidal grating that can be fitted into an image is image width divided by 2. Therefore, for an image of size 128 × 128, the maximum sinusoidal gratings that can be fitted is 64. This image of maximum sinusoidal gratings creates an image of one-pixel-wide black and white stripes at 0°orientation, which is an image of two-dimensional square wave, i.e., vertical bars/stripes of black and white. Since the value of a white 8-bit pixel is 255 in decimal or 0xFF in hexadecimal and the value of a black pixel is 0 in decimal or 0xFF in hexadecimal, traversing in a horizontal direction in the image, the pixel values change from 0x00 to 0xFF and 0xFF to 0x00, which means that every bit of the 8-bit wide pixel changes from 0 to 1 and 1 to 0. This change in the bit value results in a maximum number of transitions at the IO ports of the FPGA when the image is scanned in row-major order. These transitions also contribute to the maximum amount of switching in logic.
Moreover, the power consumption of black and white images is at the lowest and is almost the same, because pixel values in the black and white images do not change. In a black image, all the pixel values are 0x00, and in a white image all the pixel values are 0xFF without any variation in them.
In the middle, the power consumption generally increases with the spatial frequency of the image, and this rise is mainly due to the variation in the content that is due to the variation in the spatial frequencies present in the image, and this means there is variation in the pixel values that then contribute to the amount of switching at the IO ports and in the logic. This content driven switching is defined by the switching activity factor α t .
To understand the effect of the spatial frequency as shown in the middle region in the graph, where the energy consumption increases with increase in spatial frequency as shown in Figure 9 , Figure 10 , and Figure 11 , the two main factors that impact the amount of switching in the circuit should be understood. The transition density and static transition probability of every bit of an 8-bit binary number because a pixel is typically represented in as an 8-bit binary number in a digital image. The transition density of a signal, denoted by α t , is given by the average number of transitions of the signal per unit time. The static transition probability of the signal is the probability of the signal being high at any given time [28] . As seen previously in Figure 3 , it is in the transition from 0 to 1, the current is drawn into the circuit, which contributes to the power consumption of the circuit.
An 8-bit binary number ranges from 0x00 to 0xFF (255 10 ). The normal binary sequence goes from 00000000 2 to 11111111 2 .
Switching activity, P 0->1 has two components: a static component that is a function of the logic topology and a dynamic component that is a function of the timing behaviour of the logic circuit (includes glitching). In this article, only the static component is considered for two reasons. First, the dynamic component, for example, glitching, depends on the exact implementation of the logic circuit which cannot be foreknown, and, second, to limit the scope of this research work.
The static transition probability of a binary single bit can be given by:
where P 0->1 is the probability of the output bit to transition from 0 to 1, P out =0 is the probability of the output bit to be 0, and P out =1 is the probability of the output bit to be 1. Moreover, in an 8-bit binary number, as moving from the Least Significant bit (LSb) to the Most Significant bit (MSb), the significance of the bits in the binary number and as a result the value of the 8-bit binary number is given by = 2 7 × bit7 + 2 6 × bit6 + 2 5 × bit5 + 2 4 × bit4 + 2 3 × bit3 + 2 2 × bit2 + 2 1 × bit1 + 2 0 × bit0. Therefore, moving from LSb to MSb, each bit switches from 0 to 1 in a decreasing order. As an example, the LSb toggles from 0 to 1 and 1 to 0 alternatively; however, the second bit, going from right to left, toggles every 1/2 n times the toggle rate of the LSb, where n is the bit position. Therefore, the transition density decreases by half moving from LSb to MSb.
Therefore, for a binary number represented with more than one-bit width, 8-bit in this case, each bit has a different static transition probability. For example, the probability of transition P LS B (0->1) of the LSb can be given by multiplying the probability of the bit being "0," P 0 , and the bit being "1," P 1 .
The static transition probability of the LSb in an 8-bit binary number can be given by:
Moving from LSb to MSb, the static transition probability reduces by 4. Therefore, the individual static transition probabilities of each bit in an 8-bit binary number is given in Table 10 [7] .
However, in the case of the spatial frequency being the maximum, 64, the pixel transitions from black to white alternatively, which is 00000000 to 11111111 and back to 00000000. This means the static transition probability of each bit is at the maximum of 1/4, which is the same as Bit0, the LSb. Also, the transition density of each of the bits is same as the LSb. This means as the spatial frequency is increased, the number of 8-bit binary values (samples) that represent a cycle of the two-dimensional sinusoidal grating is reduced.
Impact of Image Size
The same thing happens when the image size is reduced but the spatial frequency is kept constant. This means that the transition density and static transition probabilities of the bits going from LSb to MSb, right to left, increases and reaches at maximum 1/4 depending on the spatial frequency or the size of the image. Transition density together with the static transition probability increases the switching at the primary IO ports of the spatial filter as the input to the filter is the streaming of 8-bit pixel scanned from the image in row-major order.
This rise in switching of the primary IO ports is the reason why the IO power consumption increases when the spatial frequency of the sinusoidal gratings is increased, or image size is reduced while keeping the spatial frequency constant. However, since the amount of logic used in implementing the spatial filter in hardware to process a smaller, say, 16 × 16, image is considerably less than a larger image of, say, 1,024 × 1,024 size, the impact of the power consumed by the logic is not as significant as the power consumed by the IO. Therefore, for smaller images, the IO power dominates the total power consumption.
Similarly, for larger image size the proportion of the IO power in the total power consumption is reduced as the image size is increased for a given spatial frequency as the static transition probabilities of middle bits (between LSb and MSb) reduces.
Signal Rate and Transition Density
Let us explore another empirical evidence to study the impact of the signal rate or transition density by way of extracting the switching activity information from the Xilinx Power Analyser (XPA) for the input and output ports. To extract this information, the information provided under the term "signal rate" in the XPA power consumption report was used. Xilinx [41] defines the signal rate by the number of millions of transitions per second (2xClockRate in MHz) for the signal under consideration.
Since the clock frequency is kept fixed in most of experiments, only the pixel values at the 8-bit primary input (gateway_in) and output (gateway_out) ports of the spatial filter hardware implementation are considered. For example, the gateway_in port from the most significant bit gateway_in7 to the least significant bit gateway_in0 and similarly for the gateway_out port. In a clock-driven synchronous design, the maximum value of transitions can be given the half of the input clock frequency. Therefore, in the case where the clock frequency is 100MHz and the transitions for the same are 200 million transactions per second (Mtps) while the data only change once every clock cycle, the data transmission rate would be 100 Mtps maximum. If the synchronous design has components that change on each of the clock edges, i.e., positive and negative, then for the given clock frequency of 100MHz, the data signal rate would be 200Mtps. Figure 21 shows the graph of the mean of each bit in the 8-bit input pixel values given in Mtps versus the image size and confirms the theoretical findings as explained. This graph is plotted using a linear-log scale. As it can be seen in the graph, the signal rate of the most significant bits increases as the image size decreases, thus increasing the power consumed in the IO ports. Here the spatial frequency is fixed to one cycle per image and the orientation is set to 0°. Figure 22 shows the same impact of varying image size on the output port gateway_out, using a linear log scale. The mean signal rate of each of the bits in the 8 bits of gateway_out increases from least significant bit to the most significant bit as the image size decreases.
For the case where the spatial frequencies are varied by providing varying frequency grating images to the spatial filter, Figure 23 shows the graph of signal rate of each of the gateway_in bits versus spatial frequency, using a linear-log scale. Regression lines are fitted through each of the data points of gateway_in bits to extract the trend. It can be observed from the graph that as the spatial frequency increases, the trend is that there is an increase in the signal rate from the least significant bit (gateway_in0) to the most significant bit (gateway_in7) in gateway_in input port. Here the orientation is fixed to 0°.
Investigating the Impact of Image Content on the Energy Efficiency 57:27 The same general trend of increasing of the signal rate continues for the pixel output port gate-way_out of the spatial filter as shown in Figure 24 . This graph is plotted using a linear-log scale.
In the case where the spatial frequency of an image is varied while keeping the size of the image fixed, since a larger image requires more logic in the spatial filter to process the image, the rise in spatial frequency increases the power consumption in the logic thus increasing overall power and energy consumption. This rise in the occupied logic area can be observed in Figure 25 , where the area occupied by the hardware implemented spatial filter in terms of FPGA slices versus the size of an image to be processed is plotted. The graph uses a log-log scale.
However, the power efficiency, which is power consumed multiplied by the time to process the image, is given by energy consumption and is as expected whereby smaller images take less energy to process in comparison with larger images, as shown in Figure 25 , of dynamic energy consumption vs the image size.
Impact of Orientation
To assess the impact of the orientation on the energy efficiency of the spatial filter in processing the image, we need to refer to the anatomy of the image and the hardware implementation of a Table 11 . Power and Energy Consumption of Elephant versus Tiger spatial filter. In an image of a sinusoidal grating with 0°orientation, discounting the black circular mask in the image, the change in content, i.e., the variation in the content as given by a change in the pixel values is only in the horizontal direction; however, in the vertical direction, the variation in the content is zero and the pixel values remain constant. Now, in the template spatial filter, the pixels are scanned in row-major order. Therefore, when the pixel values are presented at the input ports of the hardware implemented spatial filter, the variation in the values of pixels as experienced by the spatial filter is maximum. This rise in switching at the IO ports, contributes to the IO power consumption in the FPGA and contributes to the dynamic power consumption in the logic due to the increased amount of switching. However, when the sinusoidal grating is aligned horizontally, i.e., orientation at 90°, the change in pixel values is non-existent in the horizontal direction, and thus when the image is in scanned row-major order, the values presented at the IO ports of the spatial filter do not switch in the same amount as any other orientation. This phenomenon has a direct impact on the power and energy consumption of the spatial filter used to process the image. If the spatial filter scanned the image in a column major order, then the effect would be reversed.
Validation on Natural Images
Given any two images of a tiger and an elephant as shown in Table 11 , one could pose a question as to whether the image of the tiger would consume a different amount of energy than that of the elephant while filtering them using the same digital circuit. Table 11 shows the energy consumption of an image of an elephant and a tiger. These images were processed in the Gaussian 5 × 5 template spatial filter, and the energy consumption was estimated. Care was taken to ensure that the Region Of Interest (ROI), in this case, the elephant and the tiger, was re-sized to have a very similar area in order for an objective comparison to happen. As seen in the table, there is a clear difference in the power and energy consumption between the image of the elephant and the tiger. Here, the image of the tiger consumes more power and energy than the elephant.
Analysis
This difference in the energy consumption between the images of the tiger and elephant could be explained by Figure 26 and Figure 27 , which show the graph of the signal rate of 8 bits of gateway_in input port and the 8 bits of gateway_out output port versus various images of size 256 × 256 pixels of the two animals, elephant (seven arbitrary greyscale images of elephants) and tiger (four arbitrary greyscale images of the tiger). Here, the Region of Interest is not exactly scaled to be similar in the area; however, the size of the images was kept the same. This disparity in the ROI is mainly because the impact of the images of animals on the signal rate of the IO ports was observed. Graph of mean of each of the bits of input port gateway_in and output port gateway_out were plotted for signal rate. It is quite clear that the general trend in the signal rate between the elephant and the tigers in both input and output ports is rising. The only reason this rising trend in the signal rate could be explained is that the images of the tiger have higher spatial frequencies present in them due to the stripes of the tiger; however, since the images of the elephants are predominantly shades of grey, their spatial frequencies are lower. These differences in spatial frequencies contribute to the differences in the IO port signal rate, which then contributes to the power and energy consumption of the spatial filtering circuit.
CONCLUSION
An experimental framework was developed comprising a library of spatial filters implemented in hardware and a reference dataset. This framework included the development of software utilities to customise the spatial filters automatically to create hardware design instances on which to perform the empirical exploration. Accordingly, software utilities were developed to create a dataset of synthetic images comprisingtwo-dimensional sinusoidal gratings and utilities to automate the experimental process. The developed HW library of spatial filters was deployed in the respective series of experiments conducted in this research to enable the empirical demonstration of the results. Thus a reference framework has been established for quantification of image content, an image content processing complexity metric, for modelling the computational energy efficiency in digital processing of images. The results of experiments have shown that the hardware-accelerated spatial filter consumed more energy to process an image with a higher complexity metric, e.g., the image of a tiger required more energy to process than the image of an elephant. This decline in the energy efficiency is because of the spatial frequencies present in the image of the tiger due to its stripes. These spatial frequencies contribute to a higher number of switching of signals as required in processing the image, thus increasing the overall dynamic power consumption. Some of the notable contributions made in this article by empirical demonstration are as follows:
• Even a plain grey image consumes dynamic power when processed in a digital circuit. This phenomenon occurs is mainly due to the inherent switching present in the pixels represented in binary format. • The impact of contrast and phase in a sinusoidal grating image on the dynamic power consumption of a spatial filter is not statistically significant. • The maximum amount of energy is consumed when the orientation of the sinusoidal grating in the image is at 0°and the least energy is consumed when the orientation is at 90°. It was discovered that this effect was due to the row-major order scanning of the image and the horizontal symmetry of the hardware blocks to store the image rows. • The variation in the spatial frequency of an image has a significant impact on the energy efficiency of the spatial filter used for processing it. It was confirmed that this impact was due to the gradual increase in the transition density and the static transition probabilities of the individual bits of the input port from the least significant bit to the most significant bit of the spatial filter due to the binary pixel values. • The variation in the orientation of the spatial frequencies also has a significant impact on the energy efficiency of the spatial filter.
• Different types of spatial filters consume different amounts of energy; however, they follow the same model and the difference in energy consumption is constant based on the filter used. • At lower sizes of images such as 16 × 16, 32 × 32, and 64 × 64 consume more dynamic power than the larger sizes. This phenomenon was found to be largely due to the power consumed by the primary IOs. As the image size is increased, the IO power decreases, due to the fact that the number of IO switching in a unit of time is higher for the smaller images than for larger images. Energy efficiency increases for smaller images but decreases for larger images as the image size and spatial frequency increase.
Accordingly, the results should serve to motivate insights and further research in pursuit of the optimisation of the computational energy efficiency of hardware-accelerated image processing algorithms.
DISCUSSION AND FUTURE WORK
Hadizadeh et al. [15] proposed a method for producing energy-efficient images for energy-adaptive OLED displays while preserving the perceptual quality of the original images. Similarly, the findings presented in this article should motivate the consideration of the attributes of images that influence energy efficiency in their processing and accordingly the attempts to optimize the tradeoff among design, messaging, and rendered perceptual quality objectives of such images to enhance the energy efficiency of the hardware-accelerated spatial filter used to process the images. For example, images where the content in the image has large vertically (90°) orientated structures, could be rotated to 0°so that the vertical structures are orientated horizontally and then processed through a hardware-accelerated digital spatial filter that scans the image in a row-major order, which could potentially result in fewer switching and energy savings than otherwise. Furthermore, spatial frequencies of an image could be reduced, without affecting its ultimately rendered perceptual quality, prior to processing in the spatial filter, which could potentially result into energy savings.
Further research could progress this interesting area of research, beyond the scope of this article, to evaluate energy efficiency in processing additional images including other types of synthetic and natural images and/or device/algorithm/configuration/ workflow variations, essentially exploring the computational energy-efficiency correlates of image processing, for example:
• Varying image sizes with different aspect ratios to a square image.
• Using natural images and varying the content in them.
• Varying image content and investigating the impact on the other two major dependent variables, namely, the area and the performance of the hardware-accelerated spatial filters. • Exploring other types of spatial filter architectures, different than the commonly used architecture presented in this article. • Varying the hardware implementation platforms such as other FPGA devices and ASIC implementations. • Varying the content present in a colour image on the energy efficiency of a software or hardware-accelerated image processing algorithm. • Varying configurations of algorithms and workflows involving GPUs, an embedded microprocessor like ARM and traditional single-core and multi-core CPUs.
