Abstract-The key obstacle to communicating images over wireless sensor networks has been the lack of suitable processing architecture and communication strategies to deal with the large volume of data. High packet error rates and the need for retransmission make it inefficient in terms of energy and bandwidth. This paper presents novel architecture and protocol for energy efficient image processing and communication over wireless sensor networks. Practical results show the effectiveness of these approaches to make image communication over wireless sensor networks feasible, reliable and efficient.
I. INTRODUCTION

I
N Wireless Multimedia Sensor Networks (WMSN), with the large volume of the multimedia data generated by the sensor nodes, both processing and transmission of data leads to higher levels of energy consumption than in any other types of wireless sensor networks (WSN). This requires the development of energy aware multimedia processing algorithms and energy efficient communication [1] in order to maximize network lifetime while meeting the QoS constraints.
A few protocols have been proposed to achieve image transmission over WSN [2] - [4] . Reference [2] aims at providing a reliable, synchronous transport protocol (RSTP), with connection termination similar to TCP, but does not consider the resource limitations of WSN. Reference [3] presents an energy-efficient and reliable transport protocol (ERTP) with hop-by-hop reliability control, which adjusts the maximum number of retransmission of a packet. Reference [4] proposes another reliable asynchronous image transfer (RAIT) protocol. It applies a double sliding window method, whereby network layer packets are checked and stored in a queue, to prevent packet loss. With protocols providing reliability at the transport [3] or network layers [4] , erroneous packets at the application layer can still be forwarded to the base station, requiring retransmission and associated energy cost [5] . In [6] , the authors stated that multi-hop transmission of JPEG2000 images is not feasible due to interference and packet loss. This statement is also cited by other literature [5] , [7] . In this paper, image transmission over multi-hop WSN is proved to be feasible, using a combination of energy efficient processing architecture and a reliable application layer protocol that reduces packet error rate and retransmissions. A novel FPGA architecture is used to extract updated objects from the background image. Only the updated objects are transmitted using the proposed protocol, providing energy-efficient image transmission in error-prone environments. Fig. 1 presents the architecture of the proposed WMSN processing system. The network processor performs some standard operations as well as customized instructions to support the operations of the wireless transceiver. It operates at a low clock frequency to keep the power consumption low. The image processing block runs at a high frequency to process images at a high speed. By default, it is in inactive mode (sleep mode with suppressed clock source), and can be quickly set into the active mode by the network processor whenever an object extraction task needs to be performed.
II. ARCHITECTURE FOR OBJECT EXTRACTION
In WMSN applications, the camera mote often has a fixed frame of view. In this case, to detect moving (updated) objects, background subtraction is a commonly used approach [8] . The basic concept of this is to detect the objects from the difference between the current frame and the background image. The background image represents a static scene of the camera view without any moving objects. An algorithm must be applied to keep the background image regularly updated to adapt to the changes in the camera view. For background subtraction, the Running Gaussian Average appears to have the fastest processing speed and lowest memory requirements [9] . It is further optimised for FPGA implementation and is incorporated into the proposed WMSN system.
A. Background Subtraction
The Running Gaussian Average model [8] is based on ideally fitting a Gaussian probability density function on the last n values of a pixel. The background pixel value at frame n is updated by the running average calculation shown in (1) . where, B n is the updated background average, F n is the current frame intensity, B n−1 is the previous background average, α is an updating constant. The value of α is chosen in the form of 1/2 k , because multiplication by 1/2 k can be easily implemented by a simple bit shifting circuit, thereby greatly reducing hardware complexity. So, (1) is written as:
At each new frame, the pixel is classified as foreground (i.e. belongs to an updated object) if the condition below is true.
where, T is the updating threshold. The result of the calculation (F n -B n−1 ) in (2) is reused in (3). This, along with a multiplier free implementation, leads to an optimized hardware architecture. The results in (3) can be used to decide whether a background image needs to be updated or not. The hardware implementation of (2) and (3) is shown in Fig. 2 . By default, B n is calculated for every pixel in each frame; however the decision to update the previous average (B n−1 ) depends on the value of the update signal U, which is a 1-bit signal. This signal is saved in an external RAM to provide the information required to calculate the location of the updated objects to be extracted.
B. Object Extraction
An efficient object extraction algorithm has been implemented to detect portions of the current frame that is significantly different from the background image. It involves row and column scanning of the update signals (U) to determine if the number of consecutive differences (1s) is greater than a pre-determined difference threshold. The proposed object extraction scheme, consisting of background subtraction, row/column scanning and threshold comparison blocks, was implemented and tested on various FPGAs. Table I compares the proposed architecture with [10] and [11] , which have reported synthesis results for similar object extraction functions. Both [10] and [11] have used Altera FPGAs to synthesise their architectures. Hence, for a fair comparison, Table I reports synthesis results of the proposed architecture for the same Altera FPGAs. Clearly, the proposed architecture requires significantly less FPGA resources (LE -logic elements) compared with [10] and [11] . It has a maximum frequency of operation (Fmax) of 125.4MHz on Altera Cyclone III, the highest reported so far, due to its multiplier free and optimized hardware architecture.
III. IMAGE TRANSMISSION PROTOCOL
The first challenge in image transmission is that compressed images are sensitive to packet errors. For this reason, a reliable transmission protocol is needed at the application layer to ensure that all image packets are sent and received correctly and in the right order. We have implemented a transmission protocol at the application layer as depicted in Fig. 3 . The basic idea behind this protocol is dividing the image into a number of packets, assigning packet ID, packet control and error detection. Table II describes the structure of the protocol messages implemented at the application layer.
Even with the above application layer protocol incorporated within 802.15.4, practical experiments show that when trans- [4] where multiple nodes can start data transmission at the same time, the proposed protocol allows only one node to transmit packets to the base station. In Fig. 3 , when the base station starts an image transmission, first it broadcasts a small START-OF-TRANSMISSION message to all nodes in the network. This message will also predefine the size of the image packet and forbid any nodes which are not part of the image transmission link to send anything until they receive the END-OF-TRANSMISSION message. This mechanism ensures that only one image packet is sent at a time to avoid collision and congestion, and therefore to reduce packet loss and the associated energy cost of retransmission.
From our practical experiment with a multi-hop WMSN of the proposed FPGA-based nodes, we can confirm that a JPEG2000 image couldn't be reconstructed at the base station when the standard 802.15.4 protocol was used. The test results in Table III show that the proposed queue control strategy significantly reduces Packet-Error-Rate (PER) and increases data throughput for multi-hop communications. Clearly, small sized packets exhibit better performance (throughput) than large sized packets. However, this may not be the case in other environments, for example where the SNR is lower. Without queue control, the image data flow is transparent to the intermediate nodes and this increases the probability of introduction of packet errors at the intermediate nodes.
With queue control, the data packets are always checked for correctness using Cyclic Redundancy Checks (CRC) before forwarding. This reduces PER at the base station. The throughput in Table III is calculated based on the actual image data received at the base station over time.
IV. NETWORK SET UP, TESTING AND RESULTS
The WMSN processing architecture presented in this paper, including a networking processor and the object extraction block, was implemented on a Xilinx Spartan-3 FPGA. To further reduce the size of the updated object and consequently the transmission energy, an efficient JPEG2000 Discrete Wavelet Transform (DWT) processor [12] was used. Only the lowfrequency sub-band [12] of the transformed object is transmitted. Two FPGA prototypes were developed and connected to Digi XBee and Microchip MRF24J40 transceivers respectively. These FPGA prototypes have been used in conjunction with COTS based WSN nodes (Atmel ATmega328p microprocessor) to set up a WMSN for testing. Each camera node includes a CMOS camera and a battery. The CMOS camera is a small size, low cost and low power OmniVision OV7640/8 VGA CMOS colour sensor.
A. Network Set Up and Operation
The experimental set up of a wireless sensor network with 10 nodes including a few camera nodes (CAM1, CAM2 etc.) is shown at the bottom part of Fig. 4 . All multimedia data captured by the camera nodes could be transferred and monitored on a base station connected to a PC. The background image and the extracted objects were received and reconstructed by software running on the PC. The graphical user interface (GUI) is shown in Fig. 4 along with the background image and the extracted objects.
B. Power Analysis
Table IV presents data on power consumption of the image processing block in the active state for both object extraction and DWT, as reported by Xilinx Power Analyzer. In the inactive state, the power consumption of this block reduces to only 0.3mW. The results reveal that the proposed architecture is suitable for use in low power applications. Using PicoScope, we measured the total power consumed by the FPGA board over a period of time, as shown in Fig.  5 . When idle, the standby power consumed by the entire FPGA board is ∼54mW. The total power consumption rises to ∼82mW during transmission of an image frame. The difference of ∼28 mW is the power required for transmission of the image frame. Measurements have also shown that only 10mW of power is consumed for object extraction and DWT processing (@50MHz) for a background image of 640x480 pixels. Clearly the power consumption for image processing (∼10mW) is much less than the power consumption for data communication (∼28mW). Nonetheless, the proposed object extraction scheme has significantly reduced the energy required for image transmission. Experiments have confirmed that the total power used to process an image and to transmit an updated (i.e. extracted) object of size 160x100 is ∼20 times less than the power required to only transmit the original image of size 640x480. More importantly, the proposed application layer protocol has contributed significantly to the reduction of the energy cost of image communication. This is achieved due to the combination of queue control strategy, which reduces the packet error rates, and the strategy to allow only one node to transmit at a time, thereby reducing the possibility of collision and congestion.
C. Comparison with COTS based WSN
Software was developed to run the background subtraction and object extraction functions on an ATmega328p microcontroller based WSN node. In the active mode, the power consumption of the ATmega328p node at 16MHz, 3.3V, 25OC is about 11mW, while that of the proposed architecture on XC3s1000 FPGA at 50MHz is 21.96mW (from Table IV ). However, the ATmega328p system requires more than 2 seconds to complete the image processing task, while the proposed processing system completes the same task in only 49.15ms, i.e. more than 40 times faster. Consequently, the total energy consumption of the ATmega328p node is 22.8mJ, which is ∼20 times higher than that of the proposed system. The results on processing time and energy consumption are summarised in Table V . These results show that the proposed WMSN processing system is highly energy efficient and much faster than COTS processor based systems. In addition, the proposed system works effectively in detecting any updating objects in the camera view.
V. CONCLUSIONS The object extraction architecture coupled with the DWT processor helps significantly reduce the energy cost of image transmission. The application layer protocol proposed in this paper incorporates an effective queue control strategy to reduce packet error rate. In addition, the protocol employs a strategy to allow only one node to transmit at a time, thereby reducing collision and congestion, and consequently the number of retransmissions. The practical results presented in the paper clearly demonstrate the effectiveness of the proposed techniques, namely significant reduction in energy cost of image communication. In contrast with the predictions made in available literature, the proposed strategies make image communication over wireless sensor networks feasible.
