CITRIC: A low-bandwidth wireless camera network platform by Phoebus Chen et al.
CITRIC: A LOW-BANDWIDTH WIRELESS CAMERA NETWORK PLATFORM
Phoebus Chen†∗, Parvez Ahammad†, Colby Boyer†, Shih-I Huang§, Leon Lin§, Edgar Lobaton†,
Marci Meingast†, Songhwai Oh‡, Simon Wang§, Posu Yan†, Allen Y. Yang†, Chuohao Yeo†,
Lung-Chung Chang§, J.D. Tygar†, and S. Shankar Sastry†
† Electrical Engineering and Computer Sciences, University of California, Berkeley; CA 94720, USA
‡ Electrical Engineering and Computer Sciences, University of California, Merced; CA 95344, USA
§ Industrial Technology Research Institute; Chutung, Hsinchu, Taiwan 310, R.O.C.
ABSTRACT
In this paper, we propose and demonstrate a novel wireless
camera network system, called CITRIC. The core component
of this system is a new hardware platform that integrates a
camera, a frequency-scalable (up to 624MHz) CPU, 16MB
FLASH, and 64MB RAM onto a single device. The device
then connects with a standard sensor network mote to form
a camera mote. The design enables in-network processing
of images to reduce communication requirements, which has
traditionally been high in existing camera networks with cen-
tralized processing. We also propose a back-end client/server
architecture to provide a user interface to the system and sup-
port further centralized processing for higher-level applica-
tions. Our camera mote enables a wider variety of distributed
pattern recognition applications than traditional platforms be-
cause it provides more computing power and tighter integra-
tion of physical components while still consuming relatively
little power. Furthermore, the mote easily integrates with ex-
isting low-bandwidth sensor networks because it can com-
municate over the IEEE 802.15.4 protocol with other sensor
network platforms. We demonstrate our system on three ap-
plications: image compression, target tracking, and camera
localization.
Index Terms— Wireless Sensor Network, Camera Sensor,
Sensor Architecture, Embedded System.
1. INTRODUCTION
Wireless sensor networks (WSNs) have emerged as a new
class of information technology infrastructure where comput-
ing is embedded into the physical world [9, 1, 11]. A WSN
∗Corresponding author. Email: phoebusc@eecs.berkeley.edu.
This work is partially supported by ARO MURI W911NF-06-1-0076 and
by TRUST (Team for Research in Ubiquitous Secure Technology), which
receives support from the National Science Foundation (NSF award num-
ber CCF-0424422) and the following organizations: AFOSR (#FA9550-06-
1-0244), BT, Cisco, ESCHER, HP, IBM, iCAST, Intel, Microsoft, ORNL,
Pirelli, Qualcomm, Sun, Symantec, Telecom Italia, and United Technologies.
consistsofalargenumberofspatiallydistributeddeviceswith
computing and sensing capabilities, i.e., motes, which form
an ad-hoc wireless network for communication. Applications
of WSNs include building control [13], environmental moni-
toring [29], trafﬁc control [20], manufacturing and plant au-
tomation [34], service robotics [17], and surveillance [23].
The standardization of communication protocols for sensor
networks, namely IEEE 802.15.4 and ZigBee, has facilitated
the effort to commericalize WSNs.
The research in WSNs has traditionally focused on low-
bandwidth sensors (e.g., acoustic, vibration, and infrared sen-
sors) that limit the ability to identify complex, high-level
physical phenomena. This limitation can be addressed by
integrating high-bandwidth sensors, such as image sensors,
to provide visual veriﬁcation, in-depth situational awareness,
recognition, and other capabilities. This new class of WSNs
is called heterogeneous sensor networks (HSNs). The inte-
gration of high-bandwidth sensors and low-power wireless
communication in HSNs requires new in-network informa-
tion processing techniques and networking techniques to re-
duce the communication cost for long-term deployment.
In this paper, we describe the design and evaluation
of a wireless camera mote for HSNs, called the CITRIC
mote, which is a wireless camera hardware platform with a
1.3megapixel camera, a PDA class processor, 64MB RAM,
and 16MB FLASH. This new platform will help us develop
a new set of in-network information processing and network-
ing techniques for HSNs. Since wireless camera networks
performing in-network processing are relatively new, it is im-
portant for our platform to balance performance with ease of
development of in-network computer vision algorithms to en-
able a wider base of applications. Modularity is a key tenet
of our design, reﬂected in the separation of the image pro-
cessing and networking hardware on the CITRIC mote and in
the separation of functions in our client/server back-end soft-
ware architecture for the entire CITRIC system. Surveillance
is used as an example scenario throughout this paper.
Figure 1 shows a typical network conﬁguration for our
978-1-4244-2665-2/08/$25.00 c 2008 IEEE
Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras, September 2008, pp. 1-10Fig. 1. Architecture of our wireless camera network.
surveillance system. The CITRIC motes are wirelessly net-
worked with each other and possibly with other types of
motes over the IEEE 802.15.4 protocol. Some motes also
communicate with gateway computers that are connected to
the Internet. The motes ﬁrst perform pre-processing functions
onimagescapturedfromthecamerasensorsandthensendthe
results over the network to a central server, which routes the
information to various clients for further processing and visu-
alization. The server itself may also provide some centralized
processing and logging of data. This architecture allows vari-
ous clients to interact with different subsets of the motes and
support different high-level applications.
We envision our surveillance system to be deployed in a
perimeter (e.g., a building or a park) where security can be
administered by a single entity. Multiple surveillance sys-
tems can also be connected over the Internet. The central
server should not be a signiﬁcant bottleneck in the system
because much of the image processing and computer vision
algorithms will be run on the motes, meaning the wired back-
end system will not be processing raw image streams. Also,
by not streaming images over the network, the system pro-
vides better security against eavesdropping and better privacy
protection to those under surveillance.
The rest of this section surveys existing camera mote plat-
forms and motivates why a new design is necessary to meet
all our design requirements. We believe that our platform pro-
vides the best balance between performance, cost, power con-
sumption, ease of development, and ease of deployment.
1.1. Related Work
Similar to the design of our platform, many of the existing
camera motes consist of a camera-and-processor board and
a networking mote. A comparison of some representative
platforms with our platform is shown in Table 1. A good
treatment on the baseline computation requirements for in-
network image processing can be found in [8]. The network-
ing motes have minimal on-board processing, typically not
suitable for running image processing or computer vision al-
gorithms.
Some platforms in the past focused on streaming video to a
centralized server for processing, such as eCAM [25], a small
wearable camera platform consisting of an image compres-
sion module (no programmable CPU) and a networking node.
One of the earliest camera motes with signiﬁcant on-
board processing is Panoptes [10]. The latest version of
the Panoptes platform consists of a Stargate “gateway mote,”
an 802.11b PCMCIA wireless card, and a USB camera.
Panoptes is targeted at applications where one would selec-
tively stream video to conserve bandwidth. To this end, the
platform has a priority-based adaptive buffering scheme, a
ﬁlter to remove uninteresting video frames, a video/camera
query system, and video compression. The use of commer-
cial devices in Panoptes, instead of a tightly integrated de-
sign, imposes extra limitations. Most notably, the frame rate
of the camera is limited by the USB bus speed, which forces
the USB camera to compress the image and the Stargate pro-
cessor to decompress the image to perform processing, thus
consuming extra computation and power.
On the other hand, the Cyclops [27], WiSN [8], and WiCa
[15] platforms have much tighter camera and on-board pro-
cessor integration. Cyclops was designed for low power op-
eration and connects a complex programmable logic device
(CPLD) directly to the camera for basic image processing
such as background subtraction and frame differencing. How-
ever, the 8-bit, 7.3MHz low-power CPU and 64KB RAM
limits the computation capability for supporting higher-level
computer-vision algorithms. WiSN uses a more powerful
32-bit, 48MHz CPU and also 64KB RAM, but the proces-
sor is shared between networking and image processing pro-
cesses. Similar to the Cyclops, the second generation WiCa
mote speeds up low-level image processing using an 84-MHz
Xetal-II SIMD processor, which has a linear processor array
of 320 parallel processing elements and a 16-bit global con-
trol processor for higher-level sequential processing. It uses a
separate 8051 MCU and ZigBee module for networking [14].
The platform most similar to the CITRIC mote is a proto-
type platform used by [30], which consists of an iMote2 [5]
connected to a custom-built camera sensor board. The plat-
form consists of an XScale CPU running at a slightly lower
clock speed, 32MB RAM, 32MB FLASH, and an OmniVi-
sion camera. Unlike the CITRIC mote, the networking and
image processing functions are both performed on the XScale
processor, and the platform does not have a built-in micro-
phone. The separation of the image processing unit from the
networking unit in the CITRIC mote allows for easy develop-
ment and testing of various image processing and computer
vision algorithms.
Finally, multi-tiered camera networks have also been pro-
posed to use low cost/power/resolution camera motes to wake
up higher cost/power/resolution cameras to capture and pro-Table 1. Comparison of existing wireless camera mote platforms with the new CITRIC mote platform.
Platform Processor RAM ROM Camera Wireless
eCAM OV528 Serial Bridge N/A N/A COMedia C328-7640 board Eco node
[25] (JPEG Compression only) uses OV7640 camera uses nRF24E1 radio+MCU
(640 × 480pixel @ 30fps) (1Mb/s, 10m range)
Panoptes Intel XScale PXA255 64MB 32MB Logitech 3000 USB Camera
1 802.11 PCMCIA Card
[10] (400MHz, 32-bit CPU) (640 × 480pixel @ ≈ 13fps) (11Mb/s for 802.11b)
(160 × 120pixel @ ≈ 30fps)
Cyclops Atmel ATmega128L 64KB 512KB ADCM-1700 Mica2 mote
[27] (7.3728MHz, 8-bit CPU) (352 × 288pixel @ 10fps) uses TR1000 radio
Xilinx XC2C256 CoolRunner (40kbps)
(16MHz CPLD)
WiSN Atmel AT91SAM7S 64KB 256KB ADCM-1670 built-in CC2420 radio
[8] (48MHz, 32-bit +32KB
2 +2MB
2 (352 × 288pixel @ 15fps) (802.15.4, 250kbps)
ARM7TDMI CPU) ADNS-3060
(30 × 30pixel @ 100fps)
WiCa (Gen 2) Xetal-II 1.75MB N/A Unknown Aquis Grain ZigBee+MCU
[15, 14] (84MHz, 320 PE LPA + GCP) (640 × 480 @ 30fps) uses CC2420 radio
(802.15.4, 250kbps)
iMote2+Cam Intel XScale PXA271 32MB 32MB OV7649 built-in CC2420 radio
[30] (up to 416MHz, 32-bit CPU) (640 × 480pixel @ 30fps) (802.15.4, 250kbps)
(320 × 240pixel @ 60fps)
CITRIC Intel XScale PXA270 64MB 16MB OV9655 Tmote Sky mote
(up to 624MHz, 32-bit CPU) (1280 × 1024pixel @ 15fps) uses CC2420 radio
(640 × 480pixel @ 30fps) (802.15.4, 250kbps)
cess interesting images. One such notable multi-tier camera
network system is SensEye [16], which consists of 3 tiers
of cameras. In the future, we also envision deploying our
CITRIC mote in a multi-tier network, particularly ones com-
posed of heterogeneous sensors (e.g., passive-infrared motion
sensors and microphones).
2. ARCHITECTURE AND DESIGN
2.1. Camera Mote
The CITRIC platform consists of a camera daughter board
connected to a Tmote Sky board (see Figure 2, left). The
Tmote Sky [19] is a variant of the popular Telos B mote
[26] for wireless sensor network research, which uses a Texas
Instruments MSP430 microcontroller and Chipcon CC2420
IEEE 802.15.4-compliant radio, both selected for low-power
operation.
The camera daughter board is comprised of a 4.6cm ×
5.8cm processor board and a detachable image sensor board
(see Figure 2, middle). The design of the camera board uses
a small number of functional blocks to minimize size, power
consumption, and manufacturing costs.
To choose a proper onboard processor, we have the op-
tion to use either ﬁeld-programmable gate arrays (FPGAs)
or general-purpose processors running embedded Linux. Al-
though FPGAs have advantages in terms of speed and low-
power consumption, the user would need to program in a
hardware description language, making algorithm implemen-
tation and debugging a time-consuming process. On the other
hand, many well-studied image processing and computer vi-
sion algorithms have been efﬁciently coded in C/C++, such as
the OpenCV library [2]. Therefore, we chose to use a general-
purpose processor running embedded Linux (as opposed to
TinyOS [32]) for the camera board for rapid prototyping and
ease of programming and maintenance.
2.1.1. CMOS image sensor
The camera for our platform is the OmniVision OV9655, a
low voltage SXGA (1.3megapixel) CMOS image sensor that
offers the full functionality of a camera and image processor
onasinglechip. ItsupportsimagesizesSXGA(1280×1024),
VGA, CIF, and any size scaling down from CIF to 40 × 30,
and provides 8-bit/10-bit images. The image array is capable
of operating at up to 30 frames per second (fps) in VGA, CIF,
and lower resolutions, and 15fps in SXGA. The OV9655 is
designed to perform well in low-light conditions [24]. The
typical active power consumption is 90mW (15fps @SXGA)
and the standby current is less than 20µA.
1Frame rate limited by compression and USB bandwidth.
2External memory extension. Extending both RAM and ROM not per-
mitted.Fig. 2. (Left) Assembled camera daughter board with Tmote. (Middle) Camera daughter board with major functional units outlined. (Right)
Block diagram of major camera board components.
2.1.2. Processor
The PXA270 [12] is a ﬁxed-point processor with a maximum
speed of 624MHz, 256KB of internal SRAM, and a wireless
MMX coprocessor to accelerate multimedia operations. The
processor is voltage and frequency scalable for low power op-
eration, with a minimum voltage and frequency of 0.85V and
13MHz, respectively. Furthermore, the PXA270 features the
Intel Quick Capture Interface, which eliminates the need for
external preprocessors to connect the processor to the camera
sensor. Finally, we chose the PXA270 because of its maturity
and the popularity of its software and development tools. The
current CITRIC platform supports CPU speeds of 208, 312,
416, and 520MHz.
2.1.3. External Memory
The PXA270 is connected to 64MB of 1.8V Qimonda Mo-
bile SDRAM and 16MB of 1.8V Intel NOR FLASH. The
SDRAM is for storing image frames during processing, and
the FLASH is for storing code. 64MB of SDRAM is more
than sufﬁcient for storing 2 frames at 1.3megapixel resolu-
tion (3Bytes/pixel × 1.3megapixel × 2frames = 8MB), the
minimal requirement for background subtraction. 64MB is
also the largest size of the Single Data Rate (SDR) mobile
SDRAM components natively supported by the PXA270 cur-
rently available on the market. As for the FLASH, the code
size for most computer vision algorithms falls well under
16MB. Our selection criteria for the types of non-volatile and
volatile memory are access speed/bandwidth, capacity, power
consumption, cost, physical size, and availability.
Our choices for non-volatile memory were NAND and
NOR FLASH, where the former has lower cost-per-bit and
higher density but slower random access and the latter has
the capability to execute code directly out of the non-volatile
memory on boot up (eXecution-In-Place, XIP) [6]. NOR
FLASH was chosen not only because it supported XIP, but
also because NAND Flash is not natively supported by the
PXA270 processor.
Our choices for volatile memory were Mobile SDRAM
and Pseudo SRAM, both of which consume very little power.
Low power consumption is an important factor when choos-
ing memory because it has been demonstrated that the mem-
ory in handsets demands up to 20 percent of the total power
budget, equal to the power demands of the application pro-
cessor [33]. Mobile SDRAM was chosen because of its sig-
niﬁcantly higher density and speed.
We had to forgo using multi-chip packages (MCPs) that in-
corporate a complete memory subsystem (ex. NOR + Pseudo
SRAM, NAND + Mobile SDRAM, NOR + NAND + Mobile
SDRAM) in a single component due to their availability, but
they may be used in future versions of the platform.
2.1.4. Microphone
In order to run high-bandwidth, multi-modal sensing algo-
rithms fusing audio and video sensor outputs, it was impor-
tant to include a microphone on the camera daughter board
rather than use a microphone attached to the Tmote Sky wire-
less mote. This simpliﬁed the operation of the entire sys-
tem by dedicating the communication between the Tmote
Sky and the camera daughter board to data that needed to
be transmitted over the wireless network. The microphone
on the board is connected to the Wolfson WM8950 mono au-
dio ADC, which was designed for portable applications. The
WM8950 features high-quality audio (at sample rates from 8
to48ks/s)withlow-powerconsumption(10mAall-on48ks/s
mode) and integrates a microphone preampliﬁer to reduce the
number of external components [35].2.1.5. Power Management
The camera daughter board uses the NXP PCF50606, a power
management IC for the XScale application processors, to
manage the power supply and put the system into sleep mode.
When compared to an equivalent solution with multiple dis-
crete components, the PCF50606 signiﬁcantly reduces the
system cost and size [21]. The entire camera mote, includ-
ing the Tmote Sky, is designed to be powered by either four
AA batteries, a USB cable, or a 5V DC power adapter cable.
2.1.6. USB to UART bridge
The camera daughter board uses the Silicon Laboratories
CP2102 USB-to-UART bridge controller to connect the
UART port of the PXA270 with a USB port on a personal
computer for programming and data retrieval. Silicon Labo-
ratories provides royalty-free Virtual COM Port (VCP) device
drivers that allow the camera mote to appear as a COM port
to PC applications [28]. The CP2102 is USB 2.0 full-speed
(12Mbps) compliant, and was chosen because it minimizes
the number of physical components on the PCB.
The camera daughter board also has a JTAG interface for
programming and debugging.
2.2. Wireless Communications
As shown in Figure 1, sensor data in our system ﬂow from
the motes to a gateway over the IEEE 802.15.4 protocol, then
from the gateway over a wired Internet back-end to a cen-
tralized server, and ﬁnally from the server to the client(s).
The maximum data rate of 802.15.4 is 250kbps per frequency
channel (16 channels available in the 2.4 GHz band), far too
low for a camera mote to stream images back to the server at a
high enough quality and frame rate for real-time applications.
A key tenet of our design is to push computing out to the
edge of the network and only send post-processed data (for
instance, low-dimensional features from an image) in real-
time back to the centralized server and clients for further pro-
cessing. If an event of interest occurs in the network, we can
then send a query for the relevant image sequence to be com-
pressed and sent back to the server over a slightly longer pe-
riod of time. Since we are using commercial off-the-shelf
motes running TinyOS/NesC, we can easily substitute differ-
ent standard routing protocols to suit an application’s particu-
lar needs. For instance, the real-time requirements of surveil-
lance imply that typical communication does not need to run
over a reliable transport protocol.
2.3. Client/Server Interface
From a more abstract point of view, the sensor network can be
modeled as a shared computing resource consisting of a set of
nodes that can be accessed by multiple users concurrently. As
such, users logging into the system can assign different tasks
to different nodes. The ﬁrst user to log in to a node becomes
a “manager” of that node (see Figure 3). Other users logging
in to the system can assign tasks to any unmanaged nodes,
but will only be able to listen to the data output by managed
nodes. For example, if there is a node performing a tracking
task for a given user then a new user will not be able to assign
a face recognition task to the node but s/he will still be able
to log in and listen to the output data for the tracking task. As
users log out of the nodes, the manager role is passed on to
the next logged in user. In the future, the system will be ex-
tended to allow users to negotiate the use of the resources by
integrating an interface for users to yield the role of manager
to others.
Note that in this shared computing model, the server is “in-
visible” to the users. In reality, the server will provides some
services (such as database storage for more memory intensive
tasks) but its role will be hidden from the user.
3. CAMERA MOTE BENCHMARKS
3.1. Energy Consumption
The power consumption of the camera mote was determined
by logging the current and voltage of the device when it was
connected to four AA batteries (outputting ≈ 6V). A Tek-
tronix AM 503B Current Probe Ampliﬁer was used to convert
current to voltage, and a National Instruments 9215 USB data
logger was used to log both the voltage of the batteries and
the voltage of the current probe.
First, we measured the power consumption of the camera
daughter board alone running Linux but with no active pro-
cesses (Idle). We then took the same measurement but with
the Tmote attached, although no data was sent to the Tmote
(Idle+Tmote). In this test, the Tmote was running an appli-
cation that waits to receive any packets from the camera board
andtransmitsovertheradio. Onaverage, Idleconsumes428–
478mW, and Idle+Tmote consumes 527 – 594mW, depend-
ing on the processor speed.
We also measured the power consumption of the mote run-
ning a typical background subtraction function. The test uti-
lizes all the components of the mote by both running the CPU
and using the Tmote to transmit the image coordinates of the
foreground. At the processor speed 520MHz, the power con-
sumption was 970mW. Note that the power consumption may
be reduced by enabling power management on the Tmote.
The current draw is relatively constant over time, even
though the voltage of the batteries decreases with time. The
calculations were made using the nominal voltage of 6V in
order to be consistent, since each experiment starts and ends
with a different voltage. If we assume the camera mote con-
sumes about 1W and runs on batteries with 2700mAh capac-
ity, we expect the camera mote to last over 16hours under
continuous operation.Fig. 3. Screenshot of user interface. (Left) A text menu is available for assigning tasks to the motes. At the top of this menu is a table
showing that the user is the “manager” of one node and “not logged in” on the other node. (Right) The graphical visualization of the network
physical layout and a detected foreground region from a selected mote.
3.2. Speed
The speed benchmarks for the camera board were chosen to
reﬂect typical image processing computations. We compared
the benchmarks with and without the Intel Integrated Perfor-
mance Primitives (IPP) library to evaluate whether IPP pro-
vides a signiﬁcant performance increase.
All benchmarks were performed on 512 × 512 image ar-
rays. The Add benchmark adds two arrays. The Background
Subtraction benchmark computes the difference of two arrays
and then compares the result against a constant threshold to
get a boolean array (mask). The Median Filter benchmark
performs smoothing by taking the median pixel value of a
3 × 3 pixel area at each pixel. The Canny benchmark imple-
ments the ﬁrst stage of the Canny edge detection algorithm.
The benchmark results for Add and Background Subtraction
were averaged over 1000 trials, while those for Median Filter
and Canny were averaged over 100 trials.
Figure 4 shows the average execution time of one itera-
tion for each benchmark. Note that the IPP versions of the
functions are not necessarily always faster than their non-
IPP counterparts. For example, the Background Subtraction
benchmark consists of an arithmetic operation and a compar-
ison. Implemented in IPP, this requires two function calls and
thus two iterations through the entire array. But implemented
without IPP, we can perform both operations on the same it-
eration through the array, resulting in only one iteration and
fewer memory accesses. Such non-IPP optimizations should
be taken into consideration when building future applications
in order to obtain maximum performance. Also, the non-
linear performance curve for different CPU frequencies can
be attributed to the constant speed of memory access (the bus
speed is 208MHz regardless of the processor speed).
4. APPLICATIONS
In this section, we demonstrate the capacity and performance
of the proposed platform via three representative low-level vi-
sion applications: image compression, target tracking, and
camera localization. The algorithms are implemented in
C/C++ on the camera motes and a base-station computer. In
Fig. 4. Average run time of basic image processing functions on
512 × 512 images (over ≥ 100 iterations). Execution time at 520
MHz is shown in parentheses.
all the experiments, the processor speed is set at 520 MHz.
4.1. Image Compression
Image compression is an important function to many camera-
based applications, especially in networks designed to push
captured images to a back-end server for further processing.
We quantitatively measure the speed of image compression
on the CITRIC mote platform, and the rate-distortion and
time-distortion trade-offs between two compression schemes:
JPEG and Compressed Sensing (CS) [4].
The embedded Linux OS includes the IJG library that im-
plements JPEG compression.3 Since the onboard CPU only
has native support for ﬁxed-point arithmetic, we used the in-
teger DCT implementation.
The CS scheme using random matrices [7, 3] has been
shown to provide unique advantages in lossy compression,
particularly in low bandwidth networks with energy and com-
putation constrained nodes. While a detailed discussion about
CS is outside the scope of this paper, we brieﬂy describe the
key elements used in our experiments. Suppose an n × n im-
age or image block I ∈ Rn
2
stacked in vector form can be
written as a linear combination of a set of basis vectors; i.e.,
3Available from http://www.ijg.orgI = Fx, where F ∈ Rn
2×m is some (possibly overcomplete)
linear basis. If one assumes that only a sparse set of basis vec-
tors is needed, i.e., all but a small percent of the coefﬁcients
in x are (close to) zero, then it has been shown that, with
overwhelming probability, x can be stably recovered from a
small number of random projections of the original image I,
y . = RI = RFx ∈ Rd, where R ∈ Rd×n
2
is a random
projection matrix, and d is the number of coefﬁcients used in
compression [7, 3]. The sparse representation x can be recov-
ered via `1-minimization:
x∗ = argmin
x
kxk1 subject to y = RFx. (1)
The reconstruction of the image is then ˆ I = Fx∗.
In this paper, the components of the random matrix R
are assigned to be ±1 with equal probability (i.e., the
Rademacher distribution). Therefore, computing the random
projections only involves addition and subtraction, which is
particularly suitable for ﬁxed-point processors.4 We apply
an Exponential-Golomb coding method to encode the random
projection coefﬁcients. At the decoder, the inverse DCT ma-
trix is used as the image basis F to perform reconstruction via
`1-minimization.
Compared to image compression based on block DCT
transform or wavelets, there is no surprise that random pro-
jection requires more coefﬁcients to achieve the same com-
pression quality. On the other hand, the random projection
scheme has the following unique advantages:
1. Transmission of the random projections y is robust to
packet loss in the network. Even if part of the coefﬁ-
cients in y is lost during transmission, the receiver can
still adaptively create the appropriate measurement ma-
trix R0 and carry out `1-minimization at the expense of
less accuracy.
2. It is straightforward to implement a progressive com-
pression protocol using random projection, e.g., one can
construct additional random projections of the image
signal I to improve the reconstruction accuracy.
3. In terms of security, if (part of) the projection signal y
is intercepted but the random seed used to generate the
random matrix is not known to the intruder, it is more
difﬁcult to decipher the original signal I than using co-
efﬁcients by DCT and wavelets.
We use two standard test images, “Lena” and “Barbara”,
for our compression evaluations. Each test image is a 512 ×
512 grayscale picture. To evaluate compression performance,
we measure the reconstruction quality with peak signal-to-
noise ratio (PSNR), the byte rate, and the compression time.
4We have compared the reconstruction quality using ±1 random coefﬁ-
cients with general real Gaussian coefﬁcients, and we found the difference is
minimal.
The byte rate indicates the size of the data transmitted by the
mote while the compression time gives an indication of the
amount of energy expended while compressing. Each test im-
age is compiled into the program code and copied to a mem-
ory buffer in run-time before compression on the mote, while
the reconstruction is performed on the computer. All time
measurements are averaged over 1000 trials.
Figure 5 shows selected results from our experiments.
Since JPEG is a widely used compression scheme with well
understood performance, we leave out its Rate-Distortion
curve for space reasons. For the Time-Distortion curve us-
ing JPEG, an important observation is that if we extrapolate
the curve to the point where quality goes to 0, we arrive at the
computational overhead of JPEG, i.e., we still require a com-
putational time of 70ms per image. In contrast, there is al-
most no overhead in CS: If we look at the its Time-Distortion
curve, as the reconstruction quality tends towards zero, the
computation time required also tends towards zero.
4.2. Single Target Tracking via Background Subtraction
Due to the inherent richness of the visual medium, video-
based scene analysis (such as tracking, counting, and recog-
nition) typically requires a background-subtraction step that
focuses attention of the system on smaller regions of inter-
est in order to reduce the complexity of data processing. A
simple approach to background subtraction from video data
is via frame differencing [27]. This approach compares each
incoming frame with a background image model and clas-
siﬁes the pixels of signiﬁcant variation as part of the fore-
ground. The foreground pixels are then processed for identi-
ﬁcation and tracking. The success of frame differencing de-
pends on the robust extraction and maintenance of the back-
ground model. Some of the known challenges include illu-
mination changes, vacillating backgrounds, shadows, visual
clutter, and occlusion [31].
Because there is no single background model that can ad-
dress all these challenges, the model must be selected based
on application requirements. In the context of camera motes,
the computational complexity of the algorithm and run-time
are also crucial factors, which justiﬁes our choice of a simple
background subtraction technique such as frame differencing
in this demonstration.
The data ﬂow in Figure 6 shows the target-detection algo-
rithm based on background subtraction. The ﬁrst component
performs two tasks: a background-foreground segmentation
and an update on the background model Bt. An initial mask
M0
t (i,j) := |It(i,j) − Bt(i,j)| > τ is set based on a spec-
iﬁed threshold τ, and then post-processed by using median
ﬁltering (a 9 × 9 block is used in our examples). All contigu-
ous foreground regions in M0
t are grouped together as blobs
and any blob smaller than a speciﬁed threshold is removed.
The result is a segmentation output Mt, a binary array with a
value of 1 for foreground and 0 for background. Finally, a setFig. 5. (Left) Time-Distortion curve for JPEG. (Middle) Rate-Distortion curve for CS with random projections. (Right) Time-Distortion
curve for CS with random projections.
of boxes St bounding the resulting blobs in Mt are computed
and used for tracking. The tracking results from a single cam-
era view are displayed in Figure 7.
Fig. 6. Data-ﬂow diagram of mote target-detection algorithm.
Fig. 7. Tracking results for a single camera view. (Left) Sample
image. (Right) The path represents the motion of the center of mass
of the target.
Figure 8 illustrates the data ﬂow of the entire system. In
this example, two CITRIC motes perform background sub-
traction in real-time. The resulting images are stored locally
on the motes and downloaded ofﬂine (left side of Figure 8).
Features based on the observations are sent to the server via
radio, i.e., the coordinates of the bounding box. A client can
then log on to the server and receive information streamed
from the motes. On the right side of Figure 8 we observe a
screenshot of the current visualization available on the client.
The left plot in this visualization is a diagram of the ﬂoor plan
where the camera motes are located, with rays specifying the
angles in which there was a detection. The right plot depicts
bounding boxes for the corresponding camera mote.
The execution time per frame for background subtraction
and the bounding box computation is typically 0.2 – 0.4s at
Fig. 8. Data diagram for Target Tracking application. Motes with
corresponding local observations are displayed to the left, and client
visualization to the right.
a resolution of 320 × 240, and 0.3 – 0.8s at 640 × 480. The
frame rate is not ﬁxed due to the variable execution time of
the algorithm depending on the number of foreground pixels.
4.3. Camera Localization using Multi-Target Tracking
Most camera networks beneﬁt from knowing where the sen-
sors are located relative to one another in a common coordi-
nate frame, i.e., localization. Camera localization is the pro-
cess of ﬁnding the position of the cameras as well as the ori-
entation of each camera’s ﬁeld of view. In this subsection, we
demonstrate a localization method based on [18]. The method
simultaneously tracks multiple objects and uses the recovered
tracks as features to estimate the position and orientation of
the cameras up to a scale factor.
The tracks of the moving objects are formed in the image
plane using a single point per object at each time instance. In
our experiment, this point is the center of the bounding box
around a detected foreground object. Over the whole surveil-
lance sequence of duration T, there are K unknown number
of objects. Denote the number of independent objects de-
tected at time t as nt and the set of their coordinates in theimage as yt = {y
j
t ∈ R2 : j = 1,...,nt}. Then, the set
of all objects during the whole sequence is Y = ∪t=1,...,Tyt.
The problem of multi-target tracking is deﬁned as a partition
of Y
ω = {τ0,τ2,...,τK},
where τ0 collects the false alarm samples, and τi, i =
1,...,K, collect the samples that form the K tracks, respec-
tively.
One common drawback for target tracking via direct back-
ground subtraction is that the method is not stable enough for
tracking multiple moving objects. In [22], the multi-target
tracking algorithm models each of the K tracks using a lin-
ear dynamic model. Then, the maximum a posteriori (MAP)
probability P(ω|Y ) is optimized using Markov Chain Monte
Carlo Data Association (MCMCDA). Hence, the optimal par-
tition ω∗ corresponds to the tracks of K objects in the image
sequence. The solution allows us to build tracks from mul-
tiple moving objects at any given time and use more infor-
mation from the dynamics of the scene than just using the
background subtraction results directly.
In our experiment, we positioned two camera motes 8.5
feet apart and pointed them at an open area where people were
walking, asshownbythetoprowofpicturesinFigure9. Each
camera mote ran background subtraction on its current image
and then sent the bounding box coordinates back to the base
station for each detected foreground object. The center of
each bounding box was used to build the image tracks over
time on the base station computer, as shown in Figure 10.
It can be seen that multiple tracks are successfully estimated
from the image sequence.
Fig. 9. (Top) Image frames from the left and right camera motes,
respectively, viewing the scene. (Bottom) The detected foreground
objects from the scene.
The localization algorithm then takes these tracks and per-
forms multiple-view track correspondence. This method is
particularly suitable for a low-bandwidth camera network be-
cause it works well on wide-baseline images and images lack-
ing distinct static features [18]. Furthermore, only the coor-
Fig. 10. The tracks of the moving objects in the image planes of the
left and right camera motes, respectively, formed by MCMCDA.
dinates of the foreground objects need to be transmitted, not
entire images. In implementing the localization, tracks from
the two image sequences are compared, and we adapt the
method such that minimizing reprojection error determines
which tracks best correspond between images. We used 43
frames from the cameras at an image resolution of 640×480.
Foreground objects were detected in 22 of the 43 frames and
tracks were built off these detected foreground objects. Four
tracks were built in the ﬁrst camera and ﬁve tracks were built
in the second camera. Using the adapted localization method,
we were able to determine the localization of the two cameras
relative to one another with an average reprojection error of
4.94 pixels. This was based on the matching of four tracks
between the two cameras which minimize the reprojection er-
ror.
Fig. 11. (Left) The matching of tracks between the cameras that
were used for localization. (Right) The reprojection error measured
in pixels for each of the 20 points of the tracks.
Theaccuracyofthecameralocalizationestimateisaffected
by a few factors. First, the choice of the (low-cost) camera
has an effect on the quality of the captured image. Second,
the precision of the synchronization between the cameras af-
fectstheaccuracyoftheimagecorrespondence. Last, weonly
used a small number of frames to estimate track correspon-
dence. Using a longer image sequence with more data points
can reduce the estimation error.
5. CONCLUSION AND FUTURE WORK
We have presented the architecture of CITRIC, a new wire-
less camera mote system for low-bandwidth networks. The
system enables the captured images to be processed locally
on the camera board, and only compressed low-dimensional
features are transmitted through the wireless network. To thisend, the CITRIC mote has been designed to have state-of-the-
artcomputingpower andmemory(upto624MHz, 32-bitXS-
cale processor; 64MB RAM; 16MB ROM), and runs embed-
ded Linux. The mote communicates over the IEEE 802.15.4
protocol, which also makes it easy to integrate with existing
HSNs.
We plan to improve the usability of our system by enabling
clients to manage and interact with clusters of motes instead
of individual motes. We will also expand the available C li-
brary of image processing functions on our camera motes and
evaluate their performance. The platform will enable investi-
gators to explore different distributed camera network appli-
cations.
6. REFERENCES
[1] I.Akyildizand, W.Su, Y.Sankarasubramaniam, andE.Cayirci.
A survey on sensor networks. IEEE Comm Mag, 40(8):102–
116, 2002.
[2] G. Bradski, A. Kaehler, and V. Pisarevsky. Learning-based
computer vision with Intel’s Open Source Computer Vision Li-
brary. Intel Technology Journal, May 2005.
[3] E. Cand` es. Compressive sampling. In Proc of Int Congress of
Math, 2006.
[4] E. Candes and J. Romberg. Practical signal recovery from ran-
dom projections. In Wavelet App in Signal and Image Proc,
SPIE, 2004.
[5] Crossbow Inc. Imote2: High-Performance Wireless Sensor
Network Node Datasheet, 2008.
[6] P. DiPaolo. NOR continues to battle NAND ﬂash memory in
the handset. http://www.wirelessnetdesignline.
com/showArticle.jhtml?articleID=165701488,
2005.
[7] D. Donoho and M. Elad. Optimal sparse representation in gen-
eral (nonorthogonal) dictionaries via `
1 minimization. Proc of
NAS of USA, pages 2197–2202, March 2003.
[8] I. Downes, L. Rad, and H. Aghajan. Development of a mote
for wireless image sensor networks. In COGIS, 2006.
[9] D. Estrin, D. Culler, K. Pister, and G. Sukhatme. Connecting
the physical world with pervasive networks. Pervasive Com-
puting, 1(1):59–69, 2002.
[10] W. Feng, E. Kaiser, W. Feng, and M. L. Baillif. Panoptes: scal-
able low-power video sensor networking technologies. ACM
Trans Multi Comp Comm Appl, 1(2):151–167, 2005.
[11] H. Gharavi and S. Kumar. Special issue on sensor networks
and applications. Proc of IEEE, 91(8):1151–1256, 2003.
[12] Intel Corp. Intel PXA270 Processor Electrical, Mechanical
and Thermal Speciﬁcation Datasheet, 2004.
[13] M. Kintner-Meyer and R. Conant. Opportunities of wireless
sensors and controls for building operation. Energy Engineer-
ing Journal, 102(5):27–48, 2005.
[14] R. Kleihorst. Personal communication about Xetal-II WiCa
platform, June 2008.
[15] R. Kleihorst, A. Abbo, B. Schueler, and A. Danilin. Cam-
era mote with a high-performance parallel processor for real-
time frame-based video processing. In ICDSC, pages 109–116,
Sept. 2007.
[16] P. Kulkarni, D. Ganesan, P. Shenoy, and Q. Lu. SensEye: a
multi-tier camera sensor network. In MULTIMEDIA, pages
229–238, 2005.
[17] A. LaMarca, W. Brunette, D. Koizumi, M. Lease, S. Sigurds-
son, K. Sikorski, D. Fox, and G. Borriello. Making sensor net-
works practical with robots. In PERVASIVE, pages 152–166,
2002.
[18] M. Meingast, S. Oh, and S. Sastry. Automatic camera network
localization using object image tracks. In ICCV, 2007.
[19] Moteiv Corp. Tmote Sky Datasheet, 2003.
[20] M.Nekovee. Adhocsensornetworksontheroad: thepromises
and challenges of vehicular ad hoc networks. In Workshop on
Ubiq Comp and e-Research, 2005.
[21] NXP Semiconductor. PCF50606: How to con-
nect to the Intel Bulverde application processor.
http://www.nxp.com/acrobat download/
literature/9397/75009763.pdf.
[22] S. Oh, S. Russell, and S. Sastry. Markov chain Monte Carlo
data association for general multiple-target tracking problems.
In CDC, 2004.
[23] S. Oh, L. Schenato, P. Chen, and S. Sastry. Tracking and co-
ordination of multiple agents using sensor networks: System
design, algorithms and experiments. Proc of IEEE, 95:234–
254, 2007.
[24] Omnivision Technologies Inc. OV9655 Color CMOS SXGA
(1.3MegaPixel) CAMERACHIP with OmniPixel Technology
Datasheet, 2006.
[25] C. Park and P. Chou. eCAM: ultra compact, high data-rate
wirelesssensornodewithaminiaturecamera. InSenSys, pages
359–360, 2006.
[26] J. Polastre, R. Szewczyk, and D. Culler. Telos: Enabling ultra-
low power wireless research. In IPSN/SPOTS, 2005.
[27] M. Rahimi, R. Baer, O. Iroezi, J. Garcia, J. Warrior, D. Estrin,
and M. Srivastava. Cyclops: In situ image sensing and inter-
pretation. In Embedded Networked Sensor Systems, 2005.
[28] Silicon Lab. CP2102 single-chip USB to UART bridge.
http://www.silabs.com/public/documents/
tpub doc/dsheet/Microcontrollers/
Interface/en/cp2102.pdf, 2007.
[29] R. Szewczyk, E. Osterweil, J. Polastre, M. Hamilton, A. Main-
waring, and D. Estrin. Habitat monitoring with sensor net-
works. Comm of ACM, 47(6):34–40, 2004.
[30] T. Teixeira, D. Lymberopoulos, E. Culurciello, Y. Aloimonos,
and A. Savvides. A lightweight camera sensor network oper-
ating on symbolic information. In Proc of the First Workshop
on Distributed Smart Cameras, 2006.
[31] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallﬂower:
Principles and practice of background maintenance. In ICCV,
pages 255–261, 1999.
[32] UC Berkeley and TinyOS Open Source Community. TinyOS
community forum. http://www.tinyos.net/.
[33] O. Vargas. Achieve minimum power consumption in mobile
memory subsystems. http://www.eetasia.com/ART
8800408762 499486 TA 673f1760.HTM, 2006.
[34] A. Willig, K. Matheus, and A. Wolisz. Wireless technology in
industrial networks. Proc of IEEE, 93(6):1130–1151, 2005.
[35] Wolfson Micro. Wolfson WM8950. http://www.
wolfsonmicro.com/uploads/documents/en/
WM8950.pdf, 2008.