University of Texas at El Paso

DigitalCommons@UTEP
Open Access Theses & Dissertations

2011-01-01

UTEPcam: A Scalable Wireless Vision Sensor
Architecture For Computational, Power And
Bandwidth Constrained Scenarios
Ricardo Zepeda
University of Texas at El Paso, rzepeda@miners.utep.edu

Follow this and additional works at: https://digitalcommons.utep.edu/open_etd
Part of the Computer Engineering Commons, and the Electrical and Electronics Commons
Recommended Citation
Zepeda, Ricardo, "UTEPcam: A Scalable Wireless Vision Sensor Architecture For Computational, Power And Bandwidth Constrained
Scenarios" (2011). Open Access Theses & Dissertations. 2419.
https://digitalcommons.utep.edu/open_etd/2419

This is brought to you for free and open access by DigitalCommons@UTEP. It has been accepted for inclusion in Open Access Theses & Dissertations
by an authorized administrator of DigitalCommons@UTEP. For more information, please contact lweber@utep.edu.

UTEPcam: A SCALABLE WIRELESS VISION SENSOR ARCHITECTURE
FOR COMPUTATIONAL, POWER AND BANDWIDTH CONSTRAINED
SCENARIOS

Ricardo Zepeda
Department of Electrical and Computer Engineering

APPROVED:

John A. Moya, Ph.D., Chair

Virgilio Gonzalez, Ph.D.

Dr. Vivek Tandon , Ph.D.

Benjamin C. Flores., Ph.D.
Acting Dean of the Graduate School

Copyright ©

by
Ricardo Zepeda
2011

Dedication

To my family who has encouraged, supported and believed in me since day one.

UTEPcam: A SCALABLE WIRELESS VISION SENSOR ARCHITECTURE
FOR COMPUTATIONAL, POWER AND BANDWIDTH CONSTRAINED
SCENARIOS

by

Ricardo Zepeda, BSEE

THESIS
Presented to the Faculty of the Graduate School of
The University of Texas at El Paso
in Partial Fulfillment
of the Requirements
for the Degree of

MASTER OF SCIENCE

Department of Electrical & Computer Engineering, ECE
THE UNIVERSITY OF TEXAS AT EL PASO
August 2011

Acknowledgements
First and foremost I would like to thank God for giving me the wisdom that I needed to avoid
and overcome many of the obstacles that were put in my way. I also would like to thank Dr. Rosiles who
both guided and supported me through this endeavor. Dr. Rosiles not only introduced this research but
also taught me everything I know about the subject. I would also like to thank Dr. Moya for his
assistance in stepping in as chair to make the completion of this project possible. Without both helping,
I know I would never have succeeded. I would also like to thank Linda Romero and Fernando Cervantes
whose empowering yet soothing words encouraged me even during my bleakest of moments. I would
also like to thank Dr. Nava and the UTEP Electrical and Computer Engineering department for giving
me the opportunity to pursue my master degree at UTEP. To the people whom I forgot to thank, I would
like to say thank you and I am sorry.

v

Abstract
UTEPcam is a low cost and power vision sensor node system. UTEPcam is composed of an
Atmel atmega32 8-bit microcontroller, a CY7C09099V static RAM chip, an OV6620 CMOS image
sensor, a XBEE transceiver and a SD Flash memory card, and four logic gates. UTEPcam’s simple yet
efficiently architecture enables it to capture video at one frame per second. At its absolute highest, it is
estimated that UTEPcam consumes only 1.321 Amps. When in standby, UTEPcam consumes 21
microamps. Furthermore, UTEPcam’s program takes up only 1Kbyte of memory space.
UTEPcam’s CPU, a simple 8-bit MCU, is unlike most vision sensor node architectures. The
latter results in a small, low power and long service life image sensor node. Furthermore, unlike other
image sensor node architectures, UTEPcam’s architecture allows its CPU to be swapped for either a
resource rich or computationally constrained CPU. If the UTEPcam CPU is swapped for a resource rich
CPU then its features may be increased. Conversely, if the UTEPcam CPU is swapped for a
computationally constrained CPU, then UTEPcam’s power consumption may be reduced even further.
In addition, the fact that UTEPcam’s program takes up only 1Kbyte of memory space suggests plenty of
room for software improvements such as incorporating image processing to the vision system, thereby
converting UTEPcam into a smart camera. UTEPcam can also transmit either pre-recorded video or it
can send video in real-time. Although UTEPcam is setup to be controlled via UTEPcamView, a
Graphical User Interface (GUI) designed for UTEPcam, it can also be controlled from any terminal
including Window’s default HyperTerminal. When being run from a HyperTerminal, the incoming
image data can be saved to a regular text file.

vi

Table of Contents

Acknowledgements.......................................................................................................................... v
Abstract ...........................................................................................................................................vi
Table of Contents .......................................................................................................................... vii
List of Figures .................................................................................................................................ix
Chapter 1: Introduction .................................................................................................................... 1
1.1 Problem Description ...................................................................................................... 1
1.2

Thesis Motivation .......................................................................................................... 2

1.3

Research Questions ........................................................................................................ 3

1.4

Research Contribution ................................................................................................... 3

1.5

Thesis Statement ............................................................................................................ 4

Chapter 2: A Review on Embedded Vision Systems and Networks ............................................... 6
2.1

A description of digital image/video acquisition systems ............................................. 6

2.2

Wireless Sensor Networks ............................................................................................. 8
2.2.1CPU........................................................................................................................ 9
2.2.2Power Source ......................................................................................................... 9
2.2.3Transceiver .......................................................................................................... 10
2.2.4Sensors ................................................................................................................. 10

2.3Vision Sensors ................................................................................................................. 11
2.4Smart Cameras - State of the Art ..................................................................................... 13
Chapter 3: UTEPcam: A Vision Sensor Architecture ................................................................... 15
3.1

Architectural Requirements ......................................................................................... 15
3.1.1

CMOS Imaging Sensorresolution .................................................................... 16

3.1.2

Communication protocols ................................................................................ 17

3.2

Overview of Image Acquisition Architecture.............................................................. 18

3.3

A Low Power, Scalable and Wireless UTEPcam Architectural Implementation ....... 22
3.3.1

CMOS Imaging Sensor Subsystem ................................................................. 23

3.3.2

RAM Subsystem .............................................................................................. 26

3.3.3

FLASH Subsystem .......................................................................................... 29
vii

3.3.4

Communication Interface ................................................................................ 30

3.4

Functional description of the architecture ................................................................... 31

3.5

Software architecture ................................................................................................... 33

3.6

Graphical User Interface Design ................................................................................. 35

Chapter 4: Evaluation of UTEPcam .............................................................................................. 37
4.1

Image Acquisition system analysis .............................................................................. 37

4.2

Scalable Feature of UTEPcam ..................................................................................... 38
4.2.1Scalable Image Sensor ......................................................................................... 39
4.2.2Scalable CPU ....................................................................................................... 40
4.2.3Scalable Source Code .......................................................................................... 41

4.3Clock correction............................................................................................................... 42
Chapter 5: Conclusion and future work ......................................................................................... 43
Bibliography .................................................................................................................................. 45
Vita…………….. .......................................................................................................................... 47

viii

List of Figures
Figure 2.1.1: Computationally rich image acquisition architecture. ...............................................6
Figure 2.1.2: Frame-Grabber image acquisition architecture. .........................................................7
Figure 3.2.1: UTEPcam Image acquisition architecture. ...............................................................20
Figure 3.2.2: Traditional Image Acquisition clock generator circuit. ...........................................21
Figure 3.2.3: UTEPcam clock generator circuit. ...........................................................................22
Figure 3.3: UTEPcam pin level diagram. ......................................................................................23
Figure 3.3.1: OV6620 Data Transmission Timing Diagram. ........................................................25
Figure 3.3.2: Simplified CY7C09099V Schematic. ......................................................................27
Figure 3.5.1: UTEPcam Software Flowchart. ...............................................................................34
Figure 3.5.2: UTEPcam Block Diagram........................................................................................34
Figure 4.1.1: CIF Mode Image. .....................................................................................................38
Figure 4.1.2: QCIF Mode Image. ..................................................................................................38

ix

Chapter 1: Introduction
1.1 Problem Description
There are many instances where a user desires to monitor a given area for a particular
amount of time. Whether it be a battlefield where lives are at stake or along points of entry where
drug trafficking and illegal immigration are present, video may provide immense amounts of
valuable data which can aid in the decision making process. Although technologies with image
capturing capabilities exist, they cannot always directly be used to monitor a given area. Digital
cameras and digital camcorders, for example, can capture high resolution images and video, but
they cannot transmit the capture data wirelessly. Cell phones, on the other hand, can both capture
high fidelity images/video and transmit the data wirelessly, however due to small batteries they
may consume excessive amounts of power. Also as can be imagined recharging or replacing
batteries in a battlefield or along points of entry poses a safety and exposure of equipment issue.
There also exists specialized embedded system technologies called Wireless Sensor
Networks (WSNs) which are designed specifically to monitor a given area. A WSN is a
computer network which is designed specifically to monitor a given area for a given amount of
time. Unfortunately, WSNs equipped with image/video capturing capabilities are plagued with
the same problems faced by cell phones, and digital cameras and camcorder. Embedded systems
with video/image capturing capabilities generally require a CPU with a lot of processing power.
Unfortunately there is a direct correlation between CPU processing power and power
consumption; the more processing power a CPU has the more power it consumes. Because
WSNs are powered by batteries and not through the power grid, the power consumption, and
hence processing power, of a WSN is very constrained. Not surprisingly, the service life of most

1

WSNs ends when the power source gets depleted. These limitations have a direct impact in the
design of a WSN; every conceivable Watt that can be saved by a WSN, must be saved. Thus,
while computer vision can reveal immense amounts of information about the surrounding
environment of the WSN, incorporating an imaging sensor in a computationally constrained
WSN is not a trivial task. Thus, imaging sensors for the most part have been out of range for low
cost, low processing power, battery operated WSNs.

1.2 Thesis Motivation
Talks with U.S. Customs and Border Protection suggested that they have a requirement
for image communications under resource-constrained scenarios. Further, the solution to these
problems must utilize low-cost, long-life devices that can be used in areas with difficult access
and non-existent communications infrastructure. This real-world scenario combined with the
technological advances mentioned below motivated the research reported in this thesis.
Low cost hardware platforms and imaging sensors are available that allow a new
generation of video surveillance cameras capable of high-resolution, network-based image
communication [13]. At the frontier, the newest generation of cameras is also capable of doing
basic in-situ (e.g. embedded) computer vision. The driving applications for these cameras are in
the areas of surveillance and health care.
Additionally, very low cost sensors are being used in cell phones and other portable
devices. The computing power of these devices is allowing for some level of image processing
and analysis within the device, yielding wirelessly accessible vision-based services like barcode
scanning and identification of geographical landmarks [8].
In parallel, the area of low-power, low-data-rate WSNs has matured with standards like
802.15.4 and Zigbee. One particular situation for WSNs is the case where power and

2

computational constraints are also part of the deployment/scenario. Extensive work has been
done over the last decade on the communications aspect of WSNs [19]. A natural follow on is to
consider the transmission of image data across such networks.

1.3 Research Question
A challenge faced in designing a vision sensor node with limited processing power is
how to intercommunicate the image sensor, which is continuously receiving information, with a
CPU that can not keep up with the data speed required by the image sensor. This challenge
requires a new architecture to be implemented. Based on this idea, the following work was to
obtain a high quality image at the speed of the image sensor, and if possible, to improve the
resolution of the image. The proposed algorithm would ideally provide an efficient way of
capturing images at the right speed while consuming very little electrical power. The result
would provide a high quality monitoring sensor node.

1.4 Research contribution
This thesis reports a wireless vision sensor node architecture for deployment in resourceconstrained environments. The architecture, the UTEPcam, presents a working prototype as a
proof of concept. UTEPcam is both a low-cost and low-power vision sensor node system and is
composed of an Atmel atmega32 8-bit microcontroller, a CY7C09099V static RAM chip, an
OV6620 CMOS image sensor, an XBEE transceiver and SD Flash memory card, and four logic
gates. UTEPcam’s operating program also takes up only 1Kbyte of memory space. This
architecture is further capable of acquiring images at QCIF and CIF resolutions in Bayer format
or as luminance format.

3

The simple yet efficient architecture of UTEPcam enables it to transmit pre-recorded
video or to capture and transmit video at one frame per second, utilizing very little power. At its
absolute maximum, UTEPcam consumes 3.1 mA and while in standby only 21 μA. Thus, four
AA batteries are estimated to provide about 4000 mA hours for the UTEPcam. This results in a
small, low-power and long-service-life image sensor node.
UTEPcam can be controlled via UTEPcamView, a Graphical User Interface (GUI)
designed for UTEPcam, or it can be controlled from a terminal including Windows default
HyperTerminal. When being run from a HyperTerminal, the incoming image data can be saved
to a regular text file.
UTEPcam has a flexible architecture that allows its CPU to be swapped for either a
resource-rich or computationally-constrained CPU. If UTEPcam’s CPU is swapped for a
resource-rich CPU then its features may be increased. Conversely, if the UTEPcam’s CPU is
swapped for a computationally-constrained CPU, then UTEPcam’s power consumption may be
reduced even further. In addition, UTEPcam’s small program memory space leaves plenty of
room for software improvement such as incorporating image processing to the vision system,
thereby converting UTEPcam into a smart camera.

1.5 Thesis statement
This thesis presents a solution to image communications and sensing over resourceconstrained environments. The solution consists of a vision sensor architecture capable of
acquiring images at different resolutions and time frames. Further, this architecture may process,
store and wirelessly transmit imagery under power and communication constraints.

4

The proposed architecture includes a low-power 8-bit microcontroller unit (MCU).
Further, using a 128 Kbyte RAM memory, it is possible to acquire video at one frame per second
and these images can be stored in flash memory for deferred transmission and further processing.
The thesis presents an architecture that can be realistically implemented and
manufactured. A major focus of the thesis has been on the development of the hardware
architecture with a basic software architecture that illustrates the main functionalities. Future
work can be focused on developing a software model that allows in-situ compression and lowlevel image analysis and computer vision.

5

Chapter 2: A Review of embedded vision system and networks
2.1 A description of digital image/video acquisition systems
Embedded systems with image acquisition capabilities use one of two architectures:
computationally-rich or computationally-constrained (frame-grabber) type image acquisition
approaches [13]. As can be seen in Figure 2.1.1, in a computationally-rich type image
acquisition architecture the CPU is both directly connected to the image sensor and handles the
task of latching an image/frame from the image sensor. The author of this thesis considers a
resource-rich system to be one that utilizes a 32-bit or 64-bit CPU and a computationallyconstrained CPU to be one with an 8-bit or 16-bit CPU.

Figure 2.1.1Computationally rich image acquisition architecture
As can be seen from Figure 2.1.2, the frame-grabber type image acquisition architecture
consists of an image sensor, frame grabber, RAM, and a CPU. When a frame-grabber type image

6

acquisition architecture acquires a frame/image, the image/frame is latched by the frame grabber
instead of the CPU. This permits the use of a computationally-constrained CPU instead of a
resource-rich CPU. However, this type of architecture may use additional power, since the
frame-grabber acquisition architecture includes an additional IC (the frame-grabber). The fact
that a frame grabber is a complicated circuit does not help power consumption issues as well. A
frame grabber typically consists of a signal conditioner, analog-to-digital converter, memory, and
a NTSC/SECAM/PAL decoder.

Figure 2.1.2 Frame-Grabber type image acquisition architecture
Several other important differences exist between the above two architectures. Firstly,
resource-rich CPUs typically do not require external RAM while computationally-constrained
CPUs do. Computationally-constrained CPUs typically have only small quantities of RAM type
memory, normally in the Kilo-bytes range, while resource-rich CPUs have abundant quantities of

7

RAM type memory, normally is the Mega-byte range. Since resource-rich CPUs are equipped
with abundant quantities of RAM type memory, enough to hold one single frame, there is no
need for external RAM type memory. Although many CPUs are equipped with a lot of FLASH
type memory, it has no bearing on whether external RAM type memory is needed or not. Internal
FLASH type memory stores static data, such as functions and programs. External FLASH
memory, unlike internal FLASH memory, can hold both program code as well as frame/image
data. Secondly, CMOS image sensors output data faster than most computationally-constrained
CPUs can handle. A common solution to this architectural dilemma is to send a command to the
image sensor instructing it to scale down its internal clock. However, scaling down the clock not
only decreases the image sensor frame rate but also increases the probability of image smear.

2.2 Wireless Sensor Networks
Wireless Sensor Networks (WSNs) are an emerging research area which is gaining
considerable attention from both the media and academia. Typically, each WSN node, also called
a sensor node, is made up of a CPU, a power source, a transceiver and a variety of sensors [7].
Further, a WSN is a wireless computer network whose nodes are battery-operated, spatiallydistributed, and wirelessly-interconnected. The sole purpose of a WSN is usually to monitor the
environment in which they operate. Temperature, sound, vibration, pressure, motion, and
pollution are just a few environmental data that can be gathered by a WSN.
There is a direct correlation between CPU processing power and CPU power
consumption; thus the more processing power a CPU has the more power it consumes. Because
WSNs are powered by batteries and not through the power grid, the power consumption, and
hence processing power, of a WSN is very constrained. Furthermore, unlike traditional computer

8

networks, the service life of a WSN ends when the power source gets depleted [11]. These
limitations have a direct impact in designing a WSN.
Significantly due to power constraints, one absent yet extremely useful sensor is missing
from most WSNs, the image sensor. The advantages of incorporating image sensors into a WSN
are immediately apparent, i.e., distinguishing between animal and humans, friend or foe, male or
female, etc. However, imaging sensors for the most part have been designed to mate with
resource-rich CPU hosts, although advancements in CMOS image sensor technology are making
it possible to incorporate them into low-cost, low-processing power, battery-operated WSNs.
2.2.1 CPU
To conserve power, typical WSN sensor nodes have single core CPUs. The CPUs that are
typically found on WSNs are eight, sixteen, and even thirty-two bit. Eight-bit CPUs have lower
power consumption than their thirty-two bit counterparts; however, the trade-off is that they have
far less processing power. Conversely, thirty-two bit CPUs have higher processing power than
their eight-bit counterparts; however, their trade-off is their power consumption is substantially
higher. Also, although WSNs equipped with resource-rich CPUs have better processing power,
they possess shorter service lives due to their higher power consumption.
2.2.2 Power source
Although WSN can be powered by any number of sources, they are most often powered
by batteries for the simple fact that batteries are the most reliable producers of electrical power.
With solar cells, which are devices that convert sunlight energy directly into electricity, a WSN
can be powered with sunlight. However, sunlight is not always available. Nights, cloudy days
and even solar eclipses render solar-powered WSNs powerless. WSNs can also be powered by

9

tapping into the power grid; however the whole concept of a WSN is that it specifically not be
powered through the power grid.
2.2.3 Transceiver
The transceiver is what allows each sensor node in the WSN to communicate with other
WSNs and the outside world. At the time of the publication of this thesis, there where three types
of transceivers and each had its advantages and disadvantages. For example, a laser-type
transceiver requires the least amount of power to operate but also needs a direct line-of-sight
with the transceiver to which it wishes to communicate. Infrared transceivers are similar and also
do not need an antenna but they are limited in the frequencies in which they can broadcast.
Radio-frequency type transceivers offer a broader beam width than infrared and laser type
transceivers but their power consumption is the greatest. Given the benefits and negatives of
each, radio-frequency type transceivers appear to offer the most relevant type of communication
in a WSN and they are the most commonly used WSN transceivers. Because the transceiver is
the device that consumes the most amount of power in sensor nodes, transmitting is always kept
to a minimum.
2.2.4

Sensors
The most common types of sensors found in WSN’s are low-power sensors which, as

their name implies, consume very little power. Temperature, pressure, sound, light, humidity and
ultrasonic sensors are just a few examples of low-power sensors that can be found on sensor
nodes. There are two types of low-power sensors: analog and digital. Analog sensors require a
short learning curve and are more prone to noise. Conversely, digital sensors have a long

10

learning curve but are essentially immune to noise. Typically, both analog and digital sensors
run on 3.3 or 5 volts.
Although low-power sensors provide useful data, image sensors, which are high-power
sensors, provide some of the most useful data. A WSN which has incorporated imaging sensors
is referred to as a wireless multimedia network (WMN). A WMN can incorporate image sensors
to one, some or all of its sensor nodes. Unfortunately, incorporating an imaging sensor to a WSN
is not a trivial task given that image sensors consume a lot of power and are designed to interface
with resource-rich CPUs.

2.3 Vision Sensors
A few vision sensor node architectures have been reported in the literature. Three are
reviewed here that have been thoroughly described and evaluated in research and commercial
settings. The AVRcam [3], Cyclops [12], and Meerkats [9] are all vision sensor nodes that share
architectural components: a CPU, a RAM and/or Flash, and a transceiver. The idea behind these
vision sensor nodes is simple, to acquire and transmit images and/or video. Some of these vision
sensor node architectures are equipped with computationally-constrained CPUs while others
possess resource-rich CPUs. In addition, some of these architectures are equipped with external
RAM and flash memories. Furthermore, some of these image sensor node architectures support
only wired, and not wireless, communication. Although, wireless communications could be
added externally.
The AVRcam is equipped with an Atmega8 8-bit CPU, which makes it the architecture
with the biggest computational constraint. The AVRcam also uses neither computationally-rich
nor frame-grabber type image acquisition architectures since no large external RAM or flash
memory is available. It is unique in that it does not store any image/frame but instead transmits

11

image data as it is acquired. That is, when it has stored an entire image row in the internal one
Kbyte RAM, it transmits it through a USART to a host for further processing or storage. Further,
after it transmits, the AVRcam’s software waits for the next transmission frame in order to latch
the next image line. The image itself is acquired with an OV6620 CMOS camera configured at
144 x 176 pixel resolution.
Since the AVRcam is not equipped with a large RAM, it generates a large processing
delay in acquiring each row of pixels. This mode of operation can introduce severe motion
artifacts if the objects in the scene move faster than the sensor frame rate. However, on the
positive side, out of all the vision sensor nodes discussed here, the AVRcam consumes the least
amount of power in part because it does not have an external FLASH or RAM type memory.
The Cyclops [12] is a vision sensor node equipped with an Atmega128L 8-bit core that
also constrains the computational capabilities of this architecture. However, it is equipped with
both internal as well as external memory that it uses for a frame-grabber type image acquisition
architecture. The external frame memory includes 64 Kbytes of flash and 64 Kbytes of RAM.
Image sensing is performed via an ADCM-1700 (Agilent Technologies) with a 352 x 288 pixel
resolution. As concerns image transmission, the Cyclops is built to be interfaced with the MICA
2 motes [12] commonly used for wireless sensor research.
In a frame grabber-type image acquisition architecture the image/frame is latched by a
frame grabber instead of the CPU. Although Cyclops does not have a frame grabber IC, it does
have a complex programmable logic device (CPLD) which is programmed to behave as a frame
grabber IC. The images which are outputted by the Cyclops image sensor are latched in by the
CPLD and transferred to the external RAM chip. From there the image stored in the external

12

RAM type memory could be copied over to an SD FLASH card for permanent storage or could
be sent out via a USART.
The Meerkats vision sensor node [9] is equipped with a PXA255 32-bit MCU
manufactured by Intel and Marvell. Its image sensor is a Logitech Quick Cam Pro 4000 webcam.
The Quick Cam Pro has a 640 x 480 pixel resolution. Although Meerkats has no external RAM
memory it does have 64 MBytes of internal SDRAM memory. Additionally, it is equipped with
an internal flash memory of 32 Mbytes. Meerkats is also equipped with an Orinoco Gold
802.116b PCMCIA wireless card to support wireless communications.
Meerkats architecture is constructed on top of a Crossbow’s Stargate development board
equipped with an Ethernet connection. With this computationally-rich type image acquisition
architecture, the CPU is both directly connected to the image sensor and handles the task of
latching an image/frame from the image sensor. Also, unlike the Cyclops and AVRcam,
Meerkats has a full operating system to manage all its hardware resources. For this task,
Meerkats uses the Stargate version 7.3 operating system which is an embedded Linux system
(kernel 2.4.19). Further, Meerkats program code is actually a Linux operating system
application. The webcam used by Meerkats as its image sensor, although efficient, consumes
more power than a regular image sensor simply because it has an embedded MCU. This is also
the case for the Orinoco wireless card that was originally design to mate with laptop computers
and not vision sensor nodes.

2.4 Smart Cameras – State of the Art
A recent survey article [4] has defined the taxonomy for smart cameras. Such cameras
are vision systems capable of analyzing the images they capture in order to provide a description
and features that can be used by intelligent and autonomous systems as part of decision making

13

processes. As such, smart cameras are typically equipped with a digital signal processor (DSP)
and a graphical processing unit (GPU). They also generally have charged coupled device (CCD)
image sensors which have a much higher pixel resolution than CMOS type image sensors.
Furthermore, most if not all smart cameras communicate through TCP/IP-based networks and
use the internet as a transmission medium. Ethernet connection can provide transfer speeds that
exceed 10 megabits per second and this results in a much greater bandwidth than a vision sensor
network, resulting in the ability to transmit higher resolution imagery. With all of these features,
smart cameras must typically be powered through the power grid.
Unlike traditional cameras, smart cameras perform image processing before transmitting
the image/video. Segmentation, edge detection and image compression are just a few of the
possible processing steps that can be done on an image with a smart camera. The output of smart
camera sometimes is not even an image at all.
Smart cameras share architectural similarities with vision sensor nodes. Both have a
CPU, image sensor, and transceiver. On the other hand, smart cameras typically include a 32-bit
CPU and are also equipped with second and third processing chips, i.e., a DSP as well as a GPU.
DSP chips can provide faster processing and more efficient image/video data than their 32-bit
CPU counterparts. These faster processors are needed to perform real-time image processing.
Smart cameras are typically equipped with frame grabbers. A frame grabber is a
hardware device which is design to interface directly to either a CMOS or a CCD image sensor.
Frame grabbers capture one frame from a video stream which is being outputted by an image
sensor and store it in fast RAM memory. Frame grabbers essentially relinquish the CPU
interrupting it only when a frame has been grabbed from image sensor video stream.

14

Chapter 3: UTEPcam: A vision sensor architecture
This chapter introduces the UTEPcam architecture. The requirement revisions, design
decisions and selection of components necessary to arrive to the current architecture description
occurred over many months, several design iterations, and required consideration of many design
tradeoffs. The final result is presented in a simplified and clear manner. First, the final list of
requirements is described. Second, a system level description of the architecture is provided.
Third, a detailed physical specification of the hardware architecture embodied in a realization
with real components is presented. Fourth, the software architecture of the vision sensor
discussed. Finally, a GUI that was used to interface the sensor with a PC environment is
presented.

3.1 Architecture requirements
The basis of the UTEPcam design was inspired by the simple design the AVRCam
described in Section 2.3. As mentioned before, the AVRCam image sensor is controlled by an 8bit microcontroller which in turn transmits an image to a host using the UART interface. More
specifically, the AVRCam acquires and transmits a row of pixels at a time a from the MCU
internal RAM. The row is transmitted immediately through the UART which implies that the
overall acquisition speed is determined by the UART baud rate. Hence, a single frame is actually
formed by rows acquired across many frames. The MCU controls this rather complex acquisition
process by tracking the image sensor synchronization signals.
The objective in this thesis was to maintain some of the simple features of the AVRCam
which consisted on using a low-end MCU to control the acquisition process while achieving full
video/image acquisition and storage before transmission. Hence, more advanced features found
in vision sensor nodes and smart cameras discussed in Sections 2.3 and 2.4 need to be
incorporated. The challenge of this thesis is to specify an architecture that achieves performance

15

similar to complex architectures while maintaining the simplest hardware configuration that will
minimize power consumption.
As was discussed in Chapter 2, vision sensor nodes and smart cameras share a common
purpose, they do however differ in architectures. The architecture of a vision sensor node
requires the use of limited power while the smart camera architecture is based on the availability
of unlimited power. This difference alone is what makes the smart camera quality better than the
vision sensor node. For example, due to the unlimited power supplied to the smart cameras, they
can afford higher processing power MCUs as well as high resolution video. In the vision sensor
nodes, however, their limited power consumption requires the use of a CPU of less processing
power and consequently a reduction in resolution.
Hence, the challenge for a vision sensor node with limited processing power is to achieve
intercommunication between the image sensor, which is continuously receiving information, and
a low-end CPU which cannot keep up with the data speed required by the image sensor. This
challenge opens the door to explore new architectural approaches to vision sensor node design.
The proposed architecture will provide an efficient way of capturing images at the right speed
that consumes very little electrical power.
In order to design a low-power vision sensor that can be deployed in constrained
environments, several requirements and constraints must be met. These requirements are
described in detail in the following discussion.
3.1.1CMOS sensor resolution
Due to power consumption requirements, CMOS sensor resolution must be chosen to not
be very high, but not so low as to generate low quality images. The adequate resolution was
chosen to be either QCIF 176 x 144 or CIF 352 x 288 pixels. By default, the CMOS resolution is

16

in QCIF mode, but if the user sees something of interest that the user wants to know more about,
the user may choose the CIF mode to explore all the details. Since the CIF mode contains more
pixels than the QCIF mode, there has to be more memory allocated for the RAM chip in the CIF
mode than in the QCIF mode. Since each pixel has eight bits, the number of address locations in
the CIF is 101,376, while in the QCIF mode there are only 25,344 address locations. As a
consequence, the CIF consumes more power than in the QCIF mode.

In CIF mode, the

resolution is high but it consumes four times more power than the resolution in QCIF mode;
while in QCIF mode, the power consumption is less but the image quality is poorer. Moreover,
due to the fact that the CIF mode contains more pixels than the QCIF mode, the frame rate of the
QCIF (2 frames/sec) is four times slower than the frame rate in the CIF mode (1/2 frames/sec).

The inclusion of a RAM device is crucial to achieve real-time image/video capture as has
been discussed before. Additionally, since video requires several frames over a certain period of
time, and because the RAM chip only grabs one image at a time, it is necessary to store the
image in a flash memory first before transmitting.
3.1.2Communication protocols
The proposed architecture requires the use of several communication protocols. For example, the
communication between the computer and the microcontroller uses a USART communication
protocol. Via this link the computer sends commands to the microcontroller to for instance take a
picture, record a video, or change the image resolution of the image sensor. The microcontroller
then takes action depending on the command given. If the command is to change the image
resolution, either to the CIF or QCIF mode, the microcontroller enables the CMOS image sensor
and adjusts its image resolution accordingly through the I2C serial communication protocol. If

17

the command is to capture an image or video, then the microcontroller enables the CMOS image
sensor through the I2C communication protocol, the RAM chip through a parallel
communication protocol, and the flash memory through SPI communication protocol. The
requirements set for each device are due to the hardware and software constraints the device. For
example, the CMOS image sensor can communicate only through I2C protocol, the flash only
through SPI protocol, and the RAM chip only through a parallel communication interface
protocol, while the microcontroller can communicate in a variety of ways to different devices.
The communication between the computer and the microcontroller, discussed above, was
for testing purposes only. Since the sensor node is intended to be communicating with the
computer over long distances, an XBee wireless communication protocol will eventually be
used. Zigbee is the wireless transmission protocol which the hardware device XBee utilizes [16,
17]. The Zigbee protocol allows transmission of a minimum of 4800 and maximum of 115200
bits per second. Of course, when transmitting at maximum baud rate, the power consumption of
the Xbee hardware device will be increased. If conservation of power is desired, then the baud
rate should be decreased. During image acquisition, the baud rate in a vision sensor node should
be set to the maximum possible baud rate. During any other time, the baud rate should be set to
the minimum possible value to decrease power consumption.

3.2 Overview of the Image Acquisition Architecture
The image acquisition architecture for the UTEPcam is shown in Figure 3.2.1. Here we
focus on the acquisition system consisting of the CMOS image sensor, an MCU and a RAM
chip. The non-volatile storage and communication systems are discussed in the following
section. The design is general and can accommodate any level of complexity. For the UTEPcam,
an 8-bit microcontroller, a CIF/QCIF image sensor and a 128 Kbyte RAM chip were selected.

18

The key component of the acquisition architecture is a RAM device equipped with an internal
counter for address generation. This feature greatly reduces the complexity of the frame capture
process and allows a direct connection between the CMOS image sensor with the RAM chip
such that the frame rate of the CMOS image sensor is not affected. The details of the RAM
interface are discussed in the next section.
The proposed architecture can be contrasted with those described in Section 2.3. First, the
UTEPcam does not require a computationally rich MCU to control the image capture. Unlike
computationally-rich (and power hungry) image acquisition architectures, when the UTEPcam
acquires an image, the (low power) MCU is bypassed completely and the image is feed directly
to the RAM chip. Second, the UTEPcam acquires images without the need of a fourth device
(e.g. a CPLD) as in the frame grabber architecture of Cyclops (see section 2.3). Hence it can be
argue that the overall cost and power consumption of the UTEPcam will be lower due to the
initial selection of components in the architecture.
RAM chips can be clocked at much higher speeds than CMOS image sensor. As a result
with the UTEPcam image acquisition architecture there is no need to reduce the image sensor
clock, replace the CPU, or add an additional circuit. Furthermore, the proposed image acquisition
architecture interrupts the CPU only when a complete video frame has been grabbed and
transferred to RAM memory. This feature improves the efficiency of the vision sensor node
since the CPU is active only to take care of post-acquisitions tasks, and similarly, the CMOS
sensor and the RAM are only activated at capture time.

19

Figure 3.2.1UTEPcam Image acquisition architecture

As can be seen from Figure 3.2.2, traditional image sensor nodes utilize one fast clockgenerator to generate the clock for the entire image sensor node system. Traditional image sensor
node architectures utilize one fast clock-generator in order to:
a) Increase the number of frames per second the image sensor can output and
b) Decrease the possibility of images being smeared during acquisition.
Unfortunately, the faster a CPU is clocked, the more power it consumes. However, only
during image acquisition is it really necessary to have a fast clock. More significantly, most of
the time image sensor nodes are not acquiring an image. Techniques to reduce power
consumption in traditional image sensor nodes include powering down the image sensor and
RAM chip when an image or video is not being acquired. However, just the fact that a clockgenerator is generating a clock signal increases the power consumption of the system.

20

Furthermore powering down both the image sensor and the RAM chip does not change the fact
that CPU is constantly and unnecessarily being clocked with high speeds.

Figure 3.2.2Traditional image acquisition clock generator circuit
In order to reduce these power inefficiencies, UTEPcam’s architecture utilizes two clockgenerators, one fast and one slow as shown from Figure 3.2.3. The slow clock-generator
provides the clock signal to the MCU. The fast clock-generator clocks both the CMOS image
sensor and the RAM chip. By providing different frequencies to the image sensor, MCU and
RAM chip, it is possible to simultaneously maintain a higher frame rate, have low MCU power
consumption, and decrease the probability of image smearing. When the UTEPcam is not

21

acquiring, or in the process of acquiring an image, the fast clock-generator, the RAM chip and
CMOS image sensor are disabled. In addition the MCU always uses a slow clock signal, keeping
power consumption to a minimum.

Figure 3.2.3 UTEPcam clock generator circuit
3.3 A Low Power, Scalable and Wireless UTEPcam Architectural Implementation
This section describes an implementation of UTEPcam. UTEPcam’s engine is composed
of four modules: a Cypress CY7C09000V Synchronous Dual-Port Static RAM chip, an
Omnivision OV6620 CMOS imaging sensor, an Atmega32 8-bit AVR microcontroller, a
removable flash drive and a Xbee wireless transceiver. A brief review of OV6620 CMOS

22

imaging sensor, CY7C09000V Synchronous Dual-Port Static RAM chip, Atmega32 8-bit AVR
microcontroller, SCANDISK flash drive and a Xbee wireless transceiver will be discussed.
Although most of the information about each chip can be acquired from the datasheet of the
respective devices, nevertheless the author feels a brief review of relevant important points of
each device will simplify and shorten the understanding of UTEPcam’s architecture. The pin-pin
level diagram of UTEPcam’s architecture is shown in Figure 3.3.

Figure 3.3 UTEPcam pin level diagram.
3.3.1CMOS Imaging Sensor subsystem

The CMOS image sensor converts an optical image into a continuous electrical signal
which is then sampled and converted into its digital equivalent. Although there are different
brands of CMOS image sensors on the market today, almost all of them function the same way
[10, 14]. Almost all CMOS image sensors have a Pixel Data bus, a PCLK, a HREF, and a
VSYNC pins. The PCLK, HREF and VSYNC pins are considered synchronization pins; they are

23

used to ensure there is constant synchronization between the image sensor and the host device.
Furthermore, almost all CMOS image sensors can be configured through some type of serial
communication protocol. The UTEPcam’s image sensor is the Omnivision’s OV6620 CMOS
images sensor [10].
The Pixel Data pins provide the sampled light intensity values at the particular place in
time. The Pixel Data pins on the OV6620 CMOS imaging sensor are made up of sixteen pins:
eight UV pins and eight Y pins. The eight UV and eight Y pins are collectively called UV and Y
channels, respectively. Depending on how the OV6620 CMOS sensor is configured, the Y and
UV channels can output the luma and the chroma component (YCrCb) or the green and the
red/green pixel values (Bayer mode) of an image [18]. A more detailed explanation will be
presented later.
The VSYNC pin outputs information relating to the event that a new image is about to be
outputted by the OV6620 CMOS imaging sensor through the Pixel Data pins. Out of all the
synchronization pins, which are PCLK, HREF and VSYNC, VSYNC toggles the least. Each
time a new image frame is about to be outputted by the image sensor, the VSYNC pin toggles,
goes from low to high and then back to low. The VSYNC pin remains high for approximately
eight OV6620 CMOS image sensor clock cycles when in default mode.
The HREF pin outputs information relating to the event that an image line is about to be
transmitted through the Pixel Data pins. When the OV6620 CMOS imaging sensor starts
transmitting a line of pixels through the Pixel Data pins, the HREF pin transitions from a low to
a high state. Once that particular line of pixels has finished transmitting, the HREF pin
transitions back to a low state. After the CMOS image sensor sends a complete line of pixels
through the Pixel Data pins, the HREF pin pauses and remains low, for the same amount of time

24

that it took to transmit the line of pixels. This pause is needed so that the image sensor can clear
its previous data and acquire the new data. If the image sensor is programmed in common
intermediate format (CIF) mode, the HREF pin toggles 288 times per image frame. However, if
the image sensor is programmed in quarter common intermediate format (QCIF) mode, the
HREF pin toggles 144 times.
Whenever the PCLK pin transitions from a low to a high state, it signifies that there is
valid pixel data on the Pixel Data pins; if the Pixel Data pins are sampled at any other moment in
time, the data is not guaranteed to be valid. If the OV6620 image sensor is programmed in CIF
mode, then PCLK pin toggles 101,376 times per image frame; however, if the sensor is
programmed QCIF mode, then PCLK pin toggles 25,344 times.
Figure 3.3.1shows a simplified timing diagram of the chain events just described if the
sensor is set to QCIF mode. If this mode is changed to CIF mode, then the diagram is still
applicable, except HREF will toggle 288 times instead 144 times and PCLK will toggle 352
times instead of 176 times.

Figure 3.3.1OV6620 Data Transmission Timing Diagram

25

Most if not all CMOS image sensors use a serial communication protocol to configure
their camera settings [10]. I2C, TWI, USART, UART and SPI are five popular serial
communication protocols. The communication protocol used by the UTEPcam image sensor is
TWI. Via this protocol, UTEPcam configures its image sensor resolution (the default CIF 352 x
288 or QCIF 176 x 144), gating of PCLK with HREF, video output (default YCrCb or Bayer
format), etc.
3.3.2RAM subsystem
The CY7C09099V is a synchronous dual-port static RAM chip manufactured by Cypress
Semiconductor Corporation [3]. It supplies 128K x 8 memory and is available in a 100-pin Thin
Quad Plastic Flatpack (TQFP) package. The chip’s dual porting enables it to independently and
simultaneously access reads and writes to any location in memory. A key feature of the
CY7C09099V is an integrated binary counter. This feature is not typically found in RAM
devices. As will be explained later, this feature was one of the main reasons behind the success
of UTEPcam.
The simplified schematic diagram of CY7C09099V RAM is shown on Figure 3.3.2.Each
port on the chip has an Address Stop, Chip Enable, Clock, Counter Enable, Counter Reset,
Output Enable, Read/Write Enable, and Flow-Through/Pipelined Select pins. Furthermore, each
CY7C09099V port has a seventeen pin Address input and eight pin Data Input/Output bus. The
Chip Enable pins from both the left and right port are each composed of two pins; CE0 and CE1.
To enable a desired port on a CY7C09099V RAM chip, CE0 and CE1 should be set logically
high and logically low, respectively. When enabled, the respective port will consume
approximately 115 milliamperes of current. If CE0 and CE1 have a configuration other than the

26

one just stated, the respective port is put into standby mode, which will cause the RAM chip to
consume power. As described later, the UTEPcam takes this feature into account in order to
control power consumption. As an additional note, constructing each port’s Chip Enable out of
two pins instead of one, enables the possibility to cascade multiple RAM chips to create, if
needed, a bigger RAM chip. This feature allows future expansion of the architecture.

Figure 3.3.2Simplified CY7C09099V schematic
Each CY7C09099V RAM port has a Read/Write pin which determines whether the
respective port will be configured to read or write data; if Read/Write is set high, then the
selected port will be configured to read data, else the respective port will be configured to write
data. Because the CY7C09099V is a dual-port RAM chip, two RAM chips are manufactured on
the same die and hence they share the same memory address space. One CY7C09099V port can
be configured to read what the other port is writing by setting one port’s Read/Write pin high and
the other one low. The only precaution that must be taken when exercising this feature is to
ensure not to read the same memory location that is being written; if this is done, there is no
guarantee that the data written to or the data read from the RAM chip will be correct. When one

27

port is configured to read what the other port is writing, the former port can be clocked at a much
slower frequency. One of the main reasons behind the success of the UTEPcam is the ability to
read, at a slower frequency, what is being written to another faster clocked port.
Each CY7C09099V RAM chip port has eight bidirectional Data Input/Output pins.
Output Enable determines whether theses pins will be configured as an output or an input bus. If
Output Enable is set low, then the eight pin Data Input/Output bus will provide data, else the
eight pin Data Input/Output bus is expecting to receive data. Additionally, the Read/Write pin
and the Output Enable pin must be configured simultaneously for proper operation. If the
Read/Write pin is configured for reading, then the Output Enable pin should be set low; else if
the Read/Write pin is configured for writing, then the Output Enable pin should be set high.
Unlike the other pins found on the CY7C09099V RAM chip, the Output Enable pin is the only
pin that is not synchronous to the rising edge of the clock. Thus, great care must be taken when
configuring the Output Enable pin. If the Output Enable pin is set low while trying to provide
data to the RAM chip, a writing clash will occur thereby potentially damaging the CY7C09099V
chip.
The Address Strobe pin is an input pin which serves as an address qualifier input pin.
This pin receives a signal to specify the source of the RAM address that is to be latched. If the
address strobe input pin is set high, the RAM chip will latch the address from the output of the
internal binary counter. Conversely, if the address strobe input pin is set low, the RAM chip will
latch an external address from the seventeen pin address input bus. In addition, if the address
strobe input pin is set low, it specifies both the address to the RAM chip and the starting point
from which the counter will start counting.

28

Both Counter Enable and Counter Reset are routed directly to the internal binary counter.
The Counter Enable pin enables or disables the internal binary counter. If Counter Enable is set
low, then the internal binary counter is enabled and it increments on each rising edge of the
clock; else the counter is disabled. If either Address Strobe or Counter Reset is set low, then the
binary counter will be disabled even if the Counter Enable is set low. As can be guessed, the
Reset Counter pin resets the internal binary counter. Unlike the Counter Enable pin, the Reset
Counter pin always remains active, and can reset the counter, regardless of the state of the
Address Strobe or the Counter Enable pin.
3.3.3FLASH subsystem
As was mentioned previously, UTEPcam uses FLASH type memory to permanently store
captured frames. This subsystem also allows UTEPcam to record video. Further, it gives
UTEPcam the ability to retransmit frames/images if the original transmission failed. When the
RAM chip has latched a complete set of frames/images, those frames/images are copied from the
RAM chip to the FLASH type memory. A QCIF image is 25,344 bytes and a CIP image is
101,376 bytes. This means that the minimum FLASH type memory size, is 25,344 Bytes if set to
QCIF mode or 101,376 Bytes if set to CIF mode. The serial communication port which is used to
read from and write to the FLASH utilizes the serial peripheral interface (SPI) communication
protocol. All FLASH type memory has a 10 MHz SPI bandwidth [1]. UTEPcam’s maximum
frame rate is thus directly proportional to the SPI bandwidth.
Although RAM is faster than FLASH type memory, it is a lot more expensive. When this
thesis was published, the average cost per Gigabyte of FLASH type memory was 2.5 dollars.
When compared to RAM memory, which was 35 dollars per GB, that is a substantial difference

29

[15]. Because RAM memories cost per byte is so high, very little of it can be found on most
embedded systems and the same is true for UTEPcam.
As was mentioned previously FLASH type memory also gives UTEPcam the ability to
re-transmit an image/frame at a later time. Flash type memory is a non-volatile type of memory,
which means that unlike RAM type memory, it retains its information even if power is removed.
RAM type memory is volatile which means it forgets its information if power is removed.
UTEPcam’s current architecture shutdowns the RAM, FLASH, CMOS image sensor and XBee
transceiver subsystems and hibernates the CPU if UTEPcam encounters interference during
transmission. When the interference subsides it re-enables the RAM, FLASH, CMOS image
sensor and XBee transceiver subsystems and un-hibernates the CPU. This would not be possible
if the only type of memory to be found in UTEPcam’s architecture was RAM memory.
3.3.4Communication Interface
UTEPcam receives and sends commands via the universal synchronous asynchronous
receive transmit (USART) serial communication protocol. Take Video (TV), Take Picture (TP),
and Change Resolution (CR) are just a few of the possible commands that can be sent to the
UTEPcam via USART. UTEPcam can send or receive commands to and from either a vision
sensor node or a computer. If commands are sent by a computer, a MAX232 voltage level shifter
should be used (see below). UTEPcam can be configured to send or receive commands wired or
wirelessly; if, however, UTEPcam is configured for wireless transmission it is more susceptible
to noise. Furthermore, UTEPcam’s USART baud rate can be configured as well with pros and
cons to high and low baud rates.
If UTEPcam receives its commands from another computer, as stated above a MAX232
voltage level shifter must be used. Microcontrollers consider five volts a logical one and zero

30

volts a logical zero. Computers however, may considers +16 volts a logical one and -16 volts a
logical zero. If UTEPcam is configured for wired communication, a MAX232 voltage level
converter must be used to convert computer operating voltages into MCU operating voltages. If
UTEPcam is configured for wireless communication, a XBee radio transceiver is inserted inbetween the MAX232 voltage level converter and UTEPcam’s MCU. There is no need to modify
UTEPcam’s software if wireless transmission is used, however, UTEPcam is more susceptible to
noise if configured for wireless transmission.
UTEPcam uses Xbee multipoint radio frequency module when configured for wireless
communications. XBee is a radio transceiver module which follows the Zigbee wireless
transmission protocol. XBees operate in the industrial, scientific and medical (ISM) radio band.
Unlike other bands which require a license, the ISM band is an unlicensed band. Unfortunately, a
lot of wireless devices such as baby monitors, WLAN devices, and garage door openers also
transmit in the ISM band [6]. This leaves UTEPcam vulnerable to interference or noise. In
addition, increasing UTEPcams baud rate furthermore increases its susceptibility to noise. The
lowest and highest baud rate that UTEPcam can achieve is 4800 and 115200 bits per second,
respectively.

3.4 Functional description of the architecture
This section presents the overall functionality of the UTEPcam architecture by describing
different modes of operations and the necessary control logic and protocols for different modes
of operation. To control power consumption, UTEPcam has four modes of operation. The four
modes of operation are the idle mode, the acquisition mode, the Flash storage mode, and the
communication mode. Each mode is design to enable only the modules that are needed to carry

31

out the sequence of events for that particular mode. The modules that are enable and disabled are
RAM, FLASH, image sensor, transceiver and CPU.
Idle mode is UTEPcam’s default mode. During idle mode, UTEPcam is not executing
any command. Every module (RAM, FLASH, transceiver and the image sensor) is disabled and
the CPU is in stand-by. UTEPcam consumes the least amount of power when it is in idle mode.
If UTEPcam receives the commands TV or TP, the mode is changed form idle to
acquisition mode. During acquisition mode, the RAM and image sensor are enabled and the
FLASH and transceiver are disabled. When in acquisition mode the CPU is in active mode.
During acquisition mode, UTEPcam is acquiring an image and hence the RAM chip is latching
an image/frame from the image sensor.
Once an image/frame is latched by the RAM chip, UTEPcam changes from acquisition
mode to FLASH storage mode. During this mode the image is assumed to be in the RAM chip
and is waiting to be copied to the FLASH memory. During this mode neither the image sensor
nor the transceiver is needed, hence they are disabled. The RAM and FLASH modules however
are both enabled during FLASH storage mode. When in FLASH storage mode, the CPU is also
in active mode.
If the user desires for UTEPcam to transmit the acquired images, UTEPcam changes
from idle mode or FLASH storage mode to communication mode. During communication mode,
both the RAM chip and the image sensor are disabled but the FLASH memory and the
transceiver are enabled. During communication mode the CPU is grabbing the acquired images
from FLASH and transmitting to the transceiver module.

32

3.5 Software architecture
The software program was designed to primarily conserve and reduce power
consumption. If the user, for example, does not provide any input to the system, the system will
be and remain in sleep mode until the user decides to send a command. If the UTEPcam system
receives a signal from UTEPcamVIEW, the software will check for the following commands:
“Take Video (TV)”, “Take Picture (TP)”, and “Adjust Resolution (AR)”. If any other characters
are received by the system, the system ignores the commands and goes back to sleep. If the
characters received matches one of the commands above, the system will execute the desired
operation.
If the user types the command “TV”, UTEPcam will record a video clip with the user
specified number of frames. Then, as can be seen in the flowchart in Figure3.5.1, the system will
turn on the transceiver, the RAM chip, the CMOS image sensor and the flash SD card. The
UTEPcam will then try to capture the image until the desired number of frames is met. Capturing
an image or images, if a video is requested, requires the use of the RAM counter, and the
VSYNC. For instance, the MCU checks if the VSYNC pin in the CMOS image sensor is set
high, and if it is, the RAM counter is enabled. After the RAM counter is enabled, the MCU
checks if VSYNC is once again set high, and if it is, the RAM counter is disabled and an entire
frame is stored inside the RAM chip. Once this is done, the MCU then transfers the captured
frame from the RAM chip to the SD card for permanent storage. If the user wishes to capture a
video, which requires more than one frame, then the above process is repeated until all frames
are copied to the SD card. Once this phase is completed, the image or the frames can be sent
wirelessly at the user’s request. The sequence of events is shown on Figure3.5.1.

33

Figure3.5.1 UTEPcam Software flowchart

Figure 3.5.2UTEPcam block diagram [2]
The UTEPcam embedded software was written in a modular fashion. In total there are
thirteen modules and there relationships can be seen in Figure 3.5.2. The most important module

34

is the Camera Interface class. This class sets up the ports which interface to the RAM chip, SD
card, XBEE and the CMOS image sensor. The data gathered by the Camera Interface is used by
the Frame Manager to make frame-level decisions. A User-Interface Manager processes
incoming commands and generates outgoing serial packets. Finally, a simple event-dispatching
executive sits at the top of UTEPcam’s embedded software system and it handles the events
generated in the system.

3.6 Graphical user interface design
A LabVIEW graphical user interface enables operation of the UTEPcam. Using the
interface, a user can make the UTEPcam take a picture or record a video clip, along with
accessing different options. One of these options includes checking the system to determine if it
is connected to the vision sensor node. If it is indeed connected, the vision sensor node will give
an acknowledge response to the user interface which confirms that there is indeed
communications between the GUI and the UTEPcam. This is typically done before any further
action is performed to confirm the communication link between the computer and the system.
If the user decides to take a picture, the number of desired frames must be entered to
capture the video clip; by default, the number of frames is 14. Another option the user can set is
the location or file where the image raw data is stored. This raw data is stored as a row vector in
a text file. To reproduce the image, a user may utilize MATLAB software to reshape the row
vector in to a series of frames to view the video. However, if the user only wants to take an
image, the user must specify the file directory where the raw image data is to be stored. After
obtaining the image data and obtaining the image through MATLAB, the GUI enables the user to
see the image produced with MATLAB by selecting the directory where the image created by
MATLAB is stored. Since the image produced by MATLAB is read by the user interface, the

35

MATLAB directory must point to the same location as the directory that the user interface uses
to read the image and display in the GUI.

36

Chapter 4: Evaluation of UTEPcam
4. 1 Image Acquisition system analysis
The UTEPcam system enables the acquisition of images as well as video. The resolution
of the acquired image can be set by the user using the LabVIEW user interface. Various
resolutions have advantages and disadvantages. For instance, the image in Figure 4.1 below was
acquired in CIF mode and shows greater contrast and detail compared to the image captured in
the QCIF mode shown in Figure 4.1.2.One disadvantage of using the CIF mode is that it
requires more pixels and consequently more memory allocations in the RAM chip to store the
temporary image data before the MCU sends the data to the SD card. Recording video in this
resolution would also require more data to be transferred from the MCU to the SD card, and
since the communication protocol between the MCU and the flash uses SPI (Serial Peripheral
Interface), the frame rate is not as fast as a video captured in CIF mode. Assuming a wired
transmission, transmitting video between the MCU and the user interface in CIF mode will take
four times as much compared to the video transmission in the QCIF mode. The main
disadvantaged in acquiring an image in the QCIF mode is that the resolution is not as great as the
resolution in CIF mode.
The lines present in both Figure 4.1.2 and Figure 4.1.1represent the distortion due to reducing the
amplitude of the clock pulse from 5 V to 3.3 V. This introduces pixel values not latched in the
RAM chip and consequently sent to the SD card. If there is an interfering object between the
vision sensor node and the user interface having the same frequencies as the operating
frequencies of the vision sensor node, then the lines, shown in the images, will appear in larger
quantities throughout the image with higher intensity levels.

37

Figure 4.1.1 CIF mode image

Figure 4.1.2QCIF mode image
4. 2 Scalable Features of the UTEPcam
Although UTEPcam’s performance is satisfactory, there are some possibilities for
improvement. Since UTEPcam’s hardware as well as software architecture are scalable, with
some slight hardware and software modifications, UTEPcam’s pixel resolution may be

38

drastically increased. This would be accomplished by swapping the computationally-constrained
CPU for a resource-rich CPU, replacing the OV6620 image sensor for one with a higher pixel
resolution and integrating a variable clock generator circuit. Furthermore, modifying UTEPcam’s
program so thatOV6620 is left in its default 356 X 292 pixels resolution is an example of a
software modification which would also increase the pixel resolution of UTEPcam.
Increasing the pixel resolution of UTEPcam does not come without a cost. While both
hardware and software modifications will increase UTEPcam’s pixel resolution, they will also
increase its power consumption. In some instances, however, the default UTEPcam pixel
resolution is insufficient and hence it needs to be increased.

4.2.1 Scalable image sensor
A CMOS image sensor pixel is an actual single photodetector. A photodetector is a
sensor which measures the amount of photon energy that is being radiated onto its surface. A
CMOS image sensor is created out of thousands of pixels which are arranged in such a way as to
create a “two-dimensional array of pixels”. Digital images are created by focusing an object’s
optical image onto the two-dimensionally array follow by quantizing and sampling of the
focused image. Different CMOS image sensors have different sized two-dimensionally arrays.
Generally, high-quality CMOS image sensors tend not to have bigger sized two-dimensionally
arrays but denser ones instead [10, 14].
There is a distinction between increasing the size of a CMOS sensor’s two-dimensionally
array of pixels and increasing its “pixel density.” Two CMOS image sensors can have the same
number of pixel rows and pixels columns on their two-dimensional array of pixels but their pixel
density might be different. Denser CMOS image sensors tend to have more rows and columns in

39

their images. CMOS image sensors with low pixel density tend to produce blockier images than
their high pixel density counterparts.
The pixel resolution of UTEPcam can be enhanced simply by replacing the CMOS image
sensor with one with greater pixel density. Conversely, if reduction in power consumption is
desired, then UTEPcam’s CMOS image sensor can be replaced with a low pixel resolution
CMOS images sensor. It is possible to swap UTEPcam’s CMOS image sensor because although
different CMOS image sensors have different size pixel arrays, they nevertheless function
similarly. All CMOS images sensors have a VSYNC, HREF and PCLK pins with which they
signal that a new image is about to start being sent on the pixel data bus, a new line is about to
start, and that there is valid pixel data on the pixel data bus. What differentiates different quality
CMOS image sensors is the number of times the synchronization pins toggle per image. For
example, a CMOS image sensor with a 144 X 176 two-dimensional array toggles its VSYNC,
HREF and PCLK once, 144 and 176 times respectively per image. Similarly, a 392 X 256
CMOS image sensor toggles its VSYNC, HREF and PCLK once, 292 and 356 times respectively
per image.

4.2.2 Scalable CPU
It was previously mentioned that UTEPcam’s architecture allows its CPU to be swapped
for either a resource-rich or computationally-constrained CPU. If UTEPcam’s CPU is swapped
for a resource-rich CPU, then its resolution may increase. Conversely, if UTEPcam’s CPU is
swapped for a computationally-constrained CPU, its power consumption may be reduced. It is
possible to swap UTEPcam’s CPU because the Atmega32 architecture is not unique. Almost all
current CPU’s are equipped with at least a few kilobytes of EEPROM, SRAM and FLASH type
memory. Additionally, most CPU manufacturers also integrate Timers, SPI, USART, TWI,

40

analog-to-digital converter (A/D) and comparator circuits into their CPU dies. Furthermore, most
CPU vendors also manufacture their CPUs with several I/O ports.
It is also possible to swap UTEPcam’s CPU because most of the image acquisition
process is being handled by the CMOS images sensor and the static RAM chip. UTEPcam’s
CPU merely transfers the collected image from the RAM chip to the SD flash card. A resourcerich CPU will transfer the data image faster from the RAM chip to the Flash card than will a
computationally-constrained CPU. Although resource-rich CPUs have wider CPU registers, the
increase in pixel resolution comes from them being able to be clocked faster. This is because the
UTEPcam’s image acquisition architecture outputs the pixel data serially on an 8-bit wide bus.
The output of UTEPcam’s image acquisition architecture also reveals that the CPU must be
minimally an 8-bit CPU. Although it is theoretically possible to swap in a lower bit CPU, it
would require complete modification of UTEPcam’s image acquisition architecture.

4.2.3 Source code alteration
It is also possible to increase UTEPcams pixel resolution without modifying UTEPcam’s
hardware architecture. As was previously mentioned, the OV6620 CMOS image sensor can be
set to two resolution modes: QCIF and CIF. By default, the UTEPcam’s image sensor is set to
CIF mode. By simply removing the instruction that sends the QCIF mode command to the
OV6620, it is possible to double the pixel resolution of UTEPcam. The register and bit that
determine the OV6620 resolution are CMOC and bit 5, respectively. If CMOC is set high, the
OV6620 is set to QCIF resolution else is set to CIF resolution

41

4.3 Clock correction
Several problems where encountered during the implementation of the UTEPcam
architecture. One of the problems was caused by the different operating voltages of the clock in
the CMOS image sensor and the clock in the RAM chip. The peak-to-peak voltage of the clock
for the CMOS image sensor is 5 V, while the peak-to-peak voltage of the clock for the RAM
chip is 3.3 V.
An initial solution to this problem was to implement a voltage divider using a network of
resistors connected in series in between the RAM chip and the CMOS image sensor. However,
the output voltage of the network distorted the clock pulse being inputted into the RAM chip.
Another better solution was to reduce the voltage of the clock pulse using a pair of diodes
connected in series. The resultant clock pulse was less distorted.

42

Chapter 5: Conclusions and Future Work
UTEPcam is a low-power, low-cost image sensor node whose architecture enables it to
acquire mid to high resolution images without significantly increasing power consumption.
Although UTEPcam’s performance is satisfactory considering its power consumption and
processing power, nevertheless its performance can be further improved.
At the time of this publication, UTEPcam’s frame rate was limited to two frames per
second. When this document was published, UTEPcam’s bottleneck frame rate was due to the
speed at which UTEPcam’s CPU could transfer the images stored on the RAM chip to the Flash
type memory. According to datasheets for the CPU, CY7C09099V RAM, and Flash, the
theoretical transfer speed should be around 8 Mbits per second. It is hypothesized that the
deviation from the anticipated transfer rated was cause by improper initialization. Before data
can be transfer from or to the flash memory, t it must be initialized by enabling the SPI clock for
approximately 80 clock pulses. During this initialization, the SPI clock should be set to no
greater than 1 MHz. After initialization, the SPI clock can be increased up to 20 MHz. The
UTEPcam designer was unaware of this protocol and as a result initialized the SPI clock to only
1MHz.
The current UTEPcam embedded software utilizes approximately 500 bits of the 8K flash
program memory, 20 bytes of the 1K RAM, and 0 bytes of the 512 bytes EEPROM. Thus, there
is plenty of space left on the system to add custom image processing algorithms. Color detection,
image subtraction and image compression are examples of algorithms which have been
incorporated into image sensor nodes similar to UTEPcam. By incorporating a compression
algorithm into UTEPcam’s embedded software; power consumption may be further decreased.
Out of all the modules found on UTEPcam, the transceiver is by far the device that consumes the

43

most power. By decreasing the image size, the transmission time would also decrease resulting in
lower power consumption.

44

Bibliography
[1]

“AN037 – Interfacing to an MMC or SD card via SPI.” June 27,
2008<http://www.cyantechnology.com/public/AN037InterfacingtoanMMCorSDCardvia
SPI.pdf>.

[2]

“ATMEL 8-bit AVR Microcontroller with 32K Bytes In-System Programmable Flash.”
Atmel. Aug. 30, 2006
<http://www.atmel.com/dyn/resources/prod_documents/doc2503.pdf>.

[3]

“AVRcam.” JRobot. 2004. July 3, 2007 <http://www.jrobot.net/Projects/AVRcam.html>.

[4]

Belbachir, Ahmet N. Smart Cameras. New York: Springer, 2010

[5]

“Cypress Perform Synchronous Dual-Port Static RAM.” Cypress Semiconductor
Corporation. 2010. January 13, 2008 <http://www.cypress.com/?docID=22263>.

[6]

“Electronic Code of Federal Regulations.” Janurary 27,
2009<http://edocket.access.gpo.gov/cfr_2005/octqtr/pdf/47cfr15.247.pdf>.

[7]

“How Motes Work.” How Stuff Works. Marshall Brain. 1998-2011. May 6,
2008<http://computer.howstuffworks.com/mote.htm>.

[8]

“How 2-D Bar Codes Work.” How Stuff Works. Jonathan Attleberry. 1998-2011. May 2,
2008 <http://www.howstuffworks.com/innovation/repurposed-inventions/2dbarcodes.htm>.

[9]

Margi, C. B., X. Lu, G. Zhang, et al. “Meerkats: A Power-Aware, Self-Managing
Wireless Camera Network for Wide Area Monitoring.” July 1, 2008
<http://users.soe.ucsc.edu/~manduchi/papers/meerkats-dsc06-final.pdf>.

[10]

“Omnivision Advanced Preliminary Information.” July 3,
2007<http://www.cs.cmu.edu/~cmucam/Downloads/ov6620DSLF.PDF>.

[11]

Radke, Richard J. “A Survey of Distributed Computer Vision Algorithms.” September
13, 2009<http://www.ecse.rpi.edu/~rjradke/papers/jaise08-radke.pdf>.

[12]

Ramihi, Mohammad, Rick Baer, Obimdinachi I. Iroezi, et al. “Cyclops: In Situ Image
Sensing and Interpretation in Wireless Sensor Networks.” November 2-5, 2005.
September 2, 2009
<http://www.google.com/url?sa=t&source=web&cd=1&ved=0CB0QFjAA&url=http%3
A%2F%2Fciteseer.ist.psu.edu%2Fviewdoc%2Fdownload%3Bjsessionid%3D79EFE60E
007CFCA0748E238F63820A3F%3Fdoi%3D10.1.1.92.3772%26rep%3Drep1%26type%
3Dpdf&rct=j&q=Cyclops%3A%20In%20Situ%20Image%20Sensing%20and%20Interpr

45

etation%20in%20Wireless%20Sensor%20Networks&ei=UEMuTtzoOJH2tgP6wqTzDw
&us`g=AFQjCNEe9FMoTQcJtgylH9CsUCT4ZQ5C6g&cad=rja>.

[13]

Sharif, Atif, Vidyasagar Potdar, Elizabeth Chang. “Wireless Multimedia Sensor Network
Technology: A Survey.” IEEE (2009) 608-613

[14]

Sparkfun Electronics. February 3, 2008 <www.sparkfun.com>.

[15]

Walmart. 2011. Walmart Stores Inc. February 15, 2008. <www.Walmart.com>.

[16]

“XBee Multipoint RF Modules.” Janurary 17,
2008<http://www.digi.com/pdf/ds_xbeemultipointmodules.pdf>.

[17]

“XBee®/XBee-PRO® RF Modules.” September 23, 2009. Digi International Inc.
September 13, 2009<http://www.sparkfun.com/datasheets/Wireless/Zigbee/XBeeDatasheet.pdf>.

[18]

“YCbCr.” Wikipedia. July 22, 2010. June 23,
2009<http://en.wikipedia.org/wiki/YCbCr>.

[19]

Zainaldin, Ahmed, IoannisLambadaris, BiswajitNandy. “Video over Wireless Zigbee
Networks: Multichannel Multi-radio Approach.” IEEE (2008) : 882-887

46

Vita
Ricardo Zepeda was born on September 1, 1983 in El Paso, Texas. During his last
semester as an undergraduate student, he along with his partner Andres Ibarra, were awarded
2007 best senior project design. In spring 2007 he received his Bachelor’s degree in Electrical
Engineering from The University of Texas at El Paso (UTEP). His first semester after receiving
his bachelor’s degree, he was offered a research position with the Wireless Sensor Network
Research Group. As a research assistant he worked on incorporating CMOS image sensor to
Wireless Sensor Network. Upon entering the master’s degree program, he was offered another
research position headed by Dr. Rolfe Sassenfeld. While there he was given the task of
improving a multi-frame blind deconvolution (MFBD) algorithm. He received his Bachelor’s
and Masters Degrees from The University of Texas at El Paso (UTEP) on fall of 2007 and spring
2010.

Permanent address:

12493 Angie Bombach
El Paso, TX, 79928

This thesis was typed by Ricardo Zepeda.

47

