AbstrAct
INtrODUctION
Although face recognition has been researched for many years, promising laboratory systems developed with off-line face databases have not fared well when deployed in the real world. This is largely because of the deleterious effect of image acquisition problems such as lighting angle, facial expression, and head pose. Accuracy may drop to 10% or even lower under uncontrolled image acquisition conditions. Such conditions are often encountered in automatic identity capture for video surveillance and for identification of faces from a mobile camera phone.
Indeed, the mobile phone is a very interesting target platform for advanced pattern recognition. Many modern phones can reliably recognize speech in noisy environments. Some recognize handwriting, including Chinese characters. Newer phones are equipped with fingerprint reading sensors and software. At first glance, it seems strange that such advanced pattern recognition algorithms representing decades of research are found on garden-variety mobile handsets. Yet the small size of the mobile phone encourages the development of pattern recognition interfaces rather than bulky keyboards. The high cost of licensing these algorithms is defrayed by the sheer volume of mobile phones being sold every year. With this in mind, the authors have implemented an early prototype of a face recognition module on a mobile camera phone so the camera can be used to identify the person holding the phone and unlock the keyboard. Robust face recognition is a key requirement of such a system.
Our recent research has led to recognition systems that are much less sensitive to uncontrolled image capture. These methods are described in this chapter. Now the challenge is to incorporate these new algorithms into embedded platforms to help realize our dream of ubiquitous, low-cost face recognition.
bAcKGrOUND OF FAcE rEcOGNItION
Face recognition under uncontrolled image acquisition conditions is a challenging goal, not only because of the gross similarity of all faces, but also because of the vast differences between face images of the same person due to variations in lighting conditions, facial expression, and pose. An ideal face recognition system should recognize new images of a known face from a novel image and be maximally insensitive to nuisance variations in image acquisition. However, the fact that differences between images of the same face due to these nuisance variations are normally greater than those between different faces (Adini, Moses & Ullman, 1997) makes this task exceedingly difficult. Variations in illumination, facial expression, and head pose significantly degrade the performance of a face recognition system. Among these three important variations, pose variation is the hardest one to model (Phillips, Grother, Micheals, Blackburn, Tabassi & Bone, 2003) . Therefore, most of the existing face recognition systems only work well in well controlled environments (Chellappa, Wilson & Sirohey, 1995) with frontal images taken under constrained or laboratory conditions. However, this requirement is too strict to be met in many situations or when only a few gallery images are available, such as in recognizing people from images captured on mobile cameras. 
Illumination-Insensitive Face recognition
Many approaches have been proposed for illumination-invariant recognition (Yilmaz & Gokmen, 2000; Yongsheng & Leung, 2002) and expression-invariant recognition (Beymer & Poggio, 1995; Black, Fleet & Yacoob, 2000) . Many of these methods suffer from the need to have large numbers of example images for training, or they are computationally too expensive to be used in mobile camera phone platforms. In 2004, we developed adaptive principal component analysis (APCA) and rotated APCA (Chen & Lovell, 2004; Lovell & Chen, 2005) , which inherited merits from both PCA and FLD (Fisher linear discriminant) (Belhumeur, Hespanha & Kriegman, 1997) by warping the face subspace according to the within-class and between-class covariance to compensate for illumination and facial expression variations. We sometimes refer to the highly representative features used in these techniques as "eigenfisher-faces" to reflect the hybrid nature of their derivation.
We first apply principal component analysis (PCA) (Turk & Pentland, 1991a , 1991b to project each face image into a face subspace with reduced dimension to form an m-dimension feature vector s j,k with k = 1, 2, …, K j denoting the k th sample of the class S j . After constructing the face subspace for image representation, we warp this space to enhance class separability. According to Bayes rule, the conditional density function is:
where u j is the mean of class S j and cov j is the covariance matrix of S j under the assumption of a Gaussian distributed random variables. In order to compensate for the influence of between-class covariance on the estimation of the pdf, we use a whitening power parameter, p, to whiten the distribution: 
We then define the filtering matrix γ by:
where q is an exponential scaling factor determined empirically. We use the following cost function, which is a combination of error rate and the ratio of within-class distance to between-class distance, and optimize it empirically using an objective function defined by Box 2, where d jj,k0 is the distance between the sample s j,k and s j,0 , which is the standard image reference (typically the normally illuminated neural image) for class S j . Correspondingly, d jm, k0 is the between-class distance between sample s j,k and the reference image s m,0 for class S m .
Expression Invariant Face recognition
We rotate the feature space according to withinclass covariance to enhance the representativeness of the features and to improve estimation of the conditional pdf. The rotation matrix R is a set of eigenvectors obtained by applying singular value decomposition to the within-class covariance matrix. Every face vector s is transformed into the new space by R:
We test RAPCA, APCA, and PCA on the Asian Face Image Database (I.M. Lab), which consists of 535 facial images under five different standardized illuminations and 428 images with four different facial expressions corresponding to 107 subjects.
Each image is 171*171 pixels with 256 grey levels per pixel. We perform threefold cross-validation on the database in an attempt to obtain reliable estimates of accuracy on unknown faces. Figure  2 -4 shows that our methods offer significant recognition performance gains over standard PCA when changes in lighting, expression, and both lighting and expression are present.
APCA and RAPCA perform well against both lighting variation and expression change, but they only perform well with frontal face images as registered in the database. In the mobile camera phone scenario, many face images captured by the cameras are not perfectly frontal, so the need for a pose-invariant face recognizer becomes crucial.
background of Pose-Insensitive Face recognition
The performance of face recognition systems drops significantly when large pose variations are present (Phillips et al., 2003) . Many approaches have been proposed to compensate for pose change. Wiskott, Fellous, Kuiger, and von der Malsburg (1997) extend the DLA-(dynamic link architecture-) based face recognizer (Lades et al., 1993) to deal with larger pose variations. The face image is represented by a labeled graph called the face bunch graph (FBG), which consists of N nodes connected by E edges. The nodes are located at facial landmarks , 1, ,
 , which are called fiducial points. By using such a graph, the correspondences between face images across different viewpoints can be found. Matching an FBG on a new image is done by maximizing a graph
similarity between an image graph and the FBG at that pose. DLA approaches are invariant to pose change with rotations in depth less than 30 degrees, but the computation is too expensive 1 to be used in a real-time face recognition system.
In 1994, Pentland, Moghaddam, and Starner (1994) proposed two methods for use in face recognition using variable pose. Given N individuals under M different views, face recognition and pose estimation is carried out in a universal eigenspace computed from the combination of all the NM images. In this way, a "parametric eigenspace" will encode both identity and view conditions (Murase & Nayar, 1993) . The other method is to build a "view-based" set of M separate eigenspaces, each capturing the variation of the N individuals in a common view (Darrell & Pentland, 1993) . In the view-based method, each view space's eigenvectors are used to compute the "distance-from-face-space" (DFFS) (Turk & Pentland, 1991a , 1991b ; once the proper view space is determined, the input face image is encoded using the eigenvectors of that view space and then recognized.
In 2003, Chai, Shan, and Gao (Xiujuan, Shiguang & Wen, 2003) presented a framework for pose-invariant face recognition using a pose alignment method, which is based on a statistical transformation. Their algorithm partitions the face into three rectangular regions, and the (Chen & Lovell, 2004; Lovell & Chen 2005) affine transformation parameters associated with various poses are learned from the one-to-one rectangle mapping relations. The affine transformation parameters associated with different poses can be used to align the input nonfrontal face image to frontal view face image, and the virtual frontal view can be generated through the polynomial warping. Experiments on the FERET dataset (NIST, 2001) show that their method can increase recognition rate by an average of 17.75% compared to face recognition without pose alignment. But their recognition rate was still quite low, only reaching an average of 58% under pose rotations of 30 degrees. Moreover, they did not develop an automatic way to mark the key points on facial structures.
In 2000, Cootes, Walker, and Taylor (2000) proposed "view-based active appearance models," which was based on the idea that a small number of 2D statistical models are sufficient to capture the shape and appearance of a face from any viewpoint. They demonstrated that to deal with pose angle change from left to right profile, only three distinct models were needed; each model is trained on the labeled images of a variety of people. They learn the relationship between model parameters and head orientation based on the assumption that the model parameters trace out an approximately elliptical path, which can be used to estimate both the orientation of any head pose and synthesize a new face at any orientation. They applied this method to face tracking but did not do any face recognition experiments. Sanderson, Bengio, and Gao (2006) address the pose mismatch problem by extending each frontal face model with artificially synthesized models for nonfrontal views. The synthesized methods are based on several implementations of maximum likelihood linear regression (MLLR) and standard multivariate linear regression (LinReg). In the MLLR-based approaches, they used prior information to construct generic statistical face models for different views. A "generic" GMM was used to represent a population of faces. Each nonfrontal generic model is constructed by learning and applying an MLLR-based transformation to the frontal generic model. The LinReg approach is similar to the MLLR-based approach. The main difference is that it learns a common relation between two sets of feature vectors instead of learning the transformation between generic models. They evaluate these two approaches by applying it to two face verification systems: a holistic system based on PCA-derived features, and a local feature system based on DCT-derived features (Sanderson & Paliwal, 2002) . Experiments on the FERET database show that for the holistic system, the LinReg-based technique is more suited than the MLLR-based technique. It also shows that the local feature system is less affected by view changes than the holistic system.
HEAD POsE cOMPENsAtION

Facial Feature Interpretation
Active shape models (ASMs) (Cootes, 1992; Cootes, Taylor, Cooper & Graham, 1995) and active appearance models (AAMs) (Cootes, Edwards & Taylor, 2001; Edwards, Taylor & Cootes, 1998) , first introduced by Cootes and Taylor, is the most famous and robust approach to build such deformable models and used to interpret images.
Statistical Models of Shape and Appearance
The shape of an object is represented by a set of n landmarks, which can be in any dimension. Good landmarks are the points that lie at the corners of object boundaries or obvious biological landmarks. For instance, a face image can be labeled by N landmark points, {( , )} i i
x y , which are located on key feature points such as the eyes, nose, mouth, chin, and so forth, and be represented by a 2N element vector x: 1  2  1  2, , ( , , , , , )
After shape normalization, we apply principal component analysis (PCA) to the sample shape to reduce the dimensionality. Any shape x in the training set can be represented by:
where x is the mean shape vector, P s is a matrix containing the k eigenvectors with largest eigenvalues, and b s is a vector representing the weighting of the eigen-components. We can also generate a statistical appearance model to represent texture variations. We warp each training sample to the mean shape to form a texture vector g im , which is the texture covered by the mean shape. We then apply PCA to these data and use similar methods to generate statistical shape models. Any texture g in the training set can be represented as:
where g is the mean appearance vector, P g is the matrix describing the appearance variations matrix learned from the training sets, and b g is the component vector.
Combination of Statistical Shape and Appearance Models
The shape and appearance parameters b s and b g can be used to describe the shape and appearance of any example. As there are correlations between the shape and appearance variations of the same person, we can apply PCA to these data:
where W s is a diagonal matrix that represents the change between shape and texture. We apply PCA to these vectors to get:
where P c are eigenvectors and c is a vector of appearance parameters controlling both shape and texture of the model.
Here, x is the mean shape, g is the mean appearance, Q s is the matrix describing the shape variations that have the shape eigenvectors at its columns, and Q g is the matrix describing the texture that has the texture eigenvectors as its columns. The vector of components c is used to control the shape and texture change.
Interpreting Images Using Active Shape and Appearance Models
Interpreting images means we try to find a set of model parameters containing shape, orientation, scale, position, and texture information of this object to generate a sample of this model that can best match the input image. Given a novel image, active shape models are used to interpret the shape information, and active appearance models are used to interpret the texture information.
combination of cascade Face Detector with Active Appearance Model search
The initialization of the active appearance model search is a critical problem since the original AAM search is a local gradient ascent. Some failed AAM searches due to the poor initialization can be seen in Figure 3 -1.
In 2001, Viola and Jones (2001) proposed an image-based face detection system that can achieve remarkably good performance in real time. The main idea of their method is to combine weak classifiers based on simple binary features, which can be computed extremely fast. Simple rectangular Haar-like features are extracted; face and nonface classification is done using a cascade of successively more complex classifiers that discards nonface regions and only sends facelike candidates to the next layer's classifier. Thus, it employs a "coarse-to-fine" strategy (Fleuret & Geman, 2001) . Each layer's classifier is trained by the AdaBoost learning algorithm. AdaBoost is a boosting learning algorithm that can fuse many weak classifiers into a single more powerful classifier. Our face detector is based on the Viola-Jones approach using our own training sets, as illustrated by Figure 3 -2.
The cascade face detector finds the location of a human face in an input image and provides a good starting point for the subsequent AAM search. which then precisely marks the major facial features such as mouth, eyes, nose, and so forth.
Pose Estimation
Here we follow the method of Cootes, et al. (2000) . They assume that the model vector c is related to the viewing angle, θ, approximately by a correlation model: The estimation of θ is not entirely accurate due to landmark annotation errors or possibly regression learning errors. But this model appears to be quite sufficient to be used to synthesize the face images for recognition.
Frontal View synthesis
After we estimate the angle θ, we can use the model to synthesize new views. Here we will synthesize a frontal view face image, which will be used for face recognition.
Let c res be the residual vector, which is not explained by the correlation model: As we want to synthesize frontal view face image, α is 0, so this becomes: The shape and texture at angle 0° can be calculated by:
The new frontal face image then can be reconstructed. A front view synthesized from the frontal correlation model can be seen in Figure 3-3. 
rEsULts FrOM EXPErIMENts training of Frontal Face Models
To train the frontal face model, we collected face image samples from 40 individuals for the training set. For each person, we had three images in 
training of correlation Model
We then apply our combined AdaBoost-based cascade face detector and AAM search on the rest of the images of the Feret b-series dataset where each person has seven pose angles, ranging from left 25°, 15°, 0°, to right 15°, 25°. Despite Cootes, et al.'s statement that a near frontal face model can deal with pose change from left 45° to right 45°, we found that the AAM search cannot locate the facial features precisely on most face images with high (40°) pose rotation, even given a very good initialization position, especially for face images with a large noise. AAM searching on the remaining face images with smaller pose angle (NIST, 2001) change can achieve 95% search accuracy rate. Frontal parameters c 0 , c c , and c s are learned from the successful AAM search samples. By using the same method, we get the rotation parameters for left and right rotation model, respectively.
synthesis results from a Frontal Face Image
Given a frontal face image as in Figure 4 -4, Figure  4 -5 shows the synthesized results from left 45° to right 45°.
Figure 4-5. Synthesized results from left 45° to right 45°
High Pose Angle Face recognition results
We trained our APCA face recognition model using the Asian Face Database (I.M. Lab). We selected face images from 46 persons with good AAM search results; each person had seven pose angles ranging from left 25°, 15°, 0°, to right 15°, 25°. We then formed two datasets: one is the original image set and the other is the synthesized frontal image set; each of them contains 322 images. We only registered the frontal view images into the gallery and applied both PCA and APCA on the high pose angle images for testing (276 images on each of the datasets). The overall recognition results from the threefold cross-validated trials are shown in Figure 4 -6.
It can be seen from Figure 4 -6 that the recognition rates of PCA and APCA on synthesized images is much higher than that of the original high pose angle images. The recognition rate increases by up to 48 percentage points from 9% to 57% for images with a view angle of 25º. Yet even for smaller rotation angles less than 15º, the accuracy increases by up to 29 percentage points from 50% to 79%. Note that the recognition performance of APCA is always significantly higher than PCA, which is consistent with the results in Chen and Lovell (2004) and Lovell and Chen (2005) .
rEAL-tIME AUtOMAtED FAcE rEcOGNItION sYstEMs
To date, the majority of the research work on automated face recognition (AFR) has focused primarily on developing novel algorithms and/or improving the efficiency and accuracy of existing algorithms. As a result, most solutions developed (similar to the examples given in previous sections) are typically high-level software programs targeted for general-purpose processors that are expensive and usually nonreal-time solutions. Since face recognition is typically the first step and frequently a bottleneck in most solutions due to the large search space and computationally intensive operations, it is reasonable to suggest an embedded implementation specifically optimized to detect faces and recognize them. An embedded solution would entail many advantages such as cost and miniaturization, as only a subset of the hardware components are required compared to 
related Work
There is a vast range of embedded technologies and associated design methodologies that can be employed for the design of an embedded AFR system. The most common technologies are pure hardware, embedded microprocessors, and configurable hardware.
Pure hardware systems are typically based on very large-scale integrated circuit (VLSI) semiconductor technology implemented as application-specific integrated circuits (ASIC). Compared to the other technologies, ASICs have a high operating frequency resulting in better performance, low power consumption, high degree of parallelism, and well-established design tools. However, a large amount of development time is required to optimize and implement the designs. Also, due to the fixed nature of this technology, the resulting solutions are not flexible and cannot be easily changed, resulting in high development costs and risk.
On the other hand, software programs implemented on general purpose processors (GPPs) offer a great deal of flexibility, coupled with very well-established design tools that can automatically optimize the designs; little development time and costs are required, thus assuming less risk. GPPs are ideally suited to applications that are primarily made up of a control processing because operations are carried out sequentially (one operation after another). However, they are disadvantaged because minimal or no special instructions are available to assist with data processing (B.D.T. Inc., 2004) . Digital signal processors (DSPs) extend GPPs in the direction of increasing parallelism and providing additional support for applications requiring large amounts of data processing. The drawbacks of microprocessors (both GPPs and DSPs) are high-power consumption and inferior performance compared to an ASIC. The performance of the final solution is limited to the selected processor.
Finally, configurable platforms such as field programmable gate arrays (FPGAs) combine some of the advantages from both pure hardware and pure software solutions (B.D.T. Inc., 2002) ; more specifically, the high parallelism and computational speed of hardware and the flexibility and short design cycle of software. By inheriting characteristics from both hardware and software solutions, naturally the design space for FPGAs is extended for better trade-offs between performance and cost. These design trade-offs are far superior to that of pure hardware or software solutions alone (B.D.T. Inc., 2005) . From an efficiency point of view, the performance measures for FPGAs (i.e., operating frequency, power consumption, etc.) are generally halfway between the corresponding hardware and software measures. Theocharides, Link, Vijaykrishnan, Irwin, and Wolf (2004) investigated the implementation of a neural network-based face detection algorithm in 160nm VLSI technology. The authors' motivation in selecting this technology is due largely to its ability to support a compact and high-speed implementation. The face detection algorithm used is that proposed by Rowley, Baluja, and Kanade (1998a) , which is selected for the following properties: high detection rate, faces can be in different orientations, and the algorithm has a high degree of parallelism. The last property is the most important one in pure hardware-based implementations.
VLsI-based Face Detector
The implemented system operates on greyscale images with a dimension of 300x300 pixels exhibiting the following operating characteristics: clock frequency of 409.5 KHz, power consumption of 7.35 Watts, and area of 30.4 mm2. Though the operating frequency is relatively low, a throughput of 424 images per second was achieved. The system is also able to maintain a reasonably high detection accuracy of 75%, which is comparable to its software counterpart.
As illustrated by Theocharides, et al. (2004) , a VLSI-based implementations can yield an accurate solution with high throughput and low power consumption. However, as mentioned previously, pure hardware-based systems are extremely inflexible. If any system parameters require changing, for example, the size of the input image, the overall system including the environment in which the circuit is integrated, may need to be redesigned.
Face Detection Implemented on Configurable Platforms
Several configurable hardware-based implementations exist, including that by McCready (2000), Sadri, et al. (2004) , and Paschalakis and Bober (2003) . McCready (2000) specifically designed a novel face detection algorithm for the Transmogrifier-2 (TM-2) configurable platform. The Transmogrifer-2 is a multiboard FPGA-based architecture proposed by Lewis, Galloway, Van Ierssel, Rose, and Chow (1998) that is made up of between one and 16 boards. Each prototyping board consists of two Altera Flex 10K100 FPGA chips, four ICube field programmable interconnect device (FPID) chips, and 8MB of SRAM, and operates at a frequency of 12.5 MHz.
The algorithm was intentionally designed with minimal mathematical operations that could execute in parallel-engineering effort has been put in to reduce the number of multiplications required. The implemented system required nine boards of the TM-2 system, requiring 31,500 logic cells (LCs). The system can process 30 images per second with a detection accuracy of 87%. The hardware implementation is said to be 1,000 times faster than the equivalent software implementation.
On the other hand, Sadri, et al. (2004) implemented the neural network-based algorithm proposed by on the Xilinx Virtex-II Pro XC2VP20 FPGA. Skin color filtering and edge detection is incorporated to reduce the search speed. The solution is partitioned such that all regular operations are implemented in hardware, while all irregular control-based operations are implemented on Xilinx's embedded hardcore PowerPC processor. This partitioning allows the advantages of both hardware and software to be simultaneously exploited. The system operates at 200MHz and can process up to nine images per second. Inadequate information is given regarding the resource usage, but a minimum of 18,000 look-up tables (LUTs) and 2Mbytes of memory are required.
Finally, Paschalakis and Bober (2003) designed a system with the intention of integrating it within a power-and resource-constrained environment such as 3G mobile phones, where the face can be tracked for mobile conferencing applications. The algorithm is based on skin color modeling, which makes use of the LogRG 2D color space proposed by Forsyth, Saja, et al. (2005) . Essentially, detection is implemented by identifying the largest skin region and tracking the change in the centroids of these regions over subsequent frames. Strictly speaking, this is not face detection, but it is adequate for the application in which it is intended. The system is implemented on an Altera Apex20K EP20K1000EBC652-1 device requiring 8.3% of the total logic elements (LE), 1.4% of embedded system blocks (ESB), and 700 bytes of memory. The operating frequency is 33MHz and can process up to 434 frames per second.
The examples presented illustrate the obvious compromises between accuracy and algorithm robustness vs. the amount of resources required; that is, to improve the performance of the face detection algorithms, either increase the embedded design complexity, which generally results in higher power consumption and hardware costs, or settle for the less superior solution.
system Level-based Design Methodology Kianzad, et al. (2005) present a framework for designing and exploring different possible face detection implementations on embedded reconfigurable systems. The face detection algorithm employed is that presented by Moon, Chellappa, and Rosenfeld (2002) , which is based on using edge information to detect objects in 2D. In order to exploit parallelism, multiple processors are instantiated and synchronous dataflow diagrams based on synchronization graph models are used to analyze and schedule task execution on each processor. The proposed design was implemented on the Xilinx ML310 development board and is able to process 12 frames per second.
The problem with this proposed design approach is that the model is not easily extendable to other face detection algorithms and hardware platforms. The models created are tightly coupled with the face detection algorithm and hardware architecture used. As a result, in order to consider different design alternatives, several models will need to be created.
DsP-based Implementations
There are two DSP-based AFR system implementations by Batur, Flinchbaugh, and Hayes (2003) and Wei and Bigdeli (2004) that are reported in the literature.
The implementation by Batur, et al. (2003) consists of four stages: face detection, face feature localization, face normalization, and face recognition. The probabilistic visual learning algorithm proposed by Moghaddam and Pentland (1997) is used for face detection. The system was implemented on the Texas Instruments TMS320C6416 fixed point DSP operating at 500 MHz.
On the other hand, the system implemented by Wei and Bigdeli (2004) is made up of three stages: image normalization, face detection, and face recognition. Where face detection is performed using the algorithm proposed by . The system was implemented on the Analog Devices ADSP-BF535 EZ-KIT Lite development board containing a 16-bit fixed point DSP operating at 300MHz.
Similar optimization techniques were employed by both systems, including converting floating point operations to fixed point, write timeconsuming functions in assembly, use available parallelism in the DSPs, and use look-up tables in place of complex arithmetic operations. Unfortunately, both implementations still require over a second to process an image where face detection consumes the majority of the processing time.
Future Directions
Field programmable gate array technology has gained considerable attention over the last decade as an established platform for embedded systems. With the improvement of design tools, it is a platform that supports easy integration of both hardware and software solutions, allowing their advantages to be easily exploited. Software support takes the form of embedded processors, which provides flexibility, while hardware support is in the form of custom instructions, custom peripherals, or coprocessors allowing parallel and fast execution. The advantage of this design paradigm is that existing high-level software solutions can be easily ported onto an embedded system, and unlike GPPs or DSPs, hardware support is readily available. Also, this form of hardwaresoftware partitioning provides an easy means of continual design refinement directed toward evolving system specifications and performance improvements.
As most existing AFR algorithms inherently make use of complex operations and are generally not parallel, pure hardware-or software-based implementations are not the most ideal solutions. Our proposed method will use an embedded softcore processor integrated with custom processor instructions to aid with complex operations relating to the face detection algorithm.
Optimization Using custom Instructions
Configurable custom processors are becoming an ever more popular implementation technology of choice for addressing the demands of complex embedded applications. Unlike traditional hardwired processors that consist of a fixed instruction set from which application code is mapped, configurable processors can be augmented with application-specific instructions, implemented as hardware logic to optimize bottlenecks. This lends toward a method for hardware-software partitioning whereby the efficiency of hardware and the flexibility of software are integrated.
There is a number of benefits in extending a configurable processor with custom instructions. First, transparency: the added custom instructions will improve the performance of the tasks for which they are designed with minor changes to the original code. Second, rapid development and short time-to-market: there is a wide variety of off-the-shelf configurable cores that could be used as a base for development. Additional instructions could be integrated into the processor core as the need to extend its computational capabilities arises. Finally, low-cost access to domain specific processors: generally, the fundamental characteristics of an application area are similar. These characteristics can be summarized as a set of instructions and applied to a variety of similar applications; for example, multimedia applications (Bigdeli, Biglari-Abhari, Leung & Wang, 2004) .
Unfortunately, there are two minor drawbacks to using custom instructions. For one, additional hardware is required, as such a bank of resources needs to be set aside specifically for custom instructions. This is becoming less of an issue as embedded technologies become more economical. Second, as the custom instructions are directly integrated into the processor's pipeline, the maximum operating frequency may be degraded if the instruction is poorly designed.
Augmenting configurable processor cores with custom instructions is a proven optimization technique that has been applied to a wide and varied range of embedded applications. Some published examples include encryption (Groszschaedl, Kumar & Paar, 2004; Juliato, Araujo, Lopez & Dahab, 2005; Tai-Chi, Zeien, Roach & Robinson, 2006) , audio encoding (Bower, 2004) , embedded real-time operating systems (RTOS) (Oliver, Mohammed, Krishna & Maskell, 2004) , biometrics (Aarajt, Ravit, Raghunathant & Jhat, 2006; Gupta, Ravit, Raghunathan & Jha, 2005) , and multimedia (Tsutsui, Masuzaki, Izumi, Onoye & Nakamura, 2002) .
custom Instruction Design Flow
The design flow for identifying and integrating custom instructions into configurable processors is summarized in Figure 5 .1. This is a generic framework that could be applied to any application. First, the software code is profiled to reveal bottlenecks that could be alleviated with the introduction of custom instructions. Obviously, the operation, which has the most significant impact on the overall performance, is optimized first.
There are two approaches to designing the new custom instructions. One method is to reuse IP cores already available in the development suite or third-party libraries, which will further shorten development time. Alternatively, the instructions are designed from scratch. This generally requires greater engineering effort, but the designer has better control over design decisions.
possible. Furthermore, in real-world applications, lighting condition is not usually desirable. In order to achieve real-time performance, a combination of optimization techniques that are low resolution images, fixed-point arithmetic, conversion of key functions to low-level codes, and most importantly through custom instruction, are applied to improve the overall system speed and performance.
cONcLUsION AND FUtUrE WOrK
In this chapter, we described the face recognition algorithm adaptive principal component analysis (APCA) and rotated adaptive principal component analysis (RAPCA), which are insensitive to illumination and expression variations. We then extend our previous work to multiview face recognition by interpreting facial features and synthesizing realistic frontal face images when given a single novel face image. The experimental results show that after frontal pose synthesis, the recognition rate increases significantly, especially for larger rotation angles.
Furthermore, we examined how an automated face recognition system can be implemented on embedded systems. We also explored various design approaches. We currently have two prototype systems for the real-time automated face recognition. The first prototype was entirely implemented on an Analog Devices Blackfin DSP processor capable of verifying a face from a database of 16 faces under a second. This was done as a replacement for PIN identification on a NOKIA mobile phone. The second prototype was developed using a hardware-software approach on a NIOS II processor with extended instructions. The NIOS II processor was configured on an Altera FPGA.
In our continuing work, we will extend the work described in this chapter to pose angle change larger than 25º. We also will use the face models and correlation models developed in this chapter to synthesize virtual views under different lighting 
. Custom instructions design flow
Once the hardware module for the instruction is implemented and tested, it is added to the processor, and the whole system is regenerated. Then the software code is updated to make use of the new instructions. Finally, the functionality of the system is verified to ensure bugs are not introduced with the new instruction. This process is repeated until either the performance requirements or resource limits are met.
Design considerations
As discussed in previous sections, in practice, most cameras such as those used in CCTV are positioned so that capturing frontal image is not conditions, facial expressions, and poses when given a frontal face image. These synthesized virtual images can be used as training samples for face recognition algorithm, like support vector machine (SVM) or neural network. Thus, we can form a face recognition system using dual algorithms: one is adaptive principal component analysis, and the other is SVM-or neural networkbased algorithm, which would enhance the system performance. 
ENDNOtE
1
Ten to 15 minutes to recognize a face from 111 models using a computer of the day in 1996 (Wiskott & von der Malsburg, 1996) .
