The proposed research aims at inexpensive, specialized firmware solutions to smart algorithms designed to assist remote surveillance using a group of mini-UAVs. We identify an application problem, blind sources separation for sub-pixel target identification, that need to be resolved to compensate for either hardware inefficiencies or to reduce communication/energy consumption such that the deployment of these mini-UAVs could be feasible and affordable.
I. Introduction
T he US Marine Corps is in need of inexpensive, unmanned, over-the-horizon targeting, surveillance, and reconnaissance. The recently developed mini-UAVs, e.g. Silver Fox and ScanEagle, serve this need very well. These devices have a comparatively low price which makes them highly expendable. In addition, a group of these devices may be deployed to achieve swarm intelligence, which has been well demonstrated in the insect society. In order to keep the price low, these devices normally use smart algorithms to compensate for hardware inefficiencies and to reduce redundancies in data transmissions to save both communication and energy cost. Szu et al. explored smart real-time software which performs blind sources separation at the sub-pixel level.
We choose to design firmware solutions to single pixel blind sources separation (BSS) using smart Lagrange Constraint Neural Network (LCNN) algorithm. Independent Component Analysis (ICA) has been one of the popular methods implementing BSS. It uses the all-order statistics to factorize the joint-density pdf into independent sources that can reveal the hidden distribution over a new subspace, the feature space. The LCNN methods avoid pixel average, therefore making sub-pixel restoration feasible. The algorithms are based on the MaxEnt and thermodynamics principles, which do not require the sources to be statistically independent. More importantly, the pixel-by-pixel processing allows easier firmware implementation. It is capable of solving a couple of existing data processing problems onboard mini-UAVs, including video mosaicking, dimensionality reduction of multi-/hyper-spectral images, jitter restoration, etc.
Our research focuses on the development of inexpensive, low-power consumption, specialized firmware to these smart algorithms. Although generalist firmware offers more functionalities than necessary, they deliver the capability for a higher price and a substantially higher energy consumption. We envision the future firmware implementation would be more specialized. In addition, due to the convergence rates and the large volume data set, most smart algorithms are slow processes. Software parallelism can help some, but only to a limited extent. Firmware implementation provides not only an optimal parallelism environment, but also potential faster and real-time solutions.
II. Pixel-Variant Blind Sources Separation
Information theory provides a constructive criterion for determining probability distributions based on partial knowledge, which leads to a type of statistical inference called maximum entropy (MaxEnt). Jaynes 1 proposed that the MaxEnt distribution has one important property that no possibility is ignored. It is the least biased estimation based on the given partial information. Thermodynamics theory states that an equilibrium system possesses a minimum Helmholtz free energy
where E is the internal energy, T denotes the temperature, H represents the final entropy of the system, and the thermodynamic entropy S is the same as the information theory entropy of the probability distribution except for a Boltzmann constant. 1 Based on the MaxEnt and thermodynamics theories, Szu et. al derived a series of methodologies and philosophies called Lagrange Constrained Neural Network (LCNN) [3] [4] [5] [6] [7] to solve the space variant imaging problem. Different from the ICA algorithms based on the a-posteriori MaxEnt (H(y)), the LCNN algorithms are based on the a-priori MaxEnt (H(s)). Eq. 1 forms the basis of contrast function for all the optimization problems of LCNN.
Given x = As, where x is the observed signal, each column in A represents a material signature, and s is the abundance source, or the mixing vector. The Shannon entropy is specified by
The contrast function defined in the a priori MaxEnt methodology for BSS problem can be regarded as a constraint entropy function with Lagrangian constraint multipliers:
where double recursions are required to estimate the two unknowns s and A. After initializing the Lagrangian multipliers λ 
a ij s j (4) and then updates λ
The second recursion updates the reflectance matrix according to
It can be seen that the unknown source signature matrix A is estimated with rank-1 outer product, since s = A −1 x provides that the unique inverse exists, which requires a full rank A. In, 7 Szu et al. suggested a full rank Hebbian learning rule based on the Riemannian metric.
This learning rule is similar to the ICA algorithm in Eq. ?? except that there is no need to calculate the post-processing weight matrix W, thus no pixel average is needed. Furthermore, the nonlinear function ϕ() is replaced by the Lagrange multiplier λ. They further proposed a singular value decomposition (SVD) based update of the Lagrange multipliers. The change of the Lagrange multipliers can be determined by the inverse of the Jacobian-like matrix ∆λ = J −1 ∆x (8) where
To de-couple the system, SVD is applied to matrix J such that
whereJ is a diagonal matrix, and E is a matrix with each column being an eigenvector of matrix J. Then the update of the Lagrange multiplier is given by
The advantage of developing pixel-based processing is its great potential in parallel implementation, which would compensate the heavy computation burden associated with any restoration algorithms, especially with BSS. However, the challenges of pixel-based processing, including limited a-priori information, not using neighborhood relations, unstable convergence, etc., also present high degree of difficulties in algorithm design and development.
III. The Design of a Virtual Integration Platform for Smart Algorithms
In VLSI circuit and system domain, virtual design platforms are becoming more and more popular, resulting from the market demands of fast design and implementation. We classify existing virtual platforms into two categories, the embedded system oriented platforms and the VLSI system oriented platforms. The objectives of the embedded system platform [8] [9] [10] are mostly to provide flexible hardware/software (HW/SW) co-design environments that dynamically partition the system architecture. The VLSI system oriented platforms, [11] [12] [13] [14] on the other hand, set up such re-use based design environments that system designs targeting at FPGA or ASIC based system-on-a-chip (SoC) can be developed and verified in a very efficient manner, therefore reducing the time-to-market. The virtual platforms in this category can be further divided into behavior, register transfer level (RTL), 15 and system levels [16] [17] [18] [19] according to different abstractions in the synthesis procedure.
The advantages of virtual platform in design efficiency, flexibility, and re-use are obvious. We have proposed a VLSI system oriented virtual platform for smart algorithms that target at SoC platforms with embedded FPGA. The concept of virtual SoC platform is to provide a design environment that a set of pre-qualified IPs can be selected with pre-defined metrics, integrated with predefined design architecture. This virtual platform plays an important role in VLSI design methodology, addressing earlier validation of various applications in both regular and faulty conditions.
A. A Signal and Image Processing Virtual Platform
As the foundation of the virtual platform, the design reuse paradigm bridges the gap between design productivity and manufacturing capacity, leading to shorter time to market, higher performance/price ratio, and higher quality. It is also the key to design, improvement, and testing of the SoC implementations. Based on these advantages, the virtual platform supports easy plug and play of IP models in a seamless way to the user.
The virtual platform is composed of three sections: the sensing section, the digital processing section, and the communication section, as demonstrated in Fig. 1 . Every section is expandable and contains various VCs. All VCs are synthesized and optimized using CAD tools, and ready to compose the whole design. The sensing section handles variety of sensing modalities and analog-digital conversion (ADC) for the computing section. The virtual platform architecture brings a challenge for SoC design integration where analog and digital circuits coexist on a common substrate with the actual sensing platform.
The communication section includes network and MAC layer protocol designs, which are responsible for sending and receiving signals through RF circuits after digital-analog conversion (DAC).
The computing section that mainly consists of cores and IP library provide multiple pre-processing and image understanding blocks. From previous experience, the IP design of most applications on SoC is subject to a lot of constraints originating from application requirements and architecture limitations. For such application oriented platforms, a promising approach is to drive the IP design at the algorithm level with the system integration constraints. At the same time, both temporal and functional application constraints and system I/O are also taken into account. Hence, VCs within the IP library are evaluated in terms of extensive performance parameters including power consumption, circuit delay, frequency, and core area. The user is able to select appropriate blocks, such as A and B, for specific applications regarding to application constraints including accuracy requirement, processing speed, expenses, etc. In addition, the virtual platform allows users to add their own processing blocks.
The advantages of virtual platform include:
• enabling the integration and simulation of new functional IP blocks in the existing environment.
• allowing these IP blocks to be tested and evaluated on different SoC architectures.
• admitting design re-use at any level of abstraction.
At any time a virtual platform could easily be tested on FPGAs for prototyping, and then converted to custom ASICs.
The virtual platform design methodology relies upon the IP library of smart algorithms that models the most important features of the commonly used VCs. Obviously, it is impossible to prepare all kinds of design properties with possible functional and structural combinations in advance. The virtual platform has to be limited to a range of applications that share common features and requirements, such that the design cycle, quality and cost are predictable and comparable. In this work, we target at a high-speed, low-energy signal & image processing IP library, in which pre-processing VC of polynomial approximation based geometric correction, and processing VC of ICA have been presented in the first quarterly report.
B. FPGA Prototyping Platforms and System-On-Chip (SoC) Implentation
As the preliminary designs of smart algorithms come in the form of hardware description languages, IC for FPGA and SoC platform comes in the form of a chip. However, the real hardware platform is not available until late in the design process.
In order to achieve fast validation and test on hardware at low cost, FPGA prototyping is the first considerable step in the physical implementation procedure. Although many applications require platforms with reconfigurable hardware that direct to the use of FPGAs, FPGAs are energy consuming devices, not suitable to be backbones of the battery supplied mini-UAVs. Hence, FPGA is mainly and only used as prototyping platform before SoC fabrication.
The application requires the computational power being the driving force to the development of FPGA/ASIC technologies. Current phenomenal growth of FPGA/ASIC technologies have been far beyond Moore's law prediction of doubling in the number of transistors per integrated circuit every 18 months.
With the fast development of physical, material, and electronics technologies, current FPGAs have provided end users sufficient capacity and powerful computing capability. Taking the pilchard reconfigurable computing platform as an example, this platform is plugged in memory slot on a SUN workstation, as shown in Fig. 2 . The pilchard platform contains a Xilinx VIRTEX V1000E FPGA that contains 1M equivalent gates for programming. In general, FPGA platforms use PCI or PCMCIA slots to exchange data with memory and communicate with CPU. However, the data transfer speed can be extremely slow for applications with large data sets. The pilchard platform uses the DIMM RAM slot as an interface and is compatible with the 168 pin, 3.3V, 133MHz, 72-bit, registered synchronous DRAM in-line memory modules (SDRAM DIMMs) PC133 standard, 20 thereby achieving very high data transfer rate. Compared to the pilchard platform, the more powerful FPGA of Xilinx Virtex II Pro is embedded with two PowerPC cores, and manufactured based on the technology of 0.15 µm 8-Layer Metal Process with 0.12 µm high speed transistors. It allows users to implement designs on 8M logic gates with 420 MHz internal clock speed and 840Mb/s I/O.
21 Figure 3 shows a Xilinx Virtex II Pro based FPGA platform which is in compliance with the PCI-X 133MHz, PCI 66MHz and 33 MHz standards. After the FPGA prototyping validation, we can finalize the design with the SoC fabrication for the physical circuit on-board of mini-UAVs. Table 1 summarizes and predicts the trend of change in SoC design productivity. It shows that the change of manpower on SoC design depends on the improvement of reuse overhead and the efficiency of design methodology. 
IV. Experiments and Results
We use the remote sensing data set (92AV3C9) obtained from a 1992 AVIRIS data set to show the effectiveness of the MaxEnt blind sources separation algorithm. The original data set contains 220 spectral bands, and each band is approximately 10nm wide centered at appropriate wavelength. The 92AV3C9 data consist of 9 spectral bands, selected from this original hyperspectral data as shown in Fig. 4 , with the ground truth illustrated in Fig. 5 . Note that there exist different classes of vegetable in this test set, some of which possess very similar features to others. This makes the classification problem more complicated. Moreover, since the total number of classes (16) is larger than the number of spectral bands (9), there must be some classes that are misclassified. The recovered source images are shown in Fig. 6 . We observe that spectral band 1 corresponds to the corn class. The woods class is highlighted in bands 2 and 3. Spectral bands 4, 5, 6, and 8 are very similar, which cover the classes soybeans-notill, soybeans-min and soybeans-clean. The stone-steel towers class is shown in spectral band 7. Note that this band also covers part of the background class. We expect the existence of some manmade objects in the same areas.
V. Discussion
Besides smart algorithm development for sub-pixel target detection which concerns the performance of individual mini-UAV system, another issue which is equally important is the robust surveillance problem using a network of heterogeneous mini-UAVs. The main concern is how to make these UAVs self-position themselves to form a so-called mosaic pattern, such that the network can optimally or sub-optimally interact with the environment from the perspectives of providing better sensing coverage, comprehensive and faulttolerant information, and longer network lifetime. Algorithms inspired by biological systems, in specific, the formation of the human and animal color visual systems, are under study. 
