The Kalmanfilter is a set ofmathematical equations that provides an efficient computational (recursive) mean to estimate the state of a process, in a way that minimizes the mean of the squared error. This filter is very powerful in several aspects. it provides estimations ofpast, present, and future states, and it can do so when the precise nature of the modeled system is unknown, and even with the presence of measurement and process noise. Moreover, Kalman filter for linear estimate is the most complex and precise algorithm used for target tracking. However, using Kalman filter algorithms in software for multitarget tracking (MTT) radar system would result in a very long computational time which may not be suitable for today's warfare constraints, or real-time processing. Consequently, a hardware alternative has to be developed which may result in big area overhead which is not suitable for today's area constraints such as sensor nodes in a sensor network. In this paper, we break the arrays into their scalar forms, and develop fully-pipelined hardware architecture for the radar tracking Kalman filter, with time division multiplex blocks to decrease the silicon area.. The proposed architecture contains 6 multipliers, 2 dividers, 9 adders, 5 subtractors, one control unit, and some registers and multiplexers for pipeline and control.
Introduction
Several methods have been applied to handle the radar target tracking in the literature. An overview of these methods can be found in [1] - [5] . Recent research has focused on some applications of the Kalman filter in sensor networks, like location algorithms, extended Kalman for energy efficient wireless sensor networks, and data fusion in sensor networks [6] - [8] . Kalman filter has presented the most complex and precise algorithm to approach the radar tracking problem. However, available computational resources are not enough when applying such an accurate and complex algorithm that would require high speed and a high degree of parallelism. MTT systems should be able to track many targets and perform complex computations to meet today's warfare needs. Software solutions are still not fast enough for this kind of computation despite the huge progress in computer and software research. A trivial solution to this limitation is to use simpler algorithms which the software approaches can handle. Consequently, the accuracy of the tracking is much lower and more track losses are obtained. Therefore, a dedicated hardware solution should be designed to meet the specifications of the system concerning the performance and the computational speed. Therefore, the authors in [9] developed a fullyhardware-type maximum parallel FPGA-based Kaman tracking filtering coprocessor in a track-while-scan (TWS) radar system. They showed that their implementation outperforms other implementations in terms of performance. However, this came at a cost. Many hardware components had to be deployed into the architecture, resulting in a more complex, area-and power-consuming system. Moreover, it resulted in idle time for certain blocks, and useless power consumption. This was not a problem in the past. However, nowadays, optimizing the speed, area and power is a major design issue. For instance, to implement a Kalman filter in sensor networks, the designer should take care of the speed, area, and power constraints imposed by the sensor architecture. Therefore, we developed an architecture that achieves better utilization of the hardware components without affecting the computational speed. In other words, some of the blocks are used in a time division multiplex fashion, thus decreasing the required number of these blocks to achieve the same computational task. Moreover, pipelining the design would result in better utilization of the data path and increased computational speed.
The remainder of this paper is organized as follows: Section 2 describes the Kalman filter operation and [9] , which will be also examined in our work, is a kalman tracking filter for TWS ( Track -while -scan )
Xk-=AXkl+Bkl 
Where T is the inverse of radar antenna scan rate, and states X1(k) = ' (k) is for the aircraft range, X2(k) P (k) is for the radial velocity, X3(k) 0 (k) is for the aircraft bearing, X4(k) = d (k) is for the bearing rate (or angular velocity). Y(k) is the output of the system. Process noise terms are U1(k) which represents the change in radial velocity over interval T and U2(k) which represents the change in bearing rate over interval T. The measurement noise terms are VI(k) and V2(k). They are all assumed to be white Gaussian noise [2] . Applying the Kalman filter algorithm to the above would result in:
PI(k/k-I) is the priori error covariance estimate, 
Matlab implementation
To observe the functionality of the Kalman filter on the TWS example, we developed a Matlab code to apply the Kalman filter algorithm to the case described in section 2.2. More specifically, we applied the algorithm to estimate the bearing of a missile following a semi circle path with radius equal to 250. The measurement noise is assumed to be white Gaussian noise with 15dbW power. 1/T is set to 1/16, a 2(k) and 2 (k) are set to 300 and 1.3 x 10-8 respectively, and the number of samples is 500. Figure  3 illustrates the noisy measurements and their corresponding estimate provided by the Kalman filter. The error percentage between the expected path and the estimated path is approximately 1.02 00.
Filter architecture
In [9] , a fully hardware type Kalman filter was implemented by breaking down equations (1) deploying it on chip, for instance, on a sensor network node which is very constrained in area and power!
Proposed architecture
For the reasons discussed in the previous section, the motivation in this work, is to utilize the blocks of the circuit in a time division multiplex fashion, along with pipelining the data path, in order to (1) decrease the number of hardware components of the circuit, (2) achieve more utilization of the blocks, and (3) increase the speed of the filter. In other words, the design objective ofthis paper is to design the Kalman filer in a minimum area while satisfying the real time target tracking problem with acceptable accuracy.
Architecture overview
Like every new proposed architecture, decisions and compromises have to be taken. First, a compromise had to be taken between area and speed when choosing the structure of the complex blocks such as the multiplier and the divider. More specifically, for the divider, the combinational structure (i.e. array divider [11] ) is used because of its modularity and speed, and the non-restoring algorithm is applied with carry lookahead addition/subtraction. Although the carry lookahead circuit occupies large area, it achieves very fast addition versus the other adder circuits. This speed is needed due to the fact that the divider is combinational and each row has to wait for the carry out of the previous row. That's why providing this carry out as soon as possible for each row of the architecture is a design concern. Despite this area overhead, the compromise seems to be satisfying due to the fact that the number of dividers in the circuit will be decreased. So that, instead of using 4 dividers that used ripple carry addition, we used only 2 dividers that are a little larger in area but achieve very high speed compared with the ripple carry. Regarding the multiplier, the combinational structure (i.e. array multiplier [12] ) modified booth algorithm has been applied for the same reasons just discussed. Carry save addition was found to be enough to cope with the divider speed, reducing area overhead compared with carry-look ahead. Moreover, an additional area enhancement is achieved by removing the constant dividers (i.e. the /T step), because in most applications, the value of T can be 8 or 16 . For this reason, a simple shift is enough. The shifting process could be done using barrel shifters; however, they were avoided by providing the inputs shifted to the block, which saves more area. Second, in order to increase the utilization of the complex blocks such as the divider and the multiplier, pipelining the internal structure of each one of them, thus achieving more throughput and utilization, seems to be a good choice. For instance, suppose that the divider takes T ns to complete one division. Now, with pipelining into 2 stages, it can achieve 2 divisions in approximately T ns. Again, a compromise had to be taken between the number of internal stages and the recursive nature of the Kalman filter. As shown in the Kalman algorithm equations, in order to process new data, the system has to wait for the algorithm to finish processing the previous iteration, because its result will be used to calculate the new data set. Consequently, no more than 2 data sets are available for the divider at time t. That's why 2 stages appears to be a good choice, also satisfying area constraints. Another reason for this internal pipeline is to make all the stages of the architecture equal in latency. Pipelining the whole data path was a major renovation of the design proposed in [9] . By pipelining, we divided the architecture into equal-latency stages, thus increasing the frequency of the system. The proposed architecture is discussed in the next section.
Proposed architecture block diagrams
After clarifying the new ideas proposed in this paper, along with the decisions and compromises taken into consideration, we built the architecture for the Target tracking Kalman filter co-processor using VHDL to test its functionality. The block diagram responsible of calculating P11, P12, P21, P22, P33, P34, P43, P44, GIl, G21, G32, and G42 is shown in figure 3 . The shaded blocks represent a pipelined divider or multiplier. In the first iteration, initial values have to be provided to system depending on the application. Note that the output is fed back to the input due to the recursive algorithm.
G 1 or G32
Initial P 3(k1), P4(k1), P 3(k1), P 4(ki) FeedbackBis P 3(k), P 4(k), P 3(k), P 4(k) figure 4 . The adders used are ripple carry adders because they occupy small area. Despite the fact that they are slow, this is not important because the divider or the multiplier stages are slower. Therefore, the ripple carry adders speed is enough while satisfying our area minimization goal.
The second block diagram of the filter is responsible of estimating the states of the system, given the noisy measurements and data sets from figure 3. Figure 5 depicts the details. A control unit responsible of controlling the data flow in the data path is designed, and implemented in VHDL. The proposed architecture contains 6 multipliers, 2 dividers, 9 adders, 5 subtractors, and 4 shifters, which is less by exactly half than the architecture proposed in [9] . Simulation results and conclusions are presented in the next section.
Or S 22 Fig. 4 Adder circuit
Results
The first step in testing the proposed architecture was compiling the design using the Xilinx ISE 7.1 and choosing the vertex XCV 800-hq240 family. We simulated the design using 12-bit integer numbers. 9 adders, 5 [9] . A comparison table between [9] and our work is provided below. figure 6 . The error between the exact expected circle and the hardware results is approximately 4.9 00. This error can be further reduced by increasing the width of the bus, which is straight forward. The second example is a straight line path. The results are shown in figure 7 . Again, the loss in accuracy was found to approximately 4 .90 compared with the exact straight line path.
made feasible in the future. Moreover, future work may include a power analysis for the proposed design to proof that it is more power efficient than other implementations. However, this is beyond the scope of this paper.
6. Acknowledgment 
Conclusion
Kalman filter is a powerful tool to track moving targets. Specifically, MTT systems, like TWS radar operations, require this type of filter to meet speed and accuracy warfare constraints. Several approaches were implemented to achieve a better performance for the Kalman operation. In [9] , a hardware parallel architecture was proposed and implemented in FPGA. However, many hardware components had to be deployed in the circuit which is not suitable for today's constraints regarding area, especially if the Kalman filter needs to be placed on-chip, for example, on a sensor network node. Therefore, a new architecture that applied pipelining techniques and used some components in a time division fashion in order to reduce their number as much as possible, while enhancing the performance of the system was implemented. This was achieved using 9 adders, 5 subtractors, 4 shifters, 2 dividers, and 6 multipliers. The worst computational time was found to be 293 ns versus 1.8247 gs achieved by [9] . The error between the expected path and the hardware generated path was found to be 4.9 %O. On the other hand, the error between the Matlab generated path using the floating representation and the hardware integer mapped path was found to be 8.3%. Regarding the area and speed achieved by our implementation, this Kalman filter can be applied to tracking operations performing a very accurate estimate in short time and using less area. Hence, deploying the filter in sensor network can be
