This letter presents a pipelined circuit to calculate the linear regression. The proposed circuit has the advantages that it can process a continuous flow of data, it does not need memory to store the input samples, and supports variable length that can be reconfigured in run time. The circuit is efficient in area, as it consists of a small number of adders, multipliers and dividers. These features make it very suitable for real-time applications, as well as for calculating the linear regression of a large number of samples.
The Linear Regression: The linear regression [1] is used to determine the relation between a dependent variable, Y , and an independent variable, X, based on a set of N pairs of samples, (X i , Y i ), where i = 1, . . . , N . The variables are supposed to be related by a line
where ϵ i is the error of the ith sample.
The line that best fits the data is calculated by minimizing the mean square error (MSE). This provides b 0 and b 1 , which are the estimators of (a) (b) (c) β 0 y β 1 for the y-intercept and the slope of the line, respectively:
where the sums are defined for the interval i = 1, . . . , N . Finally, the mean square error of the linear regression is calculated as
As a result, (2) , (3) and (4) provide the values of the three parameters of interest, b 1 , b 0 and M SE respectively.
Proposed Architecture: The continuous-flow variable-length memoryless linear regression architecture is shown in Fig. 1 . The circuit is divided into three blocks that calculate the accumulations, main computations and divisions, respectively. The first block in Fig. 1(a) calculates the summations
This block only needs five registers, which are the only storage elements in the architecture. This is very little storage compared with the memory required in memory-based architectures, which needs to store all the input samples. These savings in memory are especially significant when calculating the linear regression on large amounts of data. The second block in Fig. 1(b) calculates the main operations
For this block the architecture admits two options: fully pipelined and timemultiplexed. The fully pipelined architecture is the direct implementation of the operations in Fig. 1(b) . The multiplication by 2 in Fig. 1(b) is carried out by the 1-bit shift represented by (<< 1). This shift is hard wired and, therefore, does not need any hardware. Furthermore, the adders and multipliers are shared for different computations. For instance, the term N C − A 2 is reused to calculate H, I and K. This reduces the number of adders and multipliers in the circuit. The time-multiplexed architecture takes into account the fact that the main operations in Fig. 1(b) must only be calculated once, just after the first stage of accumulators has processed the N input data. Therefore, the operations in Fig. 1(b) can be multiplexed in time. By doing this, only one adder and one multiplier are needed, at the expense of a few extra registers. Table 1 shows the register allocation procedure. By writing partial results sequentially in these registers, the output results can be provided in two iterations. Note that by writing to the registers in order, the second iteration only overwrites registers with values that are not needed any more. The third block in Fig. 1(c) calculates the divisions to obtain the parameters of the linear regression. The fully pipelined architecture uses the three dividers shown in Fig. 1(c) , whereas the time-multiplexed approach uses only one divider. Furthermore, in applications where the outputs of the regression are compared to a threshold, such as [5] , these dividers can be substituted by constant multipliers. This simplifies the hardware. For instance, in order to check if the error is bigger than a threshold value T H M SE , the
Note that the former requires a divider and the latter only needs a constant multiplier.
The total number of components for the fully pipelined and timemultiplexed architectures are summarized in Table 2 . As explained before, the small number of components is achieved by sharing terms in the calculations and reusing components. Furthermore, the number of components is fixed for any N . This is a significant advantage for large N , where other designs need large memories.
The latency also benefits from the proposed design. The results are provided a short time after the last sample has been collected. Contrary to memory-based approaches, this latency is independent of N . This can be observed in Fig. 1 , where the time to calculate the main operations and the divisions does not depend on N and is constant once the registers in the accumulator block have been updated with the last sample.
The circuit can be used for any length of the regression, N , and the length can be reconfigured in run time. The reason for this is that the circuit provides incremental results of the regression. Therefore, the parameters of the regression are obtained just by collecting the values at the outputs when N samples have arrived. The calculations are restarted by just resetting the registers of the accumulators in Fig. 1(a) .
Finally, the circuit can process a continuous flow of data at a rate of one sample per clock cycle, which allows for high throughput. In continuous flow, the time-multiplexed version has the limitation that the computations cannot start before those of the previous regression have been obtained. This sets a limit to the minimum number of samples of the regression. However, if N is large enough, the accumulation stage can process new samples, while stages two and three finish the calculations on the previous regression. This guarantees continuous flow. As a result, the fully pipelined architecture is more suitable when the number of samples is small, and the time-multiplexed architecture is preferable for large N , as it reduces the hardware components and guarantees continuous flow.
Conclusion:
A circuit for calculating the linear regression is proposed in this letter. The circuit supports continuous flow and variable length, which can be configured in run time. The circuit uses few hardware components and removes the need of a memory for the samples. The circuit is suitable for calculating the linear regression in real time, especially when the number of samples is large.
