An artificial neural network algorithm is implemented using a field programmable gate array hardware. One hidden layer is used in the feed-forward neural network structure in order to discriminate one class of patterns from the other class in real time. With five 8-bit input patterns, six hidden nodes, and one 8-bit output, the implemented hardware neural network makes decision on a set of input patterns in 11 clocks and the result is identical to what to expect from off-line computation.
Introduction
Artificial neural networks (NN) are widely used in pattern recognition problems in the field of particle physics experiments. Typical applications include particle recognition in the tracking system [1] , event classification problem in physics analyses [2, 3, 4] , and hardware triggers [5, 6] . For hardware triggers, realization with standard electronics [7] , with dedicated NN chip [8] , and recently with field programmable gate arrays [6] (FPGA) have been studied.
In particular, recent advances in digital technology may make it possible to consider transferring complex level 2 NN based pattern recognition tasks to level 1 trigger using FPGA technology.
In this work, a hardware implementation of a feed-forward NN using FPGA technology is developed. First, training of the NN is made in offline computing environment in order to determine weights and thresholds of the network.
And, as an intermediate step, a standalone C++ program is then written in order to discretize the NN computation, for a bit-by-bit comparison with the response from the hardware. The hardware implementation is then carried out by programming a FPGA hardware. The performance of the implemented hardware NN and possible application to the first level trigger in high energy physics experiments are discussed at the end.
Network Architecture
A feed-forward neural network feature function F i (x 1 , x 2 , ..., x N ) [9] may be represented by the following formula 2
where the weights ω ij , and thresholds θ j are parameters to be fitted to the entire input patterns {x i }. The Eq. (1) represents one hidden-layer structure where the first summation is over hidden nodes and the second is over input nodes. In Eq. (1), T is a "temperature" term that rescales the sum, N denotes the number of input nodes of the network, and g(x) is activation function of neurons. The non-linear neuron activation function of the following type is frequently used [9] g(x) = 1 1 + e −2x (2) in order to model the activation of the neuron. In this study, a feed-forward neural network with 5 input patterns, first hidden later of 6 nodes and one output layer of one node is constructed (to be referred as 5-6-1 structure from now on). In order to have a baseline NN performance, two sets of 5-variable Gaussian random numbers are generated, one referred as "signal" and the other as "background". Training of the NN is carried out using JETNET [10] program. In total, 5,000 patterns are used for the training with the cycle of 3,000. The inverse temperature term is set to 1.0, and the back-propagation learning rule [11] is used. Even if this learning rule is rather complicated, it takes less then one minute with a modern personal computer in order to train 5,000 input patterns. After the training, in total 36 weights and 7 threshold values are saved in order to calculate the NN output on given patterns. This program apparently will be a strong debugging tool when one writes the hardware description language to program the hardware.
First, the real-valued input patterns, weights, and thresholds are converted into 8-bit integer values. Here, the 8-bit is chosen so that the design of the firmware is appropriate for moderate performance FPGA chips available in the market. The standalone C++ program mentioned above reads in the integervalued input patterns and performs neural computation in purely integervalued space with pre-stored weights and thresholds. The activation function in Eq. (2) is replaced with a 8-bit wide integer lookup table for a faster access to the activation function at the hardware level that is implemented later. The NN output is also re-scaled to be bounded within [0,2 8 ] as well. However, the internal networks storing values of (1/T ω jk x k + θ j ) have the bit width of 32
and therefore little information is lost in storing values into internal networks.
By doing these conversions to integer-based calculations, the performance of the NN output is degraded because of the fact that the conversion from to 4 integer numbers is a round-off process and therefore it is natural to loose information due to such process. Figure 2 (a) and (b) show such effects in detail. The output of NN algorithm with integer-valued algorithm is shown in Fig. 2 (a) . The power of the discrimination can be compared with Fig. 1 (b) where the real-valued NN output is plotted. One can see easily that the discrimination is significantly weaker in Fig. 2 (a) . The correlation between the real-valued versus integer-valued neural computation outcome is shown in Fig. 2 (b) . There is a strong positive correlation between two, indicating the integer-valued version of the NN performs well, but there are also cases where the resolution is smeared toward background-like patterns relative to the prediction from the real-valued version. We attribute the source of the degradation at this stage is purely the effect of round-off.
In order to explicitly study the effect of the number of bits in the discretization of the computation, the integer-valued NN with 10-bit resolution is implemented. Figure. Table 1 Signal-to-background ratios for the output of JETNET (real-valued), C++ versions 
