Abstract-This paper presents an analysis on the integration of Neural Network hardware to PC and a solution to the cache coherence problem. An analysis is achieved by determining clock cycles in CPU operation compared to mixed CPU-ANN mode. Cache coherence problem is resolved by hardware-hased protocol executed on an additional cache consistency controller.
INTRODUCTION
Typical DRAM operates sequentially, which does not match with CPU speed. Attempts in applying Artificial Neural Network hardware to PC has been limited by this sequential characteristic. A few approaches has been proposed in order to relieve CPU-to-DRAM bottleneck. There are other extensive works that utilize neural networks as hardware computing unit Typical neural network computation is simulated by sequential programming because conventional sequential computers are more suitable for some aspects, such as, large and accurate data storage and logic data processing. Besides, the conventional computer standards are already developed and well defined. However, simulating neural network by using sequential programming is limited by memory bandwidth of conventional PC architecture. This paper is based on a previous work, An Eficient
Approach to Engage Neural Net Hardware to PC [l].
Such the architecture is shown in Figurel. This work is designed to solve a memory bandwidth bottleneck problem by adding ANN hardware to a multiple-bank SIMM RAM board and acts as a co-processor. The ANN hardware can simultaneously access multiple memory banks in parallel operation. However, there are two unsolved issues i.e. the model's performance analysis and cache coherence problem. The following steps are performed for performance analysis.
The test mode part of Neural Net simulator program is written in C/C* language.(No on board training) The program is compiled into an assembly code [61.
The instruction code executions are simulated in order to accumulate the number of Peotium clock cycles. The Pentium processor clock cycles between purely sequential execution and mixed sequentialparallel execution. The analysis will he conducted to obtain the speedup criteria.
The running time of the Neural Net computation program (in test mode) may he described as the following.
Tcpv is total execution and preparing time in purely sequential execution (without ANN hardware)
T is preparing time for; reading input vectors, reading weight1 vectors (for input layer and hidden layer), reading weight2 vectors (for hidden layer and output layer),
Tmmpl_sq is computation time in computationl (for input layer and hidden layer), as a sequential operation.
T, , @, , is computation time in computation2 (for hidden layer and output layer), as a sequential operation. From equation ( 1 ) and (2), we obtain.
I.

ANN. This overhead is about 12-26 clock
Four factors affects TCPU: number of input vectors, number of input nodes, number of hidden nodes and number of output nodes. 2. In mixed sequential-parallel execution (T,iJ, the execution time of the sequential part is same as that of T-. The parallel operations (TmPjM and TcOmpzgar) are affected only by the number of input vectors. 
A SOLUTION OF CACHE COHERENCE PROBLEM
RESULTS AND ANALYSIS
From Sequential part of mixed sequential-parallel execution includes preparing input vectors and weight vectors in CPU.
Number of input nodes, hidden nodes and output nodes do not affect the parallel execution part.
2. Increasing of number of input vectors affects both purely sequential execution and mixed sequentialparallel execution. It results in longer input preparation time and longer computation time. In purely sequential execution, as the number of input vectors increases, it takes relatively longer than mixed sequential-parallel execution.
The execution time in purely sequential execution is 3.3 to 24.5 times of mixed sequential-parallel execution, in case of no pipelining between ANN layers, and 4.9 to 26.8 times, in case of pipelining.
Execution time with pipelined ANN layers is less than non-pipelined execution.
3.
4.
0-7803-7278-6/02/$10.00 82002 IEEF! h 50,000 n t T c p u t T m i x -n p Tmixgipe In case that input vectors and weights are in cache memory, we can calculate the clock cycle of T, , and Table 4 shows the invalidation time in L, and Lz cache compared with Tam" in case of pipelined computation and non-pipelined computation. If number of input vectors increases, T,, also increases. If number of output addresses increase, invalidation time also increases.
CCC provides cache and DRAM consistency. The drawback of this approach is that when the cache update takes place, the CPU must hold all cache operations, a relatively short delay.
CONCLUSIONS
This paper presents a performance analysis of Neural Net Hardware integrated on PC and a solution to cache coherence problem.
The performance analysis is obtained by comparing number of Pentium clock cycles between purely sequential execution and mixed sequential-parallel execution. When we integrates ANN module to Conventional PC, various factors affect the performance of mixed CPU-ANN computation but for relatively large ANN, the parallel computation in ANN module is clearly an advantage.
Cache coherence problem is resolved by an additional cache consistency controller (CCC). This controller will notify to cache controller to invalid cache memory if the addresses are same as the address in cache memory. The main factor is the number of input vectors that affects in parallel computation and the number of output addresses is a major factor for invalidation time in cache memory.
