Abstract. Hierarchical convolutional neural networks are a well-known robust image-recognition model. In order to apply this model to robot vision or various intelligent vision systems, its VLSI implementation with high performance and low power consumption is required. This paper proposes a convolutional network VLSI architecture using a hybrid approach composed of pulse-width modulation (PWM) and digital circuits. We call this approach merged/mixed analog-digital architecture. The VLSI includes PWM neuron circuits, PWM/digital converters, digital adder-subtracters, and digital memory. We have designed and fabricated a VLSI chip by using a 0.35 m CMOS process. The VLSI chip can perform 6-bit precision convolution calculations for an image of 100¢100 pixels with a receptive field area of up to 20¢20 pixels within 5 ms, which means a performance of 2 GOPS. Power consumption of PWM neuron circuits is estimated to be 20 mW. We have verified successful operations using a fabricated VLSI chip.
Introduction
For object detection or recognition from natural images, processing models for extracting image features should tolerate pattern deformations and pattern position shifts. Convolutional neural networks with a hierarchical structure, which imitate the vision nerve system in the brain, have such functions [1] [2] [3] .
The operations required for implementing convolutional networks are multiplication by weights and nonlinear conversion, as usual neural network models. Because they require huge computational power, to execute these operations in real-time and with low power consumption for intelligent applications such as robot vision, efficient VLSI implementation is required. Various neural network VLSIs have actively been developed, and an analog VLSI processor suitable for convolutional networks was also reported [4] .
On the other hand, we have already proposed a new circuit architecture, which is based on a pulse-width modulation (PWM) approach merging analog and digital approaches [5] . This architecture has various advantages of both approaches, especially it achieves low power consumption, and it is suitable for implementing neural networks.
In this paper, by combining this merged analog-digital architecture with the digital approach, we propose a convolutional network VLSI architecture that consists of PWM neuron circuits and digital memory. We also present the measurement results of a VLSI chip fabricated using a 0.35 m CMOS process. Figure 1 shows the principle of pattern detection using a convolutional network. The first layer of the hierarchical structure only receives images. The following layers consist of two sub-layers: a feature detection (FD) layer and a feature pooling (FP) layer. Each layer includes some feature classes, each of which has neurons that react the same image feature. The neurons are arranged in a 2-D array to maintain the feature position of the input image. Therefore, the feature class pixel size is equal to the input image pixel size, and each neuron corresponds to each pixel. All neurons are connected to the neurons in a predefined area near the same position of the previous layer, which is called a receptive field. The FP neurons are used to achieve recognition tolerant to pattern deformation and position shifts. The FD neurons operate for integrating a feature. By the hierarchically repetitive structure, local simple features (e.g., line segments) of the input image are gradually assembled into complex features.
Hierarchical Convolutional Network Model
Operations between layers are considered as a convolution because all neurons belonging to a feature class have a receptive field with the same weight distribution. The receptive field of the FP neuron is on the same feature class of the previous FD layer. All neurons of the FP layer have the same positive weight distribution, in which the weight is largest in the center of the receptive field and it decreases as the position is apart from the center. The shifts of feature positions in the FD layers are tolerated in the FP layers by this weight distribution. On the other hand, the receptive fields of the FD neurons are on all feature classes of the previous FP layer. The weights of the FD neurons are obtained by training.
Convolutional Network VLSI Architecture
We propose a VLSI architecture that implements the hierarchical convolutional networks. Because the number of processing circuits integrated in a chip is restricted, it is difficult to realize all connections of the hierarchical network by real processing circuits. Therefore, in our architecture, neuron circuits are repetitively used by time-sharing operation.
Time-sharing operation in the convolutional network is shown in Fig. 2 . The feature class size and the receptive field size are assumed AE ¢AE and Ñ¢Ñ pixels, respectively.
The outputs of AE neurons belonging to one column of a feature class are inputted to The block diagram of our convolutional network circuit is shown Fig. 3 . By utilizing the advantage of small circuit size in the PWM approach, Ñ-input PWM neuron circuits are integrated. To achieve time-sharing operation, the partial accumulation results of neuron operation are temporarily held in the neuron circuit. These partial results are accumulated and stored in an SRAM through the PWM/digital converter (WDC) and the digital adder-subtracter (DAS Although we assumed that the number of inputs of the neuron circuits is Ñ ¢ Ñ, convolution with a smaller receptive field size can be calculated by setting the extra inputs at zero. Convolution with a larger receptive field size can also be calculated by time-sharing operation.
PWM Neuron Circuit

Connection Model
In the general feedforward networks, internal state Ù and output Ó of postsynaptic neuron are given by the following equations, respectively;
where Û is the connection weight from presynaptic neuron to postsynaptic neuron , and is the nonlinear conversion function.
In the conventional model, the synapse part multiplies Ó by Û and the neuron (soma) part executes summation and nonlinear conversion ´Ù µ. From eqs. (1) and (2) However, in our circuit model, both nonlinear conversion and multiplication are performed by the synapse part, and the neuron part executes summation and outputs the internal state, as shown in Fig. 4 . Thus, from eqs. (1) and (2), the internal state which is the output of neuron is given by
Equations (3) and (4) are the equivalent operations in hierarchical networks.
Circuit Design
Our PWM neuron circuit is shown in Fig. 5 . Its operation is as follows: (1 
Experimental Results Using a Fabricated VLSI Chip
We fabricated a convolutional network VLSI by using a 0.35 m CMOS process. Since we defined the operation cycle time as 1.6 s, the whole convolution operation requires about 5 ms. This chip achieves an operation performance of 2 GOPS 5 by parallel operations for 81´ AE Ñ·½µneurons and 1620´ ´AE Ñ·½µÑµ synapses.
We have estimated a power consumption of PWM neuron circuits to be 20 mW although the digital circuit block consumes 190 mW.
We have verified that all circuit components operate successfully. The whole operation for a convolutional network has also been verified. 
Conclusion
We proposed a merged/mixed analog-digital VLSI architecture for convolutional neural networks using PWM and digital circuit techniques.
A neuron circuit with 20 synapses was designed. Nonlinear conversion and multiplications by connection weights are realized by two MOSFETs, thus a very small layout area and low power consumption of the synapse part were achieved. Since the connections between layers have the same weight distribution, hierarchical networks can be constructed by feedback and time-sharing operations using the convolutional network VLSI.
We designed and fabricated a convolutional network VLSI with an operation performance of 2 GOPS, and verified successful operations of all circuit components.
