In this paper, we propose a CMOS-memristor hybrid circuit that can perform 4-bit multiplication for future energy-efficient computing in nano-scale digital systems. The proposed CMOS-memristor hybrid circuit is based on the parallel architecture with AND and OR planes. This parallel architecture can be very useful in improving the power-delay product of the proposed circuit compared to the conventional CMOS array multiplier. Particularly, from the SPECTRE simulation of the proposed hybrid circuit with 0.13-mm CMOS devices and memristors, this proposed multiplier is estimated to have better power-delay product by 48% compared to the conventional CMOS array multiplier. In addition to this improvement in energy efficiency, this 4-bit multiplier circuit can occupy smaller area than the conventional array multiplier, because each cross-point memristor can be made only as small as 4F 2 .
thought to be implementable simply on the cross points between two orthogonal metal lines [4] , [5] , [6] . 0  1  2  3  4  5  6  7   3  0  3  1  3  2  3  3   2  0  2  1  2  2 
(1)
The right array in Fig. 2 realizes the OR plane.
For the first column, F0, in the OR plane, F0 can be 1, when either A'B' or AC' is 1. Hence F0 can be expressed with F0=A'B'+AC', as indicated in Fig. 2. 
The proposed hybrid circuit of CMOS and
Memristor for 4-bit multiplier Power-delay product (mWXps)
Voltage supply (V)
The CMOS array multiplier The dynamic memristor multiplier terms of the power consumption, the power-delay product, and the delay time. In Fig. 5 (a) , the multiplied input vectors of A x B are changed between 0000
x 0000 and 1111x1111.
In the simulation, the operating frequency is fixed by 100
MHz and the supply voltage is varied from 0.8V to 1.6V. The proposed circuit consumes larger power by 15% on average compared to the conventional CMOS array multiplier.
However, the proposed circuit shows shorter delay time compared to the conventional array multiplier, as shown in Fig. 5(b) .
The shorter delay time of the proposed circuit is mainly caused by the shorter signal path that is composed of only AND plane and OR plane in the proposed circuit in Figure 3 (a). On the contrary, the convention CMOS array multiplier has much longer signal path that includes many logic gates and many 1-bit adders to deliver the carry signal calculated by the present-stage adder to the following stage. Fig. 5(c) 
