ABSTRACT Delay caused by carry propagation is an unfavourable issue that much affects the efficiency of numerical computing in computer system. And the more data bits of the operands, the worse the delay. As data bits of the operands in optical computer can be huge because of the high information capability of light, this makes carry delay be very serious. As multiplication is one of the most important and popular calculations and many multiplications can be carried out in parallel by utilizing huge number of data bits of optical computer, study on how to reduce or even mitigate carry delay in multiplication is very important. To improve the multiplication efficiency, a new carry free method to design and implement multiplication based on Ternary Optical Computer (TOC) is put forward. Fully considering the relations between the operands based on modified signed-digit (MSD) number system, a parallel mulitplication implemeation strategy together with its resource allocation method is presented. The experimental results show that the proposed strategy is correct. It exerts the advantages of optical computing and ensures that no carry delay is introduced in the process of multiplication which much improves the efficiency of optical multiplication applications.
I. INTRODUCTION
Multiplication is one of the most popular mathematic operations. And it is the basis of other numerical applications, such as discrete fourier transform (DFT), vector-matrix multiplication, matrix-matrix multiplication and convolution and so on [1] , [2] . It's efficiency has much influence on the applications based on numerical calculation. Therefore, in times of electronic computing, many studies have focused on multipliers. There are several typical multipliers: shiftadd multiplier, booth multiplier, modified-booth multiplier, adder-array multiplier and Wallace-tree multiplier [3] , [4] . Though these studies are quite different, they are all focused on the improvement of the efficiency of multipliers. Generally, these studies can be divided into two categories: one is to reduce the number of partial products and the other is to increase the parallelism of the sum of partial products. For example, the studies on booth multiplier and modifiedbooth multiplier belong to the former. And the studies of
The associate editor coordinating the review of this manuscript and approving it for publication was Sukhdev Roy.
adder-array multiplier and Wallace-tree multiplier belong to the later. All these studies are interesting and can somehow improve the efficiency of multiplication. However, they can't avoid serial carry in the process of summing the partial products [5] , [6] . With the increasing of data bits of operands in multiplication especially those in optical computing, carry delay will become very serious or even unacceptable [7] - [9] .
As its high information capability, low energy consumption and so on, optical computing is very promising and has been widely concerned on [10] - [17] . For high information capability of light ensures that optical computer may process data with huge number of data bits. Using this characteristic, multiple multiplications can be carried out in parallel on optical computers which can only be implemented with multiple cores, many cores or nodes on elecronic computers.
To make full use of its advantages, many scholars have set their focuses on optical computing. And different aspects have been widely studied, which include the principles of optical computing, architecture of computers and components, designs and implementations of optical computers, key components, circuits and so on [15] - [23] . Among these studies, study of Ternary Optical Computer (TOC) [20] - [23] is a typical research. Using the polarization and intensity of light to express information, TOC is a typical optoelectronic hybrid computer [14] . Specifically, it uses two orthogonal polarization states together with dark state of light to express ternary information. Since the concept of TOC was proposed, many theories have been put forward, many key components have be designed and different generation of typical experiment systems have been constructed [20] , [23] , [24] . Based on these, and many applications have been studied on TOC to utilize the advantages of optical computing [9] , [20] - [22] , [25] . Among these applications, addition is one of the most typical and successful one. As the adddition put foward and implemented based on TOC is a carry-free one which implements addition in three steps with five logic transform no matter how many bits the operands are. That is, there is no carry delay in the process of addition operation. This makes the carry-free addition fully can utilize the advantages of optical computing.
To further utilize the advantages of opitcal computing in more numerical computing applications, a carry-free multiplication is studied and implemented to try to verify the feasibility and potential advantages of multiplication on TOC. The multiplication implementation scheme on TOC is presented, which includes allocation of processor bits for operators, operator reconfiguration rule, data encoding, data decoding, data feedback, adjustment of data bits and so on. Besides, pipeline mechanism is presented by full use the advantages of TOC which can own large number of processor bits and each processor bit can be reconfigured and allocated independently. Parallel schema adopted in the process of implementation of multiplication application can much improve the application efficiency. The implementation of multiplication application verifies the correctness and high efficiency of optical application on TOC.
II. RELATED WORK A. MODIFIED SIGNED-DIGIT NUMBER SYSTEM
In 1961, Avizienis proposed modified signed-digit (MSD) number system which represents three-valued binary number system with signed bits [26] . Using this number system, any decimal number A can be expressed in the form of MSD.
where the value set of a i is {u, 0, 1} where u represents -1. 2 i shows MSD number system is still a binary number system. Generally, the MSD number system has the following characteristics.
(1) The opposite of a MSD number can be obtained by applying NOT operation to each bit of the MSD number.
(2) If the highest bit of a MSD number is 1, it is positive. Otherwise if the highest bit is u, it is negative.
(3) The decimal number 0 has unique representation. (4) MSD is a binary number system with redundancy. It means that a decimal number can be expressed with different forms of MSD number. TOC applies three physical states of light to express ternary information. And it adopts MSD number system in the design to avoid carry delay in the process of addition [5] . The implementation of multiplication application is based on MSD number system.
B. MSD ADDITION
Previous studies show that carry-free addition can be implemented with five three-valued logical transformations in MSD number system. The five logical transformations are actually four kinds (T (T2), W, T', W') which are given in Table 1 [27] . In Table 1 , a and b is one-bit MSD number respectively. And the logical transformation T2 is the same to that of T. According to the relations of the transformations, an addition can be accomplished with three steps in parallel. That is transformations of T and W can be implemented in parallel, and T', W' can be implemented at the same time. Specifically, the three steps are as follows.
(1) Apply transformation T and W to the augend and addend bit by bit respectively. Append one 0 to the tail of the result of transformation T and add one 0 in front of the result of transformation W.
(2) Apply transformation T' and W' separately to the result of transformations T and W bit by bit. Append one 0 to the tail of the result of transformation T' and one 0 in front of the result of transformation W'.
(3) Apply transformation T2 which is the same to transformation T to the results of transformations T' and W' bit by bit. The result is the sum of the addition.
C. ARCHITECTURE OF TOC
In the process of study of optical computing, scholars found that the information can be expressed by combining intensity and the polarization of light. Based on this, Jin Y et al. put forward the principle and architecture of TOC [14] , which uses two orthogonal polarizations or horizontal polarization and vertical polarization of light and dark state to express ternary information. The architecture of TOC is shown in Figure 1 . As Figure 1 shows, S represents light source with which the main light of the operations is encoded. E is encoder, which is composed of two liquid crystal (LCD) arrays and a polarizer. It answers for converting electronic signals into optical signals. O means optical processor, which is composed of a LCD array and two polarizers. It accomplishes the optical computation. The optical processor of the TOC is divided into four parts, named HH, VH, HV and VV respectively, according to the directions of polarizers. D represents decoder. It decodes the result of the optical processor into electronic signals and stores them in the electronic memory. Monitoring system is to monitor and verify the data of encoder, optical processor and decoder. For more details about TOC please refer to literature [14] , [28] .
III. MULTIPLICATION APPLICATION ON TOC
TOC is an optical digital computing system with many advantages. For instance, logical operator can be reconfigured, data bits of the processor can be huge and easily extended. It supports carry-free MSD addition. Based on characteristics of TOC, parallel implementation schema of multiplication application and analysis of the implementation process of multiplication application on TOC are presented in this section. It includes the allocation of processor bits, reconfiguration of operator, data encoding, data decoding, data feedback, adjustment of data bits and count of operation and so on. 
A. IMPLEMENTATION PRINCIPLE OF MULTIPLICATION APPLICATION
MSD multiplication is implemented with transformation M and MSD addition. The truth table of transformation M is as Table 2 shows. Assume the number of bits for multiplicand A and multiplier B is n and m respectively, that is, A = a n−1 a n−2 . . . Step 1: Generate B j , j ∈ (1, m − 1) by copying the corresponding bit j of multiplier B b j n times. And the length of B j is identical to that of multiplicand A. B j is expressed as
Step 2: Transformation M is applied to A and B j bit by bit and get m partial products:
Step 3: j zeros are added to each partial product R j on the right and get the operands of the sum:
Step 4: The sum of Q 0 , Q 1 , . . . , Q j , . . . , Q m−1 is accomplished by using MSD addition, which is the product of A and B.
B. PARALLEL IMPLEMENTATION SCHEME OF MULTIPLICATION APPLICATION
In order to exploit the advantage of TOC, parallel implementation of multiplication application is presented based on its implementation principle. Basically, in the process of the multiplication, partial products are generated by using transformation M in parallel and the sum of the partial products is obtained with binary-addition-tree MSD addition. The implementation process of multiplication is as Figure 2 shows. In Figure 2 , there are m converters working in parallel which is the same to the number of the bits of the multiplier. add i,j means adder j in tier i, where i ∈ (1, 2, · · · log 2 m), j ∈ (1, 2, · · · , (t2), (t1)). With binary-addition-tree MSD addition, the sum of m partical products needs log 2 m tiers of addition. Character C at the bottom of figure 2 is the product of the multiplication. Meanwhile, it is easy to find that j in add ij is much related to the number of partical products m. For example, t1 equals to m/2 and t(j + 1) equals to j 2 , j ∈ (1, 2, · · · (t2), (t1)). Converter M is to implement transformation M and MSD Adder is to implement addition. MSD adder consists of five converters that are converter T, W, T', W' and T2. Among these converters, converter T2 is the same to converter T.
In the multiplication sheme, parallel implementations are utilized in several aspects as the follows.
(1) As each data bit of TOC can be independently reconfigured, controlled, all the transformation Ms are implemented in parallel which ensures partial products be obtained with high efficiency. (1) Four partial products are generated with four converters M working in parallel. Each converter implement a transformation M to the corresponding pair of multiplicand and assistant data bit by bit. The obained partial products are (1011) MSD , (u0uu) MSD , (0000) MSD and (1111) MSD . Process the partial products with the rule mentioned in section III-A and obtain the operands that are used as the operands of the binary-addition-tree MSD addition. They are (1011) MSD , (u0uu0) MSD , (000000) MSD and (101100) MSD .
(2) Set (1011) MSD and (u0uu0) MSD as the operands of one adder. And set (0000000) MSD and (101100) MSD as the operands of the other. The results of additions are calculated in parallel. After three-step operations of MSD addition, the sum of (1011) MSD and (u0uu0) MSD is (00u011u) MSD . And the sum of (000000) MSD and (1011000) MSD is (01u10u000) MSD .
(3) Set (00u011u) MSD and (01ul0u000) MSD as the operands of the adder. After three-step operations of MSD addition, the sum of (00u011u) MSD and (01ul0u00) MSD is (0001u010u01) MSD . It is the product of the multiplication.
C. RESOURCE ALLOCATION FOR THE PARALLEL MULTIPLICATION ON TOC
To efficiently implement the parallel multiplication on TOC, allocation of optical processor resources is discussed. In the parallel implementation scheme mentioned above, converters M are reconfigured in one time to maximize the efficiency of transformation M. For reconfiguration of MSD adders, two ways can be utilized. The first method is to reconfigure the MSD adders based on the operands in each tier of MSD addition. And the second method is to reconfigure the MSD adders based on the operands of the last tier of MSD addition. From the point of execution efficiency, the second method is adopted as reconfiguration of MSD adder will lead to extra cost. Assume multiplicand A and multiplier B mentioned is the same to that mentioned in III-A whose number of bit is n and m respectively. In order to implement transformation M in parallel, the number of converter M is m. And the number of processor bits for each converter M is n. Therefore, the number of processor bits for converters M is n×m which is expressed with q. Results of transformation M are filled with zero on the right to get the operands of the sum. The largest length of partial product is n + m − 1. It is the length of the operands in the first tier binary-addition-tree MSD addition. In the process of sum of partial products, in each tier MSD addition, the length of the result of MSD addition is two more data bits than that of the operands [14] . And log 2 m tiers binary-addition-tree MSD addition are needed. Therefore, the length of the operands of in the last tier addition is n+m−1+2×( log 2 m −1) = n+m+2× log 2 m −3. In the process of reconfiguration of MSD adder, both converters T and W need (n+m+2× log 2 m −3) processor bits, and both of converters T' and W' need (n + m + 2 × log 2 m − 2) processor bits, and converter T2 needs (n+m+2× log 2 m −1) processor bits.
It is easy to figure out one MSD adder needs (5 × m + 5 × n + 10 × log 2 m − 11) bits of optical processor. 5 × m + 5 × n + 10 × log 2 m − 11 is expressed by s. In the process of sum of the partial product, the number of processor bits of optical processor of for MSD adders is m/2 × s. In the implementation process of multiplication application, (q + m/2 × s) optical processor bits are needed to reconfigure converters M and MSD adders. Use LN express q + m/2 × s. It means the number of processor bits for the continuous idle area of optical processor is LN which is used to allocate converters M and MSD adders. Assume the number of processor bits is enough and the start address of the continuous idle area of optical processor is p. Converters M are reconfigured between p and p + q − 1 and MSD adders are reconfigured between p+q and p+LN −1. The details of allocation of processor bits for converter M and MSD adder are shown in Table 3 .
IV. EXPERIMENTS AND ANALYSIS

A. SIMULATION
In order to verify the correctness of multiplication application implementation method proposed, simulations have been carried out. In the simulations, 1, u and 0 represent the VOLUME 7, 2019 Table 4 shows. It is to notice that the simulation results correspond to that in decimal. That is the multiplication implementation method is correct.
B. EXPERIMENTS ON TOC
Extensive experiments have been done on TOC-SD11 which is composed of two parts: the master computer and the slave computer. The master computer is an electronic computer with 64-bit windows 8.1 operating system, Pentium(R) DualCore CPU @3.00GHz, 4GB DDR4 memory. The main task of the master computer is used to adjust and encode the user's input data and feedback data. The slave computer is TOC. And LCD array of the optical processor in each area of TOC has 576 pixels, that is optical processor can process 576 bits of data in parallel. For more details about TOC, the interested readers are recommended to refer to the literature [14] . The physical experiment system is shown in Figure 3 .
In the experiments, 1, u, 0 respectively represents light with horizontal polarization, light with vertical polarization and dark state. And operands of the third group are chosen as experimental data. That is, multiplicand A is (u1011) MSD , multiplier B is (10u1) MSD .
(1) The length of A and B is 5 and 4 respectively. Therefore, implementation of multiplication application needs 4 converters M and 2 MSD adders. One converter M needs 5 processor bits. Therefore, reconfiguring converter M needs 20 processor bits. And one MSD adder consists of 5 converters T, W, T', W' and T2, in which converter T2 is the same to that of T. Among these converters, both of converter T and W need 10 processor bits respectively. Both of converter T' and W' need 11 processor bits respectively. Converter T2 needs 12 processor bits. Therefore, one MSD adder needs 54 processor bits. Reconfiguring MSD adder needs 108 processor bits. It is easy to figure out that 128 processor bits are needed in the process of implementation of multiplication application. The beginning number of LCD array is 1. Operators are reconfigured at the range of data bits by following the rules as Table 5 shows.
Copy each bit data of multiplier B to generate assistant data as Table 6 shows. In Table 5 , the length of each assistant data and that of multiplicand A is identical.
The assistant data and multiplicand are encoded according to Table 6 . Then follow the rule mentioned in section III-B to implement transformation M. The encoded multiplicand and assistant data are shown in Table 7 .
Sub B of the data in Table 7 means the encoded data is in binary. And it is easy to know that there are four pairs operands for converter M. And there are two types of data, data for the main light path and data for the control light path. Operation has no requirement for data for the main light path and control light path because of the symmetry of the truth value table of logical transformation M. Multiplicand is selected as the data for the main light path and the assistant data is selected as the data for the control light path. The results of transformations M are shown in Figure 4 in which the bright pixel in HH or VH region represents 1 and the bight pixel in VV or HV region represents u. If the processor bits at the corresponding position of four regions are dark at the same time, it means the result is 0. Details of the results are presented in Table 8 according to Figure 4 . The result of transformation M consists of the results of four regions.
The decoded results of transformation M are as Table 8 shows and they are utilized as the operands of MSD adders.
Before the addition operations start, the number of operands should be checked. If the number of operands is odd, one 0 is added to participate the MSD addition. The data bits of the results of transformation M are adjusted according to the implementation principle of multiplication. And they serve as the operands of the converters T and W in MSD adders. The adjusted data for the decoded results of transformation M are shown in Table 8 And the results of transformation T and W of the second MSD addition are (00u1011000) MSD and (0001u0uu00) MSD , which are shown in Figure 6 .
Decode the results of transformation T and W in MSD addition. Append one 0 to the right of the result of transformation T and append one 0 to the left of the result of transformation W. Result of transformation T is set as the data for the main light path and the result of transformation W is set as the data for the control light path in transformation T' and W'. The results of transformation T' and W' in the first MSD addition are (00000u0u000) MSD and (0000101011u) MSD . The results of transformation T' and W' in the second MSD addition are (00010000000) MSD and (00u0u10u000) MSD . These data are as shown in Figure 6 .
Decode results of transformation T' and W' in MSD addition. Append one 0 to the right of the results of transformation T' and append one 0 to the left of the results of transformation W'. The results of transformation T' and W' is used as the data for the main light path and for the control light path in transformation T2 respectively. The results of transformation T2 of the first and the second MSD addition is (00000000011u) MSD and (00000u10u000) MSD respectively, which are shown in Figure 7 .
Once the first tier MSD addition is implemented, the results of the first tier MSD addition are utilized as the operands of transformation of T and W of next-time MSD addition. The addition process is as that of the first tier. The last operation process is shown in Figure 8 . It is easy to find the result of VOLUME 7, 2019 multiplication is (000000u00u01) MSD or −35 in decimal. It is quite the same to that obained with decimal multiplication.
V. CONCLUSION
To exploit the advantages of optical computing, multiplication application is realized on the TOC. Considering TOC has many processor bits and easily extended, parallel implementation scheme of multiplication application is put forward. Meanwhile, analysis and implementation of the process of multiplication operation based on TOC is demonstrated. Extensive experiments show that the multiplication method presented is correct and feasible. It can high efficiently accomplish multiplications in parallel. Meanwhile, there is no carry delay in the process of multipliction applications. It is very promising for the mathematic or even more other applications based on multiplication application.
