Architectural optimization of a digital optical multiplier by unknown
Cybernetics and Systems Analysis, Vol. 43, No. 5, 2007
NEW MEANS OF CYBERNETICS, INFORMATICS,
COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS
ARCHITECTURAL OPTIMIZATION
OF A DIGITAL OPTICAL MULTIPLIER
A. V. Anisimov and I. A. Zavadskyi UDC 681.325.57
A computational model of estimation of the time complexity of logical circuits constructed from
elements of an optical element base is investigated. A fast parallel multiplier is constructed.
Keywords: optical switch, switching element, multiplier, logical circuit, synchronous arithmetic.
In the past decade, the search for new physical principles that could underlie future processors is activated in
designing high-speed parallel computers. Increasing attention is being given to light as an information carrier. On the one
hand, the development of nanophotonic technologies in the near future will make it possible to control individual photons as
bits in a quantum computer. On the other hand, the control of ultrashort laser pulses makes it possible to perform classical
computations at tens of gigahertz. A change in the basic physics of computers requires new approaches to the modeling of
their functioning. In Sec. 1 of this article, a model of computations is investigated that allows one to estimate the speed of
logical circuits with allowance made for the distinctive features of processing optical signals. In Sec. 2, this model is used in
designing a parallel optical multiplier and, in Sec. 3, its time and space complexity are estimated and compared with
well-known types of parallel multipliers, in particular, with the device proposed in [1].
A reason for investigations in this direction became the results of current works oriented toward the creation and
perfection of miniature optical switches in many research centers of Japan, Western Europe, and the USA. Several
companies produce industrial models of such devices. Their characteristics allow one to create networks consisting of several
switches or several tens of sequentially connected switches without any additional equipment such as optical amplifiers. This
is amply sufficient to widely use optical switches in telecommunications but insufficient for construction of high-speed
logical circuits and low-power computers. Nevertheless, the creation of optical switches that can be used as an element base
of universal processors is the objective of several international programs. Taking into account the high rate of improving the
characteristics of optical switching elements during the past 5–7 years, the creation of an efficient digital optical processor
can be considered as a problem that can be solved in the nearest decade.
It may be noted that the most promising basic elements of optical logical circuits are Fabry– Perot microresonators
and Mach-Zehnder switches in photon crystals. A detailed description of the principle of action and characteristics of these
devices can be found in [2] and [3], respectively, and a brief review of them is given in [1].
1. MODELING OF OPTICAL COMPUTERS
In contrast to the majority of investigations in the field of optical computations, the authors focus their attention not
on the optimization of physical parameters of an element base but on the construction of logical circuits in which the
distinctive features of optical switches are used with maximal efficiency. We consider erroneous the widespread approach to
the creation of optical logical circuits that consists of the construction of switches from some collection of basic gates, for
example, disjunctions, conjunctions, negations, one-digit adders, etc. with successive replacement of the corresponding
elements in well-known circuits constructed from the traditional transistor-resistor element base by these gates. The
7491060-0396/07/4305-0749
©
2007 Springer Science+Business Media, Inc.
Taras Shevchenko University, Kiev, Ukraine, mi@unicyb.kiev.ua; zava@ukr.net. Translated from Kibernetika i
Sistemnyi Analiz, No. 5, pp. 165–177, September–October 2007. Original article submitted May 14, 2007.
inefficiency of this approach is conditioned by the fact that the model of computations that is used for the estimation of the
complexity of traditional logical circuits does not allow one to adequately estimate time characteristics of optical circuits. To
prove this thesis, let us consider some key distinctive features of digital optical computations.
All the signals in computing circuits constructed from electronic logical elements possess equal rights but, in optical
switches, one must distinguish between two types of signals, namely, control and information ones. These signals are usually
of different nature, for example, a control signal can be electric and an information signal can be optical or both signals can
be light fluxes with different wavelengths. The interaction between control and information signals is underlain by a definite
physical process, for example, the pumping or relaxation of a resonator. The speed of running of this process and also the
speed of transformation of an information signal into a control one determine the switch time of an optical element (we
denote it by t sw ). The time of transfer of an information signal through an elementary optical device (t trans) is almost the
same as the time of its transmission along a fragment of a waveguide that has the same length as the switch itself. The value
of t trans must also take into account the time of passage of the minimally possible distance between switches by the
information signal. For the majority of modern optical switches, the ratio t tsw trans/ is within the range from several unities
to several hundreds. Thus, there is a potential possibility to accelerate computations by constructing logical circuits so that a
minimal possible number of control signals are sequentially computed.
An abstraction of a switch is a switching element (SE). It has no more than two information inputs and outputs and
also one control input. Depending on the value arriving at the control input, signals from some information inputs are
transmitted to definite information outputs. In [1] and [4], SEs with two information inputs and one information output (Fig.
1a) were considered. For the unit control signal, the information output is connected with the input denoted by a single line
and, for the zero one, it is connected with the input denoted by a double line. In this case, the signal at the information input
that is not connected with the output is lost. Such losses can be avoided with the help of an SE with one information input
and two outputs (Fig. 1b). For the zero control signal, the input is connected with the output denoted by a double line and, for
the unit signal, it is connected with the output denoted by a single line. In this case, the signal is absent at the output that is
not connected to the input, i.e., this output is considered to be zero. In what follows, we will use precisely such SEs that do
not lead to losses of information signals and, hence, allow one to construct circuits with considerably low-power
consumption. It is precisely one SE with one information input and two outputs that models a Mach–Zehnder switch.
750
Fig. 1. Switching element: (a) with two information inputs and one










Control input Control input
a b
Fig. 2. A single-clock
switch circuit.
Fig. 3. Switching circuit whose time complexity
depends on the value of t tsw trans/ .
If we connect information inputs of some SEs to information outputs of others SEs, then a circuit will be constructed
in which all the SEs are simultaneously switched, i.e., a single-clock switch circuit. The total operation time of such a circuit
equals t Ltsw trans , where L is the number of SEs in the longest path connecting an arbitrary input with an arbitrary output.
For example, the operation time of the circuit presented in Fig. 2 equals t tsw trans 2 . This principle of determination of time
complexity is easily generalized to the case when a computer consists of several single-clock circuits connected so that any
of them can begin its operation only after the completion of operation of all the circuits on which it depends. Devices of this
type that are designed for the execution of most important arithmetic operations are considered in [1]. In many cases, their
time complexity determined as above is essentially less than the time complexity of devices performing the same operations
with the highest speed from the viewpoint of the classical approach according to which the operation time of a circuit is
determined by the largest number of elements along the path from its inputs to outputs.
However, the principle described above does not allow one to adequately estimate time complexities of switching
circuits in the general case since, depending on the value of t tsw trans/ , the operation time of the same circuits can be
determined by different factors. For example, let us consider the circuit presented in Fig. 3. If we assume that signals
simultaneously arrive at all the inputs of the circuit at the moment of time 0, then SE 2 switches at the moment of time
2t tsw trans and its information input is computed at the moment t tsw trans 4 . Hence, if we have t tsw trans 3 , then the
operation time of the circuit equals 2 2t tsw trans , and if we have t t ttrans sw trans  3 , then its time complexity amounts to
t tsw trans 5 .
Let us consider the formula that specifies the operation time of switching circuits and allows one to solve the









sw t l t( ) ( ) . (1)
Here, Vs is the set of all the paths from the inputs of the circuit s to its outputs,  is some path, i.e., a sequence of
adjacent SEs that can be connected by information and control signals, sw ( ) is the number of control signals along the path
 , and l ( ) is the total number of SEs along the path .
As a rule, the better the modeling means being used, the more perfect the devices constructed with their help. In
particular, striving to minimize the time complexity computed by formula (1), we will construct (in Sec. 2) a parallel optical
multiplier whose size is considerably smaller and architecture is simpler than those of the circuit of multiplication of
multidigit numbers from [1].
Before describing the structure of the multiplier, we will pay attention to one more distinctive feature of digital optical
computations. In addition to active elements such as switches, logical circuits also contain passive elements that do not
change their states as a result of definite physical processes. They include optical waveguides, waveguide bends, waveguide
branchings, etc. In our case, it is especially important that optical connectors realizing the logical function OR are also
passive elements. In particular, in [5], the Y -shaped connection of waveguides in a photon crystal (Fig. 4) is considered and it
is shown that, when a defect of special form is created at the junction point of the waveguides, the optical energy is
transmitted in the directions a b
1
 or a b
2
 rather than in the directions a a
1 2
 or a a
2 1
 . Other passive devices for
connection of waveguides, for example, a directional coupler [6, 7], are also widely used. Thus, the execution of the
operation OR in optical calculators does not mean that a physical process runs that occupies some time; it is performed as a
result of a special connection of channels along which optical signals are transmitted. This distinctive feature is used in Sec.
2, and the logical OR is denoted by a filled triangle and is not taken into account in determining the time complexity of
circuits. It is worth noting that the multiplying circuit considered in Sec. 2 can also be realized without OR connectives if we
use SEs with two information inputs and one output.
751
Fig. 4. Y-connection of waveguides that realizes the operation OR
in a photon crystal.
2. OPTICAL MULTIPLIER
The circuit that is described below and realizes the multiplication operation is the circuit proposed in [1] and
improved in a definite sense. Both these circuits belong to the class of multilevel matrix multipliers. A matrix of partial
products is applied to the input of the first level of this multiplier. At the output of the ith level, we obtain k ki i  1 numbers
whose sum is equal to the sought-for product. In particular, at the output of the last level, we obtain two numbers that should
be added together.
The difference between multilevel matrix multipliers consist of the method of translating the sum of ki  1 numbers
(some ki  1-code) into the sum of ki numbers (a ki -row code). In particular, in [1], the matrix of addends is partitioned into
cells and the circuit that is described in detail in [4] and is designed for the transformation of multirow codes into one-row ones
is applied to each of these cells (hereafter, we call the multiplier from [1] cellular). An important advantage of the circuit from
[4] is its two-clock switch time, but it also has two essential drawbacks. First, the circuit from [4] is rather long and if it is
applied to a cell of size a b , then the length of the path over which an information signal is transmitted is proportional to the
value of ab. To decrease the length of the circuit from [4] in a cellular multiplier, it is applied not to the entire matrix of partial
products but to its fragments. Second, the size of the circuit from [4] leaves something to be desired since the result of
summation of a b-bit numbers is proportional to the value of a b
2
. We will describe a circuit that makes it possible to perform a
similar operation with the help of O ab( ) SEs. It is the parallel application of such circuits to fragments of the matrix of addends
that forms the transformation performed at each or some levels of the multiplier described in this work.
2.1. A Circuit Translating a Multirow Code into a One-Row One. In constructing the circuit described below, as
well as in designing the circuit from [4], we proceeded from the following fact: the ith bit of the sum is determined by the
parity of the number of unities in the ith column of the matrix of addends with allowance made for all the carries arrived
from the right bits. However, the method of determination of the parity of the number of unities in a column and the method
of their transfer to the left columns are different in the mentioned circuits.
In the circuit from [4], for calculation of the number of unities, a one-cycle decoder translating an arbitrary binary
vector into a vector of the form 0 . . . 01 . . . 1 with the same number of unities was used. Then the 2 jth output of the decoder
was used as the jth information input of the decoder that processes the next column. Thus, a carry was realized, i.e., instead
of each pair of unities, one unity was “written” in the adjacent column to the left. The decoder has quadratic space
complexity with respect to the length of the vector being decoded and the bits of the vector are its control signals. Since
decoders are connected only by information inputs/outputs, the entire circuit from [4] is switched during one clock cycle.
The parity of the number of unities in a vector can also be found with the help of a linear-size circuit, and we will
replace the decoder by such a scheme. The same circuit will realize the transfer of pairs of carries to the adjacent column to
the left. It may be noted that, in a linear circuit, information channels in which pairs of unities would be “accumulated” are
752
Fig. 5. Determination of the parity of the
number of unities in a column and the
carry to its adjacent column.
Fig. 6. Circuit for translation of multirow codes into
one-row ones.
absent. They are also absent in a similar circuit that processes the column to the left and, hence, a unique method of
introduction of carries into a circuit that processes the column to the left is their equating with some of its control inputs. The
structure of a circuit that processes a column and also the technique of connection of circuits corresponding to adjacent
columns are shown in Fig. 5.
Let us consider the principle of operation of the circuit presented in Fig. 5. Each pair of SEs located at one horizontal
level is controlled by one signal. We note that the bits 01 or 10 arrive at the inputs of any pair of SEs controlled by the ijth
signal xij , and the choice of a concrete pair of bits is determined by the parity of the number of unities in the vector
x xj i j1 1, . . . , , . This can be easily proved by induction. Let us consider the SEs that are denoted by the numbers 1–4 in














 . If we have
x
11




1 , but if we have x
11




0 . We now

































0 . It is




1 , In In
2 3
0 0 


















0 , In In
2 3
1 0 
  , and In
4
1 . Thus, if we have xij  0 , then the same signals arrive at the inputs of the i  1th pairs
of SEs as at the inputs of the ith pair, and if we have xij  1, then the pair of signals 01 is replaced by 10 and vice versa. This
implies that if the vector x xj i j1 1, . . . , , contains an odd number of unities, then 10 arrives at the inputs of the pair of SEs
controlled by the ijth signal xij and 01 arrives otherwise, i.e., the zero output of the last left element connected by a
connective OR with the unit output of the last right element is equal to the oddness of the number of unities in the
corresponding column of the matrix of addends.
Let us consider the process of transfer of a carry to the left column. Again, we assume that SEs 1 and 2 in Fig. 5 form
an arbitrary pair of SEs that is controlled by the ijth signal xij . A carry signal must be generated in the case when xij forms
the next pair of unities in the vector x xj ij1 , . . . , , i.e., if we have In1 1 and xij 1. In order that this signal influence the parity
of the number of unities in the column to the left, it should be transformed into a control signal for the pair of SEs in the
circuit that processes this column (see Fig. 5). It may be noted that carry signals cannot be generated by two adjacent bits of
the vector x xj ij1 , . . . , and, hence, the carry generated by SE 1 is connected by a connective OR with the carry generated by
SE 3. We will also pay attention to the fact that the carry signals from the right column are processed only after processing
all the bits of the left column. Though, from a logical viewpoint, carry signals can be processed at any moment, for example,
before the processing the bits of a column or alternately with such a processing, it is the final processing of them and
precisely in the order in which they are generated by the right column makes it possible to optimize the time characteristics
of the circuit, which will be shown below. The technique of connection of circuits processing several columns of a matrix of
addends is represented in Fig. 6. We call the circuit obtained as a result of this connection basic.
Here, S S
1 2
, , and S
3
are circuit fragments processing the bits of the matrix of addends, P P
2 5
 are fragments




The structure of the circuit W is presented in Fig. 7. In this circuit, SEs form a decoder with a vector of the form
( , . . . , ) ( . . . . . . ) k 1 0 010 0 at its output. In this vector, unity is located at a position w if the outputs of the last circuit Pj
contain w unities. If the outputs of the last circuit Pj contain no unity, than we obtain the zero vector at the output of the
decoder. Given a vector ( , . . . , ) k 1 , we can easily determine the number w w wm , , 0 with the help of a multibit OR
circuits w
0 1 3 5
 	 	 	    , w
1
    
2 3 6 7
	 	 	 	 , and w
1 4 5 6 7 12





These circuits are realized with the help of two-digit OR circuits connected in the form of a treelike or linear structure and do
not require any time for switching.
We denote by SPi the entire circuit that processes the ith column and consists of fragments S i and Pi . All these
circuits are of the form that is presented in Fig. 5 and that corresponds to the rectangle enclosed by a dotted line in Fig. 6.
Note that the processing of carries from the leftmost column, in addition to the fragments Pi located above the fragments S i ,
requires some additional circuits Pj ; their number does not exceed the number of fragments S i and its exact value will be
determined below. We will also show that fragments Pi at most double the size of the circuits that process columns. This
means that the addition of a b-bit numbers requires O ab( ) SEs.
In contrast to the circuit from [4] that switches in two clock cycles, the switch time of the circuit being considered
exceeds the number of columns, which is the payment for the decrease in the size, but the loss can be reduced to zero by a
variation in lengths of columns. The idea consists of providing the condition under which an information signal does not wait
for the corresponding control signal but, during the delay conditioned by switching, passes along the chain of SEs for which
753
control signals are already computed. In other words, the lengths of circuits S i should be selected so that they all
simultaneously come into operation and, at the same time, the information signal should arrive to the last pair of SEs in a
fragment Pi exactly at the moment of their switching.
Let Lsw be the number of SEs in circuits S i or Pi through which the information signal passes during switching one
SE. Lsw approximately equals t ttrans sw/ if t trans and t sw take into account the time of transformation of an information
signal into a control one, time of passage of a signal along the branches of optical waveguides between switches, and other
“overheads.” LS LPi i, , and LSPi denote the lengths of fragments S Pi i, , and the entire circuit of processing the ith column,
respectively, i.e., we have LSP LS LPi i i  .
Let us show that, to meet the above conditions of absence of “idle time” in the circuit, it is necessary and sufficient
that the following equality be true when i  1:
LS LS iLi sw ( ) /1 2 . (2)
By the construction of the basic circuit, the last SE in each column is switched after the time t sw after the switching of
the last SE in the previous column. During this time, the information signal IS in each circuit SPi in which it has not yet
reached its end has time to pass through Lsw SEs. Moreover, the IS in each column is propagated during the same time as the
IS in the first column, i.e., any IS passes through LS
1
SEs even before the switching of the last SE in the first column. Thus,
we have the relationship
LSP LS i Li sw  1 1( ) . (3)
On the other hand, taking into account that to each two control signals in a circuit SPi  1 corresponds one control
signal in the fragment Pi , we obtain the relationship
LP LSPi i  1 2/ . (4)
Relationships (3) and (4) imply equality (2) since we have LS LSP LP LS i Li i i sw     1 1( ) LSPi  1 2/
LS i Lsw1 1 ( )  (LS1 ( ) ) / ( ) /i L LS iLsw sw  2 2 21 . In particular, as is obvious from relations (2)–(4), we have
LP LSi i and, hence, the circuit size linearly depends on the number of bits in the matrix of addends.
Assuming that a circuit contains b columns S i that process n bits in all and c columns Pj that do not belong to SPi and
that its rightmost column processes a LS
1
bits, we determine the relationship between the parameters a b c, , , and n. The
parameter b of the basic circuit is expressed through the parameter a and the circuit length n, namely, b is the largest integer
that satisfies inequality (5) that follows from relationship (2),









( ) . (5)
754
Fig. 7. The circuit W that processes
high-order bits.
The parameter a should be selected so that, for the integer b corresponding to it, relationship (5) is maximally close to
equality. This makes it possible to avoid time losses connected with the delay of the beginning of propagation ISs through
the circuit S b if its length is too small. Among all the values of a that satisfy this condition, the smallest value should be
chosen. This becomes obvious if we note that the operation time of the entire cascade of circuits SPi equals t LSP ttrans b sw ,
i.e., it decreases with decreasing LSPb . To a smaller a corresponds a larger b and, hence, a smaller length of the circuit SPb .
We will also investigate the following question: what is the expedient number of circuits Pj that do not belong to SPj ,
i.e., estimate the value of the parameter c. These circuits cannot begin to simultaneously operate, and since the length of each
next circuit Pj is half the length of the previous one, Pj must begin to operate at the moment that allows it to complete its
operation after the time t sw after the completion of operation of Pj  1. Therefore, the computation of all the bits of the sum
with the help of the cascade of circuits Pj is inefficient from the viewpoint of time complexity. Instead, when the length of
the vector of carries becomes sufficiently small, we will use one circuit W after the cascade consisting of c circuits Pj for
computation of the remained bits of the sum. In determining the value of c, we proceed from the fact that the size of the
circuit W should not “distort” the general estimate of space complexity. This size amounts to ( / ) / /LP LPc c2 2 8
2 2
 SEs.
But if these LPc bits will also be further processed by circuits Pj , then we will need 2 2 4 1 2 2( / / . . . )LP LP LPc c c    
SEs. Thus, as soon as the inequality LP LPc c
2
8 2 2/   is fulfilled that is equivalent to inequality (6) since LPc is an integer,
the use of circuits P Pc c 1 2, , ... loses any meaning at all and their cascade should be replaced by one circuit W,
LPc  15. (6)
The use of the circuit W in the case when we have LPc  15 will reduce the operation time of the basic circuit owing to
the increase in its size.
We estimate the size and total operation time of the basic circuit. If we assume that the circuit W is absent and that all
carries are processed by a cascade of circuits Pj , then a sufficiently exact upper bound of the circuit size can be easily
obtained. To each of n inputs of the circuit corresponds two SEs. Moreover, each pair of bits generates its carry bit, each pair
of carry bits generates one more such a bit, etc., i.e., no more than ( / / . . . )n n n2 4 1    carry bits are generated in the
aggregate and each of them is processed by two SEs. In total, we obtain that the circuit size is no more than 4n SEs. When
LPc  15, this estimate can be larger by a factor of at most LP LPc c
2
8 2 2/   . But if we have LPc  15, then the mentioned
value will be negative and it also should be added to 4n. Since we have LP LSPc b
c
 / 2 ( ( ) ) /a b Lsw
c
  1 2 , we obtain
the following formula for the upper estimate of the basic circuit size:
4 8 2 2
2
n LP LPc c  / , where LP a b Lc sw
c
  ( ( ) ) /1 2 . (7)
The computation of the total operation time of the basic circuit is sufficiently easy, namely, all the SEs in the circuits
S S b1  are first switched in parallel, then the IS is propagated through these circuits and, when it will attain the end of the
circuit S
1
, the last SEs of circuits P Pb c1   are sequentially switched without delays and the last SEs in each column of the
circuit W are switched after them. The time of transfer of the information signal through the circuits SP SPb2  and P Pc1 
can be neglected since the IS is transferred during switchings, but one should take into account the time of transfer of the IS






. As a result, we obtain the following formula for the
determination of the total operation time of the basic circuit:
( ) ( ( ( ) ) / )b c t a a b L tsw sw
c





Table 1 presents the characteristics of the basic circuit constructed from SEs realized with the help of Fabry–Perot
microresonators (t sw  8ps, t trans  0033. ps, and Lsw  240) for different values of n that were chosen so that the parameter a
was equal to 0. In this case, the value of b was determined as the largest integer satisfying inequality (5) and, hence, the
values of n presented in the table are the least n for given values of b. After fixing the values of a b, , and n, the parameter c
varied from the largest value satisfying inequality (6) to a value that was less by 1–2 and thereby determined various ratios
between time and space complexities. For circuits realized with the help of another element base, similar calculations can be
performed.
2.2. Multiplying circuit. The matrices of addends that are located at each level of a multilevel matrix multiplier
should be completely covered by the inputs of the circuits that translate multirow codes into one-row ones. Note that, though
the input columns of each of such circuits can be arbitrarily formed from bits of the corresponding columns of the matrix of
755
addends, from the viewpoint of the decrease in the length of interelement connections, the arrangement of inputs in the form
of a staircase such as that presented in Fig. 6 is most convenient. In this case, the entire matrix of partial products at the first
level of the multiplier can be covered by the inputs of the circuits of translation of multirow codes as is shown in Fig. 8,
where i denotes the inputs of the ith circuit. As a rule, this matrix is represented in the form of a parallelogram but it can also
be represented in the form of a triangle by arranging the contacts in the left part of the parallelogram in a different way.
Since the size of the basic circuit linearly depends on the number of bits processed by it, a partitioning of the matrix of
addends into horizontal strips each of which is processed by a collection of basic circuits will not decrease the total amount.
Calculations have shown that such a partitioning does not essentially reduce the operation time of each basic circuit; therefore, we
will assume that the inputs of these circuits cover the largest possible number of bits from the upper end of the matrix to its lower
end. “Emptinesses” remain at the angles of the triangle of the matrix of addends, but they can be eliminated by a proper selection
of the value for the parameter a of the corresponding basic circuits and also by replacing condition (2) by the inequality
LS LS iLi sw ( ) /1 2 . The unique requirement imposed on the circuits that cover bits at angles is that their operation time must
not exceed the operation time of the largest basic circuit denoted by the number 2 in Fig. 8.
At the first level of the multiplier of n-bit numbers, the largest basic circuit will have n inputs. Its operation time is
specified by formula (8) and is equal to the time of processing the first level. As is easily seen, the word length of the number
into which n input bits are translated by the largest basic circuit is more than the word length of the number LSPb by b  1,
i.e., amounts to
   
b LSP b a b Lb sw      1 1 1log log( ( ) ) bits. As is obvious, this value can be the upper bound of
the number of rows of the code into which an n-row matrix of partial products is translated by the circuit being considered.
Computations showed that, for n within 10 thousand and for the values of Lsw  4 (this inequality is fulfilled for all the
considered types of element bases), this value does not exceed several tens and it is precisely the number of addends arriving
at the second level of the multiplier. They can be processed with the help of the circuit considered in this work and with the
help of the circuit from [1]. Against the background of the total space complexity, the size of the circuit from [1] applied to
such a small amount of addends will not be considerably larger than the size of the circuit that is described in this work and
realizes computations at the second level of the multiplier. However, the circuit from [1] can turn out to be faster and, hence,
it makes sense to construct a combined multiplier that performs computations according to the scheme described in this work
only at the first or at the first and second levels whose outputs become inputs of the circuit from [1].
Let us estimate the size of the circuit located at the first level of the multiplier. Since the lengths of circuits that
translate a multirow code into a one-row one are different and the values of the parameters a b, , and c also vary, the use of
formula (7) for estimation of the total size of the first level of the multiplier seems to be difficult. A sufficiently exact upper
bound of this size can be obtained using the following approach similar to that used in estimating the size of the basic circuit:
to each bit of the matrix of partial products corresponds two SEs, and the total number of such bits during multiplying two
n-bit numbers will be equal to n
2
, which yields 2
2
n SEs. Moreover, each pair of bits generates its carry bit, each pair of
carry bits generates one more carry bit, etc., i.e., the total number of produced bits will be no larger than
( / / . . . )n n n
2 2 2
2 4 1    carry bits, each bit being processed by two SEs. In total, we obtain that the size of the circuit
is no larger than 4
2
n . We note that the size of the two-level multiplier from [1] is asymptotically larger since it is equal to
756
TABLE 1


















































































Fig. 8. Configuration of inputs of




SEs, and the multiplier considered in this article in the case when it is constructed from Fabry–Perot
microresonators and when the values of n amount to 10 thousand will also be two-level.
Let us make some comments on the value of the parameter c in basic circuits. As has been noted in item 2.1, it would
make no sense to assign a value larger than the least number for which the inequality LPc  15 is fulfilled to this parameter. If
this inequality is true, then the value of 4
2
n remains the upper bound of the circuit size but, for circuits of larger length, i.e.,
located more closely to the center of the matrix of addends, it makes sense to decrease the value of this parameter since it is
precisely these circuits that determine the time complexity of a level, and the operation time of the basic circuit decreases
with decreasing the value of c. Therefore, designing basic circuits for concrete values of n and Lsw in the cases when this will
essentially improve the time complexity of the largest basic circuit, we will chose c whose value is smaller by 1–3 than the
least number satisfying the inequality LPc  15. In these cases, the time complexity of several basic circuits located at the
center of the matrix of addends should be reduced to the time complexity of the largest basic circuit by selection of their
parameters c. This approach leads to an excess of the space complexity 4
2
n that should be taken into account. We assume
that the largest basic circuit increases its size by  owing to an additional decrease in the parameter c. Then we consider that
all the basic circuits in which the value of the parameter b is equal to the value of this parameter in the largest basic circuit
also increase their sizes by . In the circuits whose parameter b is smaller by unity, the parameter c can be closer by one to the
value satisfying the inequality LPc  15 than the parameter c in the largest basic circuit. We assume that all such circuits also
increase their sizes by the same value that, as well as the number of circuits, can be determined from Table 1 or from a
similar table for another element base. For example, if we have n  400 and circuits constructed form Fabry–Perot
microresonators are considered, then, for the largest basic circuit, we have a  20, b  3, and the least value of c satisfying the
inequality LPc  15 equals 5. If we put c  3, then we have LPc  62 , the excess of size of the largest basic circuit amounts to
LP LPc c
2
8 2 2 359/    SEs, and, taking into account the contents of Table 1, the total excess of size can be computed as
follows: 2 400 360 359   ( ) 2 360 120 54 56640   ( ) SEs. The described approach allows one to obtain the upper bound
of the space complexity of the circuit.
3. COMPARATIVE ANALYSIS OF TIME
AND SPACE COMPLEXITY OF MULTIPLIERS
Table 2 presents time and space characteristics of different circuits that translate multirow codes into two-row codes
for SEs constructed from Fabry–Perot microresonators. Using the methods described above, the parameters a and b of all
basic circuits are uniquely determined from the word length n of multipliers. The unique parameter that can be varied and
that determines the parameters c of other basic circuits is the parameter c of the largest basic circuit. Let us select
“reasonable” values of this parameter that make it possible to decrease the operation time of the largest basic circuit by
10–20% with increasing the circuit size by the same value. A further decrease in c will progressively decrease the operation
time with increasing the circuit size.
When n  1024, after using the circuit described in this work at the first level, the number of addends at the second
level will not exceed 12. The circuit from [1] is most efficient for addition of this number of 2n-bit numbers and transforms
the matrix of addends into a two-row code at one level, i.e., the entire circuit will be two-level.
In addition to the characteristics of the circuit considered in Sec. 2, for each n, we present the characteristics of a
time-optimal circuit [1] and also an asymptotically fastest classical multilevel multiplier [8] constructed from the elements of
the corresponding optical element base. At each level of this multiplier, a three-row code is translated into a two-row code.
Switching circuits that realize this transformation are described in [1].
We note that, increasing the number of levels and operation time of the circuit from [1], one can reduce its size. With
equating the time complexities of the circuit from [1] and the circuit proposed in this work, the latter will have a size
advantage only about 20–30%, depending on n. However, a smaller number of levels of the multiplier is per se an essential
advantage. Of importance is also the fact that the considered circuit of translation of multirow codes is flat and, taking into
account subcircuits for processing carries, it can be placed on two planes parallel to the matrix of partial products (if O n( )
subcircuits of W are not taken into account). At the same time, a natural construction for the two-level multiplier from [1]
consists of the placement of O n( )
/3 2
subcircuits from [4] perpendicularly to the matrix of addends.
It is also necessary to note that, in the case when optical-optical Mach–Zehnder switches in photon crystals are used in
the capacity of the element base, the circuit described above has scarcely any size advantage over the multiplier from [1] and
757
its operation time is less by 10–40% when the word length of multipliers is within 1000. This is explained by the smallness
of the value of t tsw trans/ for a Mach–Zehnder switch, this smallness implies a small cell size in the multiplier from [1],
whereas the circuit considered above reduces the space complexity arising in this multiplier during processing larger cells.
REFERENCES
1. A. V. Anisimov and I. A. Zavadskyi, “Synchronous optical multipliers,” Cybernetics and Systems Analysis, No. 4,
102–116 (2006).
2. C. Angulo Barrios, V. R. Almeida, R. R., Panepucci, et al., “Compact silicon tunable Fabry–Perot resonator with low
power consumption,” IEEE Photonics Technology Letters, 16, No. 2, 506–508 (2004).
3. K. Asakawa, Y. Sugimoto, and Y. Watanabe, “Photonic crystal and quantum dot technologies for all-optical switch
and logic device,” New Journal of Physics, 8 (208) (2006), http://ej.iop.org/links/r9P_wqX0w/_BYBIl782xGbA
Ynnav5vpA/njp6_9_208.pdf.
4. I. Zavadskyi, “Multiplication using switching elements,” Visnyk Kyivskogo Univ., Ser. Fiz.-Mat. Nauk, No. 4,
145–156 (1999).
5. “Topology optimization of asymmetric Y -junction for air-bride type photonic crystal slab waveguides,” in: Proc.
PECS-VII (2007), cmpweb.ameslab.gov/PECSVII/Abstracts/ WATANABEyoshinori.pdf.
6. F. Cuesta-Soto, A. Martinez, J. Garcia, et al., “All-optical switching structure based on a photonic crystal directional
coupler,” Optics Express, 12, No. 1, 161–167 (2004).
7. Ts. Shyh-Lin (2) and Lu Chun-Yi, “BPM simulation and comparison of 1 x 2 directional waveguide coupling and
Y -junction coupling silicon-on-insulator optical couplers,” Fiber Integr. Opt., 21, No. 6, 417–433 (2002).



















64 51 16400 29 133100 89 54600
128 58 84100 36 156700 97 219400
256 69 310600 40 918500 113 881300
512 77 1361000 43 4596000 130 3532000
1024 87 5051000 50 25436000 146 14139000
