Purdue University

Purdue e-Pubs
Department of Computer Science Technical
Reports

Department of Computer Science

1981

Processor Displacement: An Area-Time Trade-Off Method for
VLSI Design
David M. DeRuyck
Lawrence Snyder
John D. Unruh

Report Number:
81-394

DeRuyck, David M.; Snyder, Lawrence; and Unruh, John D., "Processor Displacement: An Area-Time TradeOff Method for VLSI Design" (1981). Department of Computer Science Technical Reports. Paper 320.
https://docs.lib.purdue.edu/cstech/320

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries.
Please contact epubs@purdue.edu for additional information.

PROCESSOR DISPLACEMENT: AN AREA-TIME TRADE OFF
METHOD FOR VLSI DESIGN

David M. DeRuycJc"
Lawrence SnydBT
John D. [}n:ru.h."

Department

or Computer Sciences,

Purdue University,

West Lafayette. Indiana 47907

ABSTRACT

Direct VLSI implementa.tion of pipelined (systolic) processor arrays can lead Lo an "over parallclized" design causing the chip Lo have unused or underutilized area. Processor
displacement design is a methodology thal provides a specLrum of designs with differing lime-area trade ofTs. The
methodology is motivated, presented in detail, and illustrated by several examples. Direct experience for the Transitive Closure and Dynamic Programming systolic arrays is
presented.

The WOI'Ie described herein is part of the Blue CHiP Project Elnd is supported ill part by the Otrice of Naval Research Contracts ND0014-00-K-OB16
and NOOO1-1--01-K-D360. The latter is Special Research Opportunities Task
SRO-IOO.

't'AuLhors permanent address: Bell Telephone Laboratories, Naperville, IL
60566.

PROCESSOR DISPLACEMENT: AN AREA-TIME TRADE OFF

METHOD FOR VLSI DESIGN

David M. DeRuyck·
LaWTence Snyder
John D. Unruh."

INTRODUCTION
Area-Lime trade otrs ror computing functions in VLSI technologies
have been the subject of much study in recent years [1,2,3,4]. Although
imporlant theorelically, these results lend La be based on asymptotic

anulysi:s and employ rather coarse resource measures.

To date, their

impild on VLSI design and layout has been minimal.

We report on a methodology called pTocessor displacement design
which provides area-Lime trade offs for pipelined arrays of processors
(systolic arrays [7]) thal are useful for practical VLSI design and layout
problems.

Proeessor displacement gives the VLSI designer a range of choices
that can be balanced to conform to constraints such as "pin"
1'l1c work described herein is part of the Blue CHiP Project and is supported in parL by the Ocnee of Naval Research Contracts N00014-BO-K-OB16
ilnd N00014-01-K-0360. The latter is Special Research OpporLunilies Task
:-iIW-tOO.

·i\lILhors permanent address: Bell Telephone Laboratories, Naperville, IL
G05GG.

-2limitations and to increase the size of the problem solved with a

given chip area.
The widespread interest in systolic algorithms (see the references [5,6])

provides many opportunities La apply the methodology.
There are several benefits to processor displacemenL design. Il pro-

vides a means of rapidly responding to the uneven improvements in process technology, e.g., when feature sizes reduce without a corresponding
improvement in packaging technology. lL gives a rational basis for decid-

ing between serial or parallel data tro.mlfer on and oIr the chip
methodology can even be

tran~ferred to

The

solving the problem of mapping

large problems onlo fixed size multiprocessor archiLect.ures.
The remainder of Lhe paper is organized as follows. The next section
givcs an example of the use of the methodology as well as its benefiLs and
liabilities. NexL comes a Lhorough presenLation of the methodology. The
ftnal secLion gives a summary and a discussion of some remaining issues.
MOTIVATION AND PROBLEM CONTEXT

III order Lo illustraLe processor displacemenL, consider an idealized
design siLuaUon. A systolic array processing element cell, visualized as
containing processing circuitry and slaLe memory. has been designed.

circuit
state

I
-3Foul' processing elements of a linear systolic arTilY have been implcmented as shown in Figure 1.

(Kung and Leiserson's lower triangular

banded-syslem solver is of this variety [B, pp. 285-288). The siale values

arc x, y and

11,

and the circuitry performs y

l--

Y + oz.) We suppose that

Lhe four elements fully utilize the available chip area al A = 2xf.Lm (for
some real value x > 0), and that the liming is such that. all processors arc
acLivc on each step once the pipeline has been filled.

Moreover, we

assume the eight ports of the array are connected La the eight pins of our
(over simplified) package.

(We can ignore power, ground and clocking

. wires in this discussion.)

{]DDG
DODD
10

p

0-0 [Y-6
Figure 1.

Now suppose the circuii is La be fabricnted with a A = Xf.L7Tl. process.
This factor of two density improvement enables the systolic array to be
rculi~ed

with only one-fourth the area of the previous implementation,

Wigurc 2). II is possible, therefore, to increase the implementation to
sixteen processing elements (Figure 3). NoLice Lhat this ean be done by a
globul reorganization of the cells wit.houL any cell redesign.
The sixteen element systolic array has twenty ports, but for the sake
of this discussion, we still assume that only eight pins are available. It is

-

~

-

0 L~
D ~
o
jDD9f
~IDDDDrO

o

-Figure 2.
possible to multiplex the pins, bul doing so has a liability: Processing ele-

ments 7TJ.ust remain idle awa.iting data. NoL only does this mean that we
never have all sixteen copies of the processing circuitry active at once
und thus waste silicon area, we must break open our completed cell
design La add idling logic .
. _._- --------- ---::::----,

ofrnmODolJ
~oDobood

otbtb~
Figure 3.

SpecifLcation

or

sixteen processing clements without

il

corresponding

increase in pin uvailability cuuses us tu over paruHc1i:lc the design. We
simply have more processing circuitry than can be utilized. Allhough
this simplified example can be fixed by adopting a larger package. it is

-5-

illustrative of a general fact that cannol be ignored: TheTe must be a balance between the parallel computation capability oj the processing circu.itry and the data transfer capability pTovid2d by the pins. It is this

balance that processor displacement design is intended to control.
Conlinuing with the example, notice that in both of the A = x/-Lm
implementations, silicon area is wasted; either it is unused (Figure 2) or
undcruLilizcd because of multiplexing (Figure 3).

We can bring this

wasted urea intq productive use by increasing the size or the problt:m
solved on a chip. The idea is La reduce the amount of processing circui-

try unLil it maLches the dala transfer capacity of the pins. (In this case,
only four copies of the circuitry are required (Figure 4), allhough the
siluuLioll is more complicated in general.) The remainder of the effective
chip area is dedicated to slate storage for processing clements that will
be implr.fficnted by essentially multiprogramming the circuitry. A multiplexor is provided for this purpose. Each cluster of sLate sLorage cells
und processing circuitry is called

11

multiPE. By using this processor dis-

placemenL approach, we have increased Lhe size of the implemented sysLolie array La 28.

o

000

Figure 4.

I

- 6This factor of seven improvement in effective density from a factor of
two improvement in wire width was achieved without an increase in avail-

able pins. We paid for the improvement with a loss in time. bul assessing

Lhe exact amount is difficult. The layout in Figure 2 is fusLer than thaL of
li'igure 1 by the speed improvements due Lo scaling, provided Lhe dala
can be delivered fast enough. This gain is offset in Lhe layout of Figure 3
since it is slower by a factor of four compared with Figured 2 assuming
we do not multiplex Lhe four "end" porls. Compared with l·'igure 3, the
luyouL of Figure 4 looses a faelor somewhaL 1l:s:::; than Lwo under the same

assumptions on multiplexing.
THE METHODOLOGY

The methodology to be described is nol, as yel, a fully mechanical
procedure suitable for compuler implementation as a subroutine in a
CAD system. It requires the designer to make judgements and estimates
based on his experience. Neverlheless, the process is quile procedural
and we will organb:e our presentation according to the six sleps of the
methodology.
In order to aid the reader in understanding the detailed discussion of
the individual steps. we give the methodology in its enlirety:
1.

Develop an abstract systolic array processor (ASAP) to solve a
problem of arbitrary size.

2.

Design the processing clement cell.

3.

Figure the effective chip area and the pin count.

4.

DeLcrmine if processor displacement is needed.

5.

Compute the amount of processor displacement.

6.

Layout the displaced processors and establish their timing protocols.

We now describe each step in detail.

-71.

Develop a.n a.bstract systolic array processor (ASAP) to solve a

problem of arbitTaMJ size. SysLolic arrays arc regular, locally connected

arrays of one (or a small number of) processing element(s) that operate
in a synchronous, pipelined manner and have external connections only

aL Lhe perimeter. (Sec references [9,10] for characterizations by the
inventors.) Three kinds of interconnection structure are typical: linearly

connected, orthogonally connecLed. and hexagonally connected.

Other

connections have appeared, such as the Loroidally connected Transitive
Closure Systolic Array [l1J. and these are suitable for our mythology provided

that Lhe connections are sufficiently "local" that clustering

preserves the interconnection structure.
I n general, the "size" of an ASAP will be proportional to its perimeter

and describes some property of the size of a give problem. For example,
in the Kung and Leiserson Banded Matrix Systolic Arrays [8], it is the
width of the band, not the size of the matrix, that determines the size of
the array. Thus, the width is designated as the size, n, of the ASAP. In
thc casc of the Transitive Closure Systolic Array [11], the size, n, is the
number of vertices.
The ASAP will determine two functions which have the size as a
parameter:
m(n) = number of processing clements in an ASAP of size n,

z (n) = number of inputs of an ASAP of size n.
The z(n) function describes the number of "values" that must bc
transferred to an array of size n on each logical step, once the pipeline is
[uiL These inputs arc the candidates Cor multiplexing and so the function

must be formulated with some curc. In particular, for uniformity it may

-8be wise to omit certain inputs from this function as was done with the
four "end" wires in the example of the lasl section. The Lerm "values"
here refers Lo logical values, nol bils. (Sec Step 3 for further discussion.)

For example, the linear ASAP of the last section has z (n)
2.

De!:.'ign the Processing Element Cell.

= n.

The objective is not La

design a single, monolithic cell, buL ralher to design two cells: a process-

ing circuilry cell, pc cell, and a slate memory cell.

STTt

cell. Together.

these lwo cells should define a processing element fur the ASAP.

BuL

they should also define a family of cells, each one composed or one
instance of lhe pc cell and mulLiplc inslances of the sm cells. These serve

as mulliPl!:s when multiplexing conirol logic is added. These conditions
imply not only that the two cells have a compatible geometry, but that
they are compatible with additional copies of the sm cell. (Sec Step 6 for

a di.scussion of the effect of various clustering choices.) In order that
"high level" manipulation of lhese components be possible without any
internal modification, bus wires and selection lines should be incorporated into the sm cell.
Although many systolic arrays use only one kind of processing clement, it is possible that several types will be required l8]. II this is the
case, several pc eelltypes will obviously have to be designed. Several

5111

cells may be required too, allhough these tend to be the same over the
entire urray. When multiple element types are required, there will be
geometric constraints within the multiPD: as well as belween muliiPEs.
Moreover lhere may be limits on the kinds of clustering possible, (see
Slep 6.) These considerations should obviously be assessed before designing.
There are two values thal are determined by the cell design LlmL will

- 9 -

be needed laler:
11

= area of one processing clement, Le" area of a pc cell and un sm

cell,
q = that. fraction of a used by the sm cell, i.e., area of an sm celli a..
Sinc(~

the subsequent analysis only requires these two values and noL the

designs lhemselves. it is sufficient to have good estimates in order to

proceed.
By proceeding on the basis of good estimates, information can be
learned about two important design decisions. First, it is possible that

givcll layout dimensions and cerLain clustering strategies can lead to
mu!iiPE geometries that do nol pack well inlo the available chip area.

Thi::; could make a processor displacement design unachievable.

By

esUmating the area, we can determine the d,egree of clustering and this
will allow us to infer preferred cell dimensions that will pack easily.
Secondly, it may not bc obvious how much parallelism is Ilppropriatc for
datu Lnmsmission. Since this decision will probably infiuence cell design,
we cun work through the methodology wiLh several assumptions on the

extent of parallelism and compare the results. This approach is recomrncrLded when speed is

a. I''igure the

0.

significant consideration.

~ffective

chip area and the pin count. Not all of the

ehip arca is available for use by the systolic array processing elements.
III udditiun Lo input/oulpuL pllds, we may llecd area for multiplexor logic,

!.Jus wires for routing signals !.Jelwcen Lhe pads und the urruy clements,
und possibly, buffers for timing (see Step 6). The area occupied by all of
these overhead components should be determined (or estimated). Define
the remaining area as

- 10A = effective chip area.

We assume that A is a rectangle with dimensions that permit convenient
packing of pc and sm cells.

or the

pins available on the intended package, some will be dedicated

to power ground and clocking signals. The remainder will be assigned to
I

the data transmission acLivity of the systolic nrray. If certain ports were
not included in the zen) definition (Step 1), then they

mu~l

be per-

manenLly assigned to pins and the available number reduced accordingly.
U there is a single output from t.he array, ihis should be included in the
permanently assigned pins.
The remaining pins are available to be used by Lhe mulliPE::;. If the
processing elements use parallel input (and. perhaps output), tlien divide

the available pIns by the width of the parallelism. (This allows us to refer
to a "pin" without reference to serial or parallel data transfer.) Now, if
the ASAP produces multiple outputs, then we assume there arc z (n) of
them and that they use the same degree of parallelism as the input. If
so, divide the number of avoilable pins by two, since for cach pin us-signed
to the input, onc must be assigned to an output. (Any other ralio can be
handled analogously.) Define this result to be
p

= number of available pins.

This is the number of data "values" that can be read in a single logicol
sLep (sec Step 1).

4. lJetermine if processor displacement is needed. The objective of
this step is to determine if there are sufficiently many pins to permit a
direct implementation of a portion of the systolic array. Clearly,

,.
,

- 11-

y

=

l: 1

processing elements could fit. into the available chip area. By solving
m(n)=y

for n we can determine the size of the region of the ASAP that. l1ts on one

chip. Tills region of the ASAP will require z (n) pins. Thus. if
z(n)~p,

direcl implementation of the systolic array is possible with full parallelism.
Cufe must be exercised in inlerpreLing tIle results of Lhe preceding
LesL.

H direct implementation is possible there is sLtU the problem of

packing Lhe pc cell. sm cell pairs into the available area. In Lhe following
we ussume that
z (n) > p
by a "subst.antial" amount.

5, Compute the amount oj processor displacement. Using Lhe previously J.eilncu functions,
m.(n) = number of processing clements in an ASAP of size
Z

t1.

(71.)

= number of inputs of an ASAP of size 7L, and conslanls,

= area of a processing element,

q = fraction of a reqUired for state memory,
A = effective chip area
p

71.,

= number of I1vnilable pins,

·

- 12-

we can derive an expression for the area occupied -by a displaced processor design.

The key quantity in our analysis is the bundling Junction,

~,

k(n);

p

whien describes the degree of multiplexing required of the p pins in order
Lo deliver the z(n) input required by one logical step of an ASAP oj slze n.
Thus, in k(n) processing st.eps the dalu for one logical sLep of a size n
ASAP call be read. Since the sysLolic array is assumed to require this

many inputs on each logical step, the parallelism of the ASAP should be
reduced by a facLor of k(n). Thus. each muliiPE should

simulaL(~

k(n)

logical processing clements, and hence lhe name "bundling function."

,

With k{n) bundling, the m{n) logical processing elements can be
simula:Led by m(n)/Ie(n) muliiPEs each containing ken) -1 addilional
memory sLaLes. The tolal musllhus satisfy

A; [m(n) + k(n) - 1 m(nlq]a.
k(n)
k(n)

SubsLltuLing for k(n) and simplifying the resulLing quotients yields

A;

[m(n l --l'....-+
z(n)

z(n) -p m(n)CJ]a.
z (n)

Further simplification gives

A; apm(n)
z (n)

[1+

q

[~_I]]
p

(I)

Since all quantities are known as a resulL of Steps l 3. we can solve for n
w

and determine lhe size of lhe ASAP thaL could be designed. Knowing n
allows us lo compuLe lhe bundling factor, Ie. from Ie (n).

.

·.
- 13-

Example: Equation (1) can be used to derive the displaced processor

design given in the second section. For the linear systolic array,
men}

= nand

*,

q =

zen}

= n.

We used p = 4 before and if we use a = 1 and

then A = 16 is appropriate. Thus

and n =28. Since k (n)

= n/ 4-.

The bundling factor is 7.

Obviously, some judgement must be employed in applying equation
(l). i"or example, since Lhe bundling fucior describes the extent of the

mulLiplcxing and the number of displaced processors in a ffiulliPE. Lhe
de~ign

is grcaUy simplified if it is an integer. Thus. one might choose the

grcuLc:-;L n less than that determined by equation (1) such that ken) is an
integer.
G.

f,ayoul the displaced processors and establish their timing proto-

cols. With a bundling factor of k eslablished in Step 6, we layout t.he mulliP~:::;

such that each contains a copy of the pc cell and k copies of thc sm

cell. The multiPEs are then laid oui in the available area such lhat their
inpul/output. ports are eonneet.ed either to input/out.put. pads or the
porL:-; of adjacent processing elements.

The layout problem, as has

ulrcu.dy been meuLioncd, is subject to packing diHicullies when the
dimensions oJ t.he available area are not mulliples of t.he dimensions of
the lTIultiPEs.

Wiring t.he ports of a linear systolic array should be a

sLraightforward operation. Bul wiring and timing two dimensional systolie arrays presenLs some int.eresting problems.

- 14Each mulliPE will simulate a contiguous region of k logicol proccsslng clemenLs of the ASAP.

The gcomcLry uf this region significantly

effects the multiplexing operation. Let the bundling facLar k :::: 4 und consider two mulliPEs that. simulat.e regions of t.he ASAP with input on two
siues. euch with different. geometries. + (Heier io Vlgurc ~.) MulLiPE }]
simulates a 2 x 2 block

c

Ji'igure 5.
of logical processing elements while multiPE C simulates a columnar
region.
The key difference between multiPEs Band C is thai when they
appear along the perimeter (ignore the corner case for the moment.) of
the ASAP, Lhey have different numbers of external porls; B has Lwo while
C has four.

Since each pin will c.lclivcr

fOUT

values in a logical sLep, each

C mulliPE processes exacLly Lhe amounL of daLa provided by the pill. BuL

each B InulLiPE can only use two inpuLs during a logical cycle, so Lwo B
mulLiPEs musl be allached lo one pin.

• NuUcc thnt w(! un~ nol rcferrilLg to the geomelry of Lhc pc cell and :;rn cell orgun"
iZilLion of Lhc I;;tYouL.

,

.
- 15-

The main consequence of several multiPEs sharing one pin is that the
order in which the constituent processing elements are simulated cannot
be Lhe same. For example, suppose two B multiPEs appear along the perimeter and share a pin. Then on processing step one, they cannot both
simulate processing element

< 1. t> since that would require them both to

read Irom the pin simultaneously. We must either introduce a buffer or

change the order in which constituent processing elements are simulated, so that they are not both reading at once. Notice that linear mul-

LiPEs are not subject to this difficulty.
Handling the external input for the multiPE that simulates the
corner processing element adds a bit more complexity because for either

geometry, il has more inpuls than the others. In either ease the multiPE
will be connecled Lo two different pins. Again, changing the ordering of

Lhe simulations (now iL musi be done along both sides) or buffering solves
Lhe problem.
Perhaps the simplesL solution is to use C multiPEs such that the
corner rnulliPE simulaLes in sequential order down Lhe column and the
k-l adjacent C mulliPEs simulaLe in an ordcr thaL is a cyclic shift
(upward) of this sequence. (Obviously, analogous remarks apply for the
output.)
EXAMPLES

We have used Lhe processor displacement methodology to analyze
Lwo sysLolic array algorithms. The compuLed resulLs are summarized in
Taule I. The processing elemenL layouLs usc Mead and Conway [9J design
rule::;.

The expression "p

Unlitlliled.

---'.

= z(n)"

means Lhat pins are assumed Lo be

- 16-

TABLE I.

IP

Transitive
Closure
params

men)

n'
2n
6mm x6mm
11286;\2

zen)
A
a
q

0.5
21'm

l1'm

n
men)

26
784

56
3136

p=z(n)

n

28
784
784

66
4356
1452

package

32
1024
512

71

5041
1261

package
size =40

men)

multiPEs
n
men)

multiPEs

size =64

Dynamic
rogrammJDg

n'
5n
6mm)( 6mm
285000>..2

0.25
2,.m
5
25

l1'm

8
64
16
9
81

19
361

11

11

121

~O

20
400
24

For the Transitive Closure Systolic Array [11], n is the number of ver-

tices. Since the input does not overlap with the output. the same pins
are used for both operations. Notice that there is no benefit in processor
displacement for>.. = 2J.l-m and package size of 64 since only 56 pins are
needed in addition to the four overhead pins, i.e .. in this technology it is
possible Lo have full parallelism.
The Dynamic Programming Systolic Array solves string distance
measurement using an n x n array with six bit data. Thus, for a 64- pin
package

p

= (64 -

4)/ 6 = 10 logical pins. Each cell requires three values

from the north and two from the west.
Notice that the values in Table I may be optimistic in the sense that
"divisibility constraints" have been ignored. To usc the pins optimally, n
should be chosen so ihat. z(n)/k(n) is an integer. If it is not, some
bandwidth will be wasted or the timing will be significantly complicated.
A [wother constraint lhat one might require is lor m(n)/ k, the number of

·.
- 17-

mulliPEs used, Lo be an interger. Fractional numbers could be achieved
by having mulLiPEs simulate fewer than Ie logical processing elements.
The Dynamic Progrumming army for lJuckagcs of 6'1 pius lim! ),. ;:: 2J1.Tn

is the only table entry to satisfy both constraints. We can often reduce
lhe problem size to enforce the divisibility constraints. For example, in
Lhe Transitive Closure, package of 40 pins, A ;:: 1/.l.T7L case, the problem

si~e

must be reduced La n ;:: 51 before the divisibility constraints are met.
This choice reduces the area utilization Irom betler than 98% to about
6l%.

However.

if

the

available

area were

about

1.5% larger (or

cqUlvulcnLly, Lhe cells were proporllonutcly smaller), u problem size of
n ;:: '(2 (with k ;:: 4) would be possible. This would require redesigning Lhc
iU}Jut./ouLpuL pad area. In general the divisibiliLy constraints can be conLrolled wiLh several parameLers and the optimal combination depends on
the designer's judgement.
:;UMMl\RY

We have presenLed a six sLep methology that allows the amount of
panllklism in a sysLolic array La uc maLched to the data transfer
uand w it! III provided by Lhe pins. The leehnique appears to be up plicable
Lo sysLoliL: arrays wiLh a wide variety of characteristics. IL provides a
mCUllS

of e.vr.l.luaLing the benefits of serial

fully uUlizing the available silicon.

V~·

parallel daLu transfer and for

- 18-

Temple Window Lattice. Mount Omei, Szechwan, (date unknown).
REFERENCES
[1]

C. D. Thompson
Complexity Theory for VLSI
Ph.]), Thesis, Carnegie-Mellon University. 1979

[2]

R. P. Brent and H. T. Kung
Area Time Complexity of Binary Multiplication
Technical Report CMU-CS-136. Carnegie-Mellon University, 1979

[3]

Jenn Vuillemin
A Combinatorial Limit Lo the Computing Power of V.L.S.J. Circuitry
Proceedings of the 21st Annual Symposium on the Foundalions of
Computer Science, IEEE, 1980

[4J

R. J. Lipton an Robert Sedgewlck
Lower Bounds for VLSI
Proceedings of the 13th Annua.l Symposium on the Theory of Computing, 19B1

[5]

John P. Gray (ed.)
VLSI 81 Proceedings of an International Conference on VLSI
Adadernic Press, 1981

[6J

H. T. Kung, Bob Sproull, and Guy Sleele (eds.)
VLSI Systems and C07TLpulations
Computer Science Press, 19B1

[7]

H. T. Kung and C. E. Leiserson
Syslolic arrays (for VLSI)
Technical Reporl CS-79-103, Carnegie-Mellon University, 1979. (Also
in reference [0].)

·.
- 19-

[0]

Curver Mead and Lynn Conway
Introduction to VLSI Systems
Addison Wesley, 1980

[9]

Charles E. Leiserson
Area-Efficient VLSI Computation
Ph.D. Thesis, Carnegie-Mellon University, 1980

[10]

Ii. T. Kung
Why Systolic Arrays?
Computer, IEEJ£,

[llJ

January 1982

1... J. Guibo.s, H. T. Kung and C. D. Thomp!wn
Direct VLSI Implementation of Combinatorial Algorithms
Proceedings of the CalTech Conference on VLSl, California Institute
of Technology, 1979

