Implementing nested conditional statements in SIMD machines by Middleton, David
4 
_ _ ~ ~  ~ -
. -  
NASA Contractor Report 181832 
ICASE REPORT NO. 89-27 
ICASE 
IHPLEKENTING NESTED CONDITIONAL 
STATEXENTS I N  S M D  MACHINES 
(USA-CR-181832)  I E € L E M E I P 1 & 6  E€.SZED N8 902x9 3 
C C I D I P I C U A L  S I I I E E E Y P 5  Ill SltC C4dCHZIJES 
Einal Begort (JCASE) 17 F CSCL G9B 
Unclas 
G3/60 02 117 14 
David Middleton 
C o n t r a c t  No. NAS 1-1 8605 
April 1989 
INSTITUTE FOR COMPUTER APPLICATIONS IN SCIENCE AND ENGINEERING 
NASA Langley Research Center, Hampton, Virginia 23665 
Operated by the Universities Space Research Association 
National Aeronautics and 
Space Administration 
bngley Research Carter 
Hampton, Mrginia 23665 
https://ntrs.nasa.gov/search.jsp?R=19890013702 2020-03-20T03:17:25+00:00Z
Implementing nested conditional statements in SIMD machines 
David Middleton 
Institute for Computer Applications 
in Science and Engineering1 
NASA Langley Research Center 
Abstract 
SIMD computers consist of a very large number of processors executing a com- 
mon sequence of instructions. Maintaining the full speedup potential of such ma- 
chines is most sensitive to conditional execution in their programs, regions of code 
where some PES perform no useful work. Techniques are presented for efficiently 
implemen ting nested conditional stat emen ts, specifically if and cusc statements, 
in SIMD machines, while adding minimal specialized hardware. 
1 Introduction 
An SIMD parallel computer typically provides a very large number of processing ele- 
ments*(PEs) at  the cost of constraining them to execute a common sequence of instructions. 
Regions of code requiring conditional execution, such as occur in case and if statements, 
interfere with maintaining the full speedup potential of such machines. While conditional 
statements are being executed, those PES whose data do not satisfy the predicate per- 
form no useful work. This paper describes implementing nested cusc and if statements 
efficiently, both with respect to the number of instructions used and the amount of spe- 
cialized hardware needed by the PES. 
Bruner and Reeves implement nested if statements in the PES of the MPP [BR83] (dis- 
tinguished by use of the where keyword from those performed in the central controller). 
Thinking Machines provide similar facilities in CM Lisp and C*. A specialised hardware 
This work was supported by the National Aeronautics and Space Administration under NASA Contract 
No. NASl-18605 while the author was in residence at ICASE. 
i 
stack for implementing nested conditional statements effectively has been suggested else- 
where [FMPC]. This work studies implementing nested conditional statements on abstract 
SIMD machines rather than specific ones, in order to determine the appropriate amount 
of hardware support. 
In most SIMD computer designs, each processing element (PE) has some form of 
Enable register which controls whether the globally issued instructions may modify the 
data held in that PE. When an if statement occurs, for example, all PES would evaluate 
the predicate and load the result into their Enable register. Those PES for which the 
predicate is false would effectively ignore subsequent instructions until their Enable bit 
were reset to true. 
In order to nest conditional statements, each PE must maintain a stack of these enable 
bits, with the topmost element indicating whether that PE is active. Section 2 describes 
the transformations to this stack associated with the keywords in if and cuse statements; 
Section 3 presents implementations of these transforms for various SIMD designs; Section 4 
presents variations that use less P E  memory. 
2 The abstract stack of Enable bits 
An if statement takes the form 
if <predicate> then <statement> [else <statement>] endif, 
where <predicate> and <statement> can contain other conditional statements. At  the 
endif, every PE, including those that are disabled, pops its Enable stack. Consequently, 
every PE, including those that are disabled, must push a value on its Enable stack at the 
start of each if statement, A t  the then, every PE replaces its current Enable bit with 
<current cnabfe>A<predicale result> (the value of the predicate alone being undefined 
in disabled PES). If an optional else is encountered, the Enable bit on the top of the 
stack is inverted in those PES that were enabled immediately outside this if statement, as 
indicated by the second stack element. Figure 1 shows the transitions each keyword causes 
1 
I 
to the different stack configurations that it can encounter. (Those configurations can arise 
from keywords other than the next one on the left; for example, endif may apply to a 
stack most recently changed by a then rather than an else, but the possible configurations 
resulting from each keyword are the same). 
IF THEN P ELSE ENDIF 
Figure 1. Enable stack transitions in an if statement 
A common programming technique for SIMD machines is to enumerate all possible 
states of the PES and to issue instructions for each case after enabling the appropriate 
PES. Consequently, case statements are important. They take the form 
caseif <predicate> then <statement> 
[elseif <predicate> then <statement>] 
[default < stat c m e nt > ] 
endcase. 
2 
Not Yet Run 
I Previous Enable Bit 1 
Figure 2. Enable stack for case statement 
A ca8e statement can be implemented with if statements, but the depth of the stack grows 
with the number of case branches. This growth can be avoided by exploiting the fact that 
the nested if statements all terminate together. 
Within the scope of a case statement, the PES may be in one of four states: (1) en- 
abled and performing one of the branches of the case statement (or evaluating a predi- 
cate); (2) disabled, not yet having performed a branch; (3) disabled and having already 
performed a branch; and (4) disabled by a conditional statement enclosing the case state- 
ment. States 3 and 4 are equivalent until the case statement terminates and the PES in 
state 3 are re-enabled. PES in state 2 change to state 1 right before a predicate is evaluated, 
that is, at an elseif or default, and change back to state 2 at the then if the predicate is 
false. PES in state 1 change to state 3 after performing their branch, that is, at an elseif 
or default. 
A case statement uses two bits in the Enable stack as shown in Figure 2; the new 
Enable bit which distinguishes state 1 from the others, and a Not-Yet bit which distin- 
guishes state 2 from states 3 and 4. Section 4 presents an alternative implemention of case 
statements that only pushes one bit onto the Enable stack. 
Figure 3 shows the transitions associated with each keyword in a case statement, the 















CASEIF THEN P ELS EIF 1 ELSEIF2 ENDCASE 
Figure 3. Enable stack transitions in a case statement 
twice duplicates the Enable bit as it exists immediately prior to the case statement, and 
an endcase removes two elements to return the stack to its initial configuration. At a 
then, the predicate result replaces the Enable bit in enabled PES, or equivalently, the 
logical and of the Enable bit and the predicate result replaces the Enable bit in all PES. 
Two actions occur a t  an elseif. First, PES in state 1, having now finished their branch 
of the case statement, change to state 3. Second, PES in state 2 change to state 1 in 
4 
order to evaluate the subsequent predicate. These actions are accomplished by clearing 
the Not-Yet bit in enabled PES, and then copying the Not-Yet bit to  the Enable bit in all 
PES. Default is logically equivalent to "elseif true then" which reduces to elseif. 
3 Implementing the abstract stack in actual machines 
The abstract stack described above is now implemented for realistic SIMD machines, 
with the twin aims of providing speed while requiring minimal additional hardware. The 
purpose is both to show how these conditional statements can be implemented in SIMD 
machines that are already designed and to suggest the appropriate amount of conditional- 
statement-specific hardware to be added to new designs. 
We start with a very simple model of SIMD computer under the assumption that SIMD 
machines with less hardware would need to simulate the Enable stack anyway, and that 
their best approach would depend on the specific instructions available, and the related 
data paths among the PE registers and memory. 
We do not consider the specific instructions necessary for evaluating predicates. AI- 
though the time spent evaluating predicates probably dominates the housekeeping costs 
associated with maintaining the Enable stack, the target applications dictate the impor- 
tant data types and so the common predicates'. The original motivation for this study 
was a proposal involving specialized hardware support for the Enable stack itself, rather 
than the general mechanism which already support predicate evaluation. 
Typical SIMD machines use a one-address instruction format: each instruction issued 
by the central controller epecifies an operation, a single address into the local memory 
' We also assume that the disabled PEs do not affect the evaluation of the predicate for the enabled 
PES; for example, if the predicate involves communication operations or global reductions, such M the 
IFANY statement in the MPP, then the disabled PEE do not alter the result. 
5 
of each PE, and, implicitly, one or more special purpose registers in each PE’s ALU2. 
One of these, the Enable register, must be set for any other register or memory location 
to be affected by the instruction. That is, instructions affecting the Enable register are 
performed in all PES, while instructions affecting other PE state are performed only in 
PES whose Enable register is set. We assume predicate results reside in another register, 
P, that can perform arbitrary logical operations (as in the MPP). 
The topmost element of the Enable stack resides in the Enable register; the rest are 
stored in the local memory of a PE. The Enable stacks have the same height at all times, 
allowing a common stack pointer to be maintained by the central controller. Local ad- 
dressing by individual PES, as provided by the ILLIAC IV and the Connection Machine 2, 
provide no gains to implementing these stacks. 
This first model of SIMD machine can implement the various keywords as shown 
below. Each keyword (with instruction count for comparison) is followed by the instruc- 
tions for the PES. Parentheses distinguish addresses into PE memory from PE register 
names. The expression within the parentheses is computed by the array controller concur- 
rently with its issuing instructions and may involve side-effects such as pre-incrementing 
or post-decrementing the Enable stack pointer, SP. The prefixes global and masked show 
explicitly whether the operation occurs in all PES or only those whose Enable register is 
set; they are redundant for this model since global is equivalent to the destination being 
the Enable register. 
The difficulty for this model lies in saving the Enable register, which is needed to 
duplicate the top of the Enable stack for if and caseif, and to  generate the new Enable bit 
from the old for then and else. Since writes do not occur in PES whose Enable register 
is clear, saving the Enable register requires that the destination be cleared beforehand 
It seems likely that SIMD computers would follow the same evolution of other classes of computers and 
acquire multiple general purpose registers. The model described here reflects current SIMD designs. 
6 
~ 
(effectively using two instructions to save the Enable register, one when it is clear and a 
second when it is set). A program using these statements must preclear the stack a t  the 
































global  Enable 





















t o  
;; Push previous enable (E Enable reg) onto 
;; PE mem. Use 1 if no path from Enable. 
;; Unchanged if Enable not set. 
;; Enable t Enable A P. 
;; Clear Stack element. 
;; Can use 1 instead of Enable. 
;; second stack element 
;; Invert top of memory stack 
;; iff second stack element true. 
;; Reload Enable. 
;; Reset stack. 
;; Push Previous Enable (E Enable reg). 
;; Push Not Yet (E Enable reg). 
;; As above. 
;; Clear Not Yet in memory. 
;; Load Not Yet into Enable. 
;; Clean up stack: clear and pop Not Yet; 
;; clear and pop Current Enable location. 
Endif and endcase acquire an additional one or two instructions respectively to main- 
tain the unused stack entries at zero. Then requires moving the predicate result from the 
P register to  the Enable register, but for this model, the value of the P register is unde- 
fined in disabled PES (even if the Enable register were combined into the predicate result). 
Then takes the logical product of the Enable and P registers by performing a masked 
store of P into a previously cleared location (for which the stack is convenient). Else 
7 
conditionally inverts the Enable register according to a memory location, specifically the 
second element of the Enable stack. 
An alternative approach to saving the Enable register under this model is to duplicate 
its value on the Enable stack (making that one element deeper). Such an approach might 
also be necessary for specific designs (like the MPP) whose Enable register is not easily 
accessible. Any new value computed for the Enable register is first pushed on the stack 




























masked (SP- 1) 
global  Enable 
masked (SP) 



















- 0  
;; Duplicate new stack top from Enable reg. 
;; Update stack top in memory 
;; and in Enable register. 
;; second stack element 
;; Stack top in memory 
;; only changed where second stack elt. true. 
;; Push Not Yet and Current Enable. 
;; (order reversed from previous set) 
;; As above. 
;; Clear stack top in memory: state 1 -+ 3. 
;; (Was part of ‘then’ in previous set). 
;; Clear Not Yet in memory: state 1 -+ 3. 
;; Load Not Yet: state 2 -+ 1. 
;; Update stack top in memory. 
;; Clear and pop Current Enable location. 
;; Clear and pop Not Yet location. 
If statements require only 9 instructions instead of 12 since there is less need to store 
the Enable register before changing it. However, case statements aquire an additional 
instruction per branch to maintain the duplicate top of stack. 
8 
Adding special purpose hardware to each PE can reduce this overhead for conditional 
statements. The second model of SIMD computer to be considered extends the first model 
by adding an Enable field to the instruction format, as the MPP does for many instructions. 
This field closely coincides with the distinction between instructions manipulating the 
Enable stack, which all PES perform, and instructions performing useful work, which only 
some PES perform, according to  their Enable register. This extension effectively adds 
new instructions purely for manipulating the Enable stack; however, the hardware cost of 
“decoding” the Enable field is negligible. 
Each P E  determines whether its memory and registers (now including the Enable reg- 
ister) can be updated according to the logical sum of this instruction bit and the contents 
of the Enable register. The prefixes global and masked now refer to  this Enable field of the 
instruction: global means the bit is set and the instruction takes effect in all PES; masked 
means the bit is clear, so the instruction’s affect depends on each PE’s Enable register. A 
tick emphasizes those instructions exploiting this facility. 
if (1) global‘ 
then (I) masked‘ 




endif (1) global 










+ - P  
Enable 
(SP-1) - 
4 - P  




;; copy stack top to memory 
;; Enable +- Enable and P. 
;; Assumes paths exist between P and Enable. 
;; Load second stack element. 
;; P only changed where 2nd element true. 
;; This sequence assumes P not live. 
;; Push Previous Enable. 
;; Push Not Yet. 
then (1) ;; As above. 
elseif (2) masked (SP) 4-0 ;; Clear Not Yet in memory. 
global Enable +- (SP) ;; Load Not Yet. 
endcase (1) (SP-) ;; concurrently in array controller. 
global Enable c (SP-) 
9 
With this extension, there is no need to preclear the Enable stack nor to duplicate the 
Enable register on the memory stack. Case statements require 3 instructions per branch 
instead of the 5 or 6 required in the previous implementations. Except for else, the if 
statement keywords all reduce to one instruction; if statements require 7 instructions 
compared with 12 or 9 above. 
Else remains unwieldy which suggests adding further special instructions. The nec- 
essary operation is Enable +- Second A Enable, where Second is the P E  memory location 
holding the second Enable stack element. Since Enable implies Second, the above expres- 
sion is equivalent to Enable t SecondeEnable (e being exclusive or). The MPP exploits 
this to use a special facility which tests the equality of its Enable register, G, and its 
logic register, P. The MPP generates the value P = G which is equal to FCB G and so can 
implement else with 
else (2) global' P +- (SP-1) 
global Enable +- P=G. 
A better approach is to make the Enable register a general purpose boolean register, 
supporting all logical combinations of the Enable register with another source; this allows 
flexibility for implementing other conditional statements in the future. The action a t  an 
else becomes 
else (1) global' Enable e Enable A (SP), 
and the if statement reduces to  four instructions of overhead, one at  each keyword. 
4 Implementing the abstract stack densely 
Some current SIMD machines are constrained by the amount of memory available in 
each PE. Two methods are presented for reducing the space used to  implement nested 
conditional statements. 
The first technique for saving space is to  replace the stack of Not-Yet bits with a single 
statically allocated bit. As case statements nest, the stack of states in each P E  follows a 
10 
particular pattern. A t  a caseif, PES in states 2, 3, and 4, push a new state 4 on their 
Enable stacks. PES in state 1 push a new state 1 which may be overwritten by states 2 
or 3 during the statement’s execution. Thus, every Enable stack consists of zero or more 
1 states, a single state 1, 2 or 3, and zero or more 4 states. Since the Not-Yet bit is always 
zero in PES in state 4 and always one in PES in state 1, it need only exist explicitly for 
the Enable stack entry that is in state 1, 2 or 3. 
The Not-Yet bit is only read at occurrences of elseif, to distinguish PES in state 2 
from the others; it is only written a t  a caseif where it is set on being allocated, or an 
elseif where it is cleared in enabled PES. The actions associated with the case state- 
ment keywords are modified so that PES in state 4 neither write nor read their Not-Yet 
bit, in order to leave the Not-Yet bit unchanged in any PES having entered states 2 
and 3. The operation Enable + Not-Yet, associated with an elseif, is replaced with 
Enable +- Not-Yet A Previous Enable. The Previous Enable is one in PES in states 2 
and 3, so their operation remains unchanged. The Previous Enable is zero in PES in 
state 4, so the Not-Yet bit is irrelevant. Assuming instructions with an enable field, the 
actions associated with the different keywords become 
caseif (2) masked (Not-Yet) +- 1 ;; initialise for PES in state 1 
global’ (+SP) t Enable ;; push Previous Enable on memory stack 
then (I )  masked’ Enable +- P 
elseif (3) masked (Not-Yet) c 0 ;; clear Not Yet in memory 
global  Enable + (Not-Yet);; load Not Yet 
masked’ Enable +- (SP-1) ;; A Previous Enable 
endcase (I) global  Enable + (SP-) 
PES in states 2 and 3 may push state 4’s on the stack; this will have no effect on 
the Not-Yet bit. PES in state 1 do not use their Not-Yet bit without either setting it 
immediately beforehand a t  an elseif, or re-initialising it at a more deeply nested caseif. 
Thus, any changes made to the Not-Yet bit by other nested case statemeri I r: have no affect. 
11 
The second technique for saving space replaces the Enable stack with a count of the 
number of disabling entries. This approach needs only logarithmic instead of linear space 
a t  the cost of taking logarithmic rather than constant time. This approach is unattractive 
because programs suited to SIMD machines are unlikely to use conditional statements 
nested deeply enough to  justify the added delay of performing bit-serial arithmetic; nev- 
ertheless, the central controller need only perform addition to  the necessary length, since 
the maximum possible count is known at all times. 
A method is shown for the case statement; if statements can be implemented by in- 
terpreting them as 
caseif <predicate> then <statement> [default <statement>] endcase, 
which is as efficient as techniques designed specificalIy for the if and allows the two state- 
ments to be used together. 
Each PE holds the following values: Enable, either the register or a separate memory 
location if the instruction set causes frequent saving to  be necessary (we assume the second 
SIMD model in this section); Not-Yet, a single static memory location; and Count, the 
number of state 4’s on the abstract Enable stack. An additional variable, Zero, is redundant 
but improves efficiency. The following values encode the different states: 





1 - 0 1 
0 1 0 1 
0 0 0 1 
0 - n > O  0 
Not-Yet is 1 for PES in state 2, not having been altered since the caseif a t  this level 
initialised it, and 0 for PES in state 3, not having been altered since an elseif at this 
level cleared it. Not-Yet is undefined for PES in state 1 because a preceding, more deeply 
nested, case statement may have altered it, and is undefined for PES in state 4, depending 
on whether this state occurs above a state 2 or a state 3. 
1 2  
The code for manipulating these values at the different keywords becomes 
caseif(1) masked (Not-Yet) t 1 ;; set if state 1; no cahnge if state 4. 
(# bit#) global' Half Add (Count) , Enable ;; increment counter in disabled cells 
(I)  global' (Zero) + Enable 
then (1) masked' Enable +- P ;; as before. 
elseif (9) masked (Not-Yet) t 0 ;; clear Not Yet as before 
global Enable t (Not-Yet);; Enable +-Not-Yet 
masked' Enable t (Zero) ;; A Previous Enable 
endcase (1) global Enable t (Zero) ;; states 1,2 and 3 -+ 1. 
(# bits) global' Decr (Count), (Zero) ;; Pop a state 4; perhaps reset Zero. 
The subroutine HalfAdd, executed with the enable field set in all its instructions, adds one 
bit to a number, in this instance, incrementing Count in disabled PES. The subroutine 
Decr subtracts the inverse of a bit from a number, in this case, decrementing Count in 
disabled PES. As a side-effect, it sets Zero when appropriate. These two operations are 
merged together to exploit possible arithmetic operations that may be provided by the 
SIMD machine. For example, the MPP provides single instructions that perform half and 
full adds; the test for a zero result can be interleaved with these operations as the bits 
pass through the arithmetic registers. 
5 Conclusions 
The efficient execution of nested conditional statements on SIMD machines requires 
little additional hardware to support the consequent Enable stack. Special purpose stack 
hardware is not required; the stack can be stored in the general P E  memory and manip 
ulated by standard instructions. An Enable field in the instruction helps noticeably, as 
does, to a lesser extent, providing general logical operations on the Enable register. 
Separating the abstract Enable stack and its evolution from the actual machine facilities 
greatly aided designing the instruction sequences for the if and case statements. This 
approach should be used for implementing these statements on other SIMD designs or for 
implementing other conditional statements in general. 
13 
In the final set of instructions, the Enable instruction field, which is manipulated and 
issued by the array controller, rather than being part of the PES, is applied to subroutine 
calls. This raises issues for the design of the array controller and software techniques as 
regards the several side-effects that a subroutine may have. 
References 
[BR83] J. D. Bruner and A. P. Reeves, “A Parallel P-code for Parallel Pascal and 
0 ther High Level Languages,” International Conference on Parallel Program- 
ming, pp. 240-243, August 1983. 
Second Symposium on the Frontiers of Massively Parallel Computation, Vir- 
ginia, October 1988, and Computer Science TR88-048, UNC Chapel Hill. 
Thinking Machines CM lisp, CM C*. 
[FMPC] 
1 1  
14 
Report Documentation Page 
1. Report No. 2. Government Accession No. 
NASA CR-181832 
ICASE Report  No. 89-27 
3. Recipient's Catalog No. 
4. Title and Subtitle 
IMPLEMENTING NESTED CONDITIONAL 
STATEMENTS I N  SIMD MACHINES 
5. Report Date 
A p r i l  1989 
6. Performing Organization Code 
7. Authork) 
David Middleton 
8. Performing Organization Report No. 
8 9-2 7 
9. Performing Organization Name and Address 
I n s t i t u t e  f o r  Computer Appl ica t ions  i n  Science 
Mail Stop 132C, NASA Langley Research Center 
Hampton, VA 23665-5225 
12. Sponsoring Agency Name and Address 
National  Aeronaut ics  and Space Admini s t ra t ion 
Langley Research Center 
Hampton, VA 23665-5225 
and Engineering 
F i n a l  Report 
10. Work Unit No. 
505-90-2 1-0 1 
11. Contract or Grant No. 
NAS1-18605 
13. Type of Report and Period Covered 
Contrac tor  Report 
14. Sponsoring dgency Code 
16. Abstract 
SIMD computers c o n s i s t  of a very  l a r g e  number of processors  execut ing  a common 
sequence of i n s t r u c t i o n s .  Maintaining the  f u l l  speedup p o t e n t i a l  of such machines 
i s  most s e n s i t i v e  to  cond i t iona l  execut ion i n  t h e i r  programs, reg ions  of code 
where some PES perform no use fu l  work. Techniques are presented f o r  e f f i c i e n t l y  
implementing nested cond i t iona l  s t a t emen t s ,  s p e c i f i c a l l y  i f  and case s t a t emen t s ,  
i n  SIMD machines, while adding minimal spec ia l i zed  hardware. 
17. Key Words (Suggested by Author(s1) 
cond i t iona l  s t a t emen t s  ; 
SIMD machines 
18. Distribution Statement 
60 - Comp. m e r .  & Software 
61 - Comp. Prog. & Hardware 
Unclass i f ied  - Unlimited 
NASA-Langley, 1989 
19. Security Classif. (of this report) 20. Security Classif. (of this page) 21. No. of pages 
Unclass i f ied  16 Unclass i f ied  
22. Price 
A0 3 
