Missouri University of Science and Technology

Scholars' Mine
Computer Science Technical Reports

Computer Science

01 Jul 1985

A Parallel Branch and Bound Algorithm for Integer Linear
Programming Models
Rochelle L. Boehning
Billy E. Gillett
Missouri University of Science and Technology

Follow this and additional works at: https://scholarsmine.mst.edu/comsci_techreports
Part of the Computer Sciences Commons

Recommended Citation
Boehning, Rochelle L. and Gillett, Billy E., "A Parallel Branch and Bound Algorithm for Integer Linear
Programming Models" (1985). Computer Science Technical Reports. 81.
https://scholarsmine.mst.edu/comsci_techreports/81

This Technical Report is brought to you for free and open access by Scholars' Mine. It has been accepted for
inclusion in Computer Science Technical Reports by an authorized administrator of Scholars' Mine. This work is
protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the
permission of the copyright holder. For more information, please contact scholarsmine@mst.edu.

A PARALLEL BRANCH AND BOUND ALGORITHM FOR
INTEGER LINEAR PROGRAMMING MODELS
Rochelle L. Boehning*and Billy E. Gillett

CSc-85-2

Department of Computer Science
University of Missouri-Rolla
Rolla, Missouri

65401

(314) 341-4491

*This report is substantially the Ph.D. dissertation of the
first author, completed July 1985.

ABSTRACT

A parallel branch and bound algorithm is developed for
use with MIMD computers

to study the efficiency of parallel

processors on general integer linear programming problems.
The Haldi and IBM test problems and a System Design model
are used in the implementation of the algorithm.

Initially

the algorithm solves the Haldi and IBM test problems on a
single processor computer which simulates a multiple
processor computer.

The algorithm is then implemented on

the Denelcor HEP multiprocessor using two of the IBM
problems

to compare the results of the simulation

results using an MIMD computer.

to the

Finally the algorithm is

implemented on the HEP using the System Design model

to

show a case in which the number of pivots decreases as the
number of processes are increased from seven to the process
limit of sixteen.
In general,

it is shown that super linear efficiency

can be achieved using multiple processors.

Ill

ACKNOWLEDGEMENTS

I wish to thank my advisory committee for their help
and guidance throughout my graduate work in computer
science.

They have been much more than an advisory

committee,

they have also been my teachers,

colleagues and friends.
Praters

A special thanks goes to the

for opening their home to me and giving much needed

emotional support.
special,

supervisors,

My advisor,

Bill Gillett,

is very

having kept me on the path of my program and never

forgetting to encourage me when problems arose.
In 1966,

John DeCicco of Illinois

Institute of

Technology convinced me that I could do research.

His

patience and guidance kept alive in me the hope of finally
obtaining the Ph.D.
Thanks to Ralph Butler for his work on the HEP which
kept me from having to reinvent the wheel.
Finally,
understanding,

I must thank my family for their
encouragement and support.

strength of my wife,
impossible.

Without

this endeavor would have been

the

TABLE OF CONTENTS

page
A B S T R A C T .....................................................

ii

ACKNOWLEDGEMENTS............................................

iii

LIST OF T A B L E S ..............................................

vii

I.

II.
III.

IV.

INTRODUCTION......................................

1

A.

STATEMENT OF THE P R O B L E M .................

I

B.

TECHNIQUES FOR SOLVING THE P R O B L E M ......

2

LITERATURE SEA R C H ................................

3

PARALLEL BRANCH AND

BOUND A L G O R I T H M ............

9

A.

INTRODUCTION...............................

9

B.

THE ALGO R I T H M ..............................

9

C.

LEVELS OF P A R A L L E L I S M .....................

12

SIMULATION OF A M U L T I P R O C E S S O R .................

16

A.

INTRODUCTION...............................

16

B.

BRANCH AND B O U N D ...........................

16

C.

BRANCH AND BOUND WITH PARALLEL
HYPERPLANE C U T S ............................

D.

BRANCH AND BOUND WITH EXPLICIT
ENUMERATION ON SOME

E.

17

(0,1)

V A R I A B L E S ....

17

RESULTS USING THE THREE TECHNIQUES ON
THE HALDI AND IBM TEST P R O B L E M S .........

18

1.

Branch and Bound R e s u l t s ...........

19

2.

Results using Branch and Bound with

Parallel Hyperplane Cuts

28

V

TABLE OF CONTENTS CONTINUED

3.

Comparison of Branch and Bound
with and without Parallel
Hyperplane C u t s ......................

4.

Results Using Explicit Enumeration
Techniques............................

V.

36

IMPLEMENTATION OF PARALLEL ALGORITHM ON AN
MIMD COM P U T E R ....................................

39

A.

INTRODUCTION...............................

39

B.

PROGRAMMING THE H E P .......................

40

C.

RESULTS OF THE TEST PROBLEMS

IMPLEMENTED

ON THE H E P .................................
VI.

34

A CASE S T U D Y ......................................

VII.

42
46

A.

THE SYSTEM DESIGN P R O B L E M ................

46

B.

APPLICATION OF THE A L G O R I T H M .............

49

C.

RESULTS USING THE H E P .....................

51

D.

CONCLUSI O N S................................

52

CONCLUSIONS AND DIRECTIONS FOR FUTURE
R ESEARCH.........................................

55

A.

CONCLUSI O N S................................

55

B.

SUGGESTIONS FOR FUTURE R E S E A R C H .........

57

BIBLIOGRAPHY................................................

59

V I T A .........................................................

65

APPENDICES
A.

THE DENELCOR HEP (HETEROGENEOUS ELEMENT
PROCESSOR) .......................................

66

VI

TABLE OF CONTENTS CONTINUED

B.

PL/I PROGRAM TO SIMULATE BRANCH AND BOUND
TECHNIQUES WITH AND WITHOUT PARALLEL
HYPERPLANE C U T S .................................

71

C.

MACROS USED IN THE C P R O G R A M . . . . ..............

86

D.

C LANGUAGE PROGRAM FOR THE PARALLEL BRANCH
AND BOUND ALGORITHM,

WITH MACROS,

FOR THE

DENELCOR HEP AT THE ARGONNE NATIONAL
LABORATORY .......................................

88

v ii

LIST OF TABLES

TABLE

page

I.

BRANCH AND B O U N D ......................................

20

II.

BRANCH AND BOUND WITH PARALLEL

29

III.

COMPARISONS OF THE TEST PROBLEMS USING SINGLE AND

C U T S ................

MULTIPLE PROCESSORS AND USING BRANCH AND BOUND
WITH AND WITHOUT PARALLEL HYPERPLANE C U T S .........
IV.

35

HALDI— 10 WITH ZERO-ONE ENUMERATION USING SEVEN
P R O C E S S O R S ...................................

38

V.

IMPLEMENTATION OF LEVEL 2PARALLELISM ON H E P .......

43

VI.

EQUIPMENT OFFERED BY SUPPLIERS

FOR THE SYSTEM

DESIGN M O D E L ...........................................
VII.

47

SYSTEM DESIGN MODEL RESULTS USING THE PARALLEL
BRANCH AND BOUND ALGORITHM AND THE HEP
MULTIP R O C E S S O R ........................................

53

1

I.

A.

INTRODUCTION

STATEMENT OF THE P R O B L E M .
The solving of integer linear programming models using

the simplex method with a branch and bound algorithm lends
itself naturally to implemention on a multiprocessor
computer.

The parallel implementation of components of a

branch and bound algorithm on a multiprocessor computer
gives rise to the possibility of achieving super
efficiency.

linear

This means that n processors working in

parallel can solve a given problem in fewer total
operations than a single processor.
This paper describes a parallel branch and bound
algorithm for solving integer linear programming models.
The algorithm uses a combination of parallel branch and
bound techniques with and without parallel hyperplane cuts.
Problems with some

(0,1) variables were also investigated

using a combination of the above techniques
with explicit

enumeration.

in combination

The algorithm was

initially

implemented using a single processor to simulate many
processors working in parallel.

This process was used to

investigate the IBM and Haldi test problems

[1] to

demonstrate that parallel processors could achieve super
linear efficiency.

These problems were chosen since they

were designed to test ILP algorithms,
are considered small but difficult.

are well known and
The algorithm was then

2

implemented on a multiple instruction stream,
stream computer.

multiple data

The IBM-3 and IBM-4 test problems and a

System Design model were chosen for investigation on the
parallel processing computer to compare with the simulation
results.
B.

TECHNIQUES FOR SOLVING THE PROBLEM
The simulation was done using the PL/I language on the

IBM 4381 computer.

The computer simulated was a Multiple

Instruction stream,

Multiple Data stream

(MIMD)

type

computer with a common memory as well as individual
memories

in each processor.

The algorithm was then programmed in the "C" .anguage
and implemented on the Denelcor HEP multiprocessor at
Argonne National Laboratory.

The HEP is a general purpose

computer that can handle multiple instruction streams and
multiple data streams

(MIMD)

[APPENDIX A].

The macros used

to implement the parallel algorithm were adapted from the
FORTRAN macros written by Lusk and Overbeek
"C" macros written by Butler

[4].

[2,3]

and the

3

II.

LITERATURE SEARCH

The fields of parallel processing and integer linear
programming (ILP) have,
separate groups.

until recently,

been studied by

This is evidenced by the fact that most of

the literature on parallel processing is in the electrical
engineering journals with very few articles concerning
parallel processing appearing in the operations research
journals.

The number of papers published in the Proceedings

df the, International Conference on Parallel Processing has
more than doubled in the past five years.

The early

parallel computing machines which were built

in the

seventies were of the Single Instruction stream,
Data stream (SIMD)

type.

Most of the commercially

available multiprocessor computers
type

Multiple

today are still of this

[5].

The operations research community had been disappointed
with the applicability of the SIMD type of super computer to
mathematical programming (MP)
algorithms

in particular

in general and to linear

[6,7,8].

Multiple Instruction stream,

The hope now is that

Multiple Data Stream

computers will help the field of M P .

[8]

The main advantage

MIMD computers have over SIMD computers
with systems of linear equations,

(MIMD)

the

in solving problems

is that the pivot step can

be done in parallel with column operations

[7].

Present MP

applications are mostly in the area of matrix decomposition
algorithms

[6,7,9,10,11,12]

and partial differential

4

equations

[13], with some in graph theory

[14].

Techniques for obtaining parallelism include,
a single execution string into several concurrent

dividing
"threads"

[15] and dividing programs into sections that "reflect the
logical structure" of the problem concerned

[16,17].

Working with a parallel computer and global variables
cause the concepts of mutual exclusion to be much more
critical than in a sequential computer,
locks,

and hence the use of

semaphores and monitors cannot be left to the

operating system

[17,18,19].

These topics will be discussed

later in the paper.
Although the Branch and Bound algorithm for ILP seems
to lend itself very naturally to parallelism,

the work with

this method has been restricted to the area of (0,1)
implicit enumeration problems.
special

interest,

Lai

Two of these papers are of

[20] and Gehringer et.al.[ll].

Lai deals with anomalies

in the knapsack problem and

the travelling salesman problem.
the following manner:

Anomalies are defined in

If n processors

take I(n)

iterations

to do a particular problem and m processors take I(m)
iterations
I(n)/I(m)

for the same problem and n
<_ m/n and I(n)

>_ I(m)

m , the inequalities

should hold.

If either of

these inequalities does not hold it is said to demonstrate
anomalous behavior.

Lai found that

in the knapsack problem

that anomalous behavior occurred in ten percent of the
tests,

and that no anomalous behavior occurred in the

travelling salesman test problems.

He found,

in the

5

knapsack problem,

that the speed-up ratio I(n)/I(2n),

doubling the number of processors),
14.6.

varied from 0.15 to

Since the number of processors was doubled,

limits should have been 1.0 (i.e.
on twice as many processors)

(i.e.

the

taking as many iterations

and 2.0

(i.e.

taking half as

many iterations using twice as many processors).

He also

felt that an acceptable ratio would have been 1.6.
Although it was not stated,
make up for added switching,

it appears that this was to
communication and blocking

time due to the extra processors.
tree was used,

The binary state space

since it was believed that

the use of an

n-ary state space tree would have taken weeks of computer
time to complete the simulation.

Iterations were used as

measurements but were referred to as time measurements,
since no parallel machine was mentioned,

and

it is assumed that

the work was done as a simulation.
G e h r i n g e r , Jones and Segall
Ca r n e g i e - M e 1Ion group
[21]

[10],

[11]

reported on how the

especially Raskin,

used the Cm*

to compare the Cm* used as a multiprocessor and the Cm*

used as a network.
processors

The Cm* consists of several m i n i 

linked by intercluster busses.

This architecture

makes the computer behave more like several separate
computers than a single multiprocessor.

The problem was a

set partitioning integer problem using an enumeration
algorithm that performs an n-ary tree search in a large,
relatively sparse binary matrix for a min-cost solution.
The matrix was two dimensional with a size usually in the

6

order of hundreds by thousands.

Greater-than-1inear speedup

was obtained with a 10—processor Cm*.

The speed-up ratio is

the time for one processor to complete a task divided by the
time for n processors to do the same task.
to achieve at least linear speed-up,
least n.

For n processors

this ratio should be at

Greater-than-1inear speed-up in this case means

that one processor will take more than n times as much time
to complete a task as n processors will take doing the same
task.

In this algorithm’s initalization phase,

a large

number of possible solutions are put in a global stack,
which all the processors choose their work.
proceeds,

from

As the search

the cost of the best solution found so far by any

processor is stored as a global variable.

All processors

compare their current cost value to it and begin to
backtrack in the search when the global cost
multiprocessor could be "lucky",

is lower.

The

in that one of its

processors might encounter a near-optimal solution at the
outset and then none of the processors would have to do very
much work.

The uni-processor version,

which does not

encounter the near optimal solution until

later,

has the

disadvantage of having done a more complete search over the
earlier possible solutions.
be "unlucky"

The multiprocessor could also

if at the outset the near optimal solution

encountered by the uniprocessor.

is

This would cause both the

uniprocessor and the multiprocessor to enter at the same
time and therefore before the cost could be determined,
other processors

the

in the multiprocessor version would have

7

wasted processing tine on their initial solutions.
The conclusion concerning the ILP test runs was,
although greater— than-linear speed-up was obtainable,
will

not usually be obtained.

that
it

Greater-than-linear speed-up

was obtained in one of the five integer programming runs
using two through eight processors.

All five runs produced

approximately linear speed-up for two processors.
processors,

For eight

the worst speed-up was 5.5 and the best was 9.75

where 8 would be linear.
The literature in computer science is void of papers
on the use of parallel processors

in the solution of ILP

problems using simplex type algorithms.

The only papers

which use parallel branch and bound techniques
those for the (0,1)
searches,

in ILP were

type of problem such as AND/OR-tree

state space searches,

finding shortest paths in trees

game-tree searches and
[22,23,24,25].

Most of

these types of problems use only addition and subtraction
since binary matrices are used.
factors

Those using weighting

include some integer multiplication but none use

floating point multiplication or division.
The simplex method
types of ILP problems,

[26], which

is used on more general

typically uses a large number of

divisions on each pivot operation,
many pivot operations in a problem,

and there are usually
hence the computational

difficulty is much greater and much more prone to
computational errors such as round-off.

Taha

[27]

suggests

8

that

in the general

ILP problems,

a combination

such as b r a n c h - a n d - b o u n d , cutting plane,
enume ration,
used a l o n e .

may give better results

of methods,

and implicit

than any one method

9

III.

A.

PARALLEL BRANCH AND BOUND ALGORITHM

INTRODUCTION
Consider the integer linear programming model:
P
Maximize

z=

£

c(j)x(j)

j = l

P
subject to

J

a(i,j)x(j)

b(i)

i=l,2,...,n

j= l
x(j)

>_ 0, integer

where n=number of constraints,
are the cost coefficients,
the constraints,

p=number of variables,

a(i,j)

and b(i)

c(j)

are constants

in

x(j) are the variables and z is the

objective function value.

B.

THE ALGORITHM

STEP 1 .
Initialize Lower Bound

(LB) = -10E10

(a number with a

large enough absolute value to approximate negative
infinity for the p r o b l e m ) , state the number of
processors and the initial simplex dimensions and then
load the initial simplex.

10

STEP 2 .
Calculate the continuous solution using the
Lexicographical Column Dual Simplex Method.

[27]

STEP 3 .
If the continuous solution is an all-integer solution,
the problem is solved,
otherwise,
solution,

so print the results and stop;

the final tableau of the continuous
referred to as a node,

of a three dimensional matrix

is stored as a layer

(node storage)

z-value (objective function value)

and the

is stored as an

element of a Z-vector protected by an ASKFOR monitor
[APPENDIX C] which prevents two processors
accessing the same node.

from

The Z-vector elements are

labeled Z(i), where i represents the level of the node
in the node storage,

and are not changed outside of the

noni tor.
STEP 4 .
Any free processor

(one that is not calculating a node)

may ask the monitor for a simplex tableau with the
maximum z—value

(maximum upper bound node selection).

If there is no available node,

either all the nodes are

fathomed and the problem is finished or some node is
being calculated by some other processor and the free
processor waits for the next clock tick
HEP) and tries again.

(100ns for the

If there is an available maximum

node at level i in the node storage,

mark Z(i)

to

11

indicate that the node is being; processed and go to
step 5.
STEP 5 .
Select the first row of the node obtained from step 4
which has a non integer x variable value.

This will be

the row used to determine the branching variable (First
Fraction Variable Selection).

Add the down constraint

to the simplex tableau and add the up constraint to a
copy of the simplex tableau
branches

(nodes)

[27].

This gives

the two

of the Branch and Bound Algorithm.

STEP 6 .
Calculate the LP solution at each of these nodes by the
methods of step 2.
pivot operation,

During the calculation,

after each

the processor compares the floor of

the z-value of the node it is calculating with the
present

lower bound after each pivot operation is

performed,

where the floor of a number is the greatest

integer less than or equal to the number.
of the z-value is less than or equal
the lower bound,

to the value of

the node is fathomed.

that a node is fathomed,

If the floor

To indicate

set Z(i)=-10E10 which will be

less than or equal to the lower bound so that node will
no longer be considered.
continued until
For each node,

Otherwise,

the pivots are

the simplex tableau is primal

feasible.

if the value of the objective function Z

and the associated x values are integer,
to the present lower bound and if larger,

Z is compared
set LB=Z(i)

12

to indicate the new present lower bound.
integer or not,

if the z-value is less than or equal to

the present lower bound,
fathomed.

Whether

the node is said to be

This means that no better solution will be

found along that branch and hence it is pruned.

When

all nodes except the one with the present lower bound
are fathomed,

the problem is completed.

If all nodes

are fathomed and the lower bound is still negative
infinity,

there is no integer solution.

STEP 7 .
If a calculated node is not fathomed it is stored in
the node storage matrix.

The down node will

replace

the branching node and the up node will form a new
layer in the node storage.
unfathomed nodes,

If the node storage has any

go to step 4; otherwise,

stop and

print the results.

C.

Levels of Parallelism
There are several levels of parallelism possible in the

implementation of the algorithm.

Level 1 .
Steps 4,5,6 and 7 are combined into one logical module.
Each free processor:
(a) checks the monitor for an available node and
receives the node

13

(b) adds the up constraint to a copy of the
branching node and stores

the uncalculated up

node,
(c) adds the down constraint,
node,

calculates the down

then

(d) calculates the up node,

stores both results

in the node storage and puts the z-values

in the

Z-vector.
Thus,

any free processors will only have access to these

nodes after both are calculated and stored.
Level 2 .
The same as Level 1 except change

(c) and (d) of Level

1 as follows:
(c*) adds the down constraint,
node,

calculates the down

stores it in the node storage and puts

z-value in the Z-vector,

the

then

(d*) calculates the up node,

stores it in the node

storage and puts the z-value in the Z-vector.
This gives the opportunity for a free processor to obtain
the down node while the first processor is calculating the
up node.

Level 3 .
The same as level 1 except,

change

(b) and (d) of Level

1 as follows:
(b") adds the up constraint to a copy of the
branching node and stores the uncalculated up node

14

in the node storage,

marking it as an uncalculated

node,
(d” ) stores the results in the node storage and
puts the z-value in the Z-vec t o r . ”
When the up branch is examined by a free processor it will
have the Z(i)

value of the node before branching.

This will

put it back in equal contention with all available nodes.
The last line of step 4 of the algorithm would be replaced
by:
If there is an available node,
uncalculated node.

If it is,

check to see if it is an
go to step 6; otherwise

go to step 5.
Level 4 .
Another level of parallelism is also desirable if many
processors are available and the Simplex tableaus are
very large.

When the pivot operation is called in any

of the above steps,

the processor is given the indexes

of the leaving row and the entering column.
performs

the element operation n*p times,

It then

where p is

the number of variables and n is the sum of the number
of variables,

the number of constraints and the number

of branches already done.

These operations are

independent of each other and could therefore be
assigned to different processors.

This assignment of

processors could either be done row by row or element
by element.

If by row,

a processor is given a row

index and it performs the p operations associated with

15

that row.

If by element,

a processor is given the row

and column indexes of the element it is to process.
Level 1 was used in a test version of the code on the
HEP but was discarded because of its poor use of parallelism.
Levels 1 and 2 are identical on a single processor but Level
2 utilizes multiple processors better because it makes the
down constraint available sooner.

Level 2 was implemented

on the Denelcor HEP for the IBM-3 and IBM-4 test problems.
There is only one line of code difference in the two levels.
Level 3 was used in the simulation for all of the test
problems.

The simulation was done before the HEP became

available and extensive changes in the code including a
different node storage would have been necessary to obtain
Level 3 parallelism.

Level 4 could be implemented using

more monitors and using methods of calculating the simplex
tableaus similar to methods

in the literature involving

Gaussian elimination techniques

[7].

This would utilize

idle processors but should not change the number of pivots.

16

IV.

A.

SIMULATION OF A MULTIPROCESSOR

INTRODUCTION
A single processor was used to simulate the

multiprocessor implementation of the Parallel Branch and
Bound Algorithm.

Level 3 parallelism was simulated using

the IBM 4381 computer and the PL/I language program given
APPENDIX B.

in

The simulation explored every node which could

be applicable,

which in most cases meant

that every branch

was pursued until either an integer lower bound was found,
infeasibi1ity occurred or the node could not be used because
of its value relative to the known solution and its location
in the branching tree.

Three ILP techniques were studied for

simulating implementation on a multiprocessor:
bound,

branch and bound with parallel hyperplane cuts,

branch and bound with
B.

branch and
and

(0,1) explicit enumeration.

BRANCH AND BOUND
Branch and bound with first fraction variable selection

and maximum upper bound node selection was used in all
s imulat ions.
The first fraction variable selection was used since it
shortens the search for the branching variable

in the node

storage and uses only comparisons instead of calculating
penalty functions.
The maximum upper bound node selection was chosen
because of its simplicity,

needing only a comparison search,

and because of its favorable numbers of pivot calculations

17

when compared with the more complicated penalty function
methods

[28].

The Beale and Small

[27] method gains storage

efficiency for sequential processing,

especially in the case

in which the first direction contains the solution.

This

efficiency is gained by the use of a stack type storage with
backtracking.

The use of backtracking makes the variable

selection automatic for one processor but more complicated
for multiprocessors.

The use of multiple processors also

necessitates a stack for each processor and hence this
method was not pursued.
C.

BRANCH AND BOUND WITH PARALLEL HYPERPLANE CUTS
The techniques of part B were combined with the

following addition.

If a primal feasible solution does not

have an integer z—value,
is parallel

a fractional cut is performed which

to the objective function.

This cut

is called

the "parallel hyperplane cut," and is performed immediately
after a branching node is chosen and before the up and down
constraints are added.
cutting plane techniques

This cut is different from the usual
in that

it is performed on the

objective function row instead of a basic variable row.
D.

BRANCH AND BOUND WITH EXPLICIT ENUMERATION ON SOME
(0,1) VARIABLES
The calculation of the continuous solution for the

first node usually takes several pivot operations,
the case of using a dedicated multiprocessor,
but one are idle during this time.

hence in

all processors

Since only one

18

additional constraint has been added for the up or down
branch,

it quite frequently takes only one additional pivot

to obtain the continuous feasible solution for the next
node.

To utilize the other processors while the first

processor is calculating the first node,

an explicit

enumeration technique was used.
The explicit enumeration technique used gives one or
more of the (0,1) variables the value 0 or 1 and then
performs the calculations on the resulting simplex tableau.
This tableau is smaller since at least one variable and row
have been eliminated from the original

tableau.

An

enumeration was performed by each free processor while the
first processor was calculating the continuous solution for
the first node.

Since these processors were working on

smaller simplex tableaus,

it was hoped that they would have

their results in fewer pivots than the first processor would
need so their results would be ready for the first processor
when it had completed its calculations.

It was hoped that

this would not only keep more processors busy while the
continuous solution was being calculated,

but might either

give the solution or at least give a lower bound and hence a
better idea of which variable to pursue next.
E.

RESULTS USING THE THREE TECHNIQUES OH THE HALDI AND IBM
TEST PRO B L E M S .

In all implementations, the number of pivot operations
was used as a measure of performance since it was found that
the time to do a pivot operation did not significantly

19

change in a given problem as one row (up or down constraint)
was added .

The clock times available to this project on

the IBM 4381 were in increments of ten milliseconds so were
of little value.
IBM-1 through IBM-5 and HALDI-1 through HALDI-10
were used as test problems

in the simulation.

continuous solutions for the IBM-1,

[1]

The

HALDI-7 and HALDI-8 were

also the integer solutions so no branching was needed.
1.

Branch and Bound R e s u l t s .

The results of the

simulation of the parallel branch and bound algorithm using
branch and bound techniques are given in TABLE I.
the twelve test problems
linear efficiency,
pivot operation

(IBM-2 and HALDI-4)

Two of

showed super

although the improvement was only one

less than with a single processor.

Five of

the test problems showed linear efficiency and five showed
less than linear efficiency.

Only HALDI-10 showed an

increase of more than one pivot operation on the best of the
multiprocessing tests over the single processor.
In general,
been reached,

after the optimal number of processors.has

the addition of more processors causes the

number of nodes and the number of pivots to increase.

The

exceptions are HALDI—5, HAL D I —6 and HALDI— 9 where both the
number of pivots and the number of nodes remained constant
regardless of the number of processors.
With the exceptions of IBM-3 and IBM-5,
would utilize more processors
in the problem.

no test problem

than the number of variables

In the case of IBM-3 using eight

20

TABLE I
BRANCH AND BOUND

IBM-2
7 VARIABLES
# processors

# pivots

# nodes

1

12

5

2

12

5

3

11

6

4

12

7

5

Will not use 5 process o:

IBM-3
7 VARIABLES
rocessors

# pivots

# nodes

1

30

21

2

31

21

3

31

21

4

31

21

5

33

23

6

37

27

7

39

28

8

39

28

9

Will not use 9 processo

21

TABLE I CONTINUED
BRANCH AND BOUND

IBM-4
15 VARIABLES
rocessors

# pivots

# nod

1

55

14

2

57

15

3

65

20

4

69

23

5

74

26

6

83

29

7

70

26

8

59

24

9

61

25

10

64

26

11

55

22

12

56

23

13

57

24

14

Will not use 14 processors

22

TABLE I CONTINUED
BRANCH AND BOUND

IBM -5
15 VARIABLES
# processors

# pivots

# nodes

1

526

251

2

526

251

3

527

251

4

527

251

8

532

252

HALDI-1
5 VARIABLES
# processors

# pivots

# nodes

1

14

9

2

15

9

3

16

11

4

15

10

5

16

11

6

Will not use 6 processors

23

TABLE I CONTINUED
BRANCH AND BOUND

HALDI-2
5 VARIABLES
# pivots

# nodes

1

14

9

2

15

10

3

16

11

4

15

10

5

16

11

rocessors

6

Will not use 6 processors

HALDI-3
5 VARIABLES
rocessors

# pivots

# nodes

1

12

7

2

13

8

3

13

8

4

15

10

5

16

11

6

Will not use 6 processors

24

TABLE I CONTINUED
BRANCH AND BOUND

HALDI-4
5 VARIABLES
# pivots

# nodes

1

14

8

2

15

9

3

13

8

4

15

10

5

16

11

rocessors

6

Will not use 6 processors

HALDI-5
5 VARIABLES
# pivots

# nodes

1

14

9

2

14

9

3

14

9

4

14

9

# processors

5

Will not use 5 processors

25

TABLE I CONTINUED
BRANCH AND BOUND

HALDI-6
5 VARIABLES
# processors

# pivots

# nodes

1

1

1

7

2

1

1

7

3

Will not use 3 processors

HALDI-9
6 VARIABLES
# pivots

# nodes

1

13

7

2

13

7

# processors

3

Will not use 3 processors

TABLE I CONTINUED
BRANCH AND BOUND

HALDI-10
12 VARIABLES
processors

# pivots

# nodes

1

39

9

2

43

11

3

54

14

4

53

14

5

51

17

6

57

19

7

Will not use 7 processors

27

processors,

the eighth processor performed only one pivot

operation and following that calculation,
were idle.

two processors

Although the simulation of IBM-5 was not carried

out beyond eight processors,

it appeared from the branching

trees that more than fifteen processors might be utilized
with very little change in the number of nodes visited,

but

with an increasing number of pivots.
The IBM-5 problem had almost no differences

in the

number of nodes visited in obtaining a solution because of
the large number of distinct solutions which were at the
same level of the branching tree.

The slight differences

in

the relative number of pivots seemed to depend mostly on how
many nodes were calculated with z-value floors that were the
same as the z-value of the solution and on how many of their
pivot operations were performed before this floor was
obtained.

For example,

integer solution,
less than -14,
fathomed.

if -15 is the z-value of a feasible

whenever any pivot operation gives a value

its floor is then -15 so the node is

In the IBM-5 problem,

the first feasible integer

solution was the optimal solution regardless of the number
of processors.

Because of the enormous amount of time

consumed simulating IBM-5 and the inability to see any
useful patterns,

further investigation of this problem was

delayed until a large MIMD machine was available to check
for possible patterns using parallel processors.
The IBM-4 problem seemed ideal for multiprocessors
because of the path pursued using the sequential branch and

28

bound algorithm.

The sequenti al algorithm pursued a path

down the right branch of the b ranching tree where the
optimal

integer solution was e ight

levels

shortest path of thirty-two pi vots.

deep with a

There was,

however,

a

path down the left side of the tree with an optimal solution
six levels deep and with a sho rtest path of twenty-six
pivots.

A shortest path is th e path a processor would take

if it knew where the optimal

i nteger solution was which

would need the fewest pivot op erations
simulation,

this shortest

processors were in use,
the shortest

to obtain.

In the

path was not utilized until eleven

so the advantages

associated with

path were offset by examining nodes which were

not examined by using one proc e s s o r .
and Bound with Parallel
Results Using Branch

2.

Hyperplane C u t s .
in TABLE

II.

The results of this simulation are given

Since every node in HALDI-9

z-value it did not utilize
hence HALDI-9 is not

had an integer

the parallel hyperplane cuts,

in TABLE

II.

When parallel

combined with the branch and b ound methods,
only case where there was
the best

cuts were

HALDI-4 was

the

less than linear efficiency using

performance of multip rocessors against single

processors.

Seven of the prob lems demonstrated super linear

efficiency.

HALDI-5,

constant

HALDI-6 and HALDI-9

level regardless

No test problem,

with

remained at a

of t he number of processors.
the possible exception

of IBM-5,

would use as many processors a s the number of variables
the problem.

In HALDI-1 and HALDI-2,

as the number of

in

29

TABLE II
BRANCH AND BOUND WITH PARALLEL CUTS

IBM-2
7 VARIABLES
#

cessors

# pivots

# nodes

1

12

4

2

11

4

3

12

5

4

will not use 4 processors

IBM-3
7 VARIABLES
#

ces sors

# pivots

# nodes

I

40

18

2

40

18

3

43

19

4

46

20

5

48

20

6

48

20

7

will not use 7 processors

30

TABLE II CONTINUED
BRANCH AND BOUND WITH PARALLEL CUTS
IBM-4
15 VARIABLES
oces sors

# pivots

# nodes

1

90

23

2

73

21

3

83

24

4

58

20

5

54

19

6

57

21

7

53

20 best

8

54

21

9

55

22

10

56

23

11

57

23

12

58

24

13

will not use 13 processors

IBM-5
15 VARIABLES
# processors

# pivots

# nodes

1

1137

331

2

1098

327

3

1133

329

7
8

1081
1131

319 best
330

31

TABLE II CONTINUED
BRANCH AND BOUND WITH PARALLEL CUTS

HALDI-1
5 VARIABLES
# processors

# pivots

# nodes

1

16

5

2

18

6

3

14

5

4

Will not use 4 processors

HALDI-2
5 VARIABLES
# processors

# pivots

# nodes

1

19

7

2

14

5

3

13

5

4

Will not use 4 processors

HALDI-3
5 VARIABLES
# processors

# pivots

# nodes

1

20

5

2

19

5

3

19

5

4

Will not use 4 processors

32

TABLE II CONTINUED
BRANCH AND BOUND WITH PARALLEL CUTS

HALDI-4
5 VARIABLES
# processors

# pivots

# nodes

1

15

4

2

20

7

3

2

4

1

7

Will not use 4 processors

HALDI-5
5 VARIABLES
# processors

# pivots

1

2

1

7

2

2

1

7

3

2

1

7

4

# nodes

Will not use 4 processors

HALDI-6
5 VARIABLES
# processors

# pivots

# nodes

1

16

5

2

16

5

3

Will not use 3 processors

33

TABLE II CONTINUED
RANCH AND BOUND WITH PARALLEL CUTS
BRANCH

HALDI-10
5 VARIABLES
# processors

# pivots

# nodes

1

63

11

2

64

11

3

62

11

4

58

13

5

66

13

6

Will not use 6 processors

34

processors increased,

the efficiency increased,

up to the

maximum number of processors which would be used.
IBM-4 dropped from ninety pivots with a single processor
to fifty-three pivots with seven processors.
These results sound very good since they show that a
method is available which utilizes multiprocessors
efficiently.

However,

these results need to be compared

with the results obtained using Branch and Bound without
parallel hyperplane cuts to get a more complete picture.
3.

Comparison of Branch and Be^and with and without

Parallel Hyperplane C u t s .
between the two methods.

TABLE III gives the comparison
In all but IBM-2 the number of

pivot operations for a single processor is smaller without
the parallel hyperplane cuts than with them.
two processors,

the number of pivot operations

In IBM-2 using
is the same

with or without the cuts.
In IBM-5 the comparison is significant since the number
of pivot operations more than doubled when the cuts were
added,

causing more than five hundred extra pivot operations

and visiting almost one hundred extra nodes.
The comparisons in IBM-4 start out almost as bad, with
fifty-five pivot operations without
pivot operations with them.
processors increases,
cuts.

the cuts and ninety

However,

as the number of

so does the efficiency of the parallel

With seven processors,

the parallel cuts give better

efficiency than the single processor with or without the
parallel cuts.

HALDI-2 is the other problem in which three

35

TABLE III
COMPARISONS OF THE TEST PROBLEMS
USING SINGLE AND MULTIPLE PROCESSORS
AND USING BRANCH AND BOUND WITH AND WITHOUT
PARALLEL HYPERPLANE CUTS
Prob lent

Haldi-1
Haldi-1
Haldi-2
Haldi-2
Haldi-3
Haldi— 3
H a 1d i-4
Haldi-4
Haldi-5
Hald i-5
Haldi-6
Haldi-6
Hald i-7
Haldi-8
Haldi-9
Haldi-9
Haldi-10
Haldi-10
IBM-1
IBM-2
IBM-2
IBM-3
IBM-3
IBM-4
IBM-4
IBM-5
IBM-5

Single
Processor
Performance
Number of
Pivots
B&B
+ PC
B&B
+ PC
B&B
+ PC
B&B
+ PC
B&B
+ PC
B&B
+ PC
B&B
B&B
B&B
+ PC
B&B
+ PC
B&B
B&B
+ PC
B&B
+ PC
B&B
-t-PC
B&B
+ PC

Multiple
Processors
Best Performance
Number of
Number of
Processors
Pivots

14
15
2,4
14
3
16
14
15
2
3
19
13
13
12
2
20
3
17
13
14
3
20
2
15
2,3,4
14
14
21
16
3
11
2
11
16
16
2
The continuous solution was integer
The continuous solution was integer
13
13
2
13
13
2
39
43
2
58
4
63
The continuous solution was integer
11
3
12
11
2
11
31
2,3,4
30
40
40
2
55
55
11
53
7
90
526
526
2
1137
1081
7

B&B = Branch and Bound without Parallel Cuts
+ PC = ♦branch and Bound with Parallei Cuts

36

processors with parallel cuts perform better than one
processor with or without parallel cuts.

Also,

in HALDI— 2,

the number of nodes visited with three processors using the
parallel cuts is less than one half the number of nodes
visited with three processors not using parallel cuts.
As was stated earlier,

generally fewer processors will

be used than the number of variables

in the problem.

Also,

the optimal number of processors is about one half of the
number of variables.

These statements are true whether

parallel cuts are used or not.
4.

Results Using Explicit Enumeration T e c h n i q u e s .

The HALDI-10 test problem was chosen for solution by branch
and bound using explicit enumeration,

since it was the

largest of the test problems which had (0,1) variables.
There are twelve variables

in the HALDI-10 problem and six

of these are (0,1) variables.

Although only five or six

processors could be used in the branch and bound method with
or without parallel hyperplane cuts,
used in this case.

seven processors were

Seven processors were chosen since there

were six (0,1) variables.

One processor would work on the

continuous solution while the other six could be working on
the (0,1) variables.

On these other six,

one of the six

variables could be set equal to one and the other five set
equal to zero.

This would cut the size of the resulting

problems in half and perhaps give some information about
feasibility together with a lower bound.

It was hoped the

number of pivot operations performed by the processors doing

37

the (0,1) enumerations would be considerably smaller than
the number of pivot operations performed by the processor
working on the continuous solution.

However,

no processor

needed fewer pivot operations than the processor doing the
continuous solution.

The only (0,1) processor obtaining a

feasible integer solution needed thirteen pivot operations
while the continuous solution took only eleven.

This not

only gave the lower bound too late for immediate use,

it

also used a processor for two pivot operation periods that
could have been used by one of the branches from the
continuous solution.

In addition,

the lower bound given was

too low to be of any value in the problem.

The information

given by the other five processors showed only that each of
those solutions was feasible.
Better results were obtained by placing the (0,1)
variables in the first columns,

then any non-(0,l)

value

would be taken care of first and at a usual cost of two to
four pivot operations per variable.
The results of this simulation are in TABLE IV.

38

TABLE IV

HALDI-10 WITH ZERO-ONE ENUMERATION
USING SEVEN PROCESSORS

# pivots

value

Continuous Solution

11

18.709

2

x (1)=1,Other

(0,1)=0

13

12 (INT)

3

x (2)=1,Other (0,1)=0

11

14.71

4

x(3)-l,Other

12

11.33

5

x (4)=1,Other

O
II
rH
O

12

9.11

6

x (5)=1,Other

o
'—✓

O
II

14

10.28

7

x (6)=1,Other

(0,1)=0

11

10.87

r-4

1

II
o

Duty

o

Processor

OPTIMAL SOLUTION IS ,17

39

V.

A.

IMPLEMENTATION OF PARALLEL ALGORITHM ON AN MIMD COMPUTER

INTRODUCTION
After the simulation was completed,

the author became

aware of an MIMD computer at Argonne National Laboratories
which was available for graduate student research in
parallel processing applications.

This gave the opportunity

for implementation of the Parallel Branch and Bound
Algorithm on an MIMD machine.
Denelcor is the manufacturer of the machine which is
called the HEP (Heterogeneous Element Processor).
is described in APPENDIX A.

The HEP

Macros had been written to

convert FORTRAN routines to routines for MIMD computers
general and the HEP in particular.

Butler

in

[4] translated

some of these macros to the C language for use in his
research.

The adaptations of these macros for the parallel

algorithm are given in APPENDIX C.

The HEP had compilers

for only the FORTRAN and C languages so the decision was
made to convert the basic PL/I simulation code to the C
language.

The C language was chosen because of its

ALGOL-like structure.

The main problems

in the basic

translation were the lack of built-in functions which are so
plentiful
not permit

in PL/I and the subroutine structure which does
internal subroutines.

40

B.

PROGRAMMING THE HEP
The early attempts to convert the single processor

version of the C language program to a multiple process
version met with the many frustrations of trying to think in
parallel.

A program may run perfectly on a single processor

but when the number of processes

is changed to two,

computer may abnormally end with no reasons given.

the
The HEP

architecture uses the creation of processes rather than
processors since several processes may be in the pipe at the
same time on any processor.
The connection to the HEP is through a modem and
telephone lines.

System breakdowns are frequent

and the

causes are not always apparent even when the system is
rebooted.

Since the work on the computer is usually done

late at night when the phone rates are less,

rebooting the

computer is not always possible until the next day.
The amount of memory given to this type of project was
said to be about 1.5M.

With double precision arithmetic,

this amount of memory is too small
problems,

to run the larger test

hence IBM-3 and IBM-4 were chosen.

IBM-3 is a

good test problem to check for robustness of the code on a
single processor.

It is small,

several of the branches

lead

to infeasibities and round off error can cause some of the
branch and bound coding techniques

to miss the optimal

integer solution and instead stop with a non-optimal
solution.

integer

IBM-3 was therefore a good candidate for checking

the algorithm and the C language code on the parallel

41

process computer.

IBM-4 was chosen because of the extensive

simulation effort already done.
that a large number of ties

The simulation had shown

(nodes with the same z-values)

were encountered causing a large fluctuation in the paths
taken to a solution.

There are also several possible

solutions.
The results from the HEP are printed in the order in
which the information gets to the front-end-machine which
handles the I/O and the control program.
compete for the I/O buffers.
on the printout

All processes

The results are not separated

in terms of processes.

The simulation

results help to place results with processes.
Locks were used around all print statements.
them,

if two processes want

to print at the same time,

printout is garbled with intermixed messages.
slow the machine,
designed.

Without
the

Since locks

fewer and shorter print statements were

The change in the size of print statements can

have an effect on which node a process examines.
Global variables are needed so all the processes can
have access to shared information.

The Z-vector was

protected by a monitor and not changed outside of it.
The code,

given in APPENDIX D, has the access to this ASKFOR

monitor on line 123.

The finding of the next node to be

examined is done through the GETPROB macro which is
contained in the monitor.
The heart of the parallelism is the "work" subroutine.
All other subroutines needed in the parallelism are called

42

from this module.

The part of the PL/I simulation program

that was the main program has been replaced by lines 90
through 98.

These lines call the macros which CREATE the

number of processes,
problem,

time the parallelism,

start the

call the "work" module and end the parallel part of

the program.
C.

RESULTS OF THE TEST PROBLEMS IMPLEMENTED ON THE HEP
Both IBM-3 and IBM-4 were run successfully as single

processor problems on the VAX-11/780 at Rolla before
transferring the code to the HEP.
run with "numprocs"

The problems were then

(number of processes)

set equal to one.

More processes were then added to see whether the actual
runs would agree with the simulation.

Since the parallel

processes code is written for level 2 parallelism,
original branching trees
results.

the

[27] were used to check the

These results are given in TABLE V and correspond

with the Level 2 simulation results.

Clock times

in units

of 100ns were also given for the problems after the
continuous solution was calculated and problems were being
assigned to the multiple processors.
IBM-3 was run with one through six processes with very
little difference in the numbers of pivots from the results
obtained in the simulation.

The times for parallelism

indicated that two processes complete the problem in about
one half the time of one processor.

However,

with three

processes the number of pivots is larger and the time is not
close to being as small as one third of the time for one

43

TABLE V
IMPLEMENTATION OF LEVEL 2 PARALLELISM ON HEP

IBM-3

rocessors

# pivots

time

in ns

1

30

250432900

2

30

139863000

3

31

123222300

4

33

119084700

5

34

120791500

6

37

131717100

IBM-4

'rocessors

# pivots

t ime in ns

1

55

979241500

2

123

1407789100

3

128

1073274500

4

154

1063886900

6

155

836087200

7

134

691446400

8

143

693102900

44

processor.
five,

As the number of processes increase from four to

the time actually increases by .001 sec, and the

number of pivots increase from thirty-three to thirty-four.
An increase of three pivots in going from five to six
processes

increases the time by .01 sec.

The IBM-4 results indicate a vast difference between
implementation on the HEP using Level 2 parallelism and the
simulation which used Level 3 parallelism
with TABLE V).

(compare TABLE I

Level 2 causes the free processes

to wait

until the down node is calculated before the up node can be
obtained for calculation,

even though the up row had been

added and the up node stored before the down row was added.
Level 2 parallelism causes each process to act on more
complete information before obtaining a node to process
level 3 parallelism does.
of IBM-4,

This takes time,

than

and in the case

caused the calculation of many more nodes than

with level 3 parallelism.
Because of the difficulty involved in getting on the
HEP and in staying on it for long enough periods of time to
do multiple runs,
six,

IBM-4 was run using one,

seven and eight processes.

two,

three,

Another factor was,

four,

the

IBM-4 problem used enough more memory with multiple
processes

(since each process

is allotted its own memory),

that sometimes memory exceptions would occur merely by
changing the number of processes.
terminate the session.

Sometimes this would

45

In going from one process to four,

the number of pivots

almost tripled while the processing time only went up
slightly.

With four processes,

combinations of processes)
solution,

one of the processes

(or

took the shortest path to a

but this was not enough to make up for all of the

nodes calculated by the other three processes.

With one

process the shortest path to a solution was of length thirtytwo, hence only twenty— three pivots were performed on other
nodes,

nineteen of which were performed to obtain

continuous solution.

the

With three and four processes the

shortest path to their solution was of length twenty-six,
using the other pivots for other nodes

(each of the

solutions will have a shortest path).
With two and seven processes,

the shortest path to

their solution was of length thirty-five.

In this case,

two

and one half times as many processes took half as much time
to do only eleven more pivots.

This shows that,

was more work done with more processes but also,
time per process per pivot

increased.

not only
the average

Six and eight

processes took distinct paths with shortest path length of
thirty four.

The eight different numbers of processes

five different solutions.

found

46

VI.

A.

A CASE STUDY

THE SYSTEM DESIGN PROBLEM
The following system design

by Plane and McMillan

(SD) problem is described

[29].

An electric power company plans to build a steam
generating plant capable of producing 2 million kilowatt
hours of electrical energy per day.
the plant will consist of boilers,
condensers.

The major equipment
generators,

in

and

The sources of supply for these pieces of

equipment have been narrowed to 11 manufacturers.

In TABLE

VI are presented data relative to the equipment offered by
these suppliers

(A through K).

The power company wants to design that system which
will meet
cost.
A.

the energy capacity of the plant with the least

The requirements are a follows:
The capacity of the set of generators selected must be

at least 2 million kwh/day.
B.

The steam requirements of the generators must be met by

the combined capacities of the boilers selected.
C.

The steam capacities of the set of condensers selected

must be adequate to accomodate the steam capacities of the
boilers.
D.

Equipment of one supplier is interchangeable with that

of another supplier except

in the case of supplier A.

that supplier A produces both boilers and condensers.

Note
The

operating costs quoted for supplier A ’s boiler and condenser

47

TABLE VI

EQUIPMENT OFFERED BY SUPPLIERS FOR THE SYSTEM DESIGN MODEL

Steam

Electricity

Kcfm

Kwh/day

Initial
Cost

($K)

Operat i ng
Cost $K/yr

(capacity)

Boilers

X(l)

A

100

50

50

X(2)

B

140

75

60

X (3 )

C

90

60

40

X (4 )

D

80

50

20

Generators

(requirements)

X (5 )

E

70

500

600

60

X (6)

F

120

650

600

75

X (7 )

G

150

700

800

75

X (8 )

H

100

800

750

90

Condensers

(capacity)

X (9)

A

50

25

3

X(10)

I

65

17

4

X(ll)

J

70

20

4

X ( 12 )

K

55

13

2

48

are based on the assumption that each of A ’s boilers will be
matched with two of A ’s condensers,
matched set.

since they constitute a

If each and every one of A ’s boilers

is not

used with two of A ’s condensers then an added $10,000 annual
operating cost can be expected.
E.

Because of their size and shape it is impracticable to

fit one of supplier F ’s generators

into the plant with one

or more of supplier G ’s generators.
F.

Supplier K ’s condensers are of such construction that

they must be used in pairs.
G.

Management has decided the initial capital

outlay for

the system should not exceed $2.3 million.
Assume the power c o m p a n y ’s objective

is to minimize the

expected annual operating cost of the system,
above constraints.

subject to the

This leads to the following integer

linear programming formulation.
MAXIMIZE -50X(1)-60X(2)-40X(3)-20X(4)-60X(5)-75X(6)-75X(7)
-90X(8)-3X(9)—4X(10)-4X(11)-4X(12)-10X(13)
SUBJECT TO
500X(5)+650X(6)+700X(7)+800X(8)

2000

100X(1)+140X(2)+90X(3)+80X(4)
-70X(5)-120X(6)-150X(7)-100X(8)
50X(9)+65X(10)+70X(11)+110X(12)

0
1 10 0 X (1)+140X (2)
+ 9 0 X (3)+80 X (4)

2X(1)-X(9) 1 100X(13)
X (6 ) <_ 4-4X ( 14 )
X (7) 1 3-3(1- X (14))

49

50X(1)+75X(2)+60X(3)+50X(4)
+60 0 X (5)+600X (6)+800X(7)
+750X(8)+25X(9)+17X(10)
+20X(11)+26X(12) 1 2300
X(i)

>_ 0, X(i)

integer,

and X(13),X(14)

= (0,1)

This problem is notorious for poor performance using
cutting plane methods,
Syslo

especially the all integer cuts.

[30], did not obtain a solution after 15 minutes of

computation time on an Amdahl 470 V/6 and 350,000
iterat ions.
B.

APPLICATION OF THE ALGORITHM
The SD problem was attempted using the VAX 11/780 with

the dimensions of the node storage matrix declared as
50,16,50,

where the first dimension is the number of rows

anticipated in the simplex tableau,

the second is the number

of columns and the third is the number of layers needed
the node storage.
problems except

in

These were larger than any of the test

IBM-5.

However,

the problem did not run.

The first and third dimensions were then increased to 100,
each with similar results.

The problem was then partially

simulated on the IBM 4381 to find what to expect for an
upper bound on the dimensions.

The number of layers of the

node storage was found to be at least 117 and the number of
rows needed to get there was determined to be no more than
35.

With new dimensions of 35,15,125,

the problem obtained

the solution in 501 pivots with a depth of 10 in the
branching tree and a shortest path to the only solution

50

obtained was of length 21.
This problem was larger than any of the IBM problems
tried on the HEP.
HEP,

When the problem was attempted on the

the error message indicated a memory problem and the

program abnormally terminated.
system down.

The next run brought the

The first attempt to take care of the problem

was to call Argonne and ask for more memory,

however it was

not clear to the people there how this could be done and the
Denelcor representative had just been changed.

They did not

think that it was a memory allocation problem.
The next attempt was to split the problem into two
problems and test it on the UMR VAX,
run and X(7) on the next.
In eliminating X(6),
needed,

eliminating X(6)

on one

This would also eliminate X(14).

124 levels of the node storage were

424 pivots were performed and all exposed nodes were

fathomed with no integer solution.
proved more fruitful,

The elimination of X(7)

giving the solution

using 87 levels of node storage.

in 312 pivots,

The splitting,

however

took a total of 736 pivots with at least 624 needed if two
processes communicated with each other to know when the
solution had been found.

Also,

the size of the smaller

subproblem was still too large to obtain a solution on the
HEP if lack of allocated memory was the problem.
By cutting the dimensions down to 33,15,40,

the HEP

performed until these limits were reached with multiple
processes.
solution,

However,

the algorithm did not get close to a

but verified that the code would run if the memory

51

problem was corrected or more memory could be allocated.
The HEP,

however, was not the problem.

After system

breakdowns at intervals of approximately two hours for
almost a week,

the problem was found to be in the way the

VAX front-end-machine operating system was looking at the
memory.

The operating system was executing code which

should never have been executed.

Possibly a memory error

was sending the code to the wrong place and thus causing the
system breakdowns.

New memory boards were installed and

the system immediately went down.

By hiding 4M of the

original memory from the operating system and allowing it
access only to the 8M of newly installed memory the system
breakdowns stopped.

Even though this memory problem was on

the VAX and not the HEP,

this temporary fix allowed a

continuation of the study of the SD problem,

although the

memory problem with the front-end-machine has not been
resolved.

The current

logon messages

indicate a need to

save results immediately since system breakdowns are still
occurring daily.
The use of MIMD machines with their front-end and the
associated operating system problems are still not fully
unders t o o d .
C.

RESULTS USING THE HEP
Times in terms of 100ns intervals were taken for the

parallel portion of the code (APPENDIX C,

lines 94-97).

Speedup is defined to be the quotient of the time for a
single process to complete a task and the time for n

52

processes to complete the same task.
linear speed-up would be n.

For n processes,

If the quotient is greater than

n, super linear speed-up is said to be obtained.

Efficiency

is as defined in CHAPTER IV.
The following results can be seen from TABLE VII.

A

speedup in time using two processes is 1.97 and linear
efficiency is obtained.

With three processes,

a speed-up of

2.86 and super linear efficiency is obtained.

It was not

until eight processes were in use that the super linear
efficiency becomes significant where only 495 pivots were
necessary.

With 8,10 and 16 processes,

successive runs

sometimes gave a different number of pivots.
because with that many processes,

This occurred

completion of a pivot and

the reporting of the results to the Z-vector may occur in
th'e same clock time (100ns interval)

on two processes and

therefore the same path to the solution may not always be
followed .

In these cases,

the times varied slightly also,

hence the times and number of pivots for these runs for a
given number of processes were averaged.
D.

CONCLUSIONS
As a general rule in the test problems,

more processes

than the number of variables should not be utilized.
However,

in the SD problem which has 14 variables,

were 16 processes utilized,

not only

but the efficiency generally

increased as the number of processes increased.
indicates that in some real life type models,

This

super linear

53

TABLE VII

SYSTEM DESIGN MODEL RESULTS
USING THE PARALLEL BRANCH AND BOUND ALGORITHM
AND THE HEP MULTIPROCESSOR

Number of
Processors

Average Number
of Pivots

Average Number Seconds
in Parallel Processing

1

501

12.034

2

501

6.115

3

500

4.215

4

503

3.307

5

504

2.845

6

506

2.482

7

500

2.195

8

495

1.982

9

494

1.839

10

496

1.848

12

495

1.830

14

481

1.762

16

472

1.793

54

efficiency can be obtained and the number of processes
efficiently utilized may be more than was indicated by the
test problems.

55

VII.

A.

CONCLUSIONS AND DIRECTIONS FOR FUTURE RESEARCH

CONCLUSIONS
The use of parallel processing in the solution of

general

integer linear programming models is desirable in

some cases as is evidenced by the study of the System Design
model and the Haldi and IBM test problems.
Super linear efficiency is obtainable for some types of
integer linear programming problems using parallel branch
and bound techniques and the simplex method.
Design problem,

In the System

as the number of processes was increased,

better efficiency was achieved up to the maximum number of
processes available.

For the Haldi and IBM test problems,

TABLE I shows that the best efficiency was obtained using
approximately one half as many processes as the number of
variables.
In general,

if multiple processes give linear or super

linear efficiency,

there exists a point at which the

addition of more processes degrade the performance.
reason for this is that with multiple processes,

One

nodes are

explored that would not have been explored with a single
process.

This can be seen in TABLE I and TABLE II by noting

the number of nodes examined as the number of processes
increased.

A way to overcome this

is

is for multiple

processes to pursue a shorter path to a solution.
mean more nodes, but it must mean fewer pivots.

This may

56

TABLE I and TABLE II also show that there is an upper
limit for the number of processes which can be utilized.
There can never be fewer pivots performed by a process than
the length of the shortest path to a solution.
process,

or combination of processes,

When some

uses this path,

any

additional processes added will not improve the efficiency.
On the other hand,

if the branch and bound algorithm using a

single process causes a path other than the shortest one to
be pursued,

multiple processes may give better efficiency.

For single processors,

the addition of parallel cuts

generally did not improve the efficiency.

In some problems,

it more than doubled the number of pivots needed.

Using

parallel cuts generally started out inefficiently but
improved as the number of processes was increased.

The

combination of parallel hyperplane cuts with branch and
bound increased efficiency in only one fourth of the test
problens.
The use of explicit enumeration

in conjunction with

branch and bound for the purpose of keeping processes busy
does not seem to be efficient.

A much better way to keep

the extra processes busy while one process

is finding the

continuous solution would be to use level 4 parallelism.
The present code is portable between MIMD machines
except for a few lines in some of the macros.
robust,

The code is

giving correct answers to the System Design model

and to the Haldi and IBM problems tested for which the
allotted memory space allowed the program to finish.

57

The algorithm is general and leaves choices as to which
path to follow in the branching tree.

One such choice is

which node should be chosen when there is a tie,
whenever two nodes have the same z— value.

that

This became

evident when level 2 parallelism was used on IBM-4.
nodes with the same z-value,

is

level 2 parallelism,

with the storage numbering method of the program,

For

combined
caused the

choosing of the smallest numbered node, whereas the
simulation using level 3 parallelism used the
first-in-first-out technique for choosing the node.
B.

SUGGESTIONS FOR FUTURE RESEARCH
Brown and Almasi

[31] believe that a new era in high

performance computing is beginning which will be dominated
by parallel computing and that its application will pace
future development

in manufacturing and knowledge-intensive

indus tries.
The research started in this paper is now being
continued in collaboration with Ralph Butler.

Our research

is centered on the implementation of level 4 parallelism.
As more memory becomes available for this type of research
on the HEP or other MIMD type computers,

larger problems

could be used to test the algorithm.
Different techniques for handling the node storage
could be tried which would use a different numbering system
on the nodes.
process

This may give a better way to tell which

is working with a particular node and make the

tracing of shortest paths easier.

It might also give more

58

efficient use of the node storage,

by eliminating a level

from the storage as soon as the node is fathomed and reusing
it.

Some different method of keeping track of the best

present

lower bound would also need to be devised.

Techniques involving a combination of parallel
hyperplane cuts with branch and bound may still have merit
if used in a different way.

An example might be to use the

cuts only when the branch and bound

technique gets the same

z-value for two or three consecutive pivots,

or nodes.

Parallelism should be investigated using new techniques
for solving linear programming problems.

Extensions

C language like those suggested by Nacini

[32] could also

be s t udied.

to the

59

BIBLIOGRAPHY

[1]

Garfinkel,

R. S. and Nemhauser,

G. L.,

Programming, John Wiley and Sons,

Integer

New York,

(1972).
[2]

Lusk,

E. L. and Overbeek,

in Fortran:

Lusk,

Self

and ASKFOR Monitors,"

Argonne National Laboratory,

Argonne,IL,
[3]

"Use of Monitors

A Tutorial on the Barrier,

Scheduling Do-Loop,
ANL-84-51,

R. A.,

(July 1984).

E. L. and Overbeek,

of Monitors with Macros:

R. A.,

Implementation

A Programming Aid for the

HEP and Other Parallel Processors," Technical
Report ANL-83-97,
Argonne,IL,
[4]

Butler,

Argonne National Laboratory,

(July 1984).

R. A.,

"An Algorithm for Parallel

Subsumption,"Unpub 1 ished Ph.D.

dissertation,

University of Missouri-Ro11 a , (May 1985).
[5]

Hwang,

K. and Briggs,

F. A., Computer Architecture

and Parallel Processing. McGraw-Hill,

New York,

(1984).
[6]

Heller,

D.,

Numerical
(1978),

"A Survey of Parallel Algorithms

Linear Algebra,"

p 740-776.

in

SIAM R e v i e w . 20, 4,

60

[7]

Lord,

R. E., Kowalik,

J. S., and Kumar,

S. P.,

"Solving Linear Algebraic Equations on am MIMD
Computer," Journal of the A C M . 30, 1, (1983),
p 103-117.
[8]

Daniel,

R. C . ,"LP-Based Mathematical Programming-

-The Significance of Recent Developments," The
Journal of the Operational Research S o c i e t y , 32, 2
(1981),
[9]

Kumar,

p 113-118.
S. P. and Kowalik,

J. S.,

" Parallel

Factorization of a Positive Definite Matrix on an
MIMD Computer," Proceedings of the 1984
International Conference on Parallel P r o c e s s i n g ,
Computer Society Press,
[10]

(1984).

Ful1e r , S . J .,Ousterhout, J. K. , R a s k i n , L .,
Rubinfeld,

P. I . , S indhu, P. J. and Swan,

"Multi-Microprocessors:

R.

An Overview and Working

Example," Proceedings of the I E E E , 66,2,(1978),
p 2 16-228.
[11]

Gehringer,

E. F., Jones,

A. K.,

"The Cm* Testbed," C o m p u t e r , 15,

and Segall,
10,

(1982),

Z. Z.,
p

40-53.
[12]

Sameh,

A. H.,

"Numerical Parallel Algorithms— A

Survey," High Speed Computer and Algorithm
Organization, Academic Press,

P207-228.

New York,

(1977),

61

[13]

Traub, J. F.,

"Iterative Solution of Tridiagonal

Systems on Parallel or Vector Computers,"
Complexity for Sequential and Parallel Numerical
Algorithms, Academic Press,

New York,

(1973)

p49-82.
[14]

Quinn, M. J. and Yoo,

Y. B. "Data Structures

for

the Efficient Solution of Graph Theoretic Problems
on Tightly-Coupled MIMD Computers",

Proceedings of

the 1984 International Conference on Parallel
Processing, Computer Society Press,
[15]

Witt,

B. I.,

Partitions:

"Parallelism,

Barlow,

Pipelines,

and

Variations on Communication Modules,"

C o mputer. 18, 2,
[16]

(1984).

R. H.,

(1985),

pl05-112.

"Performance Measures for Parallel

Algorithms," Parallel Processing Systems,

an

Advanced C o u r s e , (E v a n s ,D .E .,E d i t o r ), Cambridge:
Cambridge University Press,
[17]

Jones,

A. and Schwarz,

R.,

(1982),

pl79-189.

"Experience Using

Multiprocessor Systems— A Status Report",
S u rveys, 12, 2,
[18]

Andrews,

(1980),

pl21-165.

G. R. and Schdeider,

Notations

Deitel,

F. B. "Concepts

for Concurrent Programming",

Computing Surveys, 15, 1,
[19]

Coinput ing

ACM

(1983).

H. M., An Introduction to Operating

S y stems, Addison Wesley Publishing Company,
Reading,MS,

(1984).

and

62

[20]

Lai,

T. H . , "Anomalies in Parallel Branch and

Bound Algorithms," Proceedings of the 1983
International Conference on Parallel P r o c e s s i n g ,
IEEE Computer Society Press,
[

21]

Satyanarayanan, M. Multiprocessors:
Study. Prentice Hall,

[2 2 ]

(1983),

Li, G. and Wah,

pl83-190.
A Comparative

Englewood C l i f f s , N J , (1980).

B. W.,

"How To Cope With Anomalies

In Parallel Approximate Branch-And-Bound
Algorithms," AAAI-84 National Conference of
Artificial
[23]

Kumar,

Intelligence, (August

V. and Kanal,

L.,

1984)

"Parallel Branch-and-Bound

Formulations for AND/OR Tree Search," Department
of Computer Sciences University of Texas at
Austin,
[24]

TR-83-14,

A u s t i n ,T e x a s , (August

1983).

Quinn, M. J. and Deo, N . , "An Upper Bound for the
Speedup of Parallel Branch-and-Bound Algorithms,"
Computer Science Department,
University,

CS-83-112,

Washington State

Pullman,

Washington,

(May

1983).
[25]

Deo,

N., Yoo,

T. B. and Lord,

Algorithm for the One-to-All,

R. E.,

"A Parallel

Mixed-Weight,

Shortest Path Problem," Computer Science
Department,
CS-83-113,

Washington State University,
Pullman,

Washington,

(June 1983).

63

[26]

Gillett,

B. E.,

Research,

Introduction to Operations

A Computer-Oriented Algorithmic

Approach, McGraw Hill Book Company,

New York,

(1976),
[27]

Taha,

H. A . ,Integer Programming The o r y .

Applications,
New York,
[28]

and C o m putations, Academic Press,

(1975).

Chambless,

S. D.,

"Users Guide to an Integer

Programming Package," Unpublished Paper,
Science Department,

Computer

University of Missouri,

Rolla,

(May 1984).
[29]

Plane, D. R. and McMillan,
Optimization.

C., Discrete

Integer Programming and Network

Analysis for Management D e c i s i o n s , Prentice Hall,
Englewood Cliffs,
[30]

NJ,

(1971),

pl53,159.

Syslo, M. M., Deo, N. and Kowalik,

J. S., D i sere te

Optimization Algorithms with Pascal P r o g r a m s .
Prentice Hall,

Englewood Cliffs,

NJ,

(1983),

p 9 4 ,96.
[31]

Brown,

J. C. and Almasi,

G.,

"Research in Parallel

Computing," Co m p u t e r , 17, 7, (1984),
[32]

Nacini,

p92-93.

H.,"A Few Statement Types Adapt C-language

to Parallel Processing," Electronics, 57,
(1984),

pl25-129.

13,

64

[33}

Smith,

B. J.,

"A Pipelined,

Shared Resource MIMD

Computer," Proceedings of the International
Conference on Parallel Processing, Computer
Science Press,
[34]

Mullin,

(August 1978).

R . , Nemeth,

E. and Weidenhofer,

N.,

"Will

Public Key Crypto Systems Live Up To Their
Expectations?" Proceedings of the 1984
International Conference on Parallel P r o c e s s i n g ,
Computer Society Press,

[35]

(1984).

Heterogeneous Element Processor
Reference Manual,

(HEP) Hardware

Publication 9 0 0 0 0 0 3 ,

Denelcor,I n c ., (1982).

65

VITA

Rochelle Lloyd Boehning was born on November 12,
near Diamond, Missouri.

1932

He received his elementary and

secondary schooling in Seneca, Missouri.

He received his

college education from Joplin Junior College in Joplin,
Missouri;

Northeastern Oklahoma A & M in Miami,

Oklahoma;

Kansas State College of Pittsburg in Pittsburg,
University of Missouri—Columbia,
University of Kansas in Lawrence,
Wisconsin-Madison, in Madison,
Institute of Technology,
University of Arkansas,

Kansas;

the

in Columbia, Missouri;

the

Kansas;

the University of

Wisconsin;

in Chicago,

the Illinois

Illinois;

in Fayetteville,

the

Arkansas;

University of Missouri-Rolla in Rolla, Missouri.

and the
He

received a Bachelor of Science in Education degree in
Mathematics

in July,

Mathematics in July,
Pittsburg,

1959 and a Master of Science degree
1960 from Kansas State College of

in Pittsburg Kansas.

He received a Master of

Science degree in Computer Science in July,

1983 from the

University of Missouri-Rolla in Rolla, Missouri.
He has been enrolled in the Graduate School of the
University of Missouri-Rolla since August,

1982.

in

66

APPENDIX A

THE DENELCOH HEP (HETEROGENEOUS ELEMENT PROCESSOR)

The HEP is a large-scale,
mainframe data processor.

high-speed,

general-purpose

It is designed for applications

that can effectively use a processing speed of 10 to 160
million instructions per second (Mips).

HEP achieves this

throughput with a multiple instruction stream, multiple data
stream (MIMD) architecture.
MIMD architecture allows user processes,
to execute in parallel.

or programs,

Each process has its own independent

instruction stream operating on its own data stream.
Processes cooperate by sharing data and solving parts of the
same problem in parallel.

In HEP, high-use logic functions

are pipelined to further increase performance so that new
inputs can start processing without waiting for previous
input to finish.

Also,

all of the instruction streams and

their active data streams are always in main memory;
though processes share the computing resource,

even

no active

time is required to load and store processes when selected
to run.
The hardware components that make the HEP system unique
are the central processing unit (CPU),

the switch module,

and the data memory module.
The CPU is the basic computing unit of the HEP system.
There are as many as 16 CPUs in a machine configuration.

67

Each CPU includes 2K 64-bit words of register memory,
64-bit words of constant memory,
of program memory,
processes,

4K

at least 32K 64-bit words

64 user processes,

64 supervisor

nine function execution units, MIMD architecture

and pipelined logic.
A process
program)
executed,

in a HEP CPU is an instruction stream (or

stored in program memory for execution.

To be

a process must be created in an active task.

A task is the fundamental protection domain in a CPU.
When a task is activated,

explicit areas of each type of

memory are defined for process use in that task.

Tasks can

overlap if they are to cooperate in solving a common problem.
Processes can be created and terminated in a task whenever
appropriate to optimize parallelism and maximize throughput.
In a CPU,

instruction processing is available uniformly

to all active tasks.

Each 100 nanoseconds

(ns),

selected and one instruction from a process
accepted for processing.

in the task is

The processes for a task are

queued so that only one instruction for a process
accessible at a time.
first-in,

a task is

is

Instructions are processed on a

first-out sequence according to the position of

the process in the task queue.
When a process

is read out of a task queue,

it enters a

control pipeline known as the instruction l o o p , where
decoding and operand fetching are performed.

The

instruction loop is divided into eight 100ns time phases.
An instruction takes 800ns to get through the pipeline but

68

there can be eight instructions in the pipeline at once.
The function units,
multiplier,

such as float adder,

and hardware access unit,

f1o a t / integer

are all completed

800ns in synchronism with the instruction loop.
which are not completed in 800ns,
the scheduler function unit

in

Functions

such as the divider and

(SFU),

are called asynchronous

f unc t io n s . The divide operation takes 1700ns to complete.
To maintain the throughput,

the divider is replicated rather

that pipelined and can accept new operands and begin
execution every 100ns until all modules are busy.

Process

data synchronization is maintained by inhibiting access

to

the memory location receiving the result of the asynchronous
function until after the result has been stored.
The SFU controls all operations
memory.

Asynchronous SFU operations

intermediate time to complete,

that access data
require a random,

so the SFU withholds

process

from the task queue and maintains

queue.

When complete,

the process

the

it in an SFU

is requeued using a

special asynchronous access port.
The HEP switch is a flexibly-configured,
network that

interconnects CPUs,

control processors
system devices.

programmable

data memory modules,

(a VAX 11/780 in our case)

I/O

and other

It uses packet switching techniques

to

route messages among the units that comprise the system.
Each node in the switch network has three full-duplex ports,
so it can simultaneously send and receive three messages.
Each message processed by the switch contains the address

of

69

the unit to which it is directed and the data being
transmitted.

Each message has an associated age

(priority),

which is incremented each time the message is routed if the
original routing is other than the optimal direction.
node has an input rate of 100ns.
through the node is 50ns.

The propagation

Therefore,

Each

time

the switch is

configured in such a way that adjacent nodes have alternate
input c y c l e s .
Data memory provides communication and process
synchronization between tasks active in different CPUs.

All

data memory in HEP occupies one continuous address space,
regardless of the number of memory units.

The entire memory

is addressable by all the CPUs via the switch.

Local data

memory can be made available to each CPU by special
allocation if needed

.

SFU access to data memory via the

switch is asynchronous with the instruction and data loops.
The SFU uses control

logic to synchronize multiple

concurrent accesses to data memory and to ensure correct
relinking to the task queues when the data memory operation
is complete.

Synchronous and asynchronous access

memory are through separate ports,

to data

so accessing conflicts do

not occur.
Program memory stores instruction streams to be
executed.

Because of the execute-only characteristics of

program memory and the ability to subdivide other memory
domains by indexed addressing,

data environments need not

be bound to instruction streams.

Several processes can

70

execute the same instruction stream using separate data
streams in other memories.

Conflict between writing

synchronous and asynchronous results is avoided by using
separate access times to the memories

[2,3,5,33,34,35].

71

APPENDIX B
PL/I PROGRAM TO SIMULATE BRANCH AND BOUND TECHNIQUES WITH
AND WITHOUT PARALLEL HYPERPLANE CUTS

//C9040D JOB
//
//

(0465V S I B ,LPGM,C9040D,U M R V M B ), *B O E H N I N G ,CHELLE

TIM E = 1,MSGCLASS=A
EXEC

PLIXC L G ,P A R M .PL I = ’GOS TMT,MARGINS(2,72,1) ’

//PLI.SYSPRINT DD DUMMY
//PLI.SYS IN DD *
LEXDUAL: PROC OPTIONS(MAIN);
DCL

A (0:99,0:31)
FLOAT D E C (16)

INI T ((3100)0),

(M I N C ,S U M ,M I NCOST ,M I N A C T ,S U M 1 ,F K ,QUO)FLOAT D E C (16),
(E,I,J,K,L,M,N,P,Q,R,MINCOL,M I N R W ,FLAG,FLAG1,COL_DN,
C O L U P ) F I X E D D E C (3),T CHAR(9),
(MILLI.T1,T2,ST1.ST2,LT2,E L A P S E D T I M E ,TOTAL_T I M E ,
LOAD_TIME)FIXED DEC(5),
(SYS IN,SYSPRINT)FILE,
(ABS,SUBSTR,FLOOR,CEIL,MOD,TIME,CHAR)BUILTIN;
/*************************************************/
/* LOAD REQUIREMENTS=B(I ) AND ACTIVITIES= A(I,J)

*/

/************* I************************************/
GET LIST(P,Q,R);/* P=NO.

OF VARIABLES,

CONSTRAINTS*/
DO I=P+1 TO R;
GET LIST( (A(I ,J) DO J-0 TO P));

Q=NO.

OF

72

END;
DO I=P+1 TO R;
DO J=0 TO P;
A (I ,J )=-l*A(I,J )
END ;
END ;
/****************************/
/* LOAD NON BASIC VARIABLES */
/****************************/
DO J=1 TO P;
A(J,J)= -1;
END ;

/* LOAD COST COEFFICIENTS= C(J)

*/

DO J=1 TO P;
GET LIST(A(0,J));
A(0,J)=(-1*A(0,J));
END;
CALL MINCOLM;
N=R;
IF FLAG=1 THEN DO;
N=R+1;
CALL NEWROW;
L =N ;
E=MINCOL;
CALL SIMPLEX;

/ *DUAL INFEASABLE*/

73

CALL PIVOT;

/*THE SIMPLEX IS NOW DUAL FEASABLE*/

END;
DO UNTIL (MINRW=0);
CALL ROUNDA;
CALL MINROW;
IF MINRW=0
THEN DO;

/*PRIMAL FEASIBILITY*/

L =0 ;
E =0 ;
CALL PRNTSOL;
END;
ELSE DO;
L=MINRW;
CALL LEXMIN;
IF FLAG1=0
THEN DO;
E=0 ;
CALL SIMPLEX;
PUT S K I P (3) LIS T ( ’NO FEASABLE
SOLUTION’);
MINRW=0;
END ;
ELSE
CALL PIVOT;
END ;
END;
DO M = 1 TO 20 UNTIL (FK=0);

74

IF(((A(0,0)-FLOOR(A(0,0)))>1E-10)
&((CEIL(A(0,0))-A(0,0)>1E— 10)))
THEN DO;
CALL ROUNDA;
CALL P C U T ;PUT SKIP LIS T ( ’PARALLEL C U T ’);
L =N ;
CALL LEXMIN;
CALL PIVOT;
DO UNTIL(MINRW=0);
CALL MINROW;
IF MINRW=0
THEN DO;
L=0; E=0;
CALL PRNTSOL;
END;
ELSE DO;
L=MINRW;
CALL LEXMIN;
IF FLAG1=0
THEN DO;
E =0 ;
CALL SIMPLEX;
PUT SKIP (3) L I S T (’NO FEASIBLE
SOLUTION’);
MINRW=0;
END;
ELSE CALL PIVOT;

75

END;
END;
END;
ELSE PUT S K I P (3) LIS T ( ’Z IS INTEGER ’);
CALL ROUNDA;
CALL FIRSTFRACTION;
IF FK=0
THEN
DO;
PUT SKIP LIS T ( ’INTEGER SOLUTION’) ;
CALL PRNTSOL;
END;
ELSE
DO;
IF

M=2

I M=3

THEN
DO;
CALL UPBRANCH;

PUT SKIP L I S T ( ’UP B R A N C H ’);

IF COL_UP=0
THEN PUT SKIP LIS T ( ’UP
ELSE
DO;
E=COL_UP;
L = N;
END;
END;
ELSE

BRANCH IS INFEA S I B L E ’);

76

DO;
CALL DNBRANCH;

PUT SKIP L I S T ( ’DOWN B R A N C H ’);

IF COL_DN=0
THEN PUT SKIP LIS T ( ’DOWN BRANCH IS INFEASIBLE’);
ELSE
DO;
E=COL_DN;
L =N ;
END ;
END;
CALL PIVOT;
DO UNTIL(MINRW=0);
CALL MINROW;
IF MINRW= 0
THEN DO;
L=0; E = 0 ;
CALL PRNTSOL;
END ;
ELSE DO;
L=MINRW;
CALL LEXMIN;
IF FLAG1=0
THEN DO;
E =0 ;
CALL SIMPLEX;
PUT SKIP (3) LIS T ( ’NO FEASIBLE
SOLUTION’);

77

MINRW=0;
END;

ELSE CALL PIVOT;
END;
END;
END;
END;

PUT SKIP E D I T ( ’M=*,M)(A,F(3));
/I*;:*:*************:*******:**:***********::*:***********/
/♦

THIS GIVES THE CONTINUOUS SOLUTION

♦/

/*$£$£♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦/
/♦♦♦♦*****♦♦♦*♦***♦*♦♦*♦♦♦*♦♦**♦♦**♦♦♦♦♦♦♦/
/♦

NEWROW

♦/

/♦LOAD THE ROW TO OBTAIN DUAL F E A S A B I L I T Y V
/*****************************************/
NEWROW:PROC;
S UM-0;
DO J = 1 TO P;
SUM=SUM+ABS(A (0,J ));
A (N ,J )= 1;
END;
SUM1=0;
DO 1=1 TO R;
SUM1=SUM1+ABS(A (I ,0));
END;

IF SUM1>SUM+10
THEN A ( N , 0 ) = S U M 1 ;

ELSE

A (N ,0 )= S UM *10;

END NEWROW;
/**************************************** /
/*

MINCOLM

*/

/* FINDS MOST NEGATIVE COST COEFFICIENT */
/****************************************/
MINCOLM:

PROC;

FLAG=0;
MINCOST=-.0001;
MINCOL=0;
DO J = 1 TO P;
IF MINCOST>A(0,J)
THEN DO;
FLAG=1;
M I N C O S T = A (0,J );

MINCOL=J;
END;
END;
END MINCOLM;
/***************************/
/* FINDS MOST NEGATIVE ROW */
/***************************/
MINROW:

PROC;

MINACT=-.00001;
MINRW=0;
DO 1 = 1 TO N;
IF A(I,0)<-.00001

79

THEN
IF MINACT> A (I ,0)
THEN DO;
MINACT= A (I ,0);
MINRW = I;
END;
END ;
END MINROW;

/*

FINDS THE MIN COLUMN

*/

Z***************************Z
LEXMIN:

PROC;

MINC = 10E11;
FLAG 1=0;
DO J= 1 TO P;
IF A(L,J)<-.0001
THEN IF
MINC>ABS(A(0,J)/ACL,J))
THEN DO;
FLAG1=1;
MINC=ABS(A(0,J)/A(L,J));
E=J;
END ;
END ;
END LEXMIN;
/*********************/
/* CALCULATES A(I,J)

*/

80

/*********************/
PIVOT:

PROC;

T = TIME; MILLI=SUBSTR(T,5,5);T1=MILLI;
DO 1=0 TO N;
IF 1^=1 THEN
DO J=0 TO P;
IF J/'= E THEN
A(I,J)=A(I,E)*A(L,J)/(-l*A(L,E))+A(I,J);
END ;
END ;
DO 1=0 TO N;
IF I~=L THEN
A(I,E)=A(I,E)/(- 1*A(L ,E ));
END ;
DO J=0 TO P;
IF J~=E THEN A (L ,J )=0;
ELSE A (L ,E )=-1;
END ;
T=TI M E ; MILLI=SUBSTR(T,5,5);T2=MILLI;
IF T2> =T1 THEN
ELAPSED_TIME=T2-T1;
ELSE DO T2=T2+60;
ELAPSED_TIME=T2-T1;END;
PUT SKIP(2) EDI T ( *ELAPSED TIME = ’,E L A P S E D T I M E ) ( A , F ( 5 ) )
PUT SKIP EDIT(*Z=*,A(0,0))(A,F(16,8));
END PIVOT;

/**********************/
/* PRINTS THE SIMPLEX */
/**********************/
SIMPLEX:

PROC;

/* L IS LEAVING VARIABLE,

E IS ENTERING COLUMN*/

PUT SKIP(3) EDI T ( ’L = ’,L)(COL(2),A,F( 3 ) );
PUT SK I P (2) E D I T ( *E = *,E)(COL(2),A ,F(3));
DO 1= 0 TO N;
PUT S K I P (2) L I S T (’ *);
PUT E D I T ((A(I,J) DO J=0 TO P))(F(9,1));
END ;
PUT SK I P (3) LIST(’******************************’)
END SIMPLEX;
/*******************/
/* PRINTS SOLUTION */
/*******************/
PRNTSOL: PROC;
DO 1=0 TO P;
IF 1=0 THEN
PUT SK I P (2) E D I T ( ’OPTIMAL VAL U E ’ ,A (0,0))
(C O L (2),A,F(12,5)) ;
ELSE PUT SK I P (2) ED I T (’X ’ ,C H A R (I ), *= * ,A (1 ,0 ) )
(C O L (2),3 A,F(12,5));
END ;
END PRNTSOL;

82

/*****************************************/
/* ROUNDS TO INTEGERS IF THE VALUES ARE

*/

/*

*/

WITHIN E-10 OF AN INTEGER

/I*****************************************/
ROUNDA:

PROC;

DO 1=0 TO N;
DO J=0 TO P;
IF (A (I ,J )-FLOOR(A(I,J)))<1E-10
THEN A (I ,J )= FLOOR(A (I ,J ));
ELSE IF (CEIL(A(I,J))- A ( I ,J ) )ClE-10
THEN A ( I ,J)=CEIL(A(I,J));
END;
END;
END ROUNDA;

/*

PARALLE L CUTTING PLANE

*/

/*

CUTS Z TO INTEGER VALUE

*/

P C U T : PROC;
K = 0 ;FK=0;
DO 1 = 0 TO P UNTIL (FK/'= 0);
FK=A(I ,0)-FLOOR(A(1,0));
K= I ;
END;
IF FK=0
THEN

PUT S K I P (3) L I S T (’INTEGER SOLUTION’);

83

ELSE DO;
N = N + 1;
A (N ,0)= - F K ;
DO J=1 TO P;
A (N ,J )=FLO O R (A (K ,J ))-A(K,J);
END;
END ;
END P C U T ;
/t*****************^*******t******/
/*

FINDS FIRST FRACTIONAL VALUE */

/*

IN THE FIRST COLUMN

*/

/*********************************/
FIRSTFRACTION: PROC;
FK = 0; K = 0 ;
DO 1=1 TO P U N T I L (FK~=0);
F K = A (I ,0)-FLOOR(A(I,0));
IF FK~=0
THEN
DO;
K= I ;
PUT SKIP E D I T ( *F K = * ,F K , ’K = ’,K ) (A,F(5,2 ),A,F ( 3 ) );
END;
END;
END FIRSTFRACTION;
/ * * * * * * * * * * * * * * * * * * * * * * * * * * /
/*

ADDS DOWN BRANCH IF IT IS FEASIBLE

*/

/ t***************************************/

84

DNBRANCH:

PROC;

QUO=IE10; COL_DN=0;
DO J=1 TO P;
IF A (K ,J )> 0
THEN
IF QU0>A(0,J)/A(K,J)
THEN
DO;
QUO=A(0,J)/A(K,J);
COI _DN =J;
END;
END ;
IF COL_DN~=0
THEN
DO;
N =N + 1 ;
A (N ,0)=-FK;
DO J = 1 TO P;
A (N ,J )=- A (K ,J );
END;
END;
END DNBRANCH;
/***********#****************************/
/*

ADDS UP BRANCH IF IT IS FEASIBLE

*/

/^****************************t**********/
UPBRANCH:

PROC;

QUO=IE 10; COL_UP=0;

85

DO J=1 TO P;
IF A(K,J)<0
THEN
IF QUO>ABS(A(0,J)/A(K,J))
THEN
DO;
QUO=ABS(A(0,J)/A(K,J));
COL_UP=J;
END ;
END ;
IF COL_UP~=0
THEN
DO;
N =N+1;
A (N ,0)= FK-1;
DO J= 1 TO P;
A (N ,J )= A (K ,J ) ;
END;
END;
END UPBRANCH;
PUT SKIP(2) LIST(’IBM6 PARALLEL CUTS WITH
END LEXDUAL;
//LKED.SYSPRINT DD DUMMY
//GO.SYSPRINT DD SYSOUT=A
//GO.SYS IN DD *

/*

AND B ’);

86

APPENDIX C
MACROS USED IN THE C PROGRAM

/*******

macro definitions

♦define GETPROB(D 1,D 2 ) \
if (m > -I) \

BOUND VALUE AND SUBSCRIPT */

MAXZ();

\

D1 = br; \
D2 = 0; \
}

\

m = -1;

♦define PROBSTRT \
MENTER(si,0) \
m = 0;

/* USED IN THE ASKFOR MONITOR
TO GET THE MAXIMUM UPPER

{ \

♦define RESET

*********/

\

CONTINUE(sl,0,0) \
MEXIT(s 1,0)

87

#def ine M E NTER(D 1,D 2 )

/* ENTER THE MONITOR */

#def ine CONTINUE(D 1 ,D2 ,D3 )
#def ine ME XIT(D 1,D 2 )

/* LEAVE THE MONITOR */

#def ine NEWPROC(D 1)

/* START NEW PROCESS*/

#def ine AINIT(Dl)

/* INITIALIZE THE ASKFOR MONITOR */

#def ine ADEC(DI)

/* DECLARE THE ASKFOR MONITOR */

#def ine BARINIT(Dl)

/* INITIALIZE THE BARRIER MONITOR*/

#def ine LOCKINIT(D1)

/* INITIALIZE THE LOCKS */

#de fine LOCKDEC(D 1)

/* DECLARE THE LOCKS */

#def ine LO C K (D 1 )

/* USED F6R PRINT STATEMENTS */

#def ine UNLOCK(D 1)
#def ine BARRIER(D 1,D2)

/* DEFINED BY LUSK & OVERBEEK */

#def ine PROBEND(D 1,D 2 )

/* END OF PROBLEM */

#def ine PROGEND(D1)

/* END OF PROGRAM */

#def ine CREATE(Dl)

/* CREATES PROCESSES */

♦define C L O C K (D 1) D 1=0;
#def ine ASKF0R(D1,D2,D3 i04 D5 ) D2 = 1; D4
/* DEFINED BY LUSK AND OVERBEEK */

88

APPENDIX D
C LANGUAGE PROGRAM FOR THE PARALLEL BRANCH AND BOUND
ALGORITHM WITH MACROS,

FOR THE DENELCOR HEP,

ARGONNE

NATIONAL LABORATORY

1.

finelude<stdio.h>

2.

#include<math.h>

3.

int N [20] ,b r ,r c ,m ,l b ,elk 1,clk 2 ,p c t ;

4.

int p ,q ,r ,numprocs;

5.

double B [50][16][20],Z [20],C ,L B ;

6.

NEWPROC(slave)

7.

ADEC (si)

8.

LOCKDEC(3)

9.

m a i n ()

10 .

{

11.

/* load requirements b(i) and activities a(i,j)*/

12.

int i ,j ,flag, m i n e d ,sum ,suml;

13.

double mlncost,ffchek();

14.

AINIT(sl)

15.

LOCKINIT(3)

16.
17.

sca n f ("£d",&numprocs);

18.

printf(" numprocs = £d \ n ",numprocs);

19.

scanf("£d %d S d " ,& p ,& q ,& r );

20.

printf("p= £d

21.

q= £d

r= £d\nM ,p ,q ,r );

89

22.

for(i=p+l;i<r+l;i++){

23.

for(j=0;j <p+l;j++)

24.

{

25.

scanf(n%fn ,&B[i][j][0]);

26.
27.

}
}

28.
29.

/* load non basic variables */
for(j=l;j<p+l;j++){

30.
31.

B[jJ[j][0J=(-l);
>

32.
33.

/* load cost coefficients c(j) */

34.
35.

for(j = l;j <p+l;j++){

36.

scanf("*fM,&B(0][j]f0]);

37.

Bf0][j][0]=(-l)*B[0][j][0];

38.

>

39.

m = 0;

40.

/* find most negative cost coefficient */

41.

pct=0;

42.

flag=0;

43.

mincost=(-.0001);

44.

mincol=0;

45.

for(j=l;j<p+l;j++){

46.

if(mincost>B[0][j ][0])

47.

{

90

48.

flag=l;

49.

mincost = B [0] fj] [0] ;

50.

mincol=j;

51.

}

52.

}

53.

if(flag==l){

54.
55.

/* dual infeasible */

r=r+l;
/* add a new row of l ’s */

56.

sum=0;

57.

for(j=l;j<=p;j++){

58.

sum=sum+fabs(B [0][J ][0]);

59.

Bfr] fj] f0 ]=1;

60.
61.
62.

}
sum1 = 0;
for(i=l;i<=r;i++){

63.

suml=suml+fabs(B[i ][0][0]);

64.

}

65.

i f (sural>sum*10){

66.

B [r ][0][0]=suml;

67.

>

68.

else{

69.

B f r ][0][0]=sum*10;

70.

}

71.

pivot(r, m , r ,min c o l );

72.

printsol (■) ,*

73.
74.

} /* The siraplex is now dual feasible */
/* Now we try for priraal feasibility */

91

75. LB=(-10el0);
76. primal(r ,m ) ;
77. /* Uses Primal to obtain the primal solution */
78. C=zrO]=B[0][0][0];
79. N [0]= r ;
80. printf("CONTINUOUS SOLUTION IS Xf\n",C);
81. /***********♦**♦***********♦******♦****/
82. /* This Gives The Continuous Solution */
83. /**************************************/
84.

if(ffchek(m)==0){

85.

printf("CONTINUOUS SOLUTION IS INTEGER\n");

86.

1b = 0 ;

87.

goto answer;

88.

}

89. /a*****************/
90. RESET
91. for (i=l;
92.
93.

i < numprocs;

CREATE(s 1ave);
}

94. CLOCK(clkl)
95. PROBSTRT
96. work (*m *);
97. CLOCK(clk2)
98. PROGEND(sl)
99. /*******♦**********/
100. answer:

i++)

{

92

101.

printf("pivot count =Xd\n",p e t );

102.

printf("\nm=Xd\n",m ) ;

103.

printf("INTEGER SOLUTION Z[Xd]=Xf\n",lb,B [0][0] [l b ] ) ;

104.

printf(" \n total time was Xd\n",

105.

printf("lb=%d

106.

printsol(lb);

clk2 - clkl);

LB=%f\n " ,lb ,L B );

107. } /* end main */
108. /****************************************************/
109. /*

subroutines

*/

110. /****************************************************/
HI./****************************************************/
112. /***************************/
113.

slave () { work ( ’s ’); }

114.

/a**************************/

115. /****************************************************/
116.

work (who)

117. /****************************************************/
118.

char who;

119.

{

120.

int

121.

double

122.

for

i,

j,

mw,n,

brw,

arc;

ffchek();

(;;)

{

/*

forever

*/

123.

ASKFOR(s 1,a r c ,numprocs,G E T P R O B (b r w ,a rc),RESET)

124.

if

125.

126.

(arc == -1

!!

(arc

b reak;

if (arc != 0) continue;

!= 0 && wh o

==

’m *))

93

127.

if(pct> = 100

m> = 19){

/* safety valve */

128.

LO C K (1)

129.

p r i n t f ("pivot count=Xd\n",p e t );

130.

U N LOCK(1)

131.

break;

132.
133 .

}
/*****************/

134.

n = N [b r w ];

135 .

dnbrn(n, brw);

136.

i f (dnbrn(n ,b rw)= = 1) printf("error\n");

137.

M E NTER(s 1,0)

138.

m=m+l;

139.

Z [m ]=(-10 e 5 );

140 .

ME XIT(sl,0)

141 .

for(i = 0 ; i< = n;i++){

/* adds the down row */

mw=m;

142.

f o r (j = 0;j< = p;j + +)

143.

B [i][j J [m w ]= B [i ] fj] [brw]; /* copies node */

145.

}

146.

upbrn(n,mw);

147.

N[brw] = N[inw] = n + l ;

148.

if(Z[b r w ]<(— 10e6)) goto nosoll;

149.

n=N[brw];

150.

p r i m a l (n ,b r w ) ;

151.
152.
153.
154.

/* adds the up row */

/* calculates the down node */

/********************************/
if(B[0][0][brw]>LB){
i f (f f c h e k (b r w ) = =0 ){ /*check

for integer vector*/

i f ( B [ 0 ] [ 0 ] [ b r w ] != f l o o r ( C ) ) {

94

155.

LOCK(1)

156.

printf("NEW INTEGER LOWER BOUND,
Zd[a»d]=*f\n" ,brw,B [0 ] [0] [brw] ) ;

157.

UNLOCK(1)

158.

if(LB<=(-10e5)

!! B [0][0][brw ] > =L B ){

159.

LB = B [0 ] [0 J [brw] ;

160 .

MENTE R (s 1,0)

161.

Z [b rw]=(-10el0);

162.

ME X I T (s 1,0)

163 .

lb =brw;

164.

f o r (i = 0 ; i< =mw;i + + ) {

165.

i f (LB <f1o o r (Z [i ])) goto nosoll;

166.

}

167.

}

168.

}

169.

lb=brw;

170.

if

171.

(who

==

’

goto

endwork;

}

172.

}

173.

MENTER(sl,0)

174.

Z[brw]=B[0][0][brw];

175 .

ME X I T (s 1,0)

176.

nosoll:

177.

/I************************************************/

178.

i f (Z (m w ]<(-10e6)) goto nosolu;

179.

n = N [m w ] ;

180.

prim a l (n ,m w );

/* calculate the up node */

95

181.

/ft************************************************/

182.

if(B[0J [0][mw]>LB){

183.

if(ffchek(mw)==0){/* check for integer vector */

184.

if(B[0][0][m w ]!=floor(C)){

185.

LOCK(1)

186.

printf("NEW INTEGER LOWER BOUND,
Zu[*d]=*f\n” ,m w , B [0][0][mw]);

186.

U N LOCK(1)

187.

if(LB<=(-10e5)

!I B [0][0](mw]>=L B ){

188.

LB = B [0][0][m w ] ;

189.

Z [mw]=(-10el0);

190.

lb=mw;

191.

f o r ( i = 0 ;i< = m w ;i+ +){
if(LB<f1o o r ( Z [i ])) goto nosolu;

192.
193.

>

194.

}

195.

lb =mw;

196.

if (who == ’m ’) goto endwork;

197.
198.
199.

}
}
/♦it:************************************************/

200.

MENT E R (s 1,0)

201.

Z [m w ]= B [0][0] [m w ];

202.

MEXIT(sl.O)

203.

nosolu:printf(”" ) ;

204.

} /* end forever */

205.

endwork:return (0);

} /*end work */

96

206. /****************************************************/
207. MAXZ()/* Calculates the present Maximum Upper Bound*/
208. /***************************************************/
209.

{

210.

int i,frc;

211. double MAX;
212. b r =(-1);
213. r c = (-1);
214.

f rc = 0 ;

215. MAX= L B ;
216.

for(i=0;i<=m;i++){

217.

i f (Z [i]< =LB && Z[i]

> (-10e4)){

218.

Z[i]=(-10el0); /*N0DE FATHOMED*/

219.

printf("node Z[%d]

220.

continue;

fathomed \n",i);

221.

}

222.

i f (Z [i]>MAX && Z [i ]>(-10e4)){

223.

M A X = Z [i];

224.

br= i ;

225.

rc=0;

226.

}

227.

if(Z[i ]>(-10e6) && Z [i ]<(-10 e 4 )) frc=l;

228.

}

/* end for */

229.

i f (rc = = 0 ) Z[ b r ]= (-10e5);

230. else if(frc==l)
231. return(0);
232.

} /* end MAXZ */

rc=l;

97

233. /*************************************************/
234. /*************************************************/
235. p r imal(n , x)

/* Uses Pivot and Lexmin

to obtain the primal solution */
236. /************************************************/
237.

int n , x ;

238.

{

239.

int L, E;

240. d o {
241.

L=minrow(n,

242.

if(L!=0){

x);

243.

E=lexmin(x ,L ) ;

244.

i f (E !=0){

245.

pivot(n,X|L,E);

246.

L0CK(1)

247.

printf("Z[%d]=%f\n",x,B[0][0][x]);

248.

U N L O C K (1)

249.

i f (f loor(B[0][0][x ] )<= L B ){

250.

L=0;

251.

M E N T E R (s 1,0)

252.

Z[x]=B[0J[0][x]=(-10el0);

253.

M E X I T (s 1,0)

254.

LOCK(1)

255.

printf("NODE Z[*dJ

256.

UNLOCK(1)

257.

}

258.

else printfC”'*);

FATHOMED\n",x ) ;

98

259.

}

260.

else{

261.

MENTER(s 1,0 )

262.

Z[x]=B[0][0][x]=(— lOelO);

263.

M E X I T (s 1,0)

264.

LOCK(1)

265.

printf ("No Feasible Solution for Z[S>d]\n",x) ;

266.

UNLOCK(1)

267.

L=0;

268.

}

269.
270.

}
}

271. w h i l e (L !=0);
272.

} /* End Primal */

273. /*************************************************/
274.
275

pivot(n,

x, L, E)

/* Performs the pivot operation on the Simplex

*/

276. /*************************************************/
277.

int n , x, L, E;

278.

{

279.

int i, j ;

280. pct=pct+l;
281.
282.
283.

284.

for(i=0;

i <= n; i++)

{

if (i != L){
for (j=0;

j<=p;

if (j != E){

j++)

{

285.

B[i][j][x]=B[i][E][x]*B[L][j]fx]
/((— 1)*B[L][E][ x ])+B[i][j][x];

286.

}

287.

}

288.

}

289.

}

290.

f o r (i=0;i<=n;i++){

291.

if(i!=L){

292.

(B [i] [E] [x]=B[i] [E] [x]/ ( (-1)* B [L] [E] fx] ))

293.

}

294.

}

295.

f o r (j=0;j<=p;j++){

296.

if(j!=E)(B[L][j][x]=0);

297.

else(B[L][E][x]=(-l)>;

298.

}

299. return(O);
300.

} /* end pivot*/

301. /*******************************************/
302.

minrow(n,

x) /* Finds the most neg.

row */

303. /*******************************************/
304.

int n, x;

305.

{

306.

int i.minrw;

307.

double minact;

308.

minact=(-.00001);

309.

minrw=0;

310.

for(i=l;i<=n;i++){

100

311.

if(B[i][0][x]<(-.00001)){

312.

i f ( B [i][0][x]<minact){

313.

minact=B[i][0][x];

314.

minrw=i;

315.

}

316.

}

317.

}

318.

return(minrw);

319.

} /* end minrow ♦/

320.

/a:*****************************************/

321.

1exmin(x ,L ) /*

322.

/******************************************/

323.

int x ,L ;

324.

{

325.

double mine;

326.

int j ,E ;

327.

minc=10ell;

328.

E=0;

329.

for(j=l;j<=p;j++){

330.

Finds the minimum column

if(B[L][j][x]<(-.00001)){

331.

if(minc>f ab s(B[0][j][x]/B[L] [j] [x])){

332.

minc=fabs(B[0][j][x]/B[L][j][x]);

333.

E=j;

334.

}

335.
336.

*/

}
}

337. return(E);

101

338.

} /* end lexmin */

339.

/**********************************************/

340. double ffchek(x)
341. /*Checks for an integer lower bound */
342.

/**********************************************/

343.

int x;

344.

{

345.

int i ;

346. double ff;
347.

f o r (i=l;i<=p;i++){

348.

f f = (B [ i J [0][x])-floor(B[i][0][x]);

349.

if(ff>0.000000001 && ff<0.999999999) break;

350.

}

351.

if (ff< = 0.000000001

f f> = 0.999999999 ) ff = 0.0;

352. return(ff);
353.

} /* end ffchek */

354. /I************************************************/
355. dnbrn(n,

x) /*Adds Down Branch if it is Feasible*/

356. /************************************************/
357.

int n , x ;

358.

{

359.

int i,j,k,coldn;

360. double quo.fk;
361.

f o r (i=1;i<=p;i++){

362.

fk=(B[i][0][x])-flo o r ( B [i ][0][x]);

363.

if(fk>0.000000001 && fk< 0 .999999999){

364.

k = i;

365.

break;

366.

}

367.

>

368.

if(fk<=0.000000001

:: fk>=0.999999999)

369. quo=lel0;
370. coldn=0;
371.

for(j=l;j<=p;j++){

372.

if(B[k] [j] [x ]> 0){

373.

if(quo>(B[0] [j] [x]/B[k] [j] [x])){

374.

quo=B[0] [ j ] [ x ] / B [ k ] [ j ] [ x ] ;

375.

coldn=j;

376.

)

377.

}

378.

}

379.

if(coldn!=0){

380.

n++;

381.

B[ n ] [0][x]=(-fk);

382.

for(j=l;j<=p;j++){

383.

B[n][j][x]=(-B[k][j][x]);

384.

}

385.

}

386.

else{

387.

MENTER(s 1,0)

388.

Z [x ]=(-lOelO);

389.

MEX I T (s 1,0)

390.

} return(0);

391.

} /*end downbranch*/

return(l)

103

392.

/********************************************** */

393.

upbrn(n,

x) /*Adds Up Bran ch if it is Feasible */

394.

/************** *************** ***************** */

395.

int n , x ;

396.

{

397.

double q u o ,f k ;

398.

int i ,J ,k,colup;

399.

for(i=l;i<=p;i++){

400.

fk=(B[i][0][x])-floor (B [i ] [0 ] [x]);

401.

if(fk>0.000000001 && fk<0 .999999999){

402.

k =i ;

403.

break;

404.

>

405.

}

406.

quo=1e 10;

407.

colup=0;

408.

for(j=l;j<=p;j++){
if(B[k][j][x]< 0 ) {

409.

if (quo > fabs(B[0 ] [j] [x] / B[k] [j] [x] )){

410.
411.

quo = fabs(B[0] fj] tx] / B [k ] [j][x] ) ;

412.

colup = j;

413.

}

414 .

}

415.

}

416.

if(colup!=0){

417.

n++;

418.

B [ n ] [0]fx]= fk - l ;

104

419.

for(J = 1;j<=p;j++){

420.

B[n][j][x]=B[k]fj][x ];

421.

}

422.

}

423.

else{

424.

MENTER(sl.O)

425.

Z[x]=(— lOelO);

426.

M E X I T (s 1,0)

427.

}

428.

return(0);

429.

} /* End Up Branch */

430.

/*************************************************/

431.

printsol(x)

432.

/*************************************************/

433.

int x;

434.

{

435.

int i ,j ;

/* Prints the solution */

436. printf("\n");
437.

for(i=0;i<p+1;i++){

438. printf("\n");
439.
440.

printf("x(Xd)= * f " , i ,B [i ] [0][x ]);
}

441. printf("\n");
442. printf("\n");
443.

} /* end printsol */

