## CR-152287

## FEASIBILITY STUDY

FOR A
NUMERICAL AERODYNAMIC SIMULATION FACILITY

```
(NASA-CR-152287) FEASIBILTTY STUDY FOR A N79-26068
NUMERICAI AERODYNAMIC SIMDLATION FACILITY.
VOLUNE 1 Final Report (Control.Data Corpej
St. Paul, Minn.) 626. p HC A99/MF A01 Unclas
    CSCL 14B G3/09 28381
```

By: N. R. Lincoln

Contributions by: C. N. Arnold<br>R. O. Bergman<br>D. B. Bonstrom<br>T. W. Brinkman<br>S-H. J. Chiu<br>S. S. Green<br>S. D. Hansen<br>D. L. Klein<br>H. E. Krohn<br>R. P. Prow

MAY 1979
Distribution of this report is provided in the interest of information exchange. Responsibility for the contents resides in the authors or organization that prepared it.

Prepared under Contract No. NAS2-9896

CONTROL DATA CORPORATION
Research and Advanced Design Laboratory 4290 Fernwood Street
St. Paul, Minnesota 55112?
for

AMES RESEARCH CENTER'


FOR A
NUMERICAL AERODYNAMIC SIMULATION FACILITY

## Volume 1 - Final Report

By: N. R. Lincoln

Contributions by: C. N. Arnold
R. O. Bergman
D. B. Bonstrom
T. W. Brinkman
S.H. J. Chiu
S. S. Green
S. D. Hansen
D. L. Klein
H. E. Krohn
R. P. Prow

Distribution of this report is provided in the interest of information exchange. Responsibility for the contents resides in the authors or organization that prepared it.

Prepared under Contract No. NAS2-9896

CONTROL DATA CORPORATION
Research and Advanced Design Laboratory
4290 Fernwood Street
St. Paul, Minnesota 55112
for

## AMES RESEARCH CENTER

## PREFACE

This report consists of five volumes and a summary report. The summary gives an overview of the project which is documented in detail in the report. Volume $V$ contains Control Data proprietary information and, as such, is given to a limited distribution (NASA only). The five volumes are as follows:

Volume I - Final Report
Division 1 - General Narrative and Rationale
Division 2 - Implicit Method Description and Code
Division 3 - Explicit Method Description and Code
Division 4 - Weather/Climate Application Study
Division 5 - Technology Survey Update
Division 6 - NASF Reliability-Availability Evaluation
Division 7 - Maintenance Study for the NASF
Division 8 - Maintenance Software Alternatives
Division 9 - Installation Organization/Operation
Division 10 - NASF Physical Requirements Update
Division 11 - System Simulation Summary and Results

Volume II - Hardware Specifications/Descriptions
Division 1 - FMP Functional Specification Division 2 - FMP Instruction Descriptions Division 3 - System Hardware Descriptions Division 4 - Loosely Coupled Network Description

Volume III - FMP Language Specification/User Manual

Volume IV - Simulation Model User Manuals

Volume V - Cost and Schedule Projections (Limited Distribution)
1.0 INTRODUCTION ..... 1-1
1.1 OBJECTIVES ..... 1-2
1.2 INTERRELATIONSHIPS OF THE VARIOUS STUDIES. ..... 1-5
1.3 THIS REPORT AND THE OVERALL PROJECT. ..... 1-6
1.4 DEPTH OF STUDY ..... 1-7
2.0 .FMP DESIGN ..... 1-10
2.1 HARDWARE ..... 1-10
2.1.1 MEMORY SYSTEM. ..... 1-11
2.1.1.1 LEVEL 1 MEMORY ..... 1-14
2.1.1.2 LEVEL 2 MEMORY ..... 1-14
2.1.1.3 LEVEL 3 MEMORY ..... 1-15
2.1.1.4 MEMORY TO MEMORY DATA FLOW ..... 1-16
2.1.2 FUNCTIONAL PARALEELISM ..... 1-18
2.1.3 THE MAP UNITS. ..... 1-2 1
2.1.4 THE SCALAR UNIT. ..... 1-34
2.1.5 THE VECTOR UNITS ..... 1-34
2.2 THE FMP OPERATION ..... 1-39
2.3 RATIONALE SUMMARY. ..... 1-41
2.3.1 RELIABILITY, AVAILABILITY, and MAINTAINABILITY ..... 1-42
2.3.1.1 EFFECT OF PARTS COUNT. ..... 1-43
2.3.1.2 TRANSIENT ERRORS ..... 1-45
2.3.2 BUILDABILITY ..... 1-47
2.3.3 PERFORMANCE ..... 1-49
2.3.4 MAINTENANCE ..... 1-57
2.3.5 MICROCODE CONTROL AND FAULT ISOLATION. ..... 1-66
2.3.6 VALIDITY AND VERACITY. ..... 1-69
2.3.6.1 ASSUMPTIONS ..... 1-70
2.3.6.2 VALIDITY DERIVATION ..... 1-70
3.0 SOFTWARE DESIGN. ..... 1-72
3.1 LANGUAGE ANALYSIS AND DEFINITION ..... 1-72
3.1.1 EVOLUTION LEADING TO SPECIFICATION ..... 1-72
3.1.2 OBSERVATIONS AND RATIONALE ..... 1-76
3.2 FMP LANGUAGE DESCRIPTION ..... 1-77
3.2.1 THE BASE DOCUMENT ..... 1-78
3.2.2 THE ANSI 77 SPECIFICATION ..... 1-82
3.2.3 THE CYBER 200 FORTRAN 77 ADDITIONS ..... 1-83
3.2.4 FMP FORTRAN EXTENSIONS ..... 1-83
3.2.4.1 THE LEVEL STATEMENT ..... 1-83
3.2.4.2 DYNAMIC VARIABLES. ..... 1-85
3.2.4.3 EXPLICIT DEFINITION OF DYNAMIC VARIABLES ..... 1-91
3.2.4.4 DYNAMIC ARRAYS ..... 1-95
3.2.4.5 EXPLICIT DEFINITION OF DYNAMIC ARRAY ELEMENTS. ..... 1-96
3.2.4.6 SUBROUTINE COMMUNICATION OF DYNAMIC VARIABLES. ..... 1-98
3.2.5 COMPILER STRATEGY ..... 1-99
3.2.5.1 DO LOOP "GET READY" ..... 1-99

## PAGE

3.2.5.2 INFERRED TRANSPOSE OF MATRIX ..... 1-102
3.3 COMPILER•FUNCTIONAL CHARACTERISTICS ..... 1-103
3.3.1 SOURCE CODE. ..... 1-103
3.3.2 OBJECT CODE ..... 1-105
3.3.3 CONSTRUCTS ..... 1-106
3.3.4 PERFORMANCE. ..... 1-111
3.3.5 OBJECT LIBRARY ..... 1-112
3.3.6 LINKING AND LOADING ..... 1-113
3.3.7 OPERATIONAL CHARACTERISTICS ..... 1-115
3.4 OPERATING SYSTEM FUNCTIONAL CHARACTERISTICS. ..... 1-117
3.4.1 GENERAL. ..... 1-117
3.4.2 JOB FLOW ..... 1-121
3.4.3 INTERACTIVE OR BATCH? ..... 1-123
3.4.4 EXTERNAL CHARACTERISTICS ..... 1-123
3.4.5 INTERNAL CHARACTERISTICS ..... 1-125
3.4.6 MANAGEMENT TASKS ..... 1-126
3.4.6.1 STORAGE ..... 1-126
3.4.6.2 PROCESSING ..... 1-127
3.4 .7 PERFORMANCE CRITERIA ..... 1-128
4.0 FLOW MODEL PROCESSOR SIMULATION AND ANALYSIS ..... 1-130
4.1 FMP SIMULATOR. ..... 1-1 30
4.2 BENCHMARKS FOR THIS STUDY. ..... 1-132
4.3 FUTURE METRIC STRATEGY ..... 1-136
4.4 BENCHMARK SPECIFICATION ..... 1-140
5.0 PERFORMANCE ANALYSIS AND EVALUATION. ..... 1-143
5.1 THREE-DIMENSIONAL IMPLICIT CODE. ..... 1-143
5.2 THREE-DIMENSIONAL EXPLICIT CODE ..... 1-153
5.3 SPECTRAL WEATHER MODEL ..... 1-156
5.4 FINITE DIFFERENCE WEATHER MODEL ..... 1-158
5.5 REFLECTIONS ON THE PERFORMANCE ANALYSIS ..... 1-159.
5.6 BOTTOM LINE. ..... 1-161
6.0 SYSTEM DESIGN. ..... 1-163
6.1 SYSTEM TRAFFIC FLOW ..... 1-165
6.2 SYSTEM SOFTWARE. ..... 1-166
6.3 SYSTEM AVAILABILITY. ..... 1-167
7.0 FACILITIES STUDY ..... 1-169
7.1 RISK ANALYSIS. ..... 1-170
7.2 LOGISTICS SUPPORT. ..... 1-170
7.3 PHYSICAL REQUIREMENTS. ..... 1-171
APPENDIX A ..... 1-A-1
APPENDIX B ..... $1-B-1$
APPENDIX C. ..... $1-\mathrm{C}-1$
PAGE
APPENDIX D. ..... 1-D-1
APPENDIX E ..... 1-E-1
APPENDIX F ..... $1-\mathrm{F}-1$
DIVISION 2 THE THREE-DIMENSIONAL AERODYNAMIC IMPLICIT CODE
1.0 INTRODUCTION ..... 2-1
2.0 CODING STRATEGY FOR THE 3-D IMPLICIT ALGORITHM ..... 2-2
2.1 MEMORY HIERARCHY ..... 2-2
2.1.1 MAIN MEMORY. ..... 2-2
2.1.2 INTERMEDIATE MEMORY. ..... 2-3
2.1.3 BACKING STORAGE. ..... 2-3
2.2 FUNCTIONAL PARALLELISM ..... 2-4
3.0 THE STEP SUBROUTINE ..... 2-7
3.1 SLABS. ..... 2-7
3.2 J-SWEEP DIRECTION. ..... 2-11
3.3 BTRI ..... 2-19
3.4 K AND L SWEEPS ..... 2-20
4.0 THE RIGHT-HAND SIDE COMPUTATION. ..... 2-25
5.0 RIGHT-HAND SIDE--VISCOSITY AND TURBULENCE COMPUTATIONS ..... 2-29
5.1 VISRHS ..... 2-29
5.2 MUTUR . ..... 2-29
APPENDIX A. ..... 2-A-1
APPENDIX B. ..... 2-B-1
DIVISION 3 THE THREE-DIMENSIONAL
1.0 OVERVIEW ..... 3-1
2.0 OVERALL ANALYSIS ..... 3-3
2.1 Data Handing ..... 3-5
2.2 Speed Optimization ..... 3-6
3.0 IMPLICIT METHOD. ..... 3-10
4.0 EXPLICIT METHOD. ..... 3-14
5.0 METHOD OF CHARACTERISTICS ..... 3-16
6.0 IMPLICATIONS ..... 3-21
APPENDIX A. ..... 3-A-1
APPENDIX B. ..... 3-B-1
APPENDIX C. ..... $3-C-1$
DIVISION 4
1.0 INTRODUCTION ..... 4-1
2.0 THE GISS MODEL ..... 4-6
3.0 GISS MODEL VECTORIZATION ..... 4-14
4.0 THE MIT MODEL ..... $4-20$
5.0 SPECTRAL CODE ANALYSIS ..... 4-30
APPENDIX A. ..... 4-A-1
APPENDIX B. ..... $4-B-1$
APPENDIX C. ..... $4-\mathrm{C}-1$
APPENDIX D. ..... 4-D-1
DIVISION 5 TECHNOLOGY SURVEY UPDATE
Introduction. ..... 5-1
Critical Circuit Technologies ..... 5-1
Auxiliary Memory Technologies ..... 5-4
DIVISION 6 NASF RELIABILITY-AVAILABILITY EVALUATION
FMP Reliability Evaluation. ..... 6-1
NASF System Reliability Evaluation. ..... 6-3
APPENDIX A. ..... 6-A-1
APPENDIX B. ..... 6-B-1
APPENDIX C. ..... $6-C-1$
APPENDIX D. ..... 6-D-1
APPENDIX E. ..... 6-E-1
APPENDIX F. ..... 6-F-1
APPENDIX G. ..... 6-G-1
APPENDIX H. ..... 6-H-1
DIVISION 7 MAINTENANCE STUDY FOR THE NUMERICAL AERODYNAMIC SIMULATION FACILITY
Strategy Assumptions ..... 7-1
Field Organization. ..... 7-1
Preventive Maintenance. ..... 7-2
Computer Aided Maintenance ..... 7-3
Maintenance Software ..... 7-3
Logistics ..... 7-4
Technical Support ..... 7-4
Maintenance, Non-CDC Equipment. ..... 7-4
DIVISION 8 MAINTENANCE SOFTWARE ALTERNATIVES FOR THE 1980s
1.0 Introduction ..... 8-1
2.0 Existing Maintenance Software Overview ..... 8-2
2.1 General Features ..... 8-2
2.1.1 Maintenance Control Unit ..... 8-2
2.1.2 CPU Off-line Diagnostics ..... 8-2
2.1.3 CPU On-line Diagnostics. ..... 8-3
2.1.4 PDC Diagnostics. ..... 8-3
2.1.5 Fault Isolation. ..... 8-3
2.1.6 Error Logging and Recovery ..... 8-4
2.2 Summary. ..... 8-4
3.0 Alternatives For The 1980s ..... 8-5
3.1 System Recommendations ..... 8-5
3.2 Hardware Recommendations ..... 8-7
3.3 Future Maintenance Software Development ..... 8-8
3.3.1 Fault Isolation. ..... 8-8
3.3.2 Operational Summation. ..... 8-8
3.3.3 Loosely Coupled Network I/O. ..... 8-8
3.3.4 Application of Gate Simulation Data Base ..... 8-9
3.3 .5 On-line. ..... 8-9
3.3.6 Error Logging ..... 8-9
3.3.7 Recovery ..... 8-10
3.3.8 Concurrent Maintenance ..... 8-10
4.0 Implementation Goals and Strategy ..... 8-11
4.1 Goals ..... 8-11
4.2 Strategy ..... 8-11
4.2.1 Hardware Strategy ..... 8-12
4.2.2 Operating System Strategy. ..... 8-12
4.2.3 Maintenance Software Strategy. ..... 8-12

PAGE
4.3 Attainability ..... $8-12$
DIVISION 9 INSTALLATION ORGANIZATION/OPERATION
Manpower Requirements ..... 9-1
Data Center Supplies. ..... 9-6
Services. ..... 9-8
Summation ..... 9-10
DIVISION 10 NASF PHYSICAL REQUIREMENTS UPDATE
DIVISION 11 SYSTEM SIMULATOR SUMMARY AND RESULTS
1.0 Definition of System to be Simulated ..... 11-1
1.1 Stations ..... 11-3
1.1.1 The SPS. ..... 11-3
1.1.2 The FMP. ..... 11-4
1.1.3 The ..... 11-5
1.1 .4 The GRF ..... 11-5
1.2 The Loosely Coupled Network. ..... 11-6
1.3 System Philosophy, Principles, and Groundrules ..... 11-8
2.0 Simulator Characteristics. ..... 11-10
2.1 Code Structure ..... 11-10
2.1.1 Central Theme - LCN Mechanics ..... 11-10
2.1 .2 Code Modules for Device Classes ..... 11-11
Macroscopic Assumptions: Special NotesConcerning the SPS.11-12
3.0 The Simulator as a Tool ..... 11-14
3.1 Simulator Input ..... 11-14
3.2 Two Simulators ..... 11-15
3.3 Simulation Techniques ..... 11-15
3.3.1 Full Run - Truncated Run ..... 11-15
3.3.2 Light Load - Heavy Load. ..... 11-16
3.3.3 Key Diagnostics ..... 11-17
3.4 Trial Runs on Three Different LCNs ..... 11-18
4.0 Results From Simulating The "NASF Usage Model" ..... 11-23
4.1 Translation of Usage Model into Workload Inputfor Simulator11-23
4.2 An Example Simulation ..... 11-30
4.2.1 Characteristics of the Run ..... 11-31
4.2.1.1 Job Arrivals ..... 11-31
4.2.1.2 SPS Utilization. ..... 11-32
4.2.1.3 The FMP Execution Queue ..... 11-34
4.2.1.4 FMP Utilization. ..... 11-34
4.2.1.5 Throughput and Turnaround Statistics ..... 11-34
4.2.2 General System Response. ..... 11-36
4.2.3 Variations on the Example. ..... 11-37

## PAGE

4.2.3.1 Different Arrival Statistics ..... 11-37
4.2.3.2 Different LCN Geometry . . ..... 11-37
4.2.3.3 Effect of Change in SPS Performance and Load ..... 11-38
4.2.3.4 Effect of Changing the Priority of a
Job Class. . . . . . . . . . . . . . ..... 11-38
4.2.3.5 Simulation of the Night Shift Workload ..... 11-39
4.3 Conclusions. ..... 11-39
APPENDIX A. ..... 11-A-1
REFERENCES. ..... R-1

GENERAL NARRATIVE AND RATIONALE

### 1.0 INTRODUCTION

The technological advances that seem to create a new breakthrough in high speed computer development each passing day unquestionably excite the scientists whose investigations demand seemingly limitless amounts of computational power. Until recent developments in reliable production of high performance Large Scale Integration (LSI) and automated computer design methodology, such insatiable computational requirements had to be met, mainly, by manufacturers of 'standard product' computers. The goals of such standard product machines were necessarily linked to the business objectives of the producing manufacturer. These objectives, of necessity, have been the result of compromises made between many complex factors -- cost, performance, compatibility, software support, product line integration, and the realities of design, schedule, and manufacturability. On the surface, at least, the production of a 'special purpose' computer could avoid these numerous compromises, and thus achieve performance levels for a narrow range of problem charcteristics substantially in excess of what the standard product machines could yield. This premise is based on the assumption that the special purpose machine and the standard product machine would be built from similar if not identical technologies, and with similar if not identical design techniques.

The reason such an approach has not been truly practical for a manufacturer until recent innovations in design and silicon technology have occurred is simply the high degree of risk involved in such a project. The risks are considerable -- cost overrun, schedule delays, reliability, maintainability, software development lead time, attaining performance objectives being just a few that haunt any prospective vendor of a massive central computer system. The risks to the consumer are equally great; however, a clever consumer can at least make the manufacturer assume the burden of financial risk for the hardware itself with judicious use of contract clauses. Despite the incredible risks, the potential for solving a heretofore unsolvable class of problems on such a computing ensemble may justify the challenge, particularly if the special purpose computing facility is successful, resulting in a clear-cut savings in time and dollars.

A particular class of problems has been identified as offering the potential for great gains in cost and time if the appropriate computing system can be found to house them. This set of problems is the simulation of fluid flow around
three-demensional bodies, both in wind tunnel environments and in free space. The application of numerical simulations to this field of endeavor promises to yield economies in aircraft design due to reductions in tunnel tests, model designs and construction, and various flight conditions. In addition, particularly in transonic flow analysis, numerical simulation may produce results that would be obscured in physical tunnel tests. This class of problems also exhibits other computational.characteristics such as massive quantities of data required for three-dimensional meshes and extremely heavy arithmetic load for each solution. A large central processing system capable of crunching the Navier-Stokes solution seems to be called for in this case. Such a system must be capable of holding the data associated with these very large problems, achieving a problem solution in a reasonable amount of time (say about 10 minutes), and then ordering the results in a form that can be easily understood.

The question then arises, "Can a specially designed computer system be built which will provide the necessary power for this specific set of problems?". A corollary question is "Will a specially designed computer system for flow modeling yield performance substantially greater (a factor of at least 10 times) than a high-performance standard product available in the same time frame?".

It is this set of questions that has been raised by NASA, and submitted to those manufacturers who profess an interest in extremely high performance computer development. To answer these basic questions and those questions which derive from them, such as "What are the risks and costs of such a project?" NASA had engaged Control Data Corporation to pursue a two year study into the feasibility of construction of a centralized Numerical Aerodynamic Simulation Facility (NASF) in the time frame of 1980-1984. This study has been segmented into three parts -- documented in references 1 and 2, and in this report -each of which address in increasing detail the characteristics and feasibility of a full scale NASF.

### 1.1 OBJECTIVES

As understood by Control Data, the ultimate goal of NASA is to create a facility for flow simulations that can cope with the volume of data needed for three-dimensional models and with complex computations necessary for a continually maturing mathematical solution to the flow equations. This goal is achievable to the extent that sufficient computer power is available to provide system throughput. which can suffice for effective aircraft design as well as meaningful flow research. NASA-Ames researchers have determined that a maximum allowable compute time for the efficient conduct of aerodynamic design is on the order of ten minutes per full solution. In addition, the production models have been determined to need meshes on the. order of $100 \times 100 \times 100$ data points. These two qualities immediately circumscribe the memory and arithmetic performance
requirements for the computational portion of the NASF. The computer industry's best projections for the period 1980-1.984 do not reveal any potential standard products that can achieve sustained performance of even $1 / 10$ of the requisite calculational performance of the NASF, let alone the memory capacity. The overall objective of the NASF studies has been then to determine if and how a specialized computer system might be built in this time frame to meet NASA flow modeling goals. A somewhat invisible objective of these three study efforts has been to test the corporate willingness of candidate manufacturers to engage in this high risk activity. Throughout the remainder of this report, this last "hidden" objective must be kept in mind, since despite the best affirmations of feasibility and success for the NASF, the absence of vendor interest or support will guar antee that the endeavor will never be launched.

Given the overall objectives, the first study period was commenced with the following objectives:
a) Assessment of architectural and technology alternatives to creation of the main computational component of the NASF, the Flow Model Processor (FMP).
b) Instructing NASA personnel in the implications of a) above.
c) Identifying computational characteristics of simulation codes.
d) Establishing at least one hardware model, using realistic technology that could achieve the goals of the NASF.
e) Preliminary risk analysis for the conduct of such a project.
f) Identifying the key software development considerations for such a large scale system.

At the conclusion of the first study it was determined that a computer system could be designed around the characteristics of the Navier-Stokes solutions employed by NASA-Ames researchers. To a minimal extent a machine structure was arrived at that could conceivably be constructed with technologies that should be available in the 1980-1984 period. An extension was then launched to the first study, intended to further refine the FMP architecture and to develop additional information for NASA planners who were then deeply involved in making the NASF a reality. The objectives of this study. were thus similar in nature to those of the first period, with the exception that certain aspects were to be scrutinized in much greater detail:
a) Review of technological developments as they might apply to the construction of the FMP.
b) Detailed modeling of the FMP to provide structural simulation of candidate code sequences.
c) Analysis of the three-dimensional flow models as they would perform on the proposed FMP structure.
d) Development of detailed reliability data from more refined knowledge of the FMP design.
e) Specification of the general functional capabilities needed for the software systems for the FMP and their relationship to the NASF in which it is imbedded.

This second study concluded again that a machine of the essential power was buildable and could be made to meet acceptable standards of reliability, availability, and maintainability (RAM). Before such a large scale effort could be launched however, additional material needed to be developed. Hence the institution of the final "study" effort of the NASF project. The overall objective of this effort was to provide additional detail and additional answers to NASA scientists and planners so that they might begin the lengthy and arduous procurement process for such a system. The objectives of this final study in order of original importance were:
a) Derivation of detailed and reliable cost data for every segment of the project.
b) Validation of the FMP design for functionality and performance at a design level more detailed than the previous structural model.
c) Simulation and analysis of the data flow among the major FMP components.
d) Development of total NASF system load analytical techniques to provide for configuration evaluation.
e) Detailed specification of programming languages and operating system structure for the FMP.
f) Examination of two computational models with dissimilar characteristics to the aerodynamic codes, specifically a spectral weather code and a finite-difference weather code, to determine their performance on the special purpose flow model processor.
g) Simulation of the final FMP design in execution of the four identified performance metrics: the 3-D implicit and 3-D explicit Navier-Stokes solutions developed by Ames and the spectral and finite-difference weather models developed by other NASA. agencies.
h) Development of probable system loads created by potential users of the NASF when it becomes fully operational.
i) Final update on technological alternatives in the 1980-1984 period for construction of an FMP.

### 1.2 INTERRELATIONSHIPS OF THE VARIOUS STUDIES

All study efforts under this NASF project since its inception have been cooperative and interactive in form and style as regards the relationship between NASA investigators and Control Data engineers. It is also impossible to discuss any reasonable conclusions in this final report without taking into account the other side of this study "triangle", the efforts of the alternate contractor, Burroughs Corporation, as they pursued the same objectives on behalf of NASA. The three-way interaction of these parties, Ames, Control Data, and Burroughs, has not only served NASA's aims well but, at Control Data it is believed that the final FMP and NASF structures of both vendors have benefited by the competitive emphasis that two parallel approaches has provided. Thus, once the first study was completed and published, the heavy reliability emphasis placed by Control Data was adopted in part by Burroughs' designers. In a similar way, Burroughs' continuing concentration on the problems of data flow and accessing in the Navier-Stokes codes were brought to Control Data's attention and affected many redesign decisions for the FMP.

It can then be seen that each study to date owes not only its objectives to the groundwork laid by the previous studies, but even more importantly, each subsequent study derives much material from the evaluation of the competitive report for the previous effort as well as from extensive critique of each study by NASA-Ames personnel. The result of this is that in many instances major structural changes have been made and remade as each study progressed. In addition, conclusions have been drawn and redrawn in several areas due the interaction mentioned before and the changing perspective that comes with the passage of time. For example, the original choice for a bulk, random access memory (RAM) for the Control Data version of the FMP was designated as "bubble memory", with Charge Coupled Devices (CCD). being given second place in consideration. In the intervening two years of these studies, actual hardware has been constructed, certain componentry has come into production and newer components have reached unexpected cost levels. The result is that the intermediate storage for the Control Data FMP is now conceived as consisting of large scale RAM chips of moderate performance, in place of the CCD or bubble memory originally chosen.

This report cannot completely supplant. the material developed in the previous study periods. Instead its contents may be said to selectively replace or update previously reported data or conclusions in addition to providing new material in those areas not covered in prior studies. Thus this report, combined with references 1 and 2 constitute all the material developed by this contractor to assist and support the procurement of an

NASF, as well as providing guiding information to aid NASA in its decision processes about the entire project. Given the dynamic nature of the technological evolution, and more importantly the state of the economic climate that directs major manufacturing decisions, many of the approaches and conclusions reached herein can be said to remain valid for a period of no more than a year. This does not mean that the recommendations and predictions given for, say year 1983 will not prove to be correct. What it does mean is that if there is as much as a one year delay in initiating any of the next steps in the NASF. procurement, design and construction, the choices for technology, architecture, and support processor systems might be radically altered to achieve better cost, performance, and reliability levels.

### 1.3 THIS REPORT AND THE OVERALL PROJECT

To provide as much quantitative assurance as possible that the proposed NASF project is feasible, it has been necessary to develop almost all of the hardware and software components to a relatively high degree of detail. In the case of the design chosen by Control Data, this has meant selecting a technology. which is in existence and whose manufacturability and performance have already been proven. Using this technology a detailed architecture was developed and from that a design carried to enough detail that a reliable simulation could be produced for it, and relatively high-confidence component counts projected. In addition the support processing system needed to be sized and costed. What is represented in this report then is a model for a NASF/FMP ensemble using a possible approach to meeting NASF goals. If this specimen system is truly feasible, then it follows that there are other systems equally feasible, and NASA's concern about feasibility is satisfied. The candidate system offered in this report represents one which, at this point in development, Control Data considers the best possibility in terms of performance, reliability, and true buildability with minimum risk.

In no way should this candidate architecture and design become one that is specified for the final NASF, since there are still alternatives to be investigated. It should be emphasized that the structure and design numbers offered in this report are to support the possibility of a successful conclusion to NASA's search for an effective facility. The design and structure included here should be evaluated only in light of determining feasibility of the proposed NASF, and should not necessarily be considered as the candidate architecture for such a project to be compared with other competitive schemes, except where Control Data has called attention to the effects of architectural differences on some problem formulations.

Since this study, as well as others, is to form the basis of the procurement of the complete NASF it must necessarily provide as much detail as possible to support the many activities required of NASA and the NASF manufacturer. Given the broad scope of this study and limitations on the resources available, it was not entirely possible to pursue all aspects of the study to the same level of detail. The various tasks were thus met in a somewhat dynamically assigned priority order:
a) The need for detailed and exhaustive cost data by NASA in the summer of 1978 to assist the preparation of funding requests became the focal point of most of the project's technical resources during the early period of this study. Control Data attempted, within the tight time constraints, to conduct a cost analysis for production of the NASF similar to those analyses undertaken for its own product families. Though it was desired that a confidence factor of $10 \%$ be ascribed to this activi.ty, the brief time availaable for full cost detailing made this goal almost impossible to achieve. Instead each major factor was given a separate confidence factor, with the expectation that as this study progressed some of the costing. could be reevaluated and confidence improved. In fact, the degree to which some of the unknowns of the summer of 1978 were understood has not improved substantially, and probably will not until actual software design has been carried to completion. This is due to the fact that the major cost uncertainties revolve around the software implementation and maintenance strategies. A good deal of detailed hardware design had to be completed to provide the performance and cost data for this study. In the area of FMP hardware the confidence in the cost data has improved to where it is thought to be within the variance goal of 10 percent, for the most part.
b) Hardware redesign of the FMP was a continuing operation during this study period in response to criticisms from Ames, new aspects of the flow codes that were revealed, and the necessity for improving the performance of the FMP on the weather codes. A greater design concern was the reduction of component counts to improve the cost and reliability of the FMP. This led to the reduction of the number of vector pipelines to 5 ( 4 active and one spare) using a technological "trick" to double the processing bandwidth of the resulting pipelines. .-
c) Development of the FMP simulator as a reliable and useful tool for measuring code execution, was a continuing task as the changing characteristics of the machine design had to be injected into the simulator,
and operational use of the simulator revealed diagnostic and analytical aids that needed to be added.
d) Development of the NASF system model simulator began in mid 1978 and, as more has become known about the probable environment of the Ames NASF, this simulator has become more important as an evaluative tool for both CDC and NASA reseachers.
e) Development of an FMP programming lanugage which was acceptable to potential NASF users, compiler writers and language standards specialists became an interactive exercise with many alternatives weighed, rejected, or criticized. The final outcome of this effort is given in the section on language analysis and design.
f) Encoding of the implicit 3-D flow model in this language was done to demonstrate the language and how it would be mapped into machine instructions for the FMP.
g) Encoding of portions of the explicit code was done to illustrate the operation of the FMP on code sequences not necessarily similar in computational characteristics to the implicit code.
h) Analysis of the mathematical and computational characteristics of the weather codes was done to determine what the effect of the FMP architecture would be on those models.
i) Operating system software for the FMP and the full NASF system was examined and functional characteristics defined for those components not already available in the standard software that will be available on the front-end machines (support processing systems).
j). A study of the reliability, availability, and maintainability of the FMP was conducted by Control Data Supercomputer Operations reliability specialists.
k) A review of previous technological projections and recommendations was conducted to provide update information on what is realistically available to NASF implementers in the 1980-1984 timeframe.

1) The cost data provided NASA in 1978 was reviewed and updated wherever possible with more recent projections.

A number of activities were not carried out to the extent desired at the outset of this study effort. In some cases resource and time limitations dictated this deficiency, in others changing priorities or interests led to truncating a particular study effort. Some examples of this are:
a) A full-fledged and detailed specification of the. FMP operating systems was not produced. The vestigial nature of this operating system (described in reference 2) truly eliminates the need for extensive operating system functions, however the placement of some functions (data editing and analysis, for example) has not yet been decided for the NASF, and thus uncertainty remains as to the need for certain functions in the FMP.
b) A full coding and simulation of the weather models was not done by Control Data. Some portions of the spectral model were coded into FMP FORTRAN and results estimated. In addition, some investigation of the finite difference was done. Discussion of these activities is presented in Division 4.
c) A full coding and simulation of the explicit 3-D flow model was not done. Instead portions of this code which had characteristics dissimilar to the implict code were vectorized in the most straightforward manner possible. This effort was conducted to resolve two questions.

1) If the FMP is designed primarily to be efficient for the implicit code, what is the degradation in performance to be expected of codes with different computational behavior, such as the explicit code?
2) What level of performance is achievable by a "first attempt" at utilizing the FMP on the part of new FMP programmers?

### 2.1 HARDWARE

The FMP that is described here is the result of an evolution in thinking and implementation since the first attempt to arrive at a sufficient machine structure for the flow model solutions in the first study period of this project. It is, admittedly, based on the processing concepts that have emerged from the Control Data STAR-100 and CYBER 200 computer systems. To improve the project's chances for success it has been alleged from the outset that a good deal of the design, implementation, and software development must be grounded on existing work, or work in progress; the risk of beginning literally "from scratch" on an effort of this magnitude is too great to tempt rational developers into making the attempt. The basic principles for creation of the Control Data Flow Model Processor are then:
a) A massive, centralized memory system which serves as the coordinating medium for data transfer and processng control with sufficient porting to be provided in this memory for a multiplicity of concurrent processes to be carried out.
b) The maximization of functional parallelism which employs concurrency along functional lines rather than providing a multitude of concurrent but identical functional elements.
c) The minimization of the number of identical parallel processing elements through the use of the most aggressive technologies available. It is claimed that two processing elements operating at a clock cycle of two nanoseconds -provides superior control, interconnection, and reliability characteristics to an ensemble of forty processors operating at a 40 nanosecond clock cycle, although on the surface both would seem to yield an effective processing rate of one step every nanosecond. (Appendix A provides additional information on clock rates as a measure of performance.)
d) A high bandwidth and multi-access I/O connection to all other processors and storage media in the system, to provide multipathing for system availability as well as for performance reasons.
e) The employment of a FMP-type processor as a computational engine only, leaving all tasks other than the mathematical solution of flow equations to other, conventional processors attached to the system.

Given these desirable principles, a hardware design for the FMP can be completed within the constraints of technology availability, reliability, and maintainablity considerations, and tradeoffs involving physical dimensions, power, cooling, and interconnection limitations. The significance of these tradeoff considerations will now be examined for each major component of the FMP discussing the rationale behind the design presented in detail in the FMP functional and instruction specifications which can be found in Volume II of this report.

### 2.1.1 MEMORY SYSTEM

Far and away the most important part of the FMP is the memory system, both in impact on the entire design and in cost for the entire machine. Memory capacity is dictated by the requirements of current and projected production problems, and by the predicted needs of a class of research problems that may employ the FMP. If the nominal production problem is based on metric dimensions of $100 \times 100 \times 100$ elements, then the implicit code in its present form will require 9 million 64 -bit words to retain just the flow variables. Another 5 million words are needed for temporary results generated by each sweep required of the "independent sweep method" of flow code solution. Another couple of million words will be needed to hold locally temporary vectors throughout the various subroutines in the flow codes. The nominal space requirements can then be roughly guaged at 16 million words.

To make the FMP work most efficiently the capability is needed to "stage-in" or "roll-in" one job whilst another is in process so that no time is lost while transferring all the data to be used in and out of the FMP. A buffer space of from 9 to 16 miliion words seems to be indicated by this strategy.

The term 'roll-in/roll-out' is derived from the Control Data CYBER 70/170 scheme for memory management when a multiplicity of jobs are contending for the CPU. The term usually referred to the act of moving a job's entire CPU memory space onto disk to make room for aother job. It was usually performed by hand on the CDC 6600, and invoked only when the job in memory had a probability of spending a protracted amount of time in an idle state (while an archived tape was being located by the operator, for example). A small, but vital set of data about the job's status was retained by the operating system so that it could be 'rolled in' at a•later time and restarted. In variants of this 'roll-in/roll-out' scheme the job, or portions of it, could be moved to extended memory, rather than to disk, to be restored later. In the FMP, the normal mode of operation for batch jobs will consist of readying a complete image of the job on disk, transferring that image to the Backing Store (including code, data base, and supporting parameters), and when space permits in the Intermediate Memory, rolling in the job from Backing Store to Intermediate Memory. At the completion of a job, its entire image is rolled out to backing storage and thence to disk, under some circumstances leaving the job of further data reduction
of the rolled-out data to the SPS, or perhaps another job executed on the FMP itself. The basic ability to perform a roll-in/roll-out operation opens a pandora's box of possibilities for complicating the operating system. Performing checkpoint-restart images, for recovery in the event of a system failure, is one possible use of the roll-in/roll-out facility. Another would be 'interrupt roll-out' where, under special circumstances, the SPS (perhaps at the request of the user) interrupts the present job in the CPU. The job could be rolled-out to backing storage or disk until the SPS either performs an ABORT or CONTINUE function. Note that once these facilities are in place, the incentive to take the next potentially fatal step into time-sharing could become too enticing. It is at this point that the systems developers must exert some degree of discipline on FMP operating system design, so that the FMP doesn't become an abused, general-purpose, time-sharing machine, instead of a special-purpose, batch-oriented computational engine.

Finally, the research problems that are contemplated may require basic CPU-contained data bases of the order of 30 to 100 million elements that must be accessed at speeds higher than can be provided by existing rotating mass storage systems. Thus the apparent resulting requirement is for a production code memory of from 16 to 40 million words with expandability to about 200 million words.

Given the current technological predictions on componentry, it is not possible to construct a single, homogenous memory out of one single technology that would provide this range of memory capacity, and still meet the bandwidth requirements of the concurrent functional elements of the FMP.

The overall block diagram (figure 1) shows the FMP to possess a three-level hierarchy of memory. Each level is designed with a particular set of bandwidth, access time, cost, and component counts commensurate with the volume of data contained. That is, in short, the larger the capacity the slower will be the access time and the lower the bandwidth, in exchange for a significant reduction in cost and failure rate on a per-bit basis. A fourth "invisible" level of memory exists which consists of extremely high performance components (effective access times on the order of 3-8 nanoseconds) which are used as register files and high speed buffers in the internal design of all functional units of the FMP.


Figure 1. Basic FMP Configuration

### 2.1.1.1 LEVEL 1 MEMORY

The first, and most crucial memory is that called Main Memory (or LEVEL 1 memory). It is this memory that provides the effective bandwidth to supply operands to all parallel functional units. Not only is bandwidth a consideration but single-element access time must be minimized in this level of memory so that those processes which are necessarily "purely" scalar can be carried out at the maximum rate, and essential "transpose" operations on single elements can be accomplished in minimum tme. This level of memory then contains the most powerful and dense memory technology available. At this time the practical limits prescribe the use of a high speed bipolar Random Access Memory (RAM) part which is organized as a 4096 by 1-bit storage device with an access time of 16-20 nanoseconds.

The sheer cost and number of such devices needed to build a basic million-word unit of high performance memory make it impossible to use this technology uniformly throughout the machine. A reasonable limit, based on parts count reliability, power and cooling requirements, and the physical geometry for such a memory which affects access time, is the construction of 8 million 64-bit words employing this technology. If problems can be done in 32 -bit mode, this memory could house 16 million 32-bit elements. The speed of this memory part makes it possible to organize the memory into four sets of eight banks which, when strobed in a systematic way, can deliver data (or accept data for storage) at rates up to 1024 bits per set every CPU clock cycle.

This memory is organized in modules and ports with a 32-bit half-word as the smallest writable segment. Thus the single error correction; double error detection system (SECDED) is organized in a similar manner providing 7 bits of error correction/detection for every 32 bits of data stored in memory.

The minimum configuration of this Main Memory is 2 million words, a size necessary to preserve the banking relationships which support the large bandwidth of this memory system. Address trunks and other controls are provided for possible later technological extensions to memory chips of up to 65 K bits. In this case the maximum memory available for LEVEL 1 could be 128 million words. It must be pointed out that such a component is not forseen (at the requisite performance levels) for at least five or six years (well beyond the target time frame). Additionally, such parts will be more expensive than the current componentry, and thus might motivate a search for a more dense, and much less expensive part for the other memory hierarchies.

### 2.1.1.2. LEVEL 2 MEMORY

Since the maximum practical size of the high performance memory has been limited by engineering fiat to 8 million words, some
means must be sought for holding the bulk of the nominal flow model data. There exists at present one memory system composed of medium performance 32 K by 1 -bit semiconductor devices (two 16 K by 1 -bit chips per package) which can provide a reasonably high bandwidth and single-element access times of approximately 125 nanoseconds. The projection by technological experts that a 131 K by 1 -bit device (two 64 K by 1 -bit chips per package). for this system will be available in the timeframe of FMP construction has been established with high confidence. A medium performance system can thus be configured for data storage of from 8 to 32 million 64 -bit words ( 16 to 64 million 32 -bit words) with peak bandwidths on the order of 20 billion bits per second. This memory would be organized on a 64-bit basis with 8 bits of SECDED for every 64 -bit word.

Note that such a memory trades off an access speed 4 times slower than the Main Memory and a bandwidth 12 times slower for a parts count reduction of 8 times for the same volume of memory, and probable cost ratio in favor of the LEVEL 2 memory of $4-6$ to one.

The very nature of Intermediate Memory (LEVEL 2.) implies that the time required is greater to deliver data to other functional elements (Vector or Scalar Units, for example), thus some electronic delays are permissible in transmission lines between the Intermediate Memory and the other memory systems. This means that the LEVEL 2 memory can be engineered into a stand-alone unit which eases expandability (for the range of memory configurations) and improves accessibility for maintenance actions.

It is in LEVEL 2 memory that the bulk of all flow model data will reside for the large production problems. Smaller problems may, in fact, be totally contained in the 8 million-word Main Memory. The remainder of the LEVEL 2 memory will be used to stage other jobs in and out while the current job is in progress.

### 2.1.1.3 LEVEL 3 MEMORY

To simplify hardware scheduling of input and output and to provide a moderate performance memory for the large research problems, a third level of memory is shown on the block diagram (figure 1). This memory would be limited to block transfers only of 32 K elements each. By establishing. this limitation several high density, low cost, slow access technologies can be employed. This block transfer characteristic is paticularly useful when considering the employment of charge coupled device (CCD) technology. Although the beginning of a particular block may take several milliseconds to reach the output port of the CCD shift register, this wait can be avoided by starting data transfer at any point in the block with the limitation to always transfer an entire block. At the cost of some counters in the CCD memory system and the Swap Unit to which i.t is attached, the access time to select a given block can be reduced to near zero.

The LEVEL 3 memory supplies or accepts data at an effective rate 32 bits every clock cycle at its single data port. This data moves to/from Intermediate Memory (LEVEL 2) via the Swap Unit. If a 9 million-word job has been set up and held in this memory awaiting execution, it can be rolled in to Intermediate Memory in 18 million clock cycles which is approximately 288 milliseconds, assuming no major conflicts in access to either the LEVEL 2 or LEVEL 3 memory. Since the expected length of execution for the nominal job is on the order of 5 to 10 minutes, there is obviously a large window in which the 288 milliseconds can be expended.

The LEVEL 3 memory can be absent from the FMP configuration if initial installation requirements cannot justify its purchase; in this case however, there will be some degradation in performance where "explicit" input and outpu't are required by the executing code. The transfers to disk cannot be scheduled by the hardware, since a ready-resume LEVEL 3 memory is the normal I/O mechanism for the FMP. Therefore I/O transfers directly to rotating mass storage may involve many "lost revolutions" due to the priorities given the Map and Vector Units for memory access.

A better alternative to initially configuring the LEVEL 3 memory would be to install a minimum CCD memory system of 8 million words, even though this is a smaller capacity than the necessary 32 million words in LEVEL 2 memory (word $=64$ bits plus SECDED).
Software and hardware would thus operate exac.tly as it would in the final configuration. The LEVEL 3 memory is designed for a maximum of 256 million words, while current parts projections offer a practical limit. of 128 million words for the 1980-1984 construction period.

### 2.1.1.4 MEMORY TO MEMORY DATA FLOW

Further knowledge of the memory hierarchy might be gained by following the movement of data through the FMP as it might occur for a large production problem. The initial flow field and mesh. coordinates will have been stored on rotating mass storage (RMS) prior to initiating data transfer for a particular job.

1) While other jobs are in progress on the FMP, the incoming job's data is moved from RMS to the Backing Store through a buffer in Intermediate Memory. The I/O channel connections to the serial data trunk can provide 50 megabits of data at peak rate, however, the disk transfer rate is 38 megabits per second. With four channels transferring in parallel the rate is 4 x $38=152$ megabits per second, but, on the average., half the time of each is spent reading and half is
spent writing. Also, latency and gaps on disk reduce the effective rate approximately by 2. Therefore, the nominal 9,000,000 word data base (576,000,000 bits of data) can be moved in $576,000,000 /((38,000,000 \mathrm{x}$ $4 / 2) / 2)=15.16$ seconds, if no other memory conflicts or trunk conflicts exist. In reality the time required will be somewhat greater than this, as other demands for Intermediate Memory and Backing Store take priority over the I/O transfer. Sufficient bandwidth is provided to make this factor negligible. Intermediate Memory bandwidth is 21.3 gigabits per second of which about one-fourth (or 5.3 gigabits per second) is available to $I / O$ transfer.
2) Once the data is completely stored in the Backing Store (LEVEL 3 memory) the job is ready for staging into. Intermediate Memory for execution. As stated previously, this staging process requires about 288 milliseconds for the entire data base.
3) During job execution the flow model program moves slabs of data from Intermediate Memory to Main Memory using the Map Unit. When processing is completed these slabs are moved back to Intermediate Memory. At the conclusion of the job, the resulting data base is swapped back to the Backing Store (or rolled out of Intermediate Memory). While the job is in execution, the program may call for the dumping of all or portions of the flow variables for output to the user. These I/O operations are normally staged to the Backing Store and then scheduled for transfer to RMS in block transfers at the convenience of the hardware system.
4) The final solution variables and intermediate ones dumped during execution are transferred back to RMS for further editing and evaluation by the support processing system.

Note that the Backing Store is not necessary for execution of the nominal job streams, but in the event that it is absent, the FMP may not be fully utilized. For example, some codes may require a high utilization of Intermediate Memory for data, leaving a small portion of memory available for staging operations. If one-third of the Intermediate Memory is reserved as a staging buffer for the next job, a minimum of 2 times 15.16 seconds will then be required to accomplish the transfer out of Intermediate Memory of a previous job's data and transfer into Intermediate Memory of the next job's data. This is a theoretical minimum of 30.32 seconds. It is possible that some jobs executing in the FMP will finish before the nominal data transfer can be completed and thus the FMP will become idle.

For larger research problems where the data base cannot be completely held within the Intermediate Memory, a segment, or
perhaps all, of the data would be held in the Backing Store, and transferred in "slices", "slabs", or "pencils" to the Intermediate Memory and thence dismembered by appropriate map operations. For such cases a Backing Store memory is required.

### 2.1.2 FUNCTIONAL PARALLELISM

One of the significant outcomes of the initial studies for the NASF was the unanimous conclusion that the computational power of the FMP would have to come from parallel processing techniques as well as from aggressive use of state-of-the-art technology. A major difficulty in the effective employment of parallel processes lies in the means of hardware and software control of such assemblies.

While good progress has been made in programming and management of parallel jobs and, in many cases, parallel tasks in large systems, the focusing of a multitude of identical processors on a single task has not yet reached practical utility in the field of general problem solving, particularly of the type seen in the NASA flow models. One current solution to the programming and control problem has been the evolution of "vector processing" which can perform arithmetic on a data stream (vector) at rates dependent on how many vector operands can be processed in a given clock cycle. This rate is a function of the bandwidth of the "vector unit" and the data ports that feed it. It is a reasonably simple design problem to provide a range of vector unit bandwidths, either by using extremely fast technology in the vector hardware, pipelining the data through the units, or by providing several identical arithmetic elements in a pipeline all of which operate in "lock-step". The designer may also choose any combination or all of the above options to derive arithmetic performance. The key feature of this approach is that the programmer thinks and codes in terms of vectors without attempting to provide separate control to the system for each one of the vector processing elements. If this programming approach is carefully controlled, the user can be made ignorant of the actual number of parallel arithmetic elements actually enagaged at any one moment. The hardware control becomes one of identical operation of such parallel arithmetic units, each on a different portion of the incoming data stream.

An examination of the flow codes shows that arithmetic processing is only a fragment of the total functional processing that must be done, since the data must be organized into vectors where necessary, or must undergo some form of scalar calculation in some cases. As is common in existing computer systems, direct methods exist for the parallel execution of arithmetic functions while data is being transferred concurrently to and from I/O channels; so too, the FMP can perform a variety of data movement activities, all simultaneous with vector arithmetic processing.

This concept of concurrent, asynchronous execution of different functions upon the data in memory is the major programming and hardware control principle of Control Data's proposed Flow Model Processor. The functions which can operate in parallel in this mode are:

1. Input/Output
2. Backing Store swaps
3. Intermediate Memory map operations
4. Main Memory map operations
5. Scalar processing, including management of the instruction stream
6. Vector processing

Input/output execution is similar to that in most other modern day processors. Control of the actual data transfers is handled by the Programmable Device Controllers (PDC) that attach the I/O channels to the network trunk. Requests to the PDC are found in the form of software "messages" which are stored in Intermediate Memory by the Operating System.

Backing Store swaps are handled by the Swap Unit, which behaves much the same as the PDC does in I/O transfers. Requests for swap operations are stored in Intermediate Memory by the Operating System and processed by "firmware" in the Swap Unit.

The previous functions are directed and controlled by software conventions established by developers of the Operating System and firmware. The remaining four parallel functions are hardware implemented processes that are initiated from the single FMP instruction stream. Once initiated, each separate function proceeds somewhat asynchronously until it is complete. Should one function depend on the data being processed by another function, the compiler can establish this relationship and ensure that hardware interlocks will prevent one operation from starting until its predecessor is complete, through the setting of dependency "keys". The coordination. of Vector and Map Unit functions is handled by the Streaming Control Unit which resolves dependency conflicts, organizes and distributes the setup data, and initiates the appropriate streaming operation.

Streaming operations involve the processing of sequentially stored (or vector). data. The optimal performance. of the memory system of the FMP is achieved when all memory accesses are coordinated and a group of elements can be acquired at each access. The FMP memory design goal is to guarantee that groups of elements can be accessed and transferred to functional units on a regular and continuous basis thus providing an unbroken stream of operands to all attached parallel processors. The Streaming Control Unit (see figure 2) delivers the appropriate setup data to the Map and Vector Units and then initiates the unit's activity. The scheduling of memory requests and the buffering of data are then handled by the specific functional unit (Intermediate Map Unit, Main Map Unit, Vector Streaming Unit).


Nores:
$\triangle$ UndCTERMINED NUMBEA of eIrs.

Figure 2. 'Streaming Control Unit

The Map Units can operate independently with their own memory (Intermediate or Main Memory) and thus proceed concurrently, or they can be linked to perform transfer (map) operations between the two levels of memory. In all instances, the Scalar and Vector Units operate concurrently with any or all other functional units.

### 2.1.3 THE MAP UNITS

Block diagrams. shown in figures 3 and 4 display the organization of the two Map Units. The specification and function of these systems are detailed in Volume II. The major function of the Map Units is to organize data from the original mesh structure into optimal length vectors for processing by the Vector Unit and then to reorganize the results into other mesh structures.

- The various transformations that provide the functions to be called mapping operations are described below.


Figure 3. Main Map Unit


nores:

2 SH"
$\triangle$ UNDETEPMINED NUMASE OF aIrs

Figure 4. Intermediate Map Unit

1. COMPRESS-A linearly stored vector of input elements is input to the Map Unit, along with a binary string of bits, one bit for each 64/32-bit data element. Each bit of the bit string is examined in order and if it is a one, the corresponding data element from the input vector is transmitted to the result vector. If the bit is a zero the corresponding element from the input stream is discarded (not transmitted to the resuldt vector). This is illustrated in figure 5.


COMPRESS SOURCE WORD VECTOR A
BY BIT VECTOR B GIVING RESULT WORD VECTOR R

COMPRESS ON " 0 "s IN B
(SELECT ON " 1 "s IN B)

Figure 5. Example of Compress
2. MASK--This operation inputs two vectors of data elements and a single bit stream. As shown in figure 6 , the input data streams could be labeled. A and B. The bit stream is examined one bit at a time. If the examined bit is a one the corresponding element from the $A$ data stream is transmitted to the result vector, and the corresponding element from the B stream. discarded. If the bit is a zero the corresponding element from the $\cdot$ B stream is transmitted to the result vector, and the corresponding element of the A stream discarded.


MASK WORD VECTOR A
AND WORD VECTOR B UNDER CONTROL OF BIT VECTOR C to give result vector r
select a stream on " 1 "s of bit vector (SELECT B STREAM ON "0"s OF BIT VECTOR)

Figure 6. Example of Mask
3. MERGE--This operation merges elements of two input data streams ( $A$ and B) according to a binary string of bits. (see figure 7) If the examined bit of the string is a one the next available element from the A stream is transmitted to the result vector; if the bit is a zero the next available element of the $B$ stream is transmitted to the result vector. No element is. discarded from either stream. The effect is to combine all elements of the input streams into a single vector.


Figure 7. Example of Merge

A variant of this instruction (shown in figure 8) discards $B$ stream elements if the bit is a one, but does not discard A stream elements under any circumstances. The effect of the operation is to simultaneously decompress the A stream and insert the decompressed elements into the corresponding positions of the $B$ stream.
source vector

RESULTVECTOR
BIT VECTOR

| 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 3 | 1 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |

3

DECOMPRESS IS A COMBINED MERGE AND MASK FUNCTION

DECOMPRESS SOURCE WORD VECTOR A AND SOURCE WORD VECTOR B UNDER CONTROL OF BIT VECTOR C TO GIVE RESULT WORD VECTOR R

SELECT A STREAM ON " " "s IN C SELECT B STREAM ON " 0 "s s IN C

A STREAM IS MERGED
B STREAM IS MASKED

Figure 8. Example of Decompress

These three operations facilitate the selection of data from a matrix according to either preset criteria (a prestored bit vector) or data/execution dependent criteria (there are several instrucions which generate the bit strings, called "control vectors" based on data comparisons by the Vector Unit). The compress, mask, and merge operations are not required'for -optimum performance of the 3-D implicit code, but are useful for the explicit codes and essential to weather and structures codes.

The key instruction in solving the implicit code on the FMP aside from the arithmetic operations are:
i
4.
to collect non-contiguous data elements.into sequentially stored vectors which can then be processed efficiently by the Vector Units. The hardware can collect, non-sequential records: A record is a group of sequentially. stored elements which are accessed as a single entity by the Gather operation. Non-sequential single elements can aliso be collected by treating them as single-element records. Figure 9 illustrates the single-record Gather.

The elements (or records) to be moved are selected either by means of a list of indexes, each of which points to a record in memory, or by means of a "stride", which determines the number of elements to be skipped before another record is selected. In figure 9 a fixed stride of 10 was utilized to cause the movement of elements $00,10,20,30,40,50,60,70,80$, and 90 to the sequential vector $X$. The vector $Y$ in figure 9 was formed by using a list of indexes which pointed to elements $340,630,570,493,294,596,699,798,897$, and 697 of the original mesh. Note that the selected elements need not. appear in any particular order and thus can be essentially random. In place of a single element the indexed list could point to records of data. For example the record length, RL, could be specified as 10 elements. The first record would begin. at element. 340 and would proceed sequentially through element 349. The next record would begin at the element pointed to by the next index, 630, and continue through element 639. Of course, in this case, result vector $Y$ would be 10 times as long as in the , single-element example.


Figure 9. Examples of Gather with Fixed Stride $=10$ and with Index List

In figure 10 the result vector is composed of records of length RL=10 taken from the mesh at stride intervals ( $\mathrm{ST}=30$ ) of 30 elements. The Map Unit then takes the first record beginning at element 00 , moving ten elements for that record, then it applies the stride of 30 to the first element address and begins the next record at element 30.


Figure 10. Gather Record with Fixed Stride $=30$

The FMP provides for gather operations using strides in two directions (ST=n,m). Figure 11 shows the effect of using the two strides. First the hardware begins at element 00 and moves a record of length 1 (element). A stride of 10 is then applied (first stride) and the next record moved from element 10. Ten such records ( $N R=10$ ) are moved, then the hardware restarts at the first element, applying the stride of 100 elements (second stride) and begins the process all over again until the specified vector length ( $V L=20$ ) is filled. Strides may consist of any positive or negative integer value. It can be seen that with this instruction, two-dimensional and three-dimensional arrays can be transposed with a single gather instruction.


Figure 11. Gather Record with Two Strides
5. SCATTER-The scatter function operates as the inverse of the Gather in all of its options. Instead of collecting discontiguous elements however, this function distributes records (or elements) into memory according to the fixed stride or strides ( $\mathrm{ST}=\mathrm{n}, \mathrm{m}$ ) as specified in the instruction, or according to the list of indexes (for random storage of data). The scatter operation is illustrated in figure 12.


SCATTER WORD VECTOR A
TO WORD VECTOR R
USING INDEX VECTOR I

## USE VECTOR A IN RECORDS OF TWO WORDS

EACH WORD IN I IS ADDED TO THE R BASE ADDRESS TO COMPUTE A STORE ADDRESS IN R FOR THE FIRST WORD OF EACH RECORD

Figure 12. Example of Scatter

These five operations can be performed one at a time independently and concurrently in each of the Map Units, one operating with the Main Memory (LEVEL 1) and one operating with the Intermediate Memory (LEvEL 2). The two Map Units can be combined to perform Gather or Compress from LEVEL 2 to LEVEL 1 or Scatter from LEVEL 1 to LEVEL 2.

### 2.1.4 THE SCALAR UNIT

The control of the entire FMP ensemble is accomplished by centralized decoding of a single instruction stream in the Scalar Unit and distribution of the controls for. Vector, Map, and I/O operations to other separate units. The Scalar Unit is described in detail in the functional specification found in V.olume II. Since this element is the executing heart of the FMP, it has been decided that this unit should consist of the most mature design and technology available. The CYBER 203 Scalar Unit was chosen since it meets the requirements for performance and architecture ("distributed operation"), and possesses extensive diagnostic programs for a large machine structure.

### 2.1.5 THE VECTOR UNITS

The major arithmetic processing by the FMP is performed by the Vector Unit assembly (Vector Ensemble), which consists of four identical, separately controlled units (pipelines), plus one additional unit which acts as an on-line spare. One Vector Unit is diagrammed in block schematic form and discussed in detail in the functional specification (see Volume II). The key features of the Vector Units are:

1) "Double Clocking"--The employment of pipelining of arithmetic operations makes it possible to use a faster clock cycle for these units than is required by the Scalar and Map Units. By using a clock period half that of the other FMP elements, the throughput of each unit can be doubled and the number of such units otherwise required reduced by half. This permits the designers to reduce the hardware components substantially over a normal "full clock" design. The reduction is not fully one-half, but rather closer to $35 \%$ since additional "latches" must be included in the design to hold operands between logic stages because of the speed of the faster clock.
2) "Variable Redundancy"--The reliability of the FMP is crucial to its success as a major facility. Since the Vector Units constitute a major portion of the non-memory hardware, the highest failure probabilities are then found in these units. Various techniques have been suggested (and discussed in more detail in refs. 1 and 2) for ensuring the validity of results

> from the Vector Units. Two most prominent candidates were data parity ( 1 bit for every 8 bits of data) and modulus arithmetic. While these techniques provide proven validation of most of the data paths in an arithmetic unit they are not effective for the high speed multiply networks desired for the FMP, and they provide no surveillance of the variety of control networks engaged in the Vector Units.

A very attractive option is to provide total redundancy of Vector Units, each with its own control and data paths. Then a pair of units could check each other. This approach is fraught with two difficulties. First, the amount of additional hardware more than doubles the volume of parts (and hence interconnections) needed for the Vector Units, and thus increases the likelihood of component failure to unacceptable levels. Second, the impact of the additional hardware is seen in much higher machine cost, and additional chassis volume which can affect vector startup timing.

An engineering compromise is possible for this dilemma. The system could provide the completely redundant hardware which could be used for validation of answers, or for improving performance. The extent to which the redundancy is controlled would be a function of the programmer or compiler's intent or capability. The block diagram of the Vector Unit shown in figure 13 displays the use of redundant hardware as there are duplicate frontend adders, duplicate multiply units, duplicate complement networks, and duplicate backend adders in each Vector Unit. By providing four Vector Units of this type, the memory bandwidth can be matched and a minimum operation rate for 64-bit add, subtract and multiply can reach a useful limit of 500 megaflops with fully redundant checking of all arithmetic except for Divide. Since the Divide operation uses the same hardware (except for the divide table) as the Add/Sub operation, the redundant checking of the Add and Subtract is relied upon to verify the probable reliability of the Divide operation.


Figure 13. One Vector Unit

In figure 13 four checking units (CHECK A,B,C and D) are provided to compare the results from the various functional elements in the Vector Unit. Any combination of check units can be enabled for an operation, and if an error is detected by an enabled checker, a flag is sent to the Maintenance Control Unit (MCU) and the Vector Units are halted within six clock cycles of the failure detection. In the simplest case, where 500 megaflops is an adequate processing rate, data is fed from memory via buses SR1 and SR2 and selected through the corresponding trunks in the Vector Unit. This means that data on SR1 would be selected through TRUNK $A$ and TRUNK $C$ and data on SR2 would be selected on TRUNKS $B$ and $D$. The identical functions and path selections would then be enabled for both halves of the unit. The vector operation

$$
C=A+B
$$

with A coming from SR1, B coming from SR2, and C going to memory on AW1, would follow the path through FRONTEND ADDER 1 and FRONTEND ADDER 2 with the corresponding results being compared in CHECK A. The unnormalized add results would then be passed through the multiply network and the corresponding results of this "pass" operation compared in CHECK B. The final postnormalization is performed in the backend adders and those results compared in CHECK C. The data then passes through the SECDED generator which appends 14 bits of error correction code (per 64 bits of data) and is transmitted to memory. Note that the complement networks are not engaged in this particular operation, however the data from trunks $B$ and $D$ (which should be identical) is automatically gated through these complement networks by the Vector Unit so that the results may be checked in CHECK D.

The above example demonstrates the use of total redundancy in the vector Unit. To ensure that as_much hardware as possible is being checked accurately, each element of the Vector unit possesses its own control circuitry. The operation codes sent from the Scalar Unit carry a one-bit parity to each unit, which verifies its own operation code validity. All clocking, fan-out, fan-in, and microcode sequencing is then done entirely within that unit. This means that not only are the data paths verified, but the control sequencing and control fan-out are verified for each unit. This is a key part of the FMP Vector Unit design.

In many instances however, the 500 megaflop rate is not satisfactory, particularly since the FMP objective is a sustained rate of at least 1000 megaflops. To achieve higher processing rates some of the additional hardware in each Vector Unit must be brought into play. Take as an example

$$
R=(A+B) * C
$$

with $A$ coming from SR1, $B$ coming from SR2, $C$ coming from SR3, and $R$ returning to memory on AW 1.

The A and B operands are processed by FRONTEND ADDER 1 while the C operand passes through FRONTEND ADDER 2. Since the results coming from the two frontend adders cannot be identical in this case, CHECK $A$ is turned off. The sum $A+B$ is then fed to MULTIPLY 1 via MUX $1 B$ and the multiplier $C$ is fed through MUX. 1A. At the same time the frontend adder output is fed to MULTIPLY 2 via MUX 2A, and the multiplier is transmitted through MUX 2B. Thus the multiplier results can also be checked in CHECK B. By selecting SR2 through TRUNK D as well as TRUNK B, identical data will be fed through the complement network (although it has no use in this particular operation) and thus the results can be compared in CHECK D. The results of the multiply operation are post-normalized in BACKEND ADDER 1 and also BACKEND ADDER 2 so that they may be compared in CHECK C. The resulting normalized results are then sent to memory after SECDED has been generated.

In this example, three pairs of the four sets of arithmetic elements are checked for validity at each clock cycle. Depending on the operation desired and the processing rate required, the amount of checking can be varied from $100 \%$ to no worse than $25 \%$ of the actual hardware in the unit. Since an operation such as

$$
R=(A * B)+(C * D)
$$

will have different results emerging from the corresponding multiply elements, frontend adders, and complement networks, it might be desirable in some critical cases to force the object code to break up the operation into three parts, at a consequent loss in performance, in order to assure the validity of the answers:

$$
\begin{aligned}
& T 1=A * B \\
& T 2=C * D \\
& R=T 1+T 2
\end{aligned}
$$

This technique can be quite costly in storage space allocation however, and given a large mix of dyadic and triadic operations in the Vector Units, it can be expected that on a probabilistic basis the confidence in the results should be close to $100 \%$, if the variable redundancy technique is used. It is obvious that the FMP compiler should provide a compile time option which restricts the generation of object code to simple monadic operations where a programmer wishes to achieve $100 \%$ checking of results.

The additional spare Vector Unit is physically connected to the data trunks at all times. The trunk network can be electronically switched by the Maintenance Control Unit (MCU) to extricate any failing Vector Unit, and reconfigure the system so that four non-failing units (including the spare) are placed back on-line. Since the spare unit is always connected to the trunk, it can be made to behave in identical fashion (function and data) with one of the other units, except for returning results to memory.

This unit is then checked in a continuous manner as are the other operating units. A comparator is also provided which compares the outputs of the spare unit and the unit which it is tracking; the MCU is notified of any non-compare. In addition, the spare unit can be logically isolated from the other units and then driven by the MCU (at a greatly reduced rate), returning results to the MCU, such that it can be at least partially diagnosed while the other units are performing useful work.

The Vector Units are capable of operating in 64-bit, 32-bit or mixed mode. When operating in 32 -bit mode all arithmetic functions except divide can produce two floating-point results per cycle. Thus, a single Vector Unit can perform

$$
A *(B+C * D)
$$

in 32-bit mode and effectively yield 3 operations * 2 operands $=$ 6 results per cycle on one of two output ports. (The other output port can provide a partial result of that which appears at the first port, dependent on what the operations are; this is not considered in this example.) Using existing, high performance LSI technology, this cycle for the Vector Units can realistically be set at 8 nanoseconds. Four Vector Units would then produce 24 results every 8 nanoseconds, or 3 results every nanosecond, for a peak operations rate of 3000 megaflops.

### 2.2 THE FMP OPERATION

To illustrate the operation of the FMP a sequence of code is extracted from the three-dimensional implicit solution (appendix B) :

$$
R J=Q(2: K M A X-1, L: L+L S M, 6, *) .
$$

940 XKL $=\mathrm{X}(*, L-1: L+L S M+1,2: J M A X-1)$
$950 \mathrm{YKL}=\mathrm{Y}(*, \mathrm{~L}-1: \mathrm{L}+\mathrm{LSM}+1,2: \mathrm{JMAX}-1)$
$2 \mathrm{KL}=\mathrm{Z}\left({ }^{*}, \mathrm{~L}-1: \mathrm{L}+\mathrm{LSM}+1\right.$, 2: JMAX -1$)$
XK $=(\operatorname{XKL}(3: \operatorname{KMAX}, 2: L S L+1, *)-X K L(1: K M A X-2,2: L S L+1, *)) * D Y 2$
YK=(YKL (3:KMAX, 2:LSL+1,*)-YKL(1:KMAX-2,2:LSL+1,*))*DY2
ZK=(ZKL (3:KMAX, 2:LSL+1,*)-ZKLL(1:KMAX-2, 2:LSL+1,*))*DY2
1000 XL=(XKL (*,3:LSL, *) -XKL (*, 1:LSL-2,*) ) *DZ2
$1010 \mathrm{YL}=(\mathrm{YKL}(*, 3: \operatorname{LSL}, *)-\mathrm{YKL}(*, 1: \operatorname{LSL}-2, *)) * D 22$
1020 ZL=(ZKL (*,3:LSL, *)-ZKL(*, 1:LSL-2,*))*DZ2
$1030 \mathrm{XX}(1)=(\mathrm{YK} * \mathrm{ZL}-\mathrm{ZK} * \mathrm{YL}) * \mathrm{RJ}(*, *, 2: \mathrm{JMAX}-1)$

Found in lines 930 through 960 is a sequence of map operations which move data from Intermediate Memory (LEVEL 2) to Main Memory (LEVEL 1) as gather record operations. The stream of instructions is delivered to the Scalar Unit which first performs whatever scalar setup of addresses and lengths is necessary for the first map function (line 930). In this case it consists of computing the length of the records to be. transmited since the array $Q$ is dynamic and its dimensions must be computed at object time. The base addresses for the map operations must also be computed at the same time. The map operations is then sent to the Streaming Control Unit (SCU), which is assumed to be idle at the moment.

Since the map operation transmits data between the LEVEL 1 and LEVEL 2 memories, the Streaming Control Unit engages both Map Units, set up the data paths, and transmits the appropriate internal functions to both units. While this process is underway the Scalar Unit continues executing scalar instructions which perform the interpretation of statement 940 , computing record lengths and base addresses and transmitting the map instruction to the Streaming Control Unit. Since the Map Units are now busy moving data for the fist operation, the incoming map instruction is queued in the Streaming Control Unit. The Scalar Unit continues instruction decode after this second map operation is accepted by the SCU. A third map operation is then set up and transmitted to the Streaming Control Unit. A fourth map operation is set up, transmitted, and queued by the SCU. The Scalar Unit continues processing the instruction stream, which probably contains other scalar setup operations, until the vector arithmetic called out by statement 970 is encountered. This vector instruction is set up and sent to the Streaming Control Unit. However, this instruction contains a dependency key of "01" which prohibits execution until the corresponding key becomes not busy. Since this key was assigned as a write key to the second map instruction, the vector instruction will wait in the SCU until this key becomes not busy, signifying that the corresponding data (Vector XKL) has been completely mapped into Main Memory.

The Scalar Unit proceeds to execute more instructions until the next vector operation (statement 980) is encountered. Since the first vector operation is held up and not yet in the streaming queue, the Scalar Unit cannot issue any more stream instructions to the SCU. The Scalar Unit then pauses until the previosly issued vector instruction becomes free of its dependency key conflict. Upon completion of the map operation for Vector XKL, the next map operation is immediately commenced (for Vector YKL). The vector operation is then initiated on the XKL data, the Scalar Unit then can issue the vector instruction for statement 980. Since the map operations proceed at a much slower rate than the vector operations, the vector operation at 970 will be done before the map operation which moves YKL data to Main Memory is complete. The next vector operation is then held up for its data (Vector YKL) by the same dependency key mechanism (although with a different key) as
described previously. In this instance again, the Scalar Unit will pause until the dependency key becomes not busy. This same process continues for the Vector ZKL.

The sequence of pauses illustrated herein obviously affect the performance potential of the FMP. For that reason the actual code generated for this example includes prefetching the vectors XKL, YKL, and ZKL with the Map Units long before they are needed by the vector arithmetic. In actual fact, the fetching of the next set of data for these arrays is carried on concurrently with the vector arithmetic on the current XKL, YKL, and ZKL data.

Once the vector operation at statement at 990 is initiated by the Vector Unit, because its dependency key has become non-busy, the remaining vector operations are set up and queued in the Streaming Control Unit without conflict since no dependency key is needed (the data is already known to be available).

Statements 970 through 1020 illustrate the use of two functional elements in each pipeline (a subtract operation followed by a multiply), which yields a floating-point rate of 1000 megaflops peak value. Statement 1030 performs two multiplies and one subtract in one instruction and thus yields 1500 megaflops peak rate computation in $64-$ bit mode. The remaining multiply by RJ is combined into a later vector arithmetic operation.

### 2.3 RATIONALE SUMMARY

The FMP hardware described in this report represents an example of a processor which, in the best judgment of RADL personnel, can be built and utilized in the period beginning in 1983-1984, and which will meet the performance and reliability goals (1 gigaflop, $95 \%$ availability) established as minimum by NASA-Ames. The final form represented here is necessarily the result of many complex tradeoffs involving schedule, timing, technology, manufacturing considerations, and the crucial reliability, availability, and maintainability (RAM) factors that must be taken into account for such a large assemblage of equipment. Some of the architectural decisions are obviously linked to Control Data's experience with the STAR-100 and the CYBER 200 family of Vector processors. A further, practical linkage involves the use of systems and diagnostic software for the FMP that can be derived at minimal risk from the CYBER 200 systems. The thought process that has been involved in these design decisions has extended through prior study periods into this present study with the appropriate rationale documented in previous reports (refs. 1, 2). For the most part these rationale remain unchanged from the previous studies but, since they are so critical to the outcome of the project, it is desirable to restate them here in order of priority to FMP success.

### 2.3.1 RELIABILITY, AVAILABILITY, and MAINTAINABILITY

Although the FMP was founded on the premise of producing the fastest computer in existence in the 1980's in exchange for some "tailoring" of the hardware to match specific algorithms, the major consideration throughout the FMP design project has been that of reliability. The corollary issues of availability and maintainability were also included in this area of priority concerns. In ãnother section of this report these topics are defined and discussed in more detail. Here the effect of these considerations on the FMP structure will be covered.

The overriding concern of designers for a system as large as the FMP is the parts count and the effect of this count on RAM. The hierarchial memory was arrived at (as previously discussed) by evaluating the performance and capacity requirements arising from the characteristics of the flow models. The high performance Main Memory was specified on the basis of the most powerful memory system currently available, the CYBER 200 two million word, 128 billion bits per second, central memory. This memory system utilizing the recently available 4096-bit RAM chip, can hold 8 million words and achieve the same high performance levels as the 2 million word system. This memory requires over 175,000 parts. If a larger memory, say of 32 million words, were desired then over 700,000 parts would be required. Given a nominal failure rate goal of $.01 \%$ of all parts per 1000 hours that would mean 70 failures per 1000 hours or a failure evey 14 hours. Even with single-error correction, double-error detection (SECDED) networks protecting the memory, the FMP would encounter an unacceptable level of interruptions as the probability of a double-bit error occurring becomes quite high.

At the expense of lower performance, another level of memory could be built of $65 \mathrm{~K}-\mathrm{bit}$ chips to produce a memory system of 32 million words with only 38,000 parts (approximately).

Finally, if 128-256 million words are necessary for the solution of some research problems, the 65 K part again becomes inadequate to the task as so many parts are needed in the memory that the failure rate reached intolerable levels, despite the employment of SECDED. Another storage hierarchy is thus indicated using higher capacity, lower performance memory components. If the predicted 256 K -bit CCD memory part becomes available, the memory part count becomes approximately 36,000 , with a concomittantly acceptable failure rate. The realistic production of 1 million bit bubble chips (which would reduce parts count for this level of memory to around 9000) appears to be possible in the 1982 timeframe. The design goals for LEVEL. 3 bandwidth currently preclude the use of bubbles however, because of the relatively slow serial transfer rate of bubble technology.

The table below gives approximate chip counts for the major elements of the FMP. Chip types are as follows:

- CPU - LSI and Other (microcode memory/high speed buffers)
- Main. Memory - 4 K x 1-bit bipolar RAM
- Intermediate Memory - 128K x 1-bit MOS RAM
- Backing Store - 256K x 1-bit CCD

The memory chip counts include approximately $10 \%$ which are for control and interface; the balance is the indicated type of memory chip.

| Element | Chip | Count |
| :---: | :---: | :---: |
| CPU | 11K | LSI* |
|  | 13K | Other* |
| Main Memory | 185K |  |
| Intermediate 20 K |  |  |
| Memory | 20K |  |
| Backing Store | 40K |  |

* LSI is the CDC LSI-168 gate array; other chips are for microcode memory and high speed buffers.

Overall reliability is determined largely by parts count and that effect influences design decisions about the memory. What are the other effects of parts count? First, the FMP logic should be examined to see how reliability can be affected by parts count in the remainder of the CPU. Control Data chose for the FMP the most technologically aggressive circuit family, in terms of density and speed, that can be expected to be available in the period identified for construction of the NASF. This family consists of Large Scale Integration (LSI) circuits of speed on the order of 700-900 picoseconds for typical logic elements. With this component the Scalar, Vector, Map, Memory Interface, and Streaming Control Units can be built using about 18,600 parts. This design would involye the building of nine Vector Units, each operating at a 16 -nanosecond clock cycle. Although 18,600 parts are a relatively small number compared to the Main, Intermediate, and Backing Store Memories, it must be remembered that most of the memory parts are protected by SECDED, while a good deal of the CPU logic is not. By double-clocking the Vector Units (described previously) it is
possible to construct the FMP with only five Vector Units (four operating units and one spare) with a consequent reduction in LSI parts of approximately 5400 or about $25 \%$.

A second effect of parts count is the need for interconnection of the parts involving solder and pressure connections, bonds, and metalized paths. The impact of interconnections is secondary to part failure rate in the reliability calculations, but still significant for a large ensemble such as the FMP (refer to Division 6).

A third effect of parts count is subtly linked to the availability and maintainability of a hardware network. Once a failure occurs, what is the probability that it can be corrected automatically, thus requiring no emergency maintenance activity? Further, if a maintenance action is required, what is the probability that the failing part can be isolated within a minimal time objective "TR" (Time to Repair)? These probabilities directly influence the probable RAM (reliability, availability, and maintainability) objectives for the FMP.

By basing the FMP on a large, homogeneous memory system, the designers have attempted to make maximum use of memory characteristics to affect the RAM. Memory, delightfully, consists of well ordered parts with limited interconnection. The use of SECDED on memory data not only provides a first level of defense against memory failures (by automatically correcting single errors), but provides information (via the check bits) which can be analyzed by the Maintenance Control Unit (MCU) to assist Service Engineers in the isolation of the failing components in minimum time. While there are failure modes in the memories that cannot be detected or corrected by SECDED (such as power bus, address line, and read/write strobe failures) over $98 \%$ of the memory parts are covered by SECDED.

SECDED should be carried throughout all data paths wherever possible to provide automatic correction, as well as detection, to the maximum extent. In the FMP, SECDED has been carried into the Vector Units up to the point where the data enters the unit and the check bits can no longer be retained (the data will be altered by arithmetic operations). SECDED is regenerated for results emerging from the Vector Units being transmitted back to memory. Double failures in these data paths (in the Map and Vector Units) will yield check bits that can aid the engineer in fault isolation in minimum time.

### 2.3.1.2 TRANSIENT ERRORS

A word must be said about the most infuriating culprit in large systems, transient errors in data and control. A totally failing component will generally make itself known rather quickly, either through the mechanism of the SECDED error detection system, on-line diagnostics, or abysmal failure of a "stable" production code. In large complexes of hardware. such as the FMP, the possibility for transients occurring due to induced noise, bus fluctuations, marginal part operation, vibration, and perhaps such magical influences as gamma rays must be considered despite the best engineering efforts at shielding, margin testing and power system overdesign. Since a good portion of the system will be protected by SECDED, the effects of transients in the system will be invisible, except for a random, uncorrelated error report made by the SECDED networks to the MCU. There are times however, when a "hard" error in a data path, due to a component failure, will be correctable by SECDED except when a coincident transient appears. The probability of this situation occurring is dependent on the probability that at any one time there will be failing, but correctable, component errors in the system. This situation further depends on real failure rates and maintenance strategies.

Remembering that a double SECDED error will cause a system interruption, one must evaluate the frequency of maintenance replacement of components being compensated for by SECDED and the possible existence of transient errors occurring which could cause double errors to appear. A beginning maintenance strategy is therefore projected which minimizes the number of failed components being left in the machine in a given 24 hour period. As experiential data is accumulated, it may well be possible that the probabilities of transient errors, or coincident double component failures, in a given network may permit a more liberal maintenance policy, with consequent cost savings. At present, with the rate of transient. failures impossible to determine until the hardware system is built, it seems necessary to specify the most conservative maintenance strategy with its attendant high costs (see maintenance study report and maintenance action assumptions in Divisions 6 and 7).

Another, and potentially more dangerous, consequence is the possibility of undetected transient errors occurring which affect critical result data. FMP users must be able to depend on their solutions without having to run a problem three times to set a majority vote on the most probable correct answer. As stated previously, a solid component failure should make itself evident during on-line diagnosis of the FMP, which is performed periodically during job execution. A solid, uncorrected, component failure might occur during a particular solution execution causing some or all results to be invalid. Generally, before these results can be propogated to other jobs or users, the on-line maintenance diagnostics would have found the error and warned the installation that jobs run since the last diagnostic pass are probably specious. While it is acceded that a totally redundant system would make the user instantly aware of the possibly invalid results, the cost and parts count for such a system make it prohibitive to build.

A transient error causes more havoc, however, since it may occur at a time when diagnostics are not being run, or at a place that cannot be checked by SECDED or by the Vector Unit's variably redundant comparators. Results from such runs would contain invalid data with no warning as to the fact that an error occurred which negates the particular run.

There are then two kinds of undetected errors that can yield bad results. One which can be diagnosed at a later time, and hence cause the user to invalidate that set of answers, requires some degree of systems management and human action in evaluating the diagnostic messages to determine what, if any, results should be discarded. The second kind of undetected error will not be known to the user, but it will be based on the very small probability that a transient error occurs under undetectable circumstances. With over $95 \%$ of the hardware networks in the FMP being protected by SECDED and by the checking of the Vector Units, and with on-line diagnostics being employed on a regular basis to ensure a certain level of confidence is maintained in the machine, the probability of producing undetected errors in results is necessarily unknown but theoretically should be extremely low.

The effect that these types of failures (undetected by SECDED or vector checkers) have is to require a portion of the FMP power to be expended on a continuous basis throughout the operating day to establish a minimum confidence level. The interval and extent of diagnostic execution is determined by the maximum allowable period of time before results must be flagged invalid, and the time required for diagnostic operation to achieve a threshold of confidence. Thus, diagnostics become an additional "job load" on the FMP that must be taken into account when evaluating the total system throughput. (See discussion in System Simulation section later in this volume.)

### 2.3.2 BUILDABILITY

The second most important consideration in creating the FMP is ensuring that the machine can be built at all in the timeframe required. A major factor, of course, is the parts and interconnect count described previosly. Obviously, it would be possible to conceive a machine that would entail the assembly of so many components that the sheer volume of soldering, bolting, and hookup exceeds the limit of errorless construction. The resulting chaos involved in removing fabrication errors (differentiated from component failures) might prove to overwhelm the manufacturing operation.

A second and equally important factor in determining buildability is the choice of component technology, not only electronic but mechanical, power, and cooling. The choice of circuit components naturally determines the rquirements for cooling and power, while influencing packaging decisions meant to maximize the density of circuitry for performance and space reasons.

The original feasibility of the Control Data FMP was based on a postulated family of LSI circuits with densities of 500 gates and speeds under 500 picoseconds per gate. The painful process of creating a high technology, such as high speed LSI, and carrying it through volume production, made it clear by the second phase of this study that a 1982-1983 built FMP would have to be constructed out of extant technologies. Thus effort was applied to increase the parallelism of the architecture to achieve the gigaflop goal with existing circuit and packaging families. The culmination of this decision is found in the design described in this report. Certainly, if a family of logic could be found that was twice as fast, then the number of parallel units could be halved with the very desirable reduction in parts. Certainly, if a family-of-memory components could be found with densities of 2 to 4 times that of the existing technologies the parts counts, failure rates, and probably cost of the memory systems could be reduced accordingly.

It is the inevitable hope of designers and consumers of the FMP that the system could benefit from the most up-to-date, aggressive technologies available. To that end, each of the studies has examined technology futures with an eye to employing new developments in the FMP. Unfortunately, the choice of technologies cannot be delayed until after all design is complete and the machine is about to be constructed. The technological choices are integral to the initial architectural approach, the development of tools to support design and construction, design techniques, and the detailed design itself. Therefore, the FMP described in this report is heavily predicated on the use of the family of parts, 4 K bipolar RAM, 65 K MOS RAM, 256 K CCD memory, and Fairchild F200K logic. With this family, a machine can be built to meet the minimum reliability and performance goals established by NASA. Any
major change (doubling speed or density, for example) in available technology would make a reassessment of the existing design from the ground up an essential task in an effort to reduce cost and parts counts.

The technolgical possibilities that appear to loom over the 1980 horizon are tantalizing to consider for the FMP if, and only if, the FMP were to be constructed, say, beginning in 1984-1985. Not only would solid progress be evident on high speed silicon parts, but the potential of the gallium arsenide technology should be proven (see technology update report, Division 5). It would then be conceivable that a machine with a two gigaflop computation rate and half the logic hardware might be built at very reasonable cost. The difficulty with proposing a delay in FMP development to await these "futures" to come to fruition is that there are too many unpredictable factors that can effect the outcome of such strategy:
a) There is no doubt that the scientific problems presently known in development of either higher density/speed LSI or gallium arsenide logic components can be overcome. The essential question is -- in what time frame can they be solved?
b) The willingness of vendors such as Motorola, Fairchild or Texas Instruments to make the resource commitments and capital investments necessary to bring new, high technology devices. into production is based only partly on technical feasibility. The projected profitability of a particular manufacturing line (based on volume and price expectations), the capital outlay requirements, and the general state of the economy are governing factors on the availability of the high performance, high risk, essentially low volume components that designers seek for FMP-type-machines.

The effect of these issues on the architecture and design is obvious. A second, less visible, effect is the development time and resources needed to create and productize the tools necessary to utilize the new technologies. The time has passed when an engineer could gather a box of transistors (or small scale integration chips), sketch out a piece of design, and breadboard the affair with his own soldering iron. The very nature of LSI means that designers commit whole ensembles to single silicon slices requiring complex steps in manufacture. Design analysis and simulation software for a particular technology must be in place and operational before that technology can be effectively employed. There is a definite lead-time then between technology selection, useful design and simulation, and final circuit components that can be as long as five to seven years. The significance of these lead-times and uncertainties is that although there may be some technological "magic" awaiting the computer community in 1984-1985, it is not possible to consider using such components for the NASA FMP without incurring grave risk to schedule and buildability. Hence, the somewhat conservative choice of

Fairchild LSI logic, for which a vast assemblage of design and manufacturing software and procedures is now available.

### 2.3.3 PERFORMANCE

Not last, or even least, in consideration is the element of performance for which the FMP project was originally created. At the outset, a minimum threshold of one billion floating- point operations per second was established for solutions of the Navier-Stokes equations. In theory, this level could be achieved by a single processor operating at a clock cycle of one nanosecond, or 1000 processors operating at a clock cycle of one microsecond. In practice, a one-nanosecond cycle time for floating-point operations on numbers with 48 -bit coefficients is not yet achievable, while the harnessing of 1000 slow processors creates massive headaches in design of interconnection and control, not to mention programming. These issues have been discussed at length in preceding reports. The major tradeoffs in memory systems capability, number, and speed of vector units and performance of circuit technology have also been covered in previous reports as well as preceding discussion in this report. In the aggregate then; the search for performance has involved:
a) determining the peak vector arithmetic rate to support a sustained solution rate of at least 1 gigaflop;
b) designing the minimum hardware conglomeration to provide the peak vector rate, and minimize vector startup;
c) isolating those functions in the implicit and explicit code which limit the FMP from maintaining the sustained rate;
d) designing a fully concurrent map unit system to perform those non-arithmetic tasks in parallel with computation;
e) designing a memory system that could supply the peak vector bandwidth plus the data rates needed by the map units;
f) testing the resulting structure with code sequences taken from the various flow metrics;
g) reworking the design to improve the performance of those limiting cases that are of significance;
h) testing the programmability of the flow codes with the resulting structure;
i) going back to step c) and trying again.

The conclusions reached for each of these items form the basic rationale for the design of the FMP as it now appears:
a) At the outset it was assumed that the arithmetic processing bandwidth of the FMP would have to be substantially greater than the minimum threshold of 1000 megaflops in order to arrive at a sustained rate of that minimum. This premise arises from the experience with existing high performance computers in actual use. The theoretical peak rate of the Control Data 7600 is limited to the issue rate of a new instruction every. 27.5 nanoseconds which yields at best 36 megaflops. The measured peak rate for memory-to-memory operations, however, is closer to 12-15 megaflops, due to the loads; stores, and interregister transfers. The nominal rate assigned to the 7600 for production codes is $3-5$ megaflops in actual use. The STAR-100 possesses a peak vector rate of 50 megaflops in 64 -bit mode. In certain large production codes, the sustained rate is shown to be 20 megaflops which is a reduction of 2.5 to one over the peak rate. Other competitive machines demonstrate a similar degradation of 2-3 times from peak performance for a sustained rate.

These observations led the FMP architects to set vector processing rate goals of 2-3 gigaflops peak rate for initial design evaluation. If later it proved possible to improve the ratio between sustained and peak rates, the 2-3 gigaflop objective might be reduced somewhat.
b) The hardware necessary to meet the $2-3$ gigaflop peak rate could have been chosen from a variety of options, but pipelining was selected for two practical reasons:

1) Parts and interconnection count analysis indicated that pipelines could be built with fewer of these critical items than a multiplicity of processing units yielding the same compute power.
2) A substantial amount of design and development had already been completed on suitable pipeline elements for the Control Data CYBER 200 family of supercomputers.

An assemblage of 32 identical arithmetic pipelines, each operating at 16 nanoseconds to achieve a 2-gigaflop rate is a perspicuously brute-force approach to the problem. The volume of hardware this would entail seriously strained the limits of buildability, which had been given higher priority than performance. The limit of pain in hardware seemed to indicate that eight pipelines, which would have the ability to perform more than one arithmetic process per clock cycle demanded the maximum allowable
componentry. The need for error checking led to the "variably" redundant structure that has been described previously, while the desire to have still fewer circuits motivated the design of "doubly" clocked units. As a result of these deliberations, the final Vector Units could deliver 64 -bit result operands to memory at the rate of one per pipeline for each of four pipelines every eight. nanoseconds.

Since this represents a result rate to memory of only 500 megaflops, how can the units be alleged to maintain rates commensurate with the 1 -gigaflop requirement? Analysis of the three-dimensional implicit code showed that a majority of arithmetic expressions involved an average of three processes:

From the BTRI subroutine (see appendix B) line 4450:

$$
\mathrm{U} 25=(\mathrm{B} 1(2,5)-\mathrm{L} 21 * \mathrm{U} 15)^{*} \mathrm{~L} 22
$$

If each pipeline could perform all three operations on the data for each result (U25) returned to memory per clock cycle, then the arithmetic result rate could be bumped to 3*500 $=1500$ megaflops. Using this technique as a basis, the implicit code was analyzed to determine if this "triadic" facility would be fully utilized. It was found by hand calculated estimates that the FMP would probably operate at an average rate of 1200 megaflops for the whole code, assuming the data could be delivered to and removed from the pipeline to match that rate.

In some instances it is not possible to keep the maximum number of arithmetic elements in the pipeline busy with "triadic" or "dyadic" operations. For example, from lines 1070 and 1080 of the implicit-code (see appendix B)

$$
\begin{array}{ll}
1070 & D(1,2)=X X(1) * H D X \\
1080 & D(1,3)=X X(2) * H D X
\end{array}
$$

two separate vector operations would normally be generated by the FORTRAN compiler, with a consequent result rate of 500 megaflops for each separate multiply. The FMP provides two separate data trunks to memory and four separate data trunks from memory which permits more intelligently generating one instruction to perform both multiplies concurrently. In this case the four pipelines behave as if they were eight, and no data comparison is done by the CHECK networks in the pipelines.

Another use of the two separate result streams to memory is the storing of partial results on one trunk and full results on another. For example, take the sequence from the BTRI subroutine, lines 4720 and 4730:

```
4 7 2 0 ~ D 1 = L 1 1 * C 1 ( 1 , M )
4730 D2=L22*(C1(2,M)-L21*D1)
```

wherein the result D1 could be stored to memory on the AW1 result trunk while the partial result (C1(2,M)-L21*D1) could be stored via AW2, for later use. Note that in this case the capability of the vector Units for performing three arithmetic operations per clock cycle is achieved by a simple compiler technique of combining functions from "common subexpression analysis".

The 1.2 gigaflop rate is, of course, not the 2-3 times safety factor the designers had been seeking. Achieving that would require doubling the hardware complex to 8 of the triadic, eight-nanosecond pipelines to yield 2*1.2=2.4 gigaflops. This would in turn require a doubling of the memory bandwidth at the cost of additional hardware and trunks that again strain the buildability constraints. Two other avenues were open to ameliorate this situation:

1) Use of 32 -bit arithmetic--The CYBER 200 family provides a dual arithmetic system which is used for improving throughput, as well as data storage, for those problems that can tolerate the reduced significance. The design of pipelines for this dual mode demonstrates that the additional hardware needed to split a 64-bit pipeline into two 32-bit pipelines, dynamically and when the occasion warrants, is insignificant. In the CYBER 200 structure such pipelines can then deliver two 32-bit results for every one 64 -bit result, in most cases doubling the result rate of the pipelines. If this technique is applied to the FMP, then four FMP pipelines could produce a peak rate of 8 result operands delivered to memory every 8 nanoseconds. This means that if a single arithmetic operation is performed for each result, a minimum of one gigaflop is reached in $32-b i t$ mode. However, when the triadic capability of the pipelines is applied to 32 -bit mode, a peak rate of 3000 megaflops is attainable. In the projected case for the implicit code this would mean a result rate of $2 * 1.2=2.4$ gigaflops computation rate. At least for 32-bit cases this architecture, with minimal hardware, provides the purported 3 gigaflop peak rate. The determination of whether or not certain computations can be done in 32 -bit mode seems to elude even the most expert numerical analysts, and most probably must await empirical testing of algorithms in both modes. Since in all likelihood computations could most effectively be done in a mixture of $32-$ bit and $64-b i t$ forms, depending on knowledge of the numerical behavior of a given calculation, the FMP Vector Units have included the ability to perform operations on mixed $32 / 64$-bit data streams.
2) Maximize the utilization of the Vector Units--What if the computations must be done in 64-bit mode and latitude has been exhausted for adding more hardware for parallel processing? If the ensemble could be made to function at near $100 \%$ of its capacity (full triadic functions, full time for the whole code), then the 1.2 gigaflop rate would be maintained as the average rate thus achieving the "quintain". It was at this point that the concept of a "tailored" machine for a particular problem environment (wind tunnel flow models) became an aid. Using metrics provided by Ames which were chosen to reflect the desired characteristics, it was possible to determine what hardware emphasis was needed to maximize vector unit utilization. This leads to the next step.
c) Once a memory bandwidth and vector unit capability were established, it became necessary to prove that it could be utilized effectively or the design had to be changed accordingly. Beginning in the first study Control Data had concentrated primarily on the storage capacity of realistically assembled memory systems and the producibility of sufficient arithmetic power. Throughout these early study efforts, Ames personnel cautioned that the data flow required by the large flow model might dominate all other considerations. Using the Vector Units to maximum capacity requires the organization of data into-linearly stored vectors of data in memory which can be transferred in groups of elements at each single memory request. Some means had to be found to perform this organization in parallel with computations and without impacting the vector rate.
d) The concept of independent Map Units which could operate concurrently with each other and with the Vector Units arose naturally from this analysis. An initial desire for these units to be as simple as possible had to be compromised with the activities that seemed to be natural to assign to them. If the vector Units were to be solely concerned with arithmetic processing, then all other memory-to-memory vector operations would have to be performed by the Map Units. The functions to be executed in the Map Units were identified by reviewing STAR-100/CYBER 200 experience with vector processing and examination of the candidate metrics (aerodynamic and weather models).

At first a single Map Unit seemed to be sufficient for the known purposes, moving data only within Main Memory, with all other LEVEL 2 data being transmitted in block form. The split operator method of the implicit code gave rise to easily isolated "transpose" and "slicing" operations that could just as well be
performed on the data as it is moved between the LEVEL 1 and LEVEL 2 memories. A pair of seemingly identical units (for programming flexibility) were then specified. Many programming alternatives become evident from this structure, none of which have been explored during this study. An example might be the performing of full transpose operations on meshes and mesh segments in Intermediate Memory while simultaneously performing different transposes on other data in Main Memory. As will be seen later from simulation results, the existing metrics do not fully exploit this concurrency to the degree that seems possible (see section 5).
e) The memory system then becomes the key to the entire design of the FMP, as it must supply continuous streams of data not only to the Vector Units but to the concurrently operating Map Units. This is accomplished by taking the identical memory units used in the CYBER 200 family and increasing the apparent number of memory modules by dividing an existing physical module into four separate accesses. Thus where the CYBER 200 memory can deliver a peak rate of 1024 bits every 20 nanoseconds (actually the memory system can provide data for a range of clock cycles from 10-20 nanoseconds), the FMP system delivers 4096 bits of data (ignoring SECDED) every 16 nanoseconds for a peak bandwidth of 256 billion bits per second. The block diagram shown in figure 14 displays the basic design of the Memory Interchange which manages the data and address streams for the memory system. It is this unit and the data buffering in the Vector and Map Units that guarantee sustained data rates to all elements in the Vector Unit, no matter what the degree of activity is for all components requesting memory. It should be pointed out that the number of LSI panels and cabling for this scheme are considered to be at the maximum allowable by current manufacturing technology. It is felt that no way exists in which this memory system could be extended realistically, and thus no way that a doubling of arithmetic units could be supported with current design and packaging techniques.


Figure 14. Memory Interchange
f) The structure of the FMP as it has passed through its various gestative phases has been tested by estimating the performance of selected segments of the implicit, explicit and weather models which are being used as the analytical metrics for this study. More will be said about this matter in later discussions of the simulation. systems for the FMP. The FMP design as it. now stands is the culmination of this iterative structuring, testing, and restructuring effort, and does meet the performance requirements of at least the implicit code.
g) Primary emphasis in the analytical effort has always been focused on the three-dimensional implicit model. Only in the last weeks of the study, as CDC engineers had satisfactorily solved the problems in the buildability and RAM features of the FMP, was the full impact of the hardware architecture known with regard to the implicit code. Since that time the FMP design has been evaluated against the remaining three metrics. On a "first try" basis, each of the other metrics have yielded less than the desired 1000 megaflop computational rate. In some cases, the initial mapping of the scalar algorithms did not match the architectural strengths of the FMP. Complete reanalysis and recoding in these cases is called for. Shortcomings in the design have been unearthed also. At some point in the study however, "tinkering" with the design had to end, so that a final report could be completed. Although no design changes are contemplated to optimize the other metric codes at this time, work is continuing on finding appropriate restatement of these codes in the proposed FMP FORTRAN which can approach the 1 -gigaflop goal for each metric program.
h) Coupled with all of the above issues has been the problem of programmability, about which more will be stated in the "Software Section" of this report. As hardware features were added or modified in the basic FMP structure, their effect on a possible FORTRAN language system had to be assessed. The continuation of a basic single instruction stream, multiple data stream (SIMD) architecture obviated the need for many FORTRAN language changes, while the ability of a compiler to schedule, stack, and overlap map operations with vector operations impacted the design of the Map Unit and Streaming Control Unit, particularly with regard to the need for and implementation of the "dependency keys".
i) Needless to say the process of steps c) through h) are iterative and will continue in some form even after submission of this report. Once a prototype FMP has been "poured in concrete" as the result of need to freeze design for this report, analysts can spend
considerble time learning what the structure really holds in store for programmers, particularly of applications in the areas of the NASF metrics
(aerodynamics and weather).

### 2.3.4 MAINTENANCE

The discussion of first priorities earlier covered the concept of RAM--Reliability, Availability, and Maintainability. The design considerations under that heading dealt primarily wi.th probablistic circumstances affecting RAM, particularly due to the volume of parts and interconnections. Another aspect of the hardware that must be considered in a different light is the marriage of hardware, firmware, software, and procedures to maximize the availability of the FMP. Supporting documents (see Divisions 7 and 8) have been prepared by Control Data specialists as "position papers" on the maintenance strategy and diagnostic strategy that should be pursued for FMP-scale systems.

A cornerstone of FMP availability is the ability for the system to recover, diagnose, isolate, and be repaired in the minimum interval of time lost to the consumer. In addition to the issues discussed in the position papers, certain aspects of the FMP impel offering some supplementary commentary.
a) The maintenance function as a hardware concept---

The magnitude of the NASF makes complex interactions of its constituents inevitable. The quantity and complexity of the elemental relationships in this system further ensure that even trained personnel will find the evaluation of many-system failures difficult or nearly impossible in the brief time allowable for interruptions during operational hours. Computerized assistance is mandated in this situation, however none of the potentially failing computers in the system can be expected to diagnose itself. A separate computer could be given this task, but if it becomes itself debilitated then it could become more of a debit than of value to the overall availability of the system as a useful resource. The need then is for an abstract concept that can be fleshed out, and farmed out to the appropriate programmable processors in the system. The reason for discussing the maintenance function in the abstract is that it first can be created and described as if it were a centralized function, despite the fact that in actual implementation it is distributed as widely as the functions it is trying to manage.

Philosophy--Each programmable entity in a system must possess means to disclose to the outside world the existence and nature of any failure that might occur, even when the flaw is correctable. A failing entity should not be required to diagnose itself, let alone cure itself, but should provide the maximum information about itself as possible, as would a patient in a. clinic, to speed the diagnosis and treatment process. The maintenance function, like à concientious physician or health service, must collect, collate, and analyze all physical symptoms in health as well as for illness. For like the physician to whom every clue and pattern could be significant, the contemporary system practitioner needs every allusion in pursuit of quick repair. Diagnosis and collection of maintenance clues must be included in the normal workload of each system component.

Implementation--From the outset of design all hardware and software components must become subscribers to the philosophy. "Hooks" must be built into all elements of the system where engineering analysis indicates potential need and practicability. What does this mean to the FMP?

First some means of testing, measuring, and sampling the activity of critical networks must be engineered into all critical networks of the system. This implies that some systematic study should be done to identify the critical networks. In the FMP some obvious places to put the diagnostic "stethoscope" come to mind. The SECDED checking networks not only must supply error indicators, but also should provide "Polaroid" glimpses of the failing data segment and associated addresses to the maintenance monitoring system. The comparator networks in the Vector Units must, also feed failure information. Not only an error flag should be recorded, but the actual data mismatch must be frozen and registered in the error logging of the maintenance system.

In the FMP, SECDED check bits are carried as far along data trunks as possible so that the trunks and intervening paths can be corrected, as well as the memory. This does not mean that the SECDED checking networks need be limited to the trunk ends, as isolation then becomes more difficult. Instead, the checking networks can be placed at strategic locations in the system. The current design calls for placement of SECDED checking in all hardware blocks where the symbol SECDED appears in figure 15. SECDED correction, however is limited to the ends of the trunks only.


Figure 15. SECDED in the FMP

As can be seen in the block diagram of figure 15, Main Memory SECDED will be checked at the Memory Interchange as the data is transmitted to the Vector Unit. The SECDED is again checked within the Vector Unit, and any correction applied there, if needed. The major difference between these two SECDED networks is that since the first network does not actually modify bits in the data stream to perform correction, it can be an "out-of-line" circuitry which does not add time to the data path. The potential correction of data in the Vector Unit, however requires logic to be interposed in the data path at a consequent cost of several nanoseconds in transmission time. The isolation aid that the additional SECDED checking gives should be obvious from the diagram. A SECDED error reported by all port checking units in the Memory Interchange, for example, would point to a basic data failure in memory. A single port SECDED failure would indicate a problem in that particular port's trunk hardware. An error reported only by the Vector Unit SECDED network indicates a failure in the data trunk or the paths in the Vector Streaming Unit (which include high speed shift networks).

This simple' example illustrates the complex combinations of symptoms that need to be evaluated in light of the architecture to arrive at a specific failure point where the maintenance engineer can begin his search. A computerized correlation and display of the possible failure points would surely speed up the isolation process.

In addition to the SECDED and Vector Unit comparators, certain second order error detection and reporting schemes are employed in the FMP.

- Illegal function detection---Each of the independent units possesses fewer operating modes than the function codes transmitted to them are capable of encoding. For example, not all of the 256 potential operations available in the main instruction stream are legal. Thus all illegal functions, when detected, bring the FMP to a halt and send a flag to the Maintenance Control Unit (MCU).
- Time-out detection--Certain of the FMP units expect a maximum allowable time for response to function and memory access requests. If the maximum time is exceeded the FMP is halted and an error flag sent to the MCU.
- Monitoring counters---Counters internal to the FMP can sample various activities of the FMP, including the running clock, and can transmit
the accumulated data to the MCU on a regular basis. In addition to the obvious utility in performance measurement and optimization, certain measurements may provide a clue to incipient failures in the system. For example, if a vector operation of length 1000 elements consistently runs slower than some nominal value, there may be a component failure in the memory request scheduling logic of the Vector Streaming Unit. Note that this failure is subtle, since all answers may be correct, but the program runs slower than the hardware design would normally permit.
- Self checking---Some networks may have a built-in ability to generate and check simple test cases during idle periods. For example, an address adder in the Vector Streaming Unit might have alternative input selects that permit the injection of all "zeroes" and all "ones" patterns with a built-in check for the correct carry, generate and overflow bits. Since the adders are highly underutilized many idle cycles would be available for such checking.

How does the FMP hardware gather all of this data from its various limbs and actually notify the appropriate maintenance function? Each separate network in the FMP that generates error data, transmits this data to the I/O Unit where special hardware in I/O PORT 0 (see figure 16) locks up the information in static registers and sets a flag indicating that maintenance data is present. Any PDC (programmable device controller) connected to this port can-and will sample this flag and related registers, forming a message directed to the "maintenance function" (note this is the abstract of a specific hardware that used to be called the "maintenance unit"). Maintenance function messages are put on the network trunk where they are transmitted to all other attached processors. Any processor which recognizes itself as an "addressee" for maintenance functions will read the entire message and turn over the contents to stored programs which will analyze the maintenance information given.

notes.





Figure 16. FMP I/O Unit

The potential exists then for several computers on the network to possess all or part of the maintenance function software. Which computer provides the specific maintenance service depends on which processor has its "turn in the barrel", or is available to assist the FMP. In other circumstances the service may be performed by the lowest level processor that can handle the task. For example, a small miniprocessor such as the CYBER 18 might be attached to the trunk for system statistics gathering and initialization purposes. A correctable SECDED error might be reported on the trunk, and this small processor given the responsibility for collecting the data and storing it on disk for later analysis. A total failure of one Vector Unit, however, when reported by messages on the network trunk, might require the use of one of the major front-end processors to determine what strategy is to be followed (launch immediate.repairs since the workload is light, swap out the unit and perform automatic on-line diagnostics, etc. ...). The content and not the addressee of maintenance messages determine which of the system resources will perform the maintenance action. In some cases the small CYBER 18 might be off-line for maintenance, and thus the SECDED. error recording must be handled by another processor in the network. The system must not allow the information to be lost. Returning to the anthropomorphic simile of the physician and patient, the FMP puts out a cry for help and the most competent individual capable of rendering first aid attends the victim.

In like manner all other components of the NASF place messages on the trunks to provide operational and error information to be received and processed by the maintenance function. The_PDCs themselves place error and operating data messages on their own trunks. The support processor system hardware may not provide specific electronic linkages to the network and attached maintenance functions, however it should be clear that maintenance function messages can be generated by operating system software as well as by hardware.

The maintenance function need not, and must not, be purely passive in nature. Those elements charged with maintenance function responsibilities must be able to deadstart, restart, and recover any system component by transmitting special messages on the trunk. The maintenance entity must be able to make decisions about resource allocation and turn trunks, processors, peripheral devices, and remote accesses on or off as system failures make such actions necessary. Certain diagnostic sequences will have to be initiated, and oftimes monitored by the maintenance function.
b) The maintenance function as a software concept---

In the previous discussion on hardware there are many implied objectives for the designer. Some electronic "hook" must be designed into new components so that maintenance functions transmitted on the network can affect some hardware operation, as for example the concept of "SYSTEM MASTER CLEAR". The hardware hooks., however, can only provide information which must be collated and analyzed. The initiation of maintenance actions and the evaluation of maintenance data must be done by one or more software systems. Error data can be created by software also. The maintenance strategy must consider the relationship of software to the maintenance hardware system. Some points to be included in such ruminations:

- All maintenance software, with the exception of network trunk interface firmware, must be "transportable". This means that any one or any group of maintenance functions can be "farmed out" to any of the processors in the NASF. Transportability implies the use of a commonly available higher-level language that would be supported on all programmable processors in the system. The growing universality of PASCAL as a systems programming language makes this the prime candidate for maintenance software. In addition to the language, the structure of all maintenance software must be carefully specified, tightly controlled, and implemented in a disciplined manner. In specific terms this means that a small diagnostic monitor with standard software interfaces must be created in such a way that it can fit in the smallest processor that might be assigned maintenance functions. The standard interfaces necessary would consist of common message formats for communicating with the outside world (via the network trunk), a common set of diagnostic primitives (or calls to kernel subroutines), and a common set of definitions of data and procedures as well as common data descriptions which would be incorporated as declarations in each PASCAL program.
- All operating system software should provide maintenance oriented messages in the same manner as the hardware in the FMP and network trunk. These messages should provide normal operating information (such as "time-stamped" job start/stop messages) as well as warnings of exceptional conditions. Examples of this
latter message traffic would be: abnormal job termination, failure to receive responses from other processors in the system, impossible conditions detected within the operating system such as table overflows, missing pointers, etc. The concept of collecting, recording, and disseminating maintenance data to other processors should be fundamental in the design and implementation of all operating system components and not just merely a set of hooks to emplace in developing systems.
- Critical maintenance functions.such as system shut-down, restart, and recovery must be provided in at least two different processors with appropriate interlocks so that only one of the pair will respond to emergency conditions.
- The maintenance software must provide additional security safeguards for access to maintenance data and for permission to invoke maintenance software functions. A powerful tool in the maintenance arsenal will be the remote access (via standard interactive terminals) to the FMP by specialists not necessarily, resident at the NASF site. This can only be permitted if adequate protection is afforded the system from illicit use of the maintenance capabilities. The level of security must be greater than that which is nominally acceptable for commercially available computer based systems. The reason is obvious, since remote access could be obtained to functions which affect the actual hardware operation. Certainly the software system must be able to prevent catastrophic events from being initiated while the system is in production mode.
c) The maintenance function as a procedural concept---

In Division 7 the subject of maintenance strategy is discussed. In addition to the hardware and software that are designed into the NASF, a procedural strategy which concerns itself with manpower (numbers and qualifications), organization, economics, methodologies, and tradeoffs in the maintenance process must be designed as well. One of the major operating costs of the NASF will be that of the maintenance support function. Not only must a sufficient number of trained personnel be available for emergency service, but personnel, equipment, and parts must be on site to perform regular preventative maintenance. The costs of maintaining the maximum work force and parts inventory
can be prohibitive and must be weighed against the alternative probabilities of system failures during time of operational use.

### 2.3.5 MICROCODE CONTROL AND FAULT ISOLATION

A powerful tool for computer designers, made possi-ble by advances in memory technology, is the microcode control of logic networks. Although quite often stored in ROM (Read-Only-Memory), microcode can also be stored in standard RAM (Random Access Memory) and loaded into the CPU at every system
"deadstart" (or power-up, or "coldstart"). The use of reloadable microcode permits the incorporation of design improvements in the control systems without changing components or rewiring logic in the computer at a field installation. A class of memory technology is now available that permits the use of microcode control for even the highest performance logic family, with a minimum of auxiliary hardwired control necessary to meet the speed requirements of most networks.

The Control Data FMP design is based on the use of this high performance microcode, which is distributed within each functional element rather than being centralized in a main control unit. In addition to the concept of "distributed microcode", the machine employs a "two-dimensional microcode" structure which is utilized for on-line diagnosis and fault isolation. Figure 17 sketches the concept as it appears to the hardware designers.


Figure 17. Microcode Memory

The microcode memory consists of a semiconductor system capable of storing $N$ words of $M$ bits in a high speed RAM technology package. Control is achieved by reading a word from this memory and distributing the bits to actual hardware gates in the CPU. In the example shown, bits $m$ through $n$ are used to control the next address to be read from the memory. Bit 7 is shown being sent directly to a hardware AND gate which controls some data path. Bits 8 and 9 are sent to a hardware 2-to-4 decode network which converts the two-bit code into four distinct control signals which are then transmitted to data path controls elsewhere in the network.

A typical modus operandi for such a microcode is for it to receive an initial starting address from an outside source (say for example, a function code being sent from the Scalar Unit to the Vector Unit). This address points to one of three legal starting addresses in the microcode memory. The first control word is then read from that memory and its bits disseminated to the appropriate hardware. The next word to be read is then determined from the SEQ field of the first word, and the process continues until either the microcode shuts itself off and awaits a new external address, or perhaps, enters an "idle loop" (sequence of do-nothing microcode instructions).

Since memory systems are usually packaged in convenient sized units (for example, a single efficient block might be 32 words of 32 bits), the designer usually finds that after he has completed his control structuring there are unused memory locations and unused bits in each distributed memory (crosshatched areas in figure 17). The two dimensions of this microcode are evident from the diagram -- depth (number of words) and width (number of bits). The realistic designer will first set aside some of the spare depth and width for possible later design improvements. However, the need for reliability and availability dictate that a certain amount of words and bits be allocated to the maintenance function. Some of the bits might be connected to gating networks which are not used during normal operation. Others might be sent as codes to the Maintenance Control Unit as error flags.

The spare words in the memory might be employed in two ways:

- First the idle loop, which the microcode normally enters while awaiting a new function, might be extended to use these words to initiate self-checking actions in some of the networks. An example previously given was the forced generation of all zeroes to an adder network. At the same time, one of the spare bits might enable an all zeroes check to be performed at the output of the adder. If this check is not satisfied an error flag could be sent to the MCU.
- Second, an external source (such as the MCU could initiate a microcode sequence at one of the spare words (labeled ON-LINE DIAGNOSTIC STARTING ADDRESS in the
figure). This sequence could perform certain unique network control not available in normal operation. An example of this would be the defeating (shutting off) of the generated carries in an adder network so that only the partial sums are transmitted to the output trunks. In some cases this might aid isolation of failures a great deal. Once the sequence is completed the microcde might then return to its normal idle loop.

A third diagnostic and isolation aid is the ability to load a completely different microcode into the memory from the maintenance station (this function is called "down-loading"; or "downstream loading"). Down-loading of the normal operational microcode is always done at system startup time, since the RAM memory is volatile. It can also be done independently to any independent functional element in an off-line manner by the maintenance station. This version of the microcode would be called the fault isolation microcode. It may consist of one or several loads of microcode which trigger and test the logic networks at the gate level. The intent of this system is to provide back to the maintenance station sufficient analysis to permit the determination of fault location to the smallest replaceable component level. Loading of such microcode into one Vector Unit does not affect any other unit, therefore it is possible for the MCU to switch into operation the spare Vector Unit, load isolation microcode into the failing Vector Unit and perform isolation exercises in that unit while the remainder of the FMP continues normal operation.

It should be obvious that the designer must design fault isolation into the FMP from the start, and not just be content with using whatever spare words and bits may abound. The design philosophy for. the FMP has been to add whatever extra bits and words are necessary to make achievement of the replaceable component isolation goal a reality, within the constraints that too many words and too many bits may take up too much room or too much access time as to be feasible.

### 2.3.6 VALIDITY AND VERACITY

The primary objective of this study has been to arrive at a system design which can be built and utilized effectively within a very tight development time schedule. The willingness of NASA and manufacturers such as Control Data to launch into the actual construction of such a facility with all its visibility and all its risks will be determined not only by allegations of achievable goals, but more importantly. by the confidence upon which all parties can rely on the cost, performance, schedule and RAM goals.

A summary of statements regarding the validity or correctness of the conclusions on hardware design as far as is known today are presented here.

### 2.3.6.1 ASSUMPTIONS

1. Design would not commence until mid-1980 at the earliest.
2. A FORTRAN compiler can be produced which can generate and schedule machine instructions from the implicit metric as shown in appendices $B$ and $C$.
3. The entire system design and manufacture would be conducted by a single vendor, with choice of new equipments installed with this system left to that vendor.
4. The implicit metric reflects the most probable performance characteristics of the major load of production jobs on the FMP.
5. The system load data provided by Ames is at least within $20 \%$ of reflecting the 1984-1989 production environment.
6. Installation and operation of the NASF can be phased over a period of 12 months as more and larger elements are needed to support the increasing workload (for example, the Backing Store need not be installed at the outset to meet initial program development and production operation. Then it could be installed in increments.)
7. Sufficient funding and compute power will be available for the duration of the development project to support extensive simulation of the FMP and the system at all levels (system, block, and gate-level models).

### 2.3.6.2 VALIDITY DERIVATION

1. All circuit components costs, failure modes, quantities, and complexity are based on existing parts now in production with the exception of the Backing Store technology.
2. The Backing Store estimates are based on MOS technologies now in production, and scaled for the more complex chips in that memory.
3. Where sufficient production volumes have not yielded sufficient experiential data (as in the case of the LSI-168 gate array and Intermediate Memory chip) conservative learning curves for failures and costs were utilized.
4. The packaging, power, and cooling technologies were taken directly from computer products already in existence.
5. Gate-level design of some of the key FMP elements, or portions of those elements (such as the Memory, PDC, Scalar Unit), have been completed and either put. into practice or simulated in detail as part of standard product development on other CDC projects.
6. Design and simulation techniques are identical to those employed for other large scale computer development efforts.
7. To the extent that time and resources have permitted, the design of the FMP has been carried to the point where all manufacturing and performance aspects are expected to be within $10 \%$.
8. All other system components in the NASF are taken from the warehouse of existing system devices, for purposes of calibrating cost, performance, and physical requirements. Since forthcoming products will undoubtedly improve with respect to all three parameters, the current estimates are considered highly reliable and quite conservative.

Finally, as a bottom line, based on two decades of effort to build and support machines of this class, Control Data believes that if the requirements reflect reality and the system design truly meets the performance objectives of the projected production environment, then the entire NASF is producible in the specified time frame and will be effective in its given role.
3.0 SOFTWARE DESIGN

The success of the FMP rests not only on the ability of the hardware to attain its performance goals, but also on the effectivity of the software system that supports the hardware. The first concern is that the problem statement be matched to the computer architecture to maximum effect. This is a function of the programming language characteristics as well as the capability of the compiler to make automatic decisions, transparent to the programmer, regarding the scheduling and optimization of the FMP hardware resources. The major effort in this study has been to attack these two aspects of FMP software design.

The overall FMP capability for initiating new jobs and scheduling system interactions such as I/O with a minimum of lost time to the production jobs is a second software design concern which has been addressed in these three NASF studies. This report will discuss the current conclusions on these issues, and will include updates of previous study findings in these areas.

### 3.1 LANGUAGE ANALYSIS AND DEFINITION

At the outset of the first NASF study Control Data conciuded that a special purpose language would have to be developed for the FMP to ensure that the best match was made between language and problem algorithm, and between language and machine architecture. The pragmatics of the expected operating environment made it impossible to ignore the imperative of maintaining a FORTRAN-like language for most, if not all, applications programming. Language design for the FMP would focus entirely on the applications development aspect of program statement, leaving the operating system and support software development in the hands of the extant system language "fad".

### 3.1.1 EVOLUTION LEADING TO SPECIFICATION

Two rationale dominate the CDC recommendation for a special language for the FMP.

1. The possibility that compiler technology will develop "production" language processors that can unravel the existing scalar FORTRAN metrics into sequences suitable for parallel computational elements is an exciting object to pursue, but the risk of failure at this stage is quite high. This alternative must not be abandoned by researchers however.
2. In the long run, applications programmers will benefit from conceiving of their problem solutions in parallel form, rather than letting a "smart" compiler invisibly
transform their code. By making the inherent parallelism visible, the programmer can assist the compiler in the production of optimum machine instruction sequences.

Demonstrating RADL researchers resolve and-technical opinions, at least five different approaches were attempted at defining a programming language which met the criteria of compilability, consistency, clarity, and most of all, acceptability to the potential user community (NASA-Ames). The approaches began in the initial study period by beginning with a single language construct to FORTRAN and thence experimenting with the resulting language using the implicit and explicit metrics. As
shortcomings were encountered in use, an additional construct was created and added to the language specification. In that manner, a draft language enhancement to FORTRAN was developed during most of the first study. In the main, these extensions were reasonably acceptable to those programmers who considered them, however they did not meet the criteria of consistency and compilability, appearing much like a set of patches.

A quick remedy for this was to reduce the extensions to a manageable size and ignore some of the picky programming details (such as data movement) that arose; so evolved the CODO (concurrent DO) constructs described in reference 1. As the second study began, it became obvious that other considerations might weight heavily on decisions regarding language.

1. The expected system development time for the NASF would severely strain the best efforts of any compiler development team. To reduce development risks an existing compiler would have to be adapted to the FMP usage. Otherwise, the first years of NASF production would be fraught with the problems inherent in any new, complex compiler system.
2. The acquisition of "parallel programming" insight is painful on first go for even the most highly skilled analysts. The process of learning new thought processes, researching and developing new parallel algorithms is, in its initial stages, quite lengthy and requires great diligence.
3. STAR-100 experience was beginning to show the direction that algorithms could take, and an inventory of useful programs, subprograms and functions was being developed that might be of great value, in their most mature forms by 1984, to a widely used system such as the NASF. If some means could be found to utilize all of this background derived from the STAR experience, much time and effort might be saved, hence costs and risks might be reduced.

This thinking led down the STAR-100 byway, in an attempt to directly apply the STAR-100 FORTRAN language extensions to FMP programming. The obvious advantage of this was that programs could be coded, reformatted, debugged, and even small production runs made on existing computing systems. The disadvantage was that although the language was consistent and obviously compilable, the user acceptance was near zero. The major. objections by NASA were:

1. The need for explicitly describing vector data structure and vector operations made it difficult to develop algorithms without being intimately familiar with the internal hardware design. For three-dimensional simulation problems this was an undesirable requirement, since most creative resources are expected to be needed in just solving the physics and mathematics of the model solutions.
2. Some changes were needed in the STAR-100 FORTRAN to accommodate the differences in architecture between the FMP and STAR. This meant that the compiler could not be retained in its original form in any case.
3. Some notational forms were clumsy, but required by the compiler. There were too many additional constructs to learn besides the normal FORTRAN constructs.

4: The existing metric code would have to be severely disrupted to organize the code in a form suitable for optimum machine operation using the extensions.

Control Data was asked to go back to "square one" and attempt to provide a FORTRAN system where most of, if not all, the machine architecture could be "hidden from view" of the programmer, relieving him to deal with the mathematical and physical intricacies of his problem. Interaction with Ames programmers revealed the fact that they objected to the explicit knowledge and manipulation of Vectors of the STAR form but had no reservations about dealing with an abstract collection of data called an array, or a subarray, as an entity in an single computation. Thus the programmers claimed they would be happy with the ability to state solutions like:

PROGRAM TEST
DIMENSION $A(100,100,100), B(100,100,100)$

$$
A=B * B
$$

where the entire meshes $A$ and $B$ take part in the one operation. Further, the manipulation of subsets or subarrays of arrays was also acceptable, as long as the compiler worried about partitioning the actual arithmetic operations into sequences of suitable vector operations for the FMP. Thus:

$$
A(1: 10,1,1)=B(1: 10,1,1)
$$

which causes the first column of $B$ (ten elements) to be moved to the first column of the array $A$.

It is easily apparent that this same operation could be restarted as a conventional FORTRAN DO loop: .

DO $10 I=1,10$.
10

$$
A(I, 1,1)=B(I, 1,1)
$$

but the clearly recognized movement of a column of data in the subarray form describes the desired action, while statement 10 above might be submerged in a morass of statements in a much larger DO loop in I. The clarity of intent is not only more open to the casual reader but to the compiler itself which has to deal with messy DO constructs containing IF-ELSE blocks, GOTO and CALL statements which may make the simple array move. operation impossible to determine when stated in DO loop form.

Given this assistance and guidance, RADL set about reconciling those desires and previously stated constraints with another new set of considerations:

1. The final FMP form arrived at in this study deliberately separated the data movement (map) operations from arithmetic operations in terms of hardware and control. Some of the resulting concurrency could be discerned automatically by the compiler, but a modest amount of language assistance might make it possible to fully utilize the concurrency and maximize vector unit activity.
2. Tradeoffs in FMP architecture resulted in a three-level memory hierarchy which cannot be entirely hidden from the programmer due to the vast differences in performance levels and constraints on data storage (only BLOCK transfers are permitted to/from LEVEL 3, for example).
3. The ANSI FORTRAN 77 specification was finally adopted in final form and commitments to implement this language as the new US standard were practically imposed on American computer manufacturers. Hence all CDC FORTRAN compilers were rapidly to turn into ANSI 77 processors.
4. Some of the non-vector extensions of the STAR-100 compiler proved to be quite valuable in programming for the STAR and CYBER 200 machines. The subarray reference notation, IMPLICIT, CHARACTER, and extended intrinsic functions have demonstrated high utility in production codes.
5. The need to base the FMP compiler on an existing compiler is still considered imperative to meet the NASF development schedule. This is particularly true if the language construct additions or changes can be minimized.

The outcome of these deliberations.is offered in Volume III, FMP FORTRAN Language Specification. A more detailed discussion and demonstration of the language facilities is included in subsequent sections of this report.

### 3.1.2 OBSERVATIONS AND RATIONALE

The preceding discussion has concerned the evolution of thought and deed that led to the language specification given with this report; however, some additional commentary should be offered.

1. Programs and algorithms developed for the STAR/CYBER 200 family can be directly transferred to FMP FORTRAN by converting the explicit vectors to subarrays. This can be done easily with a mechanical "SIFT" (conversion) program. STAR "Descriptors" can be directly converted to the more flexible DYNAMIC variable.
2. Dynamic assignment of memory levels using the DEFINE statement makes it possible to use the identical production code for small problems (which would fit entirely in Main Memory and operate quite efficiently) and large codes (which need to be based in Intermediate Memory, with the slow map times thus incurred), without recompiling. The programmer may, if he wishes, take direct control of temporary space allocation for large arrays.
3. Scalar algorithms may be vectorized directly in many cases by redefining all scalars as arrays or subarrays with DYNAMIC statements.
4. In the examples in the implicit code report the full subarray reference notation is used. In many instances this would be unnecessary once the compiler could detect these cases from the normal array notation imbedded in a DO loop. The subarray references were retained since it was felt that they more clearly describe "what's going on".
5. Some form of dynamic redefinition of data structures is necessary in parallel systems to avoid being bound by the static characteristics imposed by DIMENSION which makes it difficult to store adjacent elements together, column-for-column and plane-for-plane, (This is, incidentally, an opinion stated by Dick McHugh of CDC in 1976, but not put into practice until now.)
6. Although compilers can now allegedly process and incorporate more than a single module at a time, and thus vectorize across CALL statements, the use of a single generalized subroutine (with its inevitable IF statements used to pick out special cases) is basically inimical to automatic, OPTIMUM vectorization (notice the emphasis on OPTIMUM!). The recoding of STEP brought the XXM, YYM, and ZZM subroutines in-line, not only to aid vectorization but to eliminate redundant data transfers (for example the use of the array RJ, lines 930, 1030, 1040 of appendix B). In a similar vein, VISMAT needs to be brought in-line so that the subroutine call to $Z Z M$ can be eliminated and the data previously computed reused.

### 3.2 FMP LANGUAGE DESCRIPTION

The base language chosen by Control Data for FMP applications programming is ANSI FORTRAN 77. A set of extensions to this basic language has been defined for the CYBER 200 family, and these extensions have been further augmented by specific FMP features intended to assist the compiler in producing optimum code for the FMP. The choice of ANSI FORTRAN 77 was based on the following considerations:
a) The FORTRAN language, imperfect as it may be, is commonly known, and has become the de facto "lingua franca" of the American computer community. It can be expected that all potential front-end or support processors will supply FORTRAN compilers as part of their standard software system. Thus most applications programming outside the FMP will most likely be done in FORTRAN.
b) The ANSI 77 version of FORTRAN will shortly become the official standard language specified in all government procurements (and thus quickly be required in all commercial computer purchases). Absolute requirements for vendor supply of ANSI FORTRAN 77 compilers are. estimated to be imposed in the middle of 1980. It is expected that after some interim period following the introduction of the ANSI 77 requirement this updated FORTRAN will become the sole FORTRAN language supported as "standard" by the offerers of NASF processors other than the special purpose FMP.

Volume III of this report contains a specification of the proposed ANSI FORTRAN 77 language, as amended for the FMP, in the form of a programmers'reference manual. The specification consists of an original CYBER 200 (STAR) FORTRAN manual, with line and page changes incorporated as insertions to the original. This was done to make visible to potential users the differences between ANSI 66 (as represented by the CYBER 200 FORTRAN) and ANSI 77, as well as the differences between CYBER 200 FORTRAN vector features and FMP vectorization aids.

The language specification can be viewed as four distinct entities:

1. The basic reference manual.
2. Changes needed to make the language fully ANSI 77 compatible.
3. CYBER 200 FORTRAN 77 additions (beyond ANSI 77).
4. FMP FORTRAN extensions.

This language, as described in Volume III, was used for the recoding of the implicit and explicit aerodynamic flow codes as presented in this report.

### 3.2.1 THE BASE DOCUMENT

The reference manual (Volume III) provides a FORTRAN language originally based on ANSI FORTRAN 66. Extensions to permit implicit and explicit vectorization for the STAR computer were added to the language. Data types such as BIT and CHARACTER were included to support the string processing capabilities of the STAR architecture. Additional vector constructs were added, such as DESCRIPTOR and DOUBLE DESCRIPTOR, to provide a "shorthand" means of invoking standard and sparse vector operations. The explicit vector constructs that were added for STAR have been replaced by more flexible constructs for the FMP version of the FORTRAN language.

The majority of the dialect described in this base document is familiar to all FORTRAN programmers. One construct that was added to STAR FORTRAN has proven to be quite useful in the analysis and programming of FMP codes, and thus deserves some discussion here; that construct is SUBARRAY notation, and rationale for it follows.

In the early developmental days of the first "vector" or "parallel" machines (ILLIAC IV, STAR, TI-ASC, and PEPE) compiler developments discovered that automatic vectorization of FORTRAN code was made difficult by common programming practices that incurred no penalties on scalar computers. Such practices could be represented by a DO loop of the form.

DO $10 I=1,1000$
$A(I)=B(I) * C(I)$
$\dot{\operatorname{CaLL}} \operatorname{GLUMPF}(A(I), B(I))$

$$
\dot{A}(I)=A(I) * * 2
$$

If the CALL statement were not present, even primitive compilers could automatically create vector operations for the arithmetic assignment statements shown. The introduction of discontinuities such as this, when all program modules cannot be compiled together, makes vectorization nearly impossible. If, in fact, the arithmetic could be vectorized, the programmer could restate his intentions as

```
    DO }100\quadI=1,100
100 A(I)=B(I)*C(I)
    DO 101 I=1,1000
101 CALL GLUMPF(A(I),B(I))
    DO 102 I=1,1000
102 A(I) =A(I)**2
```

The compiler is thus assisted in its vectorization task by the programmer restating the data flow more explicitly, although a bit ponderously. To get optimum results on the new generations of vector machines, it has become necessary to make the programmer conscious of the "parallel" or "vector" nature of his programs so that problem restatements can be done intelligently. The consciousness must also extend to the data allocation schemes, since some processors possess memory hierarchies with significant performance differentials at each memory level, while other processors require linear, sequential storage of data for optimal vector performance. Additionally, some form of "shorthand" would be desirable to reduce the coding effort while improving readability.

Early in the 1970s, several different computer developers proposed a set of extensions to the ANSI FORTRAN committee for the description of operations on whole arrays or portions of arrays called "subarrays". These recommendations, while not approved by the committee, appeared to have sufficient support that they were implemented in at least four different vendors' compilers, one of those being the STAR FORTRAN 66 compiler.

Basically, subarray notation can be thought of as another means for describing processing that is commonly assigned to DO loops. The simplest case is to perform operations over an entire array:

DIMENSION $A(100,100,100), B(100,100,100), C(100,100,100)$
DO $10 \quad I=1,100$
DO $10 \mathrm{~J}=1,100$
DO $10 \mathrm{~K}=1,100$ $A(I, J, K)=B(I, J, K)+C(I, J, K)$
10 CONTINUE
In this instance the programmer wishes to operate on the entire matrices A, B, and C. It is obvious that the order of the subscripts is not important for

$$
A(J, K, I)=B(J, \dot{K}, I)+C(J, K, I)
$$

will yield the same results as the previous assignment statement, provided the matrices $A, B$, and $C$ are not equivalenced or overlapped in some ridiculous COMMON block allocation.

The simplest subarray representation for this case would be
DIMENSION A $(100,100,100), C(100,100,100), B(100,100,100)$

$$
\dot{A}(*, *, *)=B\left({ }^{*}, *, *\right)+C(*, *, *)
$$

A detailed specification of forms and uses for subarray notation can be found in chapter 10 of Volume III (FMP FORTRAN REFERENCE MANUAL). The case shown here uses the asterisk (*) to represent the fact that the full subscript range is to be used, hence the ADD operation will take place over the entire array. The compiler and hardware can perform this function according to whatever subscript order is optimal for that architecture!

An array variable may appear without any subscripts, in which case the entire array is processed in normal subscript order. Thus

$$
A=B+C
$$

is eqivalent to

$$
A(*, *, *)=B(*, *, *)+C(*, *, *)
$$

in the example above.
What if the need were to process only a single column of the matrices?

DIMENSION $A(100,100,100), B(100,100,100), C(100,100,100)$
Do $10 I=1,100$
$10 \quad A(I, 1,1)=B(I, 1,1)+C(I, 1,1)$
This sequence would perform the single-column addition of $B$ and C. In subarray notation this could be stated

DIMENSION A(100, 100, 100), B( $100,100,100), C(100,100,100)$

$$
\dot{A}(*, 1,1)=B(*, 1,1)+C(*, 1,1)
$$

A significant feature of this notation is that a programmer working on a large code can spot such a statement in the middle of many pages of source code and know instantly that a columnar operation is being invoked, without having to hunt upstream in the listing (perhaps for several pages) to find a related DO loop.

So much for the shorthand programming of DO loops provided by the asterisk operator, which can appear in any or all of the subscript positions in an array reference. What happens if one wishes to deal with portions of the array other than the commonly encountered:


What if only the interior points in the matrices were to be processed leaving all of the outside planes untouched? In normal FORTRAN this would become

DIMENSION $A(100,100,100), B(100,100,100), C(100,100,100)$

| DO | 10 | $I=2,99$ |
| :--- | :--- | :--- |
| DO | 10 | $\mathrm{~J}=2,99$ |
| DO | 10 | $\mathrm{~K}=2,99$ |

10

$$
A(I, J, K)=B(I, J, K)+C(I, J, K)
$$

Using subarray notation the three-level, nested DO loop could be replaced by

$$
A(2: 99,2: 99,2: 99)=B(2: 99,2: 99,2: 99)+C(2: 99,2: 99,2: 99)
$$

where the normal subscript is replaced by the construct
n1:n2
n1=starting subscript value (may be integer constant or variable expression), $n 1 \leq n 2$
n2=ending subscript value (integer constant or variable expression), n1 $n 2$

If every other element in the interior mesh were to be skipped, the optional construct

$$
\begin{aligned}
& A(2: 99: 2,2: 99: 2,2: 99: 2)=B(2: 99: 2,2: 99: 2,2: 99: 2)+C(2: 99: 2, \\
& 2: 99: 2,2: 99: 2)
\end{aligned}
$$

is permitted which is represented in general by

$$
\text { n } 1: n 2: n 3
$$

where n3 is the increment value.for that subscript, which may be an integer constant or integer variable expression, and must be greater than zero.

Now this latter example certainly doesn't appear to be a . shorthand version of the desired processing, yet when DO loops must be inserted in "dusty deck" FORTRAN to assist in vectorization, this construct can be a better alternative. More importantly it forces the programmer to think "parallel"
in terms of "planes", "solids", and other forms of the subarray. Further, the pathological subarray case just given doesn't appear very often in real code and thus shouldn't frighten programmers from the use of the construct.

How is this notation used in practice? In the recoding of the implicit code, the original flow variable mesh is treated in "slabs" which can be fit into Main Memory. A slab could be. several columns in the $L$ direction of the mesh, with each slab consisting of JMAX times KMAX elements. If static FORTRAN variables were used (and they were not used in the actual implicit recoding), the following might appear in a sample code.

DIMENSION $Q(100,100,6,100), X(100,100,10)$
DO $10 \mathrm{~L}=1, \mathrm{LMAX}, 10$
$X(*, *, *)=Q(*, *, 1, L: L+9)$
-
$\dot{C}$ CONTINUE
In this example, every. pass through the loop another slab of 10 columns is moved to the temporary matrix $X$. This case was given to show the admixture permitted of fixed subscripts (the ,,1, in the $Q$ reference), asterisks, and subarray subscripting with integer variables (L:L+9).

The subarray notation is a powerful tool for assisting the compiler and for documentation purposes. It was used. extensively in the implicit code vectorization to demonstrate its use in a variety of cases. Subarray notation is also key to the implicit and explicit definition of DYNAMIC variables, to be discussed in a later section of this language description.

### 3.2.2 THE ANSI 77 SPECIFICATION

Review of the reference manual in Volume III will show that each ANSI 77 revision appears on a page separate from the original base document, with letters keying the location of each insertion into the reference manual. The significant changes of the ANSI 77 effort worth noting are:

- Major revisions in the I/O forms, and use of internal files to eliminate ENCODE/DECODE statements.
- The "Zero-Trip" DO loop, which tests the DO variable at the beginning of the loop instead of the end.
- The addition of TYPE CHARACTER.


### 3.2.3 THE CYBER 200 FORTRAN 77 ADDITIONS

Incorporated in the revision pages, along with the ANSI 77 insertions, are several additions felt necessary to fully support the CYBER 200 computer family, and al so the FMP. The most significant item in this category is the inclusion of a HALF PRECISION variable and array type plus half-precision implicit functions. Both the CYBER 200 and the FMP can utilize the 32 -bit and 64-bit formats available for faster arithmetic throughput, as well as more compact storage for some arrays. The addition of HALF to FORTRAN 77 is essential to providing access to this feature.

### 3.2.4 FMP FORTRAN EXTENSIONS

Although the base document revised as stated above would offer a fairly rich programming language, it was found that a few additional extensions designed primarily for the FMP would improve the chances that compiled code would make optimum use of the FMP. These extensions consist of two declaratives, LEVEL and DYNAMIC, and one executable statement, DEFINE. Although these extensions are described in the FORTRAN reference manual (Volume III), they are crucial to the recoding performed on the three-dimensional implicit and explicit flow codes supplied, by. Ames. Some tutorial commentary thus seems necessary at this point to explain why the extensions have been created and how they were intended for use.

### 3.2.4.1 THE LEVEL STATEMENT

Rationale:
In any computer possessing a hierarchical memory system, where a performance differential exists in the use of each level of the hierarchy, the programmer is faced with the need to make judicious choices in the area of data allocation. It is true that compilers can attempt to automatically allocate-data to the hierarchy, and in some virtual memory systems the hierarchy is managed by the operating system, however, a minor error in judgment as to where data should be placed (or which data to be paged) can have major impact on the system performance. For optimal use of the hardware resources, the programmer who knows the actual data flow must allocate and schedule the major data blocks in his program.

The use of the two-level memory in the CDC 7600/CYBER 176 has required that programmers deal with split data allocation in an explicit manner. Note that even with this facility, the compiler and operating system still make use of the second level memory for I/O buffers and subprogram "roll-in", quite
independent of the programer's actions. The concern is for an extension which forces the programmer to consider the data allocation issue directly and then assist the compiler by instructing it where data should be placed. This is even more. true in the FMP where the programmer is confronted with three levels of memory and problems that will consume almost all available memory space at each level. Hence, the inclusion of the LEVEL statement.

## Form:

A description of the LEVEL statement can be found in chapter 6 (page 6-3.1A) of Volume III (FMP FORTRAN specification). The LEVEL statement is a declarative, used to assign variables and arrays to a specific level of memory. The default allocation of data, in the absence of a LEVEL statement is always to Main Memory (level 1).

LEVEL n Vnam1,Vnam2,..., Vnamm
where $n$ is an integer or integer PARAMETER whose values can be
1-an--Main Memory
2-----Intermediate Memory
3-----Backing Store
and Vnam1,...,Vnamm are symbolic names of variables, arrays, dynamic variables, or dynamic arrays (dynamic types will be discussed in a subsequent section).

Scalar variables may be allocated to either level 1 or level 2 memory but not to level 3 (Backing Store). This is because the Backing Store is only accessed in large blocks, and a single scalar reference would require moving an entire block to or from the Backing Store, creating a very inefficient use of that system.

Examples:
LEVEL 1 A, B, C(100, 100)
This statement would assign the scalar variables $A$ and $B$ to Main Memory and the 10,000-element array $C$ to Main Memory.

DIMENSION $X(100,100), Y(100,100)$
LEVEL $3 \mathrm{X}, \mathrm{Y}, \mathrm{Z}(100,100)$
This level statement would assign the three arrays $X, Y$, and $Z$ to the Backing Store.

LEVEL P1 $X, Y, Z$

This statement would assign the variables $X, Y$, and $Z$ to either LEVEL 1 or level 2 memory depending on the value of the integer parameter P1 (see PARAMETER statement in Volume III). If the value of P 1 is greater than 2 at compile time, a compiler diagnostic will be produced and the data will be assigned to LEVEL 1 memory by default.

### 3.2.4.2 DYNAMIC VARIABLES

Rationale:
Two major stumbling blocks are encountered in an attempt to convert existing algorithms and programs to a vector machine of the CYBER 200 type. First, since it is most efficient to process whole meshes (or at the very least major subarrays of such meshes) to maximize utilization of the many parallel elements in today's machines, the language must provide facilities for dealing in the largest subarrays practicable. The fixed DIMENSION statement makes it difficult to move subarrays about while ensuring maximum contiguous storage of data. Second, the conversion of simple scalar variables to array (or 'slice') form requires converting all scalar references to array references. These two items need to be dealt with for an effective FMP implementation.

First difficulty, first: memory space is wasted when problem variable dimensions are less than the maximum specified by static variable DIMENSION statements.

Given the statements

```
    DIMENSION Q(100,100,6,100),X(100,100,100),Y(100;100,100),
    Z Z(100, 100, 100)
    .
    KMAX=50
    JMAX=50
    LMAX=50
    DO 10 J=1,JMAX
    DO }10\textrm{K}=1,\textrm{KMAX
    DO 10 L=1,KMAX
    READ(TAPE 1)(Q(J,K,1,L),X(J,K,L.),Y(J,K,L),Z(J,K,L))
CONTINUE
```

10
the data stored into each of the arrays $Q, X, Y$, AND $Z$ will not be in contiguous locations. Arrays $X, Y$, and $Z$ will each have data stored in the first 50 elements (a half-column) of the first 50 columns of the first 50 planes. But, since the dimensions are $100 \times 100 \times 100$, each of the first 50 planes will have half-columns of data followed by half-columns of empty space for the first 50 columns, and this will be followed by a
half-plane ( 5000 elements) of empty space before the data for the next plane begins. In addition, the last 50 planes will be totally empty (500,000 elements) making a total of 875,000 (50 x $50 \times 50+100 \times 50 \times 50+100 \times 100 \times 50$ ) empty elements discontiguously through each of the $X, Y$, and $Z$ arrays (1,000,000 elements each). In a similar manner (but more complex because of a fourth dimension), the $Q$ array will have a total of 5,875,000 empty memory elements through the total of $6,000,000$. Again, these will be discontiguous in sizes from 50 to $3,055,050$ elements.

This leads to a waste memory space being unused because of the 'static' definition imposed by the DIMENSION statement. In addition, the longest vector possible in this case (without performing a gather operation) would be 50 elements. If the data could be stored contiguously as though the DIMENSION statement in this instance were

DIMENSION $Q(50,50,6,50)$
subarray operations of the form

$$
Q(*, *, 1,1)=Q(*, *, 1,1) * * 2
$$

would invoke a single vector operation of length $50 * 50$ elements.
Most FORTRAN compilers provide for variable dimensioning in subprograms as:

DIMENSION A(L, M, $6, N$ )
where L, M, and $N$ are normally passed as parameters. However, the dimensions normally cannot be changed during execution of the subroutine. A preferred method would be the ability to 'reshape' arrays dynamically during any subprogram execution to maximize the use of the FMP Vector Units, and to improve the data storage demands.

The second difficulty revolves around the desirability to transform scalar algorithms as directly as possible, with little intervention. For example, the original code includes:

DIMENSION $A(100,100,100), B(100,100,100)$
$\dot{D}$
DO $10 \mathrm{~J}=1,100$
DO $10 \mathrm{~K}=1,100$
DO $10 \mathrm{~L}=1,100$
$\mathrm{A} 1 \mathrm{~T}=\mathrm{A}(\mathrm{J}, \mathrm{K}, \mathrm{L})$
A12 $=\mathrm{A}(\mathrm{J}, \mathrm{K}-1, \mathrm{~L})$
A13 $=\mathrm{A}(\mathrm{J}, \mathrm{K}+1, \mathrm{~L})$
$A(J, K, L)=A 11^{*}(A 12+A 13)$
10
CONTINUE

This loop could be vectorized in $J$ but not in the $K$ direction because of the recursion there. For simplicity's sake it would be desirable. to let the original scalar variables A11, A12, and A13 become vector variables of length 100 , without the need to insert a new dimension statement as might be normally required:

DIMENSION $\mathrm{A}(100,100,100), \mathrm{B}(100,100,100)$
DIMENSION A11(100),A12(100),A13(100)

```
DO 10 K=1,100
```

DO $10 \mathrm{~L}=1,100$
$\dot{\mathrm{A}} 11$ ( $\left.^{*}\right)=\mathrm{A}\left({ }^{*}, \mathrm{~K}, \mathrm{~L}\right)$
$\mathrm{A} 12\left(^{*}\right)=\mathrm{A}\left({ }^{*}, \mathrm{~K}-1, \mathrm{~L}\right)$
A13(*) $=\mathrm{A}\left({ }^{*}, \mathrm{~K}+1, \mathrm{~L}\right)$
$\mathrm{A}\left({ }^{*}, \mathrm{~K}, \mathrm{~L}\right)=\mathrm{A} 11$ (*) $^{*}\left(\mathrm{~A} 12\right.$ ( $\left.^{*}\right)+\mathrm{A} 13$ (*) $\left.^{*}\right)$
10
CONTINUE
To alleviate the difficulty involved in a vectorization effort, a new data type was created--DYNAMIC--which represents arrays and subarrays whose dimensions are established at execution time instead of at compile time. The first usage is. shown by restating the previous example:

DIMENSION $A(100,100,100), B(100,100,100)$
DYNAMIC A11,A12,A13
DO $10 \mathrm{~K}=1,100$
DO $10 \mathrm{~L}=1,100$
$\dot{A} 11=A(*, K, L)$
A12=A $(*, K-1, L)$
A13 $=A(*, K+1, L)$
$A(*, K, L)=A 11^{*}(A 12+A 13)$
10 CONTINUE
Here the original scalar variables have been declared DYNAMIC, meaning that they become pointers to subarrays which will be allocated at execution time in the area of memory called 'DYNAMIC SPACE'. This memory area is what remains in each hierarchical level memory after all code, scalar variables, statically dimensioned arrays, buffers, and sundry system data are allocated.

The beginning of current dynamic memory is pointed at by a canonical register in the FMP register file called the Dynamic Space Pointer. As data space is needed for temporary variables and vectors by the object code, space is allocated and the pointer updated. In this example 100 words of dynamic space would be allocated for each of the dynamic variables A11, A12, and A13. The dynamic space pointer would be updated to point at the next free space, and the data movement from the array $A$ to each of the respective 'slices' would be initiated.

The variables A11, A12, and A13 would be assigned a 'shape' with a single dimension of length 100. (A dynamic variable may have up to seven dimensions ascribed to it to represent the 'shape' of the data area in dynamic space being pointed at.) Figure 18 gives a representation of the memory allocation of the pointers which are the DYNAMIC variables, and the data area being pointed at by the DYNAMIC variables.

Note that the shape of each of the DYNAMIC variables A11, A12, and $A 13$ is established implicitly by the variable appearing as the object of an arithmetic assignment statement, whose source is a subarray or subarray result. The shape can be changed implicitly as many times as the variable appears as an object of an assignment statement:

DIMENSION A $(10,10)$
DYNAMIC X

```
X=A(*,1)
X =A(*,*)
\dot { X } = A ( 2 : 5 , 2 : 5 )
```

In the first appearance in this example, the variable $X$ becomes a pointer with a shape of one dimension and a length of 10.

At the next occurrence $X$ becomes 'reshaped' into a twodimensional data space with dimensions 10 by 10 elements. Finally, the last subarray reference again reshapes $X$ as a two-dimensional data space of length 4 by 4 elements.


Figure 18. Memory Allocation and Assignment of DYNAMIC Variables

The implicitly defined shape can be 'passed' on to other DYNAMIC variables as in:

DIMENSION A $(10,10)$
DYNAMIC $X, Y, Z$
$\dot{X}=A(*, 1)$
$Y=X * X$
$Z=Y * * 2$
In this example $Y$ and $Z$ take on the same shape as $X$, and new data space is allocated for each in dynamic space before the calculation is performed.

Conformability:
Implicitly defined DYNAMIC variables as shown so far must obey the rules for conformability when appearing as the source operands for assignment statements, but obviously when they appear as objects of assignment statements they are always reshaped, and thus automatically obey the conformability rules for matrix operations:

DIMENSION A $(10,10)$
DYNAMIC $X, Y, Z$

$$
\begin{aligned}
& \dot{X}=A(*, 1) \\
& Y=X * A(*, 2)
\end{aligned}
$$

The multiplication of the subarray $A$ by the DYNAMIC variable $X$ is conformable because all dimensions are congruent. The statement:

$$
Y=X * A(1, *)
$$

would also be conformable since the subarray $A(1, *)$ is a one-dimensional vector of length 10 (albeit requiring a gather operation to form the vector) as is the subarray $A(*, 1)$. However, the statement

$$
Y=X * A(2: 5,2: 5)=X
$$

is non-conformable because the array reference is static and cannot be reshaped, and does not match the shape of $X$ in dimensions or size. This occurrence would also cause a fatal object-time diagnostic.

### 3.2.4.3 EXPLICIT DEFINITION OF DYNAMIC VARIABLES

Rationale:
There are instances when the reshaping of DYNAMIC variables should be more controlled than that which is permitted by implicit definitions arising from arithmetic assignment statements with DYNAMIC variables as objects. In addition, many times it is desirable to neither allocate space in dynamic space nor to move data to a working space unnecessarily, despite the fact that it is dynamically structured. To provide more explicit control over DYNAMIC variable definitions the DEFINE statement is provided:

DIMENSION A $(10,10)$
DYNAMIC $X, Y, Z$
$\dot{\operatorname{DEFINE}}(X, A(*, 1))$
$\dot{X}=\mathrm{X} * * 2$
This sequence causes the variable $X$ to become a pointer to the subarray $A(*, 1)$ which is actually in place in the array $A$, rather than in dynamic space. The DEFINE statement is an executable statement which may appear any place in a FORTRAN program where any other executable statement may appear. Upon execution, it explicitly shapes the object dynamic variable $X$ and assigns the address of the subarray within A. Figure 19 shows a representation of this definition in physical memory. No data motion takes place as a result of the DEFINE statement.

The subsequent arithmetic statement $X=X * * 2$ then performs the squaring of the subarray $A(*, 1)$ and replaces the original subarray with the result, no dynamic space being allocated. The statement

$$
X=A(*, 3)
$$

would replace the subarray $A(*, 1)$ with the subarray $A(*, 3)$.


Figure 19. Alternative Memory Allocation Using DEFINE Statement

Explicitly defined DYNAMIC variables can only be reshaped by execution of another DEFINE statement, and not by appearing as the object of an assignment statement. Thus such explicitly defined variables must obey the conformability rules on both sides of the equal sign in an assignment statement:

DIMENSION A(10,10)
DYNAMIC $X, Y, Z$
DEFINE (X,A(*,1))
$\dot{Y}=A(*, 2)$
$\dot{X}=A(*, 3)$
$\dot{Y}=A(2: 5,2: 5)$
$\dot{X}=A(2: 5,2: 5)$
The reshaping of $Y$ is permitted because it is implicity defined; however, the reshaping of $X=A(2: 5,2: 5)$ is not permitted since its shape has been fixed by one explicit DEFINE. The execution of another DEFINE statement:

DEFINE (X,A(2:5,2:5)
would change the shape (and size) legally.
Forms:
The variety of permitted statement forms for DYNAMIC and DEFINE are detailed in the FORTRAN specification (Volume III, page $6-3.2 \mathrm{~A}$ and page $10-3.1 \mathrm{~A}$, respectively). Basically the forms are

DYNAMIC Vnam1,Vnam2,..., Vnamn
where Vnam1...Vnamn represent a list of variable or array names which are to be established as dynamic pointers.

DEFINE (DVnam, subarray reference), (DVnam $\qquad$
DEFINE (DVnam, DVnam), (DVnam,.......
DEFINE (DVnam,(subarray subscripts)),(.......
DEFINE (DVnam,DAnam(subscripts)) .
where DVnam is any DYNAMIC VARIABLE name and DAnam is any

- DYNAMIC ARRAY name.

The first DEFINE form has already been illustrated. Note that the objects (left-hand member of each pair) must be a DYNAMIC variable or an element of a DYNAMIC ARRAY.

The second example is used to make one DYNAMIC variable synonymous with another. This means synonymous but not identical:

DIMENSION A $(10,10)$ DYNAMIC $X, Y, Z$
-
-
$\operatorname{DEFINE}(X, A(*, 1))$
DEFINE ( $Y, X$ )
DEFINE (X,A(*,4))

In this example $X$ is first defined as pointing to the first column of $A$. $Y$ is then made synonymous to $X$, which means the pointers for $Y$ are set the same as those for $X$. However, the last statement redefines $X$ while the definition for $Y$ will still point to $A(*, 1)$.

The third form of the DEFINE is used to 'fix' a dynamic allocation of data and definition of the DYNAMIC variable's 'shape'.

DEFINE (DVnam, (subarray subscripts), (....
In this case the DVnam may be the name of a DYNAMIC variable of a DYNAMIC ARRAY element (see next section for DYNAMIC ARRAYS). The subarray reference may include up to seven subscript expressions, any one of which may be a subarray form:

DYNAMIC A,B
DEFINE (A, (1:10, $1: 10,1: 10))$
DEFINE (B, I: J, 1:100, I: 100))

The first DEFINE statement causes an allocation of 1000 elements of dynamic space to the variable A, with the shape of $10 * 10 \% 10$ elements. No data movement takes place. In this regard, DEFINE performs the same function as an 'implicit' definition but without the data movement. The resulting memory allocation and pointer setup is the same as shown in figure 18, for implicit
definition. The major difference here is that the variable $A$ cannot be reshaped by appearing in an assignment statement. The statement

$$
A=X(1: 5,5: 10,10: 100) .
$$

would therefore result in an object-time fatal diagnostic message.

The second DEFINE statement results in a similar allocation of dynamic memory and the shaping of the variable $B$, but the amount of space allocated and the shape will not be known until the object-time execution of the DEFINE is accomplished and the values of the variables $I$ and $J$ are known.

The remaining forms will be discussed with the final construct of these extensions--DYNAMIC ARRAYS.

### 3.2.4.4 DYNAMIC ARRAYS

Rationale:
As seen from the previous discussion and by a glance at the recoding of $B T R I$ in the implicit code, much scalar code can be readily transformed into vector code while preserving its original appearance. There are cases that cause some amount of distress, however. Take the example:

DIMENSION A $(100,100), B(100,100), F(5)$
-
DO $10 I=1,5$
DO $10 \mathrm{~J}=1,100$
DO $10 \mathrm{~L}=1,100$
$10 \quad F(I)=A(J, L)+B(J, L)$
If the loop is to be vectorized in both $J$ and $L$, the variable $F$ must be redefined as a series of 5 vectors, each with length 10000.. To define $F$ as a DYNAMIC variable, five occurrences of the variable F , one through five must be provided. This creates the concept of an array of DYNAMIC variable pointers called a DYNAMIC ARRAY. This is accomplished by permitting the DYNAMIC statement forms

DYNAMIC A,B,F(5)
or
DIMENSION $\mathrm{F}(5)$
DYNAMIC A,B,F

This will create an array space for pointers which may be addressed by subscripting the variable name $F$. An implicit definition of a DYNAMIC array element can thus be done as

DIMENSION $A(10,10,10), B(10,10,10)$
DYNAMIC $\mathrm{F}(5)$

$$
\dot{\mathrm{F}}(3)=\mathrm{A}(*, *, *)
$$

The third element in the dynamic array $F$ will be allocated dynamic space and given the same shape as the static array A 10*10*10 elements. Note that each of the pointer elements in $F$. need not be related, spatially or logically, to any other element. Thus additional statements

$$
\begin{aligned}
& \dot{F}(1)=B(*, 1, *) \\
& \dot{F}(2)=A(1: 5,5: 10,5: 10)
\end{aligned}
$$

would set up other pointers in $F$ to allocated space in dynamic memory that may not be contiguous--that is the physical memory allocated for $F(1)$ may not be contiguous with the physical memory allocated for $F(2)$, since the allocation is done dynamically as the respective assignment statements are encountered during execution of the program.

### 3.2.4.5 EXPLICIT DEFINITION OF DYNAMIC ARRAY ELEMENTS

The forms of DEFINE statement given in the preceding section include the meta-symbol 'DVnam' representing a DYNAMIC variable name. In the preceding examples this was limited to simple DYNAMIC variables. It should be obvious that any DYNAMIC array element can be used in place of such simple DYNAMIC variables:

```
DYNAMIC A,B(10)
DEFINE (B(3),(1:10,1:10,1:10))
DEFINE (A,B(3))
DEFINE (B(I),(J:K,J:K,1:5))
```

In these examples the DYNAMIC array elements $B(n)$ can be used interchangeably with scalar DYNAMIC variables in the first three DEFINE forms shown.

The last (fourth) DEFINE form
DEFINE (DVnam, DAnam(subscripts))
has a special function which requires that DVnam must be a simple DYNAMIC variable. This is caused by the fact that many times a subarray reference is desired for a dynamically assigned variable:

DIMENSION A $(10,10,10)$
DYNAMIC $X, Y$
DEFINE ( $X, A(*, 1,1)$ )
$\dot{Y}=X(1: 5)$

In this case, access is wanted to a subarray of the dynamically defined variable $X$ (which is itself a subarray of the static array A). If however, access is desired to a subarray of a DYNAMIC array element, the constructs would have to look like

$$
Y=Z(3)(1: 5)
$$

where the third element of the DYNAMIC array $Z$ is used instead of the simple DYNAMIC variable $X$. This construct is considered messy to read, and makes FORTRAN scanning and error detection quite difficult in the general case. Therefore the methodology for getting at subarrays of DYNAMIC array elements requires 'aliasing' the DYNAMIC array element to a simple DYNAMIC variable and then using the variable:

```
DIMENSION A(10,10,10)
DYNAM:IC X,Y,Z(5).
DEFINE. (Z(3),A(*,1,1))
DEFINE (X,Z(3))
-
Y}=\textrm{X}(1:5
```

In this case the variable $X$ was made synonymous with the shape and location of the third element of the DYNAMIC array Z. This 'aliased'variable $X$ is then used as the basis for the subarray reference in the assignment statement which moves data to $Y$.

This methodology makes some references somewhat cumbersome at first sight, but the usage is normally limited to several instances in a program and doesn't appear to create a great burden on the programmer.

An important sidelight to the use of DYNAMIC arrays was mentioned previously, that being the fact the no single DYNAMIC array element need be related to any other. Using DEFINE statements one can establish particular relationships between an entire DYNAMIC array and the object that its independent elements are describing. For example

DIMENSION A(100, 100, 100)
DYNAMIC $F(6), B$
$\operatorname{DEFINE}(F(1), A(1, *, *))$
DEFINE ( $F(2), \mathrm{A}(100, *, *)$ )

DEFINE (F (4) , A (* $, 100, *)$ )
DEFINE (F(5),A(*,*,1))
DEFINE (F-(6), $\left.\left.A^{-(*},^{*}, 100^{\circ}\right)\right)$
-
$\dot{B}=F(4) * F(5)$
This example assigns each of the exterior planes of the mesh to a different $F$ pointer. The arithmetic expression then performs operations on the planes as shown. An interesting side effect of this is a very powerful statement:

$$
F=F * * 2
$$

wherein all subarrays in $F$ are processed with a single arithmetic statement.

### 3.2.4.6 SUBROUTINE COMMUNICATION OF DYNAMIC VARIABLES

The processing of DYNAMIC variables and DYNAMIC array elements proceeds interpretively at execution time, using pointer information in a set of 16 words allocated to each variable or element. Fourteen words contain the shape of each of 7 dimensions (in subarray notation, the starting index, ending index, and increment); the remaining two words contain a count of the number of active dimensions, base address of beginning of subarray, and the maximum space allocated for this variable's data in dynamic space. Thus, when a DYNAMIC variable appears in COMMON or as a parameter, 16 words are set aside for each DYNAMIC element. Obviously, such elements must be defined as DYNAMIC in all routines using them, and in the case of COMMON block transmission of data, any routine having a block COMMON containing a DYNAMIC variable in it must define the variable as DYNAMIC.

DYNAMIC variable names may not appear in EQUIVALENCE statements. When such variables appear in INPUT/OUTPUT statements they point at the data to be transmitted, the pointers themselves are never output or input, their life is limited to the dynamic execution environment of the operating program. Debugging tools permit the programmer to examine and modify the shape described by the DYNAMIC variables, however.

### 3.2.5 COMPILER STRATEGY

The FMP FORTRAN compiler will consist of all facilities already expected for FORTRAN compilers for large scale computers in the 1970s for the validation and evaluation of source language statements, generation and scheduling of object.code, and production of diagnostics and debugging aids for the user's programs. In addition, the compiler must be able to accept and evaluate the several FMP language extensions discussed above. The most crucial objectives for the compiler revolve around its ability to optimize the parallel execution of the Map and Vector Unit instructions. Illustrations of the implicit code in this report have assumed that the compiler would be able to generate the appropriate object code to achieve the maximum overlap. It has been further stated that the compiler could not be expected to automatically recognize the optimum "slabbing" and create object code to perform the "slabbing" functions without the assistance of the programmer. To that end the language extensions were created to make the hierarchical memory and data structures visible to the programmer and to make the programmer responsible for the management of data mapping.

In exchange for this, the compiler is expected to organize and schedule the code and to maximize machine performance. The first part of the recoded STEP routine (appendix D) will now be examined to see what the compiler must do in code generation and to estimate the final code performance. In section 5 the actual simulation of this code will be covered.

### 3.2.5.1 DO LOOP "GET READY"

Most modern compilers generating code for multi-register machines are capable of generating "prefetch" (load from memory) instructions which are extracted from the DO loop. These instructions then are scheduled to be issued before entering the DO loop. Then at the top of the actual executing DO loop another set of fetch operations are created whose intent is to load the data for the next pass through the loop. This technique then reduces the wait time required while scalar data is being transferred from memory to the high speed registers.

The counterpart of this method for the FMP is to generate the map instructions for the first pass through the loop, then at the beginning of the actual loop to place a set of map operations for data to be used on the next pass through the loop. For example, lines 930 through 960 in appendix $D$ are map operations which become gather record functions with record lengths of LSL*KMAX elements. JMAX records are gathered for each map operation to form the vectors that are processed in the balance of the metric computation (in-line XXM). The compiler will generate these four map operations with the destination
data going into Main Memory at the locations designated by descriptors called, respectively, RJ, XKL, YKL, and ZKL. These instructions will be scheduled to be executed ahead of the executable DO loop that begins at statement 830.

In addition, another set of map operations will be generated and scheduled after statement 830. These map operations will perform the data transfers from LEVEL 2 storage for the next pass through the DO loop. The destination areas for the data from these map operations will be designated by a set of auxiliary pointers, invisible to the programmer which could be called RJ', XKL', YKL', and XKL'. At the end of a particular pass through the loop, these pointers are exchanged for those which point to data areas RJ, XKL, YKL, and XKL. In this way, at the expense of a brief exhange of pointer data, the map operations can be overlapped.

Note that a total of 14 slabs of data have to be moved from Intermediate Memory to Main Memory, but there is no data interdependence among these various slabs. Thus each map operation would carry a different dependency key and thus each can be issued immediately to the Map Unit, up to the extent of the queueing buffer in that unit. Assume, for example, that the map operation at line 940

$$
X K L=X\left(*, L-1: L+L S L+1,^{*}\right)
$$

is generated with a dependency key of "01". Instructions continue to be issued after this map operation until the first arithmetic instruction in the loop (line 970) is encountered. Since this instruction references array XKL it will also have a dependency key of "01" imbedded in it. This instruction would then be held up until the map operation with that key is complete. The next arithmetic operation would then be released to the Vector Processor and the process continued. In this case the map operation at line 950

$$
\mathrm{YKL}=\mathrm{Y}\left(*, \mathrm{~L}-1: \mathrm{L}+\mathrm{LSL}+1,^{*}\right)
$$

might carry a dependency key of "02". Then the vector operation using YKL (line 950) would be held up until the corresponding map operation is complete. All of these dependencies would exist only during the first pass of the loop, since on subsequent passes the data would already have been mapped in by the overlapped map instructions to RJ', XKL', YKL' ....etc.

With this form of code generation one can see that not only are almost all map operations overlapped, but due to the action of the compiler and the dependency keys, a good deal of the prefetch map operations have some of their execution overlapped, so that loop startup is minimized. For example, consider the previous sequence. Before the first arithmetic operation can begin, the entire slab for XKL must be mapped into Main Memory. This gather record operation will require JMAX*KMAX* (LSL+2)*3/8 clock cycles to complete. Once this data has been moved the
next map operation can begin for the YKL slab. Meanwhile the subtract and multiply operation

$$
X K=(X K L(3: K M A X+2,2: L S L+1, *)-X K L(1: \operatorname{KMAX}, 2: L S L+1, *)) * D Y 2
$$

will be initiated. The time required for this instruction would be

JMAX*KMAX*LSL/8 clock cycles.
This is approximately 3 times faster than the concurrent map operation, thus the next arithmetic function:

$$
Y K=(Y K L(3: K M A X+2,2: L S L+1, *)-Y K L(1: K M A X, 2: L S L+1, *)) D Y 2
$$

must wait the completion of the corresponding map operation. This wait for the dependency key to become free continues for statement 990 as well. Thus on the second through the last time through the loop, the execution of statements 970 to 990 requires

> 3*JMAX*KMAX*LSL/8 cycles
but on the first pass (due to the wait for data from LEVEL 2 memory) these same statements require

3*JMAX*KMAX* (LSL+2)*3/8+JMAX*KMAX*LSL/8 clock cycles.
The statements at 1000 through 1020 also reference the already mapped arrays XKL, YKL, and ZKL and thus proceed at maximum rate. Meanwhile the map operation

$$
R J=Q(2: K M A X-1, L: L S L, 6, *)
$$

would be issued with a dependency key of, say, "04", and be fully overlapped with the operations at 1000 through 1020. If everything issues without hidden conflicts, the statements at 1030 through 1050 should be executed without waiting since the data RJ should be completely mapped into Main Memory by the time the Vector Processor is ready to execute statement 130. If an approximation of three vector instructions for each gather record (map) instruction is used, assuming equal vector lengths, it can then be seen that the map instruction for

$$
R R=1 . / Q(2: K M A X-1, L: L+L S L, 1, *)
$$

will have been completed by the time the Vector Processor is ready to proceed with the divide operation. The same is true for the other map operations used to gather data for arithmetic in statements 1150 through 1210 . It is possible then to compute the prefetch overhead for the first pass through the $J$ sweep and L sweep loops. This becomes

[^0]The important thing to note here is that this overhead only occurs during the first trip through the first and third loops, and constitutes the only visible cost of performing map operations on the FMP; it results from the compiler scheduling map instructions so that the set of map data required for the next trip is available before being required. (The overhead for loop 2 is discussed in the next paragraph.) Thus in the example, where $J M A X=$ KMAX $=$ LMAX is used, the number of slabs (or trips through the loops) is 17. The overhead shown is therefore amortized over all of these trips through the loop to complete the sweep.

For the $K$ sweep direction, the single-element gather operation dominates the loop overhead to a much greater degree; all of the operations for statements 2090 through 2330 are completely constrained by their corresponding map operations. In this case the compiler will generate 23 arithmetic operations plus one divide, with the divide requiring the same number of cycles as two arithmetic operations. Thus the arithmetic time would be approximately

$$
25 * K M A X * L M A X * J S L \text { clock cycles }
$$

while the Map Unit, in performing the corresponding 9 gather element operations, would take

$$
9 * K M A X * L M A X *(J S L+2) * 6 \text { clock cycles. }
$$

The overhead for the first pass would then be

$$
\begin{gathered}
\text { 54*KMAX*LMAX* (JSL+2) }-25^{*} \text { KMAX*LMAX*JSL }= \\
\text { KMAX*LMAX* }(29 * J S L+108) \text { clock cycles. }
\end{gathered}
$$

Total overhead for all three sweep directions then becomes

$$
\begin{aligned}
& \text { OVERHEAD }=\text { KMAX*LMAX* }(29 * J S L+108)+\text { JMAX*KMAX* }(7 * L S L+18) / 8 \\
& + \text { JMAX*LMAX* }(7 * K S L+18) / 8 \text { clock cycles. }
\end{aligned}
$$

### 3.2.5.2 INFERRED TRANSPOSE OF MATRIX

Aside from the judicious scheduling of map and vector instructions, the compiler has one additional burden placed upon it. The compiler must be able to evaluate loop 8 (statements 1900 through 2020) and generate the implied matrix transpose operations. The design of the Map Unit permits a single map instruction to perform the necessary transpose process. The compiler must be able to discern that construct from loops of the type shown in loop 8. The alternative is to add an extension which calls for the explicit transpose, but is unnecessary in this case, since the loop describes the actions desired.

With these simple attributes the compiler should produce code that permits the FMP to achieve its maximum rate.

### 3.3 COMPILER FUNCTIONAL CHARACTERISTICS

Volume III contains a preliminary document describing the FMP FORTRAN language and its usage. In an attempt to match certain language features to the FMP architecture, the parallel nature of the processing was made visible to the programmer through the DYNAMIC and DEFINE statements and subarray references. The final responsibility for matching the problem statement to the FMP hardware rests with the language processor (compiler), however. To achieve the NASF effectivity goals, the characteristics of the compiler will have to be specified in detail by NASA. The following are suggested items that should be included for elaboration in any NASF compiler specification, or used in consideration of any compiler development proposal.

### 3.3.1 SOURCE CODE

The FMP compiler must be able to accept input, audit, and produce object code or diagnostics for the complete FMP FORTRAN language as described in Volume III. This language is based on ANSI FORTRAN 77, with extensions that Control Data feels should appear in standard compiler products, such as "hexadecimal" data types, plus extensions felt necessary for the FMP, such as subarray notation.

The compiler should provide a mode wherein all statements not conforming exactly to ANSI 77 standards will cause a warning message to be printed. In addition, the compiler may also provide a mode of operation wherein other extensions available in compilers operating at Ames (from DEC, IBM, and CDC) might be tolerated to provide internal programming compatibility at Ames. The extent of these latter augmentations is unknown at this time and they do not appear in the language description currently called FMP FORTRAN.

The 32/64-bit nature of the Control Data FMP is accommodated in two ways: through use of the HALF PRECISION data type and related FORTRAN supplied functions, and through the use of a compile-time option which can compile an entire program with full precision (normal mode) being 64-bit or 32-bit. In this latter case ( $32-b i t=f u i l$ precision) HALF PRECISION is considered 32-bit also (since there is no 16-bit floating-point format in the FMP).

The compiler must provide some form of "escape mechanism" to allow the programmer to invoke specific machine-language instructions (except for monitor mode instructions) when the occasion warrants. Although normal applications programs should not have to use such a facility, it is certain that some general applications modules will be "fine-tuned" by clever programmers and will need access to explicit instruction control. There are two means for this:
a) Standard subroutine or function CALL statements to machine-language subroutines--this is the conventional means for handing this problem, however, it implies access to a machine-language assembler by many programmers, and can cause havoc in a large system environment. A second drawback is the execution time cost involved in subroutine CALL sequences. If a. programmer desires to invoke but a single peculiar machine function in his code, the encumbrances of writing two separate modules and taking on the subroutine switching overhead at "object time" may be excessive.
b) Imbedded, one-line machine-language statements in the FORTRAN source program, where the instruction uses the variable names and statement labels assigned by the programmer in his FORTRAN code--this approach was used in the STAR FORTRAN language employing the special call notation Q8xxxx where xxxx was a predefined machine-language instruction mnemonic. The parameters of the CALL were in fact the symbolic entries representing each field in an instruction. The machine-language instruction thus specified was then inserted directly into the FORTRAN object code at the point where it was invoked, without the need for an object-time call and return sequence to a separate subroutine. Although this has become a powerful tool in STAR-100 programming, there are several drawbacks:

- Possible misuse of machine resources--a programmer can unwittingly deadlock the FMP if allowed complete control over such instruction fields as the read and write dependency keys. By permitting access to register file oriented instructions, the programmer may accidentally overflow the available registers: Both of these difficulties can have disastrous consequences.
- Impeding the compiler ability to generate optimum object code--injection of an "alien" instruction into a FORTRAN sequence may make it impossible for the compiler to automatically vectorize a particular code sequence. In other places such in-line invocation may disrupt the entire instruction scheduling process for scalar, map and vector operations.
c) A third, and the recommended, alternative is to provide a special call syntax similar to the $Q 8 x x x x$ described above, but limit its application to invoking one of a set of predefined "special functions", which the user can specify at will, but which are imbedded in tables in the compiler, and totally under the compiler control. An example might be

PROGRAM DEMO
DIMENSION A $(100,100,100)$
DYNAMIC B
$!$
$B=Q 8 \times P O S(A)$
,
which would perform the transpose of the entire mesh A into the dynamic space $B$. The significant thing is that the method of implementation of the XPOS is left to the compiler, which might choose to use the map instruction, a series of vector operations, or even a scalar loop, depending on what else the compiler was scheduling in the FMP at that time.

The FMP compiler structure should permit the introduction of new functions of the $Q 8$ type, by means of simple table entries that can be augmented as the user desires. The compiler would. then attempt the in-line generation of appropriate code sequences from the table "skeletons".

A list of desired $Q 8$ functions has not been assembled at this time for the FMP, since it has sufficed, so far, to efficiently utilize the standard constructs and FMP extensions. The capability should be built into the compiler, however, along with a well defined procedure for requesting and specifying the desired Q8, FMP intrinsic function.

A most significant aspect of restricting the programmer's access to machine functions in this manner is that all FMP programs could be prototyped on machines other than the FMP, given that scalar sequences are implemented for $Q 8$ calls.

### 3.3.2 OBJECT CODE

Specification of a compiler's object code by a customer is necessarily two-dimensional: volume and speed. For small-to-medium computer systems the amount of object code generated for a given application can significantly affect the space remaining for data. Even in these times of large, relatively inexpensive memory this is a continuing concern of many users. In the case of the FMP this is no longer true. Considering the amount of available Main Memory the expected object code is of such moderate proportions that this factor is not a consideration. Instead, on the FMP the concern is for:

1) maximization of concurrency,
2) use of memory space for problem data and temporary data.

To these issues a compiler specification might address itself with some of the standard verbage:

- "The compiler will utilize the most advanced techniques for generation and scheduling of object code, including common subexpression analysis, invariant code removal, extended basic block optimization, and global analysis of all program modules submitted to a single compilation."
- "The compiler will minimize the amount of storage required to hold temporary vectors, and will optimize the utilization of the critical Main Memory resource."
- "The compiled object code should optimize throughput by maximizing the utilization of the Vector Units; optimization for other units is secondary."
- "The compiler should attempt to 'automatically vectorize' all DO loop constructs that do not include IF, GOTO, CALL, and function references, regardless of the presence of 'recursion' or non-unity increment values for the DO statement."
- "An option will be provided for the compiler to produce object code which uses entirely scalar sequences in place of either or both the Vector Units and Map Units."

The "God and Motherhood" nature of the preceding statements makes clear their purpose, and quite probable their inclusion, in any compiler requirement, proposal, and specification. What additional characteristics should be highlighted for the FMP FORTRAN compiler, however?

### 3.3.3 CONSTRUCTS

The compiler must be required to recognize certain source language constructs and from them derive FMP instructions. The simple DO loop construct described above would yield

DO $10 I=1,100$
$10 \quad A(I)=B(I)+C(I)$
a simple vector addition of the two arrays $B$ and $C$ in Main Memory, with the results going back to memory. By adding a simple statement

$$
\text { DO } 10 I=1,100
$$

$C(I)=3.14159$
10

$$
A(I)=B(I)+C(I)
$$

the compiler should generate two concurrent operations - one map operation transferring the constant to Main Memory array $C$ and one vector add operation. Adding yet another statement

DO $10 \quad I=1,100$
$D(I)=E(I)+F(I)$
$C(I)=3.14159$
$10 \quad A(I)=B(I)+C(I)$
the compiler should produce again two concurrent instructions one map operation to broadcast the constant to the array $C$ and $a$ vector operation to perform the pair of vector adds simultaneously. A sequence such as

D0 $10 \quad I=1,100$
$\mathrm{D} 1(\mathrm{I})=\mathrm{E}(\mathrm{I}) * \mathrm{~F}(\mathrm{I})$
$C(I)=3.14159$
$10 \quad A(I)=B(I)+C(I) * D 1(I)$
would also produce one vector instruction and one map instruction, with the vector instruction producing the result D1 which is stored to memory, then using that result internally to form the result $A$, all in one pass through the data.

A more complex sequence must also be vectorized:

$$
\mathrm{J}=1
$$

DO $10 I=1,100$
IF (A(I).GT.B(I))GO TO 10
$C(J)=A(I)$
$\mathrm{J}=\mathrm{J}+1$
10 CONTINUE
This construct and its variants should produce a map instruction which performs a vector compress operation, based on the stated conditions. Taking a key example from lines 1900 through 2020 of the implicit code in appendix B:

DO $8 \mathrm{~K}=1$, KMAX
RJ ( $1:$ LMAX $-2,1: J S L, K)=Q(K, 2:$ LMAX $-1,6, J: J+J S M)$
$8 \quad \stackrel{\prime}{\prime} \quad$ continue

This sequence must produce a single map operation for each dynamic variable RJ, XJL, YJL, ZJL which performs the transpose of the arrays. The transpose is accomplished by a gather operation which, for each K, moves LMAX and JSL columns and rows into a new alignment in memory. (See implicit code writeup, Division 2.)

An example of the various forms of object code generated by an FMP-oriented compiler are given in appendix $D$. The object code lines are denoted by a comment card of the form:

Cझ
VEC nn
op1:op2:op3
$\mathrm{VL}=\mathrm{n} 1$; $\mathrm{WK}=\mathrm{kn}, \mathrm{RK}=\mathrm{km}$
or
C\# MAP nn op4:mm $\quad \mathrm{mR}=\mathrm{m} 1, \mathrm{RS}=\mathrm{m} 2, \mathrm{ST}=\mathrm{m} 3 / \mathrm{m} 4 ; \mathrm{WK}=\mathrm{km}$, RK=kn
or
C非
MAP nn
op5:mm
$\mathrm{CVL}=\mathrm{m} 5, \mathrm{VL}=\mathrm{n} 1$; $\mathrm{WK}=\mathrm{km}, \mathrm{RK}=\mathrm{kn}$
where:

| nn | abbreviated symbolic name of destination vector |
| :---: | :---: |
| op1,op2,op3 | mnemonic vector operation codes |
| op 4 | mnemonic code for gather (GTHR) or scatter (SCTR) |
| op 5 | mnemonic code for compress (CMPS) |
| n 1 | vector length in elements |
| km | write key |
| kn | read key |
| mm | memory option MM=Main Memory to Main Memory IM $=$ Intermediate Memory to Main Memory |
|  | MI=Main Memory to <br> Intermediate Memory |
|  | $B I=$ Backing Store to |
|  | Intermediate Memory |
|  | IB $=$ Intermediate Memory to |

m2
$\mathrm{m} 3 / \mathrm{m} 4$
m 5
record size (for gather and scatter)
length of stride in each stride direction
control vector length

The read and write keys may be omitted; a key of 0 (no key) is assumed. If any specification field other than read or write key is omitted, a value of 1 is assumed. Only the number of open fields necessary to specify the required function should be used, e.g., MUL:ADD.

The comment line does not include all of the parameters needed for an actual machine instruction, such as addresses, but the code shown represents enough data to feed the FMP simulators. Where it was desired to show the operation relationships to source code symbols, a pair of brackets "<", ">" is used to surround a brief comment about the data used.

When a vector operation produces two outputs, one on AW1 and the other on AW2, two lines are used as in lines 4370 through 4383 of appendix $D$ :

```
U13=B1.(1,3)*L11
U14=B1(1,4)*L11
```

C
C\# VEC U13 MUL. VL=SSL*SSMAX
C* $\$ \$ \$$ U14 MUL
Note that $C^{*}$ indicates a continuation of the previous Ine, and $\$ \$ \$$ indicates a dual vector operation. Only one vector length is used for both operations and must appear on the first line..

The code sequences shown in appendix $D$ do not include any of the scalar code, since the concern was primarily with the vector operation rates for simulation purposes. Analysis shows that all. scalar operations can be "buried" under the vector execution umbrella IF THE COMPILER SCHEDULES THE CODE PROPERLY (see section 5).

Only one small instance of the compiler scheduling of vector operations is shown and this is critical to the performance of the implicit code. The execution of each sweep calculation in STEP is bound to the data transfers from Intermediate to Main Memory. The compiler must be able to automatically generate and schedule "look-ahead" or "fetch-ahead". code. For a scalar example that is common

```
    DO 10 I=1,100
    D=A(I)**2+B(I)**2
    E(I)=D*(A(I)-B(I))
most modern compilers for multi-register machines will generate a code sequence that looks like:
```

    FETCH A(1) to register A
    FETCH \(B(1)\) to register \(B\)
    \(I=1\)
    LOOP: FETCH A(I+1) to register A:
FETCH $B(I+2)$ to register $B^{\prime}$
MULT A*A to T1
MULT $\mathrm{B}^{*} \mathrm{~B}$ to T2
ADD $T 1+T 2$ to $D$
SUB $A-B$ to $T 1$
MULT D*T1 to E
STORE E to E(I)
MOVE A' to A
MOVE $B^{\prime}$ to $B$
ADD $I=I+1$
TEST
GOTO LOOP IF NOT DONE

```

This object code is necessitated by the time required in many machines to bring data from memory to a register. The "prefetching" operation helps overlap the time to get the next data from memory with the calculations on the current data. At the end of the computations the new data are then transmitted between registers (a very fast operation when compared with memory transfers).

In a similar manner vectors can be "premapped" so that arithmetic can be overlapped with the next map operation. This is shown in lines 821 through 829 of appendix \(D\) where the first set of vectors is mapped into Main Memory before the loop starting at line 830 is initiated.

In the main, the remaining "pseudo object code" shown is left in place with the corresponding FORTRAN source statements, with small exceptions necessitated by the need to combine some operations into a single Vector Unit function. As an example, the object code for line 1150 is shown after line 1160, at line 1162, and is combined with the functions invoked by line 1160 and shown at line 1163
\(1150 \quad \mathrm{U}=\mathrm{RR}\) *Q2
\(1160 \mathrm{~V}=\mathrm{RR} * \mathrm{Q} 3\)
1161 C
1162 C非 VEC U MUL VL=(KMAX-2)*LSL*(JMAX-2)
1163 C* \(\$ \$ \$\) V MUL

In actual practice the compiler must be able to shuffle the generated code around to assure maximum utilization of the Vector Units.

The compiler must provide an object code listing on request, and some method must be provided to key generated code to the source language statements that generate the code. This is essential because more than one source statement may be combined into a single vector operation and then that instruction rescheduled elsewhere in the instruction stream.

\subsection*{3.3.4 PERFORMANCE}

The general statement that a compiler "must produce the most efficient object code possible" is not adequate to meet the needs of the NASF procurement. Some definitive and quantitative measures must be established and specified as minimum object code performance goals. The obvious performance goal is to have the compiler and FMP hardware marriage produce an object code execution speed that can complete specified metrics in a certain amount of time. The compiler, however, must be disengaged from the speed of the hardware if it is to be properly specified as a separate, procurable entity. Other measures that suggest themselves are percentage utilization of Vector Unit capacity, percentage used of available concurrency and percentage vectorization. Each of these have some deficiencies. A compiler can generate a totally vectorized code which is terribly inefficient in use of the hardware. A compiler can also generate unneeded vector arithmetic (failing, for example, to eliminate common subexpressions) which keeps the Vector Units \(100 \%\) busy to no benefit of the actual problem solution. Finally, in a similar manner, the compiler can generate three inefficient streams of code, one each for the Map, Scalar and Vector Units which provide \(100 \%\) concurrency.

The best alternative seems to be specifying one or more of the performance metrics (implicit, explicit, spectral weather, and finite difference weather codes) as a measure of the compiler capability. Given a coding of any of these metrics in the FMP FORTRAN dialect specified for the NASF, the object code must execute an entire solution without \(1 / 0\) calls in no greater than \(120 \%\) of the theoretical execution time for the program. This means that a method must be derived for computing the theoretical time allowed.

If the peak rate of the FMP hardware is 1.5 gigaflops, then one can count all arithmetic computations in a metric and determine a best time for execution for a given set of parameters. Where data dependencies exist in the problem solution (as in the method of characteristics) some canonical value and associated parameters could be chosen for the total arithmetic load. Functions such as SQRT, SIN, COS... would each be assigned an equivalent floating-point operation count. For example, if a time is established in this manner for the subroutine BTRI in

STEP, for three calls and mesh dimensions of \(100 \times 100 \times 100\), this might produce a theoretical execution time of 5.148 minutes.

This approach on the part of the NASF customer is no different than the method for setting performance goals for standard product compiler improvements. A particular benchmark becomes crucial to a sale and the FORTRAN developers are launched. forthwith to achieve some real timing goal for that benchmark. To be useful, meaningful segments of production codes must be used for this measurement. Note that the use of I/O was excluded, an attempt to decouple the object code performance objectives from the operatng system performance objectives. Thee 'is some danger in this, since a good deal of execution time is spent in that grey area called FORTRAN object-time I/O routines which are not generated by the compiler nor claimed by the operating system developers.

Instead the costs of I/O interface should be measured separately by creating a heavily \(1 / 0\)-oriented benchmark with all desired forms of I/O-formatted, unformatted and direct-and establishing some measure of performance. This measure should include achieving at least \(90 \%\) of the total I/O hardware bandwidth available, while reducing the execution rate of a fixed number of map and vector operations by no more than \(5 \%\) (due either to memory interference or object library inefficiencies, or to object time call sequences).

\subsection*{3.3.5 OBJECT LIBRARY}

The object code just discussed is that directly produced by the compiler from input source code. In order for the program to execute however, an array of support software is needed to provide the interface to the operating system, I/O system, exception condition hardware (data flag branch register), and the myriad of intrinsic and external FORTRAN functions such as ALOG, MAX, MIN, and the like. A compiler specification must include these items and should establish some minimum measurable goals for these system components.

Generally, object library routines will be manually fine-tuned using either the Q8xxxx FORTRAN extensions, a system programming language such as PASCAL or (ugh!) assembly language. They should therefore make the best use of the machine resources of any object modules executed in the NASF. For the FMP, four considerations should be taken into account in evaluating object library strategies:
1) The compiler should be able to optionally incorporate any of the FORTRAN supplied routines (see Volume III, FMP FORTRAN) in-line; to permit better overlap and optimization of functional unit usage.
2) When incorporated in subroutine form, a maximum allowable code space should be established for each named routine.
3) A minimum performance level should be established in terms of machine cycles per input argument.
4) A set of standard tests for function correctness should be established and verified on existing hardware systems. Thus a set of end-case operands would be set up for routines like ALOG and its vector counterpart.

All object library routines should use the data flag branch register for error flagging and the FORTRAN supplied data flag manager routine for reporting errors to the user. When vector functions encounter errors, an automatic system for rescanning the results to find the out-of-bounds results and corresponding input routines should be invoked so that the user is relieved from the burden of analysis.

The FORMAT, INPUT, OUTPUT, and DEVICE status routines usually involve a great deal of software "chit-chat" which implies many CALL sequence executions wherein no other useful work can be done. The FMP will utilize Backing Store to perform pseudo I/O for the normal production job. This will be accomplished with the concurrent Map Unit, and implied data moves using LEVEL statements. With the exception of FORMAT processing then, these I/O functions should be performed by in-line instruction sequences which can be scheduled among useful vector arithmetic operations.

Formatted Input/Output should be either performed on-line (that is by a function call to the cracking routines in the object code sequence) or off-line (transmission of data, pointers, and the raw format to the Backing Store where it is blocked up and sent to another processor in the system to be formatted in final form). In either case, formatted Input/Output requires further design analysis.

\subsection*{3.3.6 LINKING AND LOADING}

All programs submitted to the FMP for execution will be delivered in a complete, prelinked and loaded binary form. This block-loaded form is called a "controllee file" and contains, in addition to the complete set of binary modules, tables describing the regions in each level of memory assigned to the program, beginning location of the Main Memory, Intermediate Memory, and Backing Store dynamic spaces.

Integral to the compiling system then, must be a "loader" function which can gather separately compiled modules with selected object library modules from a variety of inventory
caches (files), link the data and entry points together, establish local and global working spaces for each module, and generate initializing information for preset data areas.

The loader performance is only critical when a large number of "compile-and-execute" jobs are passing through the system (during debugging of new applications, for example). Of much more concern is the existence of extensive diagnostics which the user can readily understand both at load time and execution time: In case of catastrophic failures (where even the best program goes berserk) some degree of audit trail should be salvageable from the contents of the various memories to help the programmer find his error. Each module should therefore contain, in addition to the executable binary code and data constants, a series of tables which are used by the loader to line and map routines into the controllee file, and which may electively be retained in the binary controllee file as an aid to debugging or error recovery.

An example of this type of system called the MODULE HEADER TABLE, is shown on page E-3 (appendix E). Each table begins with the ASCII name of the table, in this case "MODULE", that can easily be found on visual scan of an ASCII dump, or by vector scanning memory. The module length in \(64-\mathrm{bit}\) words is found in word 2 of this table. In word 3 appears the ASCII name of the module, usually the PROGRAM, FUNCTION or SUBROUTINE name.
A time stamp for when the module was created appears in word 4 and the processor name and version number can be found in word 5. The header points to a series of additional tables (diagrammed in the remainder of appendix E), which supply loader information and debugging information for object-time debugging.
Most of the table functions are self-explanatory in their name (a tutorial on the loader methodology will not be given here), but two tables deserve a little discussion -- Interpretive Data Initialization and Relocation tables.

There are two techniques for performing data initialization and initialization of relocation pointers at LOAD time. When dynamic loading is called for (load occurs at time of CALL) as might occur in some system routines, the initialization is done by a sequence of generated objct code which is caled the "executable data initialization or relocation table". For normal static loading of FMP applications programs, the complexity of some initialization is better handled by the loader interpreting table entries one at a time. Thus constants and relocation pointers can be scatter-gunned around memory by the loader, or loaded in nice sequential streams, depending on the needs of the code. Relocation pointers are addresses in the code (relative to the beginning of the code) where non-relative branches have occurred or pointers into the register file where static addresses point to code segments or local data quantities. These must be updated as the module is placed in memory following another module, and all addresses thus relativized.

The structure given here permits all of the object code to be aglutinized in a lump with only the two-word header "CODE" intervening between modules. The remainder of the tables may be kept in the Backing Store, with the CODE ponter set to point to its particular MODULE header in Backing Store. In the event of error conditions or debugging actions, the system can either locate appropriate tables by referring backward from the linked code, or locate the linked code by searching Backing Store for the MODULE header and then using the memory pointers to locate the needed information. This is particularly suited to the use of the DEBUG symbol table and SYMBOL definition table which are used by the symbolic FORTRAN DEBUG option on the FMP. With this option, execution can be breakpointed (halted on a particular form of reference to a symbol, including execution of labeled FORTRAN source statements), data can be examined or replaced, and formatted dumps with symbolic names produced. This feature is considered essential in a system as large as the NASF and must be integral to the design of the compiler and loading and linking system. Figure E-1 (appendix E) shows a mixed hexadecimal and ASCII dump of a small controllee file to demonstrate how the tables are usually allocated in memory and how data can be located by the programmer or an analyzer program.

\subsection*{3.3.7 OPERATIONAL CHARACTERISTICS}

Another aspect of the compiler must be specified in the NASF procurement. This involves the execution characteristics of the compiler itself, its performance, code space, and compiling features.

This leads to one of the most difficult questions that has been addressed by the series of NASF studies, whereabout should lie and labor the compiler?

The problem with situating the compiler within the NASF is one of strategy and not of technical capability. The following discussion will reexamine some of the issues that have been discussed in previous reports, and in meetings with Ames personnel.
1) The development of a complex and yet stable compiler, plus supporting object library, is a lengthy and consuming process. If at all possible, an existing compiling system should be used on which to base the FMP FORTRAN in order to reduce cost, schedule, and reliability risks.
2) Until an FMP is operational, potential users and software developers will have to rely on existing computer systems to support programming, compiling, and debugging. The availability of a complete FMP FORTRAN system on these "interim" mainframes is highly desirable for the total NASF success.
3) Compiling on a front-end processor intead of the FMP might be a more effective use of the FMP, which is of course designed first and foremost for high speed arithmetic processing. Certainly the turnaround of compiler detected errors and code listings would be quicker when processed by the front-end processor than the FMP.
4) The specific architecture and model of the front-end processors may not be under the control of the FMP developer and may not be identified until late in the development cycle. It can be expected that the number and qualities of the front-end processors may change over the lifetime of the FMP. Certainly NASA may want the option of varying those parameters of the system as interactive workloads and front-end software features change.
5) "Cross-compilers", which operate on one machine compiling for another machine, have suffered in the past from the need for two machines with which to experiment and develop highly refined optimization tecniques that become ever more important in the maturing years of the target system.

At the outset of this projct Control Data recommended that the compiler reside on the front-end processors and produce code for the FMP (also called the back-end processor). In addition, it was suggested that the loading and linking function also reside with the compiler. As time passed it became obvious that the development cycle for such a compiler pointed toward use of an existing Control Data compiler for either STAR or 7600 as a base compiler. The STAR compiler recommended itself because it was structured to support expanded automatic vectorization as well as vector extensions. One of the reasons for dabbling with the STAR language as the FMP language arose from this rationale. Retention of the STAR scalar instructions, addressing schemes and I/O interface schemes gave weight to the possibility that the STAR compiler could be used in toto, with only minor extensions being necessary. When the FMP FORTRAN (described in Volume III) finally became firm, RADL realized that major changes would have to be made in any compiler to meet the requirements for the FMP. The "almost" free solution of compiling on the FMP for the FMP became no longer free and the "sitting" suggestion had to be reopened for the compiler again.

In the opinion of RADL, the optimum solution would be the development of a new compiler, written in a higher-level. language such as PASCAL, designed to reside on more than one processor, and capable of cross-compiling. The compiler should produce object code which can be debugged and tested on either the front-end processor or the FMP. Full optimization for the multiple functional units would be a selectable option for the compiler. This would provide the flexibility of having compile capablity on all processors in the system. Another advantage
of this would be reducing the need for FMP availabilit.y during its early checkout period to assist in the object code generation checkout.

This optimum solution has, as stated often, the risks of meeting the system implementation schedule.

Compiler performance should also be specified in terms of compile rates, at least those of the CYBER 7600 or CYBER 200 family compilers. Statements per minute for an average FORTRAN program and for fully optimized output from the implicit and explicit metrics should be required of any proposed compiler. Compiler space is not a problem for the FMP but could present difficulties when used on a heavily loaded front-end processor. The compiler should be limited to a fixed residence no greater than that which the current 7600 and other large scale systems FORTRAN compilers require today.

The FMP compiler can easily rely on current compiler technologies, with some special emphasis placed on automatic vectorization, sceduling of vectors, and allocation of vector storage. The only factor in the compiler development that needs to be given considerable attention is that of the sheer manhours and elapsed time to provide the exposure and testing of the compiler and object. code prior to the NASF going into full production status.

\subsection*{3.4 OPERATING SYSTEM FUNCTIONAL CHARACTERISTICS}

\subsection*{3.4.1 GENERAL}

In preceding studies the NASF system concept and structure were described and diagrammed. The network of hardware that has finally been arrived at as an evolution from those studies is shown in figures 20 and 21. Three fundamental ideas have formed the basis for the NASF configuration and system software analysis.
1) Distribution of function among a number of possibly dissimilar processors -- A philosophy-governing this distribution is summarized in the precepts:
a) definition of system resource entities;
b) management of system resources by intelligent (programmable) processors;
c) proximity of resource and its managers should be as close as possible, physically, electronically, logically;


LETTER MODEL* DESCRIOTION
1 TIT-1/6 CVEEP 175 COMPUTEA

H/5 CARD PUNCH
- 3416 CARD PUNCH CONTROLLER
tos caro reaolr
S147 CNRD RLAOLR COVTROLLER
- 580200 Train Printer

CHANNEL TRANSFER SWITCH

Toso\%\% ExTENOEO CORE STORAGE (SE\&X HORDS)
doisplat corsole
TOER-32 MAGNETIC TAPE CONTROLLER
TISS-1 SMO OISX CONTROLLEA WITH 2ND CNANVKL ODTION

- BES-12 EMD DISK UNIT (DOUM SONDKE)
rason Mass srovace cautrollef

vores:
 © צ צek sweer/.

Figure 20. NASF Support Processing System


Figure 21. NASF Trunk Network with FMP
d) resource management functions should be moved outward from a central computer toward the resource;
e) "form follows function";
f) processors with resource management functions have only knowledge (i.e., tables, pointers, etc.) of resources directly attached;
g) each processor possesses a unique catalog of functions it can perform, all others are passed on;
h) message discipline;
i) common sense and reality must dominate any design.
2) Flexible interconnection of all components -- Using a new Control Data system called Loosely Coupled Network (LCN), all system components can be connected to each other using high-bandwidth, bit-serial data trunks which can extend for great distances.
3) A "computational engine" or highly intelligent arithmetic element in the total system -- The FMP should behave as a "slave processor" and not perform any system control operations. Its internal software operating system must be absolutely minimal in order that:
a) FMP software development be minimized;
b) other system software have a minimum number of interfaces to cope with in the FMP;
c) the time needed in the FMP for system interaction processing can be reduced.

This last principle implies that the NASF will rely almost entirely on the operating system and system support software available on the front-end processors for management of the system resources. According to the "distributed system philosophy" the resources owned by the FMP are its functional units (Map Unit and Vector Unit) and its storage (Main Memory, Intermediate Memory, and Backing Store).

The FMP as a single entity is itself a resource that must be managed by some other processor. The management function has been delegated to the front-end processors or support processing system (hereafter called the SPS). The requirements for an operating system within the FMP are thus reduced to bare necessities. The extent of these necessities can be derived from an examination of how the FMP will be used in the NASF environment.

The following provides a recount of the probable sequence of events that will be involved in the processing of a flow model solution.
1) The user will, from an interactive terminal or batch input, initiate the execution of a model solution.
2) The SPS will select the already compiled, linked, and loaded solution program from the file system.
3) The contiguous binary stream representing the program and its locally defined data will be transferred to the 819 disks common to the FMP and the SPS.
4) The initial mesh data which has also been biding its time on a disk or archival file belonging to the SPS, will then be selected and moved to a block of storage on the 819.
5) External data and parameters other than the mesh information will be transferred to another set of blocks on the 819s.
6) A "message" (see reference 2) will be transmitted to the FMP from the SPS via the LCN and thence stored in the Intermediate Memory of the FMP. This message will contain the job description and pointers to physical disk addresses for each of the job components, binary program, mesh data, parameters, and location where output data is to be transferred.
7) The FMP monitor, in its own good time (when in idle loop, or when performing other system tasks), will discover the message in its queue.
8) The monitor will then schedule the transfer of all input data into either the Backing Store (if present) or the Intermediate Memory.
9) Each transfer is directed by a message to an attached PDC (Programmable Device Controller, the key LCN interface hardware).
10.) When the transfer is complete, the PDC responds to the message.
11) When the transfer is complete, the monitor marks the job "ready for execution".
12) When the FMP becomes free, the monitor scans its list of jobs "ready" and selects the highest priority job to begin execution.
13) The new job is rolled in by moving its binary code into Main Memory.
14) Program execution is begun.
15) The FORTRAN program then reads it parameters, which is accomplished by map operations from Intermediate. Memory.
16) Input meshes are also read in this manner, wherein an unformatted FORTRAN READ operation becomes a simple set of block moves of data from Backing Store to either Intermediate or Main Memory.
17) Processing is begun.
18) Formatted \(I / O\) is performed by subroutines in the object program, and data is transferred by means of map operations to buffers that are "psuedo-files". That is, there are no file OPEN, or CLOSE operations, nor I/O activity implied by READ and WRITE operations.
19) Upon program termination, PAUSE or interrupt all data in the Backing Store and Intermediate Memory buffers is rolled out to fixed positions (described by the initial job initiation message) to the 819 disks.
20). The SPS then associates each of the rolled out areas with actual SPS managed files to which the data is then moved.
21) The SPS processes the remaining job commands, either from the terminal or batch job, in order to reduce and display the result data, and to catalog or archive all other data that is to be retained.

In this scenario the 819 disks, the Backing Store, and Intermediate Memory all exhibit one common characteristic -they are managed by physical address and stream lengths, not as files. Thus from a system standpoint, each storage medium is interchangeable with any other (albeit at significant performance differences). This makes it possible for the system to actually operate without two of the three bulk memories. In a minimum configuration for example, with constraints on problem sizes, the PDC connected to the FMP could intercept data storage and retrieval messages meant for the nonexistent 819 s and convert them to block addresses and lengths in the Intermediate Memory.

This is possible only if the 819 s are limited to containing transient data at all times and not permanent files. The advantage of such a scheme is obvious not only for purposes of degraded operation (while maintenance is being performed on the 819 or Backing Store, for example) but can be of great
value during early system exercises where all the hardware need not be present. A phased installation of the NASF is thus facilitated.

\subsection*{3.4.3 INTERACTIVE OR BATCH?}

The large amount of data usually ascribed to FMP jobs means that a single job roll-in/roll-out could require a significant amount of time. Interactive debugging usually implies many such roll-in/roll-out actions during the execution of a job. Should such activity be permitted in the FMP and supported by operating system functions? Despite the inherent inefficiencies that may arise from such a strategy, the potential impact on a high priority project that may need interactive debugging facilities could be enormous if this capability is not present. The "roll-in/roll-out" facility that must be provided in the FMP operating system to support the staging of data into and out of the FMP, can be used also to perform "checkpoint-restart" recovery dumps of the data base and code, when required by the system or the programmer. In addition, this same facility could serve in special circumstances to "roll-out" an executing job, when interrupted by the user, so that key problem parameters can be examined to determine if things are "going right" before allowing the job to consume a great deal of CPU time. Although the FMP is best used in a batch-mode only form, there must be some form of "escape-valve" permitted for privileged users (whose privileges are determined by the system-managing SPS) to STOP, PAUSE, and CONTINUE executing jobs. These privileges can have a serious effect on overall FMP efficiency, and therefore must not be granted lightly by the system manager or the operating system. The need for this capability cannot be overstated, based on the experience of the many large STAR-100 and 7600 user communities who make such demands of what would otherwise be a basically "batch-job"l environment.

Another technique related to this question is that of a production job making intermediate result "dumps" which are to be sent to an output device (terminal, plotter; or printer) for evaluation while the program is still in progress. Although the system load model provided by Ames makes no mention of this requirement, experience with large code indicates that such a capability is highly recommended in the production NASF.

\subsection*{3.4.4 EXTERNAL CHARACTERISTICS}

Given the basic requirements for job flow, what are the necessary external features as seen by. the user from his terminal, and by the programmer in his FORTRAN code?

First, an aspect of the external characteristics of the FMP operating system must be discussed -- COMPATIBILITY. Since the majority of human interactions with the NASF will be with and through the SPS and other processors (such as graphics subsystems), the bulk of operating systems commands will be directed at those processors. If a function is to be invoked on the FMP, it should be described by the identical syntax and format as used on the SPS. More importantly, the relationship of such functions to the system should be the same. What does this mean? The Control Data CYBER operating systems are "file oriented". That is, data files -- raw binary and executable binary data -- are all retained in streams called "files". To invoke a system function or initiate execution of program involves the naming of the file containing the desired program. That file is opened, brought into memory and execution is commenced. At the conclusion of execution, the space used by that file is overlaid by the next file invoked.

Management of program entities is thus through the same file mechanism as is used for data. A control statement, whether submitted via a job control card, terminal or an executing program, thus consists of a file name followed by appropriate parameters. If the FMP is to be front-ended by CYBER machines then, this same relationship and command format should be maintained. The user then has only one set of concepts to assimilate and one set of formats with which to deal. The problem with this is that two effects can destroy what seems to be a nice principle:
a) A NASA choice of some other SPS, at the outset or later in the project, which has a completely different philosophy of operating system relationships and commands.
b) A change in operating system philosophy for the same equipment, a possibility, quite frankly, demonstrated more than adequately by the two Control Data operating systems for the CYBER family, NOS and NOS/BE. Although both are basically "file oriented", one adheres to that philosophy more than the other.

Only one solution can prevent the FMP operating system from being bent in the winds of change throughout its lifetime. That solution consists of eliminating completely any ability in the FMP for interpreting commands from the job stream, or to have any knowledge of resources outside itself (no concept of disk storage or SPS hardware permitted to invade the FMP). This is consistent with the distributed system philosophy, but not consistent with current system practices as, for example, in the "Symmetric Link" facility of NOS/BE. In this case, the back-end processor (such as the FMP) must "drag out" its own data base from the SPS through commands such as GETPF (get permanent file).

All command language processing is then performed by the SPS. The FMP and its 819 s appear to the rest of the system as a block of memory to be loaded with data and programs and told to execute a specified program. Further, the user has access to FMP resources when executing on the FMP and thus the only external interfaces on the FMP are the FORTRAN language constructs given in the FMP FORTRAN Manual (Volume III). The I/O statements, PAUSE and END, provide implicit linkage with the operating system but no others will be countenanced.

\subsection*{3.4.5 INTERNAL CHARACTERISTICS}

Having dismissed the subject of external characteristics so quickly, it would seem that the FMP operating system has vanished completely. There are, unfortunately, sufficient FMP management tasks that must be handled by the FMP to absorb the energies of a good sized development team. The FMP hardware contains some design features which are intended at once to constrain the creative expansiveness of such developers while also assisting their efforts in simplification.
a) Fixed memory allocations -- One 65,536-word block of Memory (beginning at address 0) and one 65,536 -word block of Intermediate Memory (beginning at Intermediate Memory address 0) are set aside for the operating system. Job mode programs cannot access either of these two areas for program execution or data access. Monitor mode execution of code is limited to the 65,636 words of Main Memory, while the monitor is permitted access to all memories in their entirety.
b) Single interrup't -- The FMP can be interrupted having control transferred to the monitor by any one of the external PDC attachments to the I/O ports. While in monitor mode no other interrupts will be permitted.
c) In the event that a Backing Store block that is addressed is not present (because of reduced configuration) or the Backing Store is totally absent, an interrupt occurs, causing monitor to deal with the attempted Swap operation. If a.Main Memory to Intermediate Memory map operation references addresses not present in the Intermediate Memory, a similar interrupt will occur. This provides a limited form of "virtual" memory to the FMP operation.
d) Monitor can establish upper and lower boundaries for data access to Intermediate Memory and Backing Store by the executing program.
e) PDCs have universal access to Intermediate Memory with the following restrictions:
- Once FMP execution has begun after a deadstart condition, the PDC cannot access addresses 0 through 32,767 in Intermediate Memory. This is to prevent inadvertent destruction of the monitor kernel.
- In deadstart mode the PDCs can write everywhere in Intermediate Memory.
f) All addresses in user programs for Intermediate and Backing Store Memory are relative to address 0 of the user space which begins at address ILBOUNDS (intermediate lower bounds) for Intermediate Memory and BLBOUNDS in the Backing Store. Both bounds are assigned by the operating system when the job is scheduled for execution. User addresses in Main Memory start at address 65,536 and are not relative.
g) Any attempt to read or write outside the prescribed bounds will result in an immediate interrupt to the monitor.

\subsection*{3.4.6 MANAGEMENT TASKS}

There are a number of tasks which must be undertaken solely by the FMP itself because it is in the best position, or only position, to make judgments about resource utilization.

\subsection*{3.4.6.1 STORAGE}
a) Allocation of one, two, or three job buffers in Intermediate Memory to permit the overlap of one job execution with the swap-out of another -- the allocation scheme must consider job size, priority, and amount of swapping required.
b) Allocation (and setting of bounds register) of Intermediate Memory for the job going into execution.
c) Presetting of Main Memory and Intermediate Memory to canonical NULLs (for security reasons as well as execution consistency).
d) Allocation of Backing Store Memory (and establishing operating bounds for the job in process) if that element is present.
e) Processing of illegal storage access requests by the user program, determining the nature of the request and transmitting the error information to the job's error processor.
f) Processing "storage not present" interrupts, moving data from alternate storage devices as required by the hardware instruction (map) that caused the interrupt.
g) Degrading the software storage maps when directed by the maintenance function to reduce memory so that maintenance actions can be performed on-line.

\subsection*{3.4.6.2 PROCESSING}
a) Entering \(J O B\) requests into internal queue in priority order.
b) Aborting an executing job or deleting from queue on demand from front-end processors.
c) Scheduling and execution of on-line "confidence" diagnostics.
d) Initiation of job.
e) Processing of job calls for systems services.
1) Transmit message to SPS.
2) Explicit Input/Output.
3) Wait on external message.
4) Terminate job normally.
5) Abnormal job termination.
6) Initiate checkpoint/restart.
7) Request job accounting information (date, time, time on, etc.).
8) Increase or decrease storage allocation in any level memory.
f) Trigger job roll-out by PDCs.
g) Linking FORTRAN "pseudo files" (TAPE1, TAPE2, etc.) with physical memory space in. Backing Store, Intermediate Memory, or local RMS (rotating mass storage).
h) Executing special system functions requested by the Maintenance Control Unit.
i) Transmitting job statistical data to Maintenance Control Unit.
j) Transmitting exceptional condition status to Maintenance Control Unit.
k) Management, reorganization; and culling of processing queues.
1) Deadstart system initialization.

The fairly long list given here is not as complicated as it might seem since, in most instances, the FMP monitor consists of a set of privileged subroutines each of which is called either by the job in execution or by a job operating in the SPS. Thus the SPS message "ADD JOB", invokes a single monitor mode subroutine in. the FMP which inserts the incoming job into the execution queue, with appropriate "time stamps" and then returns control to the executing job. "Time slicing" is not permitted for FMP jobs, and roll-in and roll-out are controlled by the SPS program called the FMP manager. The FMP manager has a total responsibility for the optimization of use of the FMP resource. Decisions regarding how much memory at each level to allocate for a given job are done in the SPS. The allocation message is then sent to the FMP monitor which performs the actual allocation act (by entering data in appropriate tables and setting internal registers when required). So the functions listed above are, in fact, very primitive "slave" operations which behave more like "drivers" of the hardware switches and registers in the FMP.

The FMP "Operating System" therefore exists as a job on the front-end processors, which make all decisions about allocation and scheduling. This makes it possible to begin development of the FMP Operating System on an SPS as soon as the final configuration has been decided upon. The FMP manager would definitely be written in a higher level language (such as PASCAL ) and would operate as a normal user job on the SPS. It would deal with the FMP as if it were a piece of peripheral equipment, whose controllable functions were only slightly more complex than start, stop, transfer data, report errors.

\subsection*{3.4.7 PERFORMANCE CRITERIA}

In the specification of an operating system for the FMP, some measurement points and criteria should be required for degree of functionality, storage utilization efficiency, and processing efficiency. RADL has already forced a limitation on space utilized by the FMP Operating System itself by hardware design "fiat". Some similar criteria.should be established for the FMP manager on the front-end processors. Maximum memory residency on the SPSs should be limited to 32 K SPS words, so that sufficient space remains for the other system management functions on those machines. The FMP manager must have its critical kernels resident at all times in the FMP to handle the processing load expected and to meet the response time requirements.

If a full queue of completely vectorized jobs in the FMP is assumed, then it can be required that the FMP monitor perform the servicing of such a job stream with no more cost than a \(5 \%\) burden of additional elapsed (wall clock) time over the theoretical time to complete the same set of fully vectorized codes. This means that the Vector Units would be kept busy 95\% of the total machine hours (assuming the compiler has \(100 \%\) vectorized the codes) in an operating day.

Minimum system response times for functions performed outside the FMP can be established for each message type listed above. For example, in the case of "explicit I/O" requests, the time taken away from the job for the monitor processing of the I/O message must be limited to some infinitesmal period like 20 microseconds (during which time the Vector Units could have been performing up to 60,000 calculations except for the interruption). Further, the time required to service the request must not be greater than \(5 \%\) of the theoretical service. time. As an example, take a disk read request of 65,000 words from the 819. Given the average latency and transfer times, the opeating system additional time for the transfer cannot exceed \(5 \%\) of the theoretical total of latency and transfer time.

In the same way, criteria can be established for response times from the SPS, maintenance units, and other attached processors, for each function listed in the preceding discussion.

Finally, the FMP manager must be bounded by minimum performance standards. Assuming an available backlog of jobs flowing into the system, the FMP manager must keep a minimum queue of 3 to 5 jobs built in the FMP (which then of course, must guarantee efficient processing of that queue). Response times for servicing of critical messages, such as interactive display and interrogation of the FMP functions, must be established to keep the FMP fully busy. If the FMP manager. in the SPS cannot respond quickly enough, then it must at last respond to the FMP with a "roll-out". function so that the FMP can begin arithmetic processing as soon as possible.

Final specification of these attributes of the FMP Operating System can be done as more experimentation with the system models and Ames load data point out the critical spots in the network that must be controlled during system development. It is expected that work with these models will continue into subsequent phases of this project so that a sufficiently rigorous specification can be produced to guide the implementation of the FMP Operating System.

\subsection*{4.0 FLOW MODEL PROCESSOR SIMULATION AND ANALYSIS}

\subsection*{4.1 FMP SIMULATOR}

Before construction of an FMP can begin, Control Data believes that the entire assemblage must be designed and simulated at the fundamental circuit level. This simulation would necessarily involve the execution of diagnostic sequences (to determine functional adequacy. and the execution of key portions of the performance metrics (to determine the execution speed). The design process for Control Data super-computers also includes simulation as a tool from the very inception of the effort through actual machine checkout. Even with the most powerful computers available on which to execute the simulators, the number of logic events simulated consumes more computer time than is reasonably available.

It becomes necessary then to establish a hierarchy of simulation systems, each one capable of a certain level of analysis of the design and FMP machine operation. The three levels identified are:
a) "Gate level" -- This simulator is used by the hardware designers to check the behavior of the actual circuits and wired interconnections with a very high resolution. In most design cases the resolution of the signal analysis is carried out to 50-picosecond intervals. This level is used to verify design before parts are ordered for the machine.
b) "Block level" -- This simulator represents aggregates of several individual components as "blocks" of logic, for example, an adder block capable of performing a two's complement add operation on input operand pairs of 16 , 32 , or 64 bits, or any increment between 16 and 64 bits. These biocks are then interconnected by the designer with the same diagnostic and performance sequences then run through them as for the gate level. The execution speed on current machines for this level can be 10-60 times faster than the gate level, thus larger assemblages of logic can be simulated over Ionger sequences of input instructions. The processing of a single vector operation on the FMP, however, still consumes extraordinary amounts of time even at this level of simulation.
c) "Functional block" -- Many events in the preceding simulation occur at the picosecond or nanosecond level. When one tries to examine the behavior of an overall performance metric though, the effects of memory data transfers become important to the final performance estimates of such a machine. In a simple case, the moving of a single slab of data from Intermediate to Main Memory using the Map Unit requires (for the
implicit example) that 65,536 words be moved at the rate of eight words every clock cycle for a total of 8192 clock cycles. Performing analysis of this activity at the block or gate level is prohibitively expensive in terms of simulator time, and submerges all of the more refined circuit and block analysis, as far as elapsed time is concerned. A third level model is then desired which allows interaction of all functional units to be modeled at less resolution for longer periods of time.

This approach is quite similar to the actual design process. First the architect draws out the overall scheme and fragments it into logically separable entities (functional blocks). Then estimates are made as to the performance and functional requirements of each entity. A computerized model is then developed for each entity, and the lot subjected to some form of programmatic analysis. If the architecture holds up, the design is initiated. Each entity is then broken down into a set of constituent logical blocks and this breakdown modeled to determine if all data and control paths are in place and timed to the nearest clock cycle. Finally, each logic block is broken down to circuit elements and coax and foil interconnect, and the entire assembly is "tuned" to the nearest \(50-100\) picoseconds. Manufacturing documentation is then produced automatically from all levels of the simulation; the parts design is extracted from the gate level model, put on tape, and shipped to electronics vendors. The machine is then constructed.

At each stage of the design, the higher level model is verified against the continually refined, more detailed models. For example, the functional model of a vector unit would have a sequence of vector instructions put through it and timings would be analyzed. As the block level design is completed for a vector unit the identical sequence-is put through the more detailed model and the timings compared to the higher level model. Where necessary, the higher level model is adjusted to reflect the actual design. This is carried out once again as the gate level design and simulation is carried out, with the more gross timings of each higher level model matched against the actual hardware design. In a well structured design environment, all three models use the same interfaces, control cards, and file structures so that a "mixed" simulation might be executed where the memory system and scalar unit can be simulated at the functional level, the vector stream control at the block level, and a single vector unit simulated at the gate level. This interchangeability permits the designers and potential users to evaluate various critical networks and performance questions without having to run the entire ensemble at the gate level.

To carry out the FMP design for this study period of the NASF project to a degree that ensured the buildability and performance of the system, all three levels of design had to be explored to some degree. A modest amount of gate level design was undertaken to answer some design questions not addressed by LSI design already in progress at Control Data. A good deal of block level simulation was done for the memory system (since that is the crucial element in the FMP bandwidth) and the vec.tor units (since they perform the useful work and take the major amount of logic in the FMP). To the extent that detailed design has been carried out, the block model represents the actual behavior of the FMP to the clock cycle level. This model was first developed for Ames in the second study period of this project and reflected the design of the FMP at that time. Throughout the summer of 1978 this model became known as the "Detailed FMP Simulator". As the FMP architecture changed from that described in reference 2, the simulator underwent major changes, some structural and some procedural (input, output, and use). Versions of this simulator have been delivered for use by Ames, and with this report (Volume IV) the simulator is formally delivered to NASA.

The functional level simulator has been called the FMP High-Level Simulator, and is documented in Division 4 of Volume IV of this report. For significantly long code sequences where vectors are long ( 1.00 or longer), the high-level simulator is an accurate representation of the current FMP design. Where short vectors or single element map operations are to be analyzed, the detailed simulator is required since only at that level are specific memory conflicts modeled. The current detailed model has not been completely verified against the latest design changes in the Map Unit so it is possible for short vector data to be as much as \(10 \%\) in error from what the actual hardware would yield.

\subsection*{4.2 BENCHMARKS FOR THIS STUDY}

Divisions 2, 3, and 4 of this volume contain discussions of the four codes submitted by Ames to Control Data as metrics around which the FMP was to be designed and analyzed. The characteristics of the implicit code--the number of computations to be performed can be estimated exactly for any set of input parameters-the code appears to be the best candidate for long term production on the FMP--have led it to be the primary focus for all hardware and software design efforts. In previous reports the characteristics of this code have been explored and the computational behavior analyzed to help direct FMP design efforts.

The weather codes were of great interest at the outset of this phase since the question had been raised as to the suitability of the FMP to broader applications areas. It was determined that for the two weather codes provided, the original FMP design
(as documented in reference 2) would not achieve the 1000 megaflop goal for the FMP.

Four different approaches were used to pursue the benchmark analysis. This was due, in part, to the experience and specialties of the staff working on the codes, and in part because of the very nature of the investigation.
1. The implicit code became the central theme of the FMP design and the language and compiler specifications. As a change was introduced in the hardware or software, the implicit code, or a portion thereof, was used to test the idea. The simulators were first exercised with the implicit code. A decision was made to treat the entire implicit code with respect to language, compiling, and simulation in order to ensure that some design feature was not overlooked.
2. The explicit code employs solution techniques which differ in some ways from those of the implicit code. The methodology requires convergence of the solution, a data-dependent feature which controls the number of calculations, unlike the fixed number that are inherent in the implicit code. Apparent recursion relationships impact the amount of vectorization that can be readily attained without severely restructuring the code. It was decided to turn over this evaluation to a new group of analysts and have them attack the code wi thout predisposition toward the FMP. They were instructed to vectorize the code as directly as possible from the statements in the original source language, and make the code operable on the STAR-100. The conversion of the STAR-100 code could then be done fairly mechanically to the FMP, since the primary functions used would be map and arithmetic operations. The answers obtained on the STAR-100 would be checked to make sure the algorithm conversion was "complete" and then simulator input for the FMP detailed and high-level models could be transliterated from the STAR code.

With the limited analyst resources available, it was decided to tackle only the LX, LYC, LYI, and CHARAC routines for the purposes of recoding and simulation. The LX and LYI subroutines were vectorized in a direct manner, rewriting local DO loops only. All data passed between these subroutines and the remainder of the code was left in the original scalar allocation so that the code could be run with and without each of the vectorized codes, on the STAR-100, to obtain correct answers. This technique led to object code for the FMP with small vector lengths, compared to the implicit code, but the exercise did provide a vehicle for testing the performance that could be gaind by making simple first-attempts at vectorization. The

TURBDA subroutine and LYC turned out to be vectorizable without much thought or effort but CHARAC proved to be the challenge to vectorization. Initial efforts at converting the code left most operations still in scalar mode with a performance of only 27 megaflops. By calling CHARAC from LYC and LZC to process all data in the I-K plane at one time, CHARAC could be vectorized using the control vector capabilities of the FMP. Estimation of performance could not be done using the simulators, because the degree of sparsity of the permissive bits in the control vectors has a direct impact on the gigaflop rate. Making the assumption that for every pass of three there was a reduction by half in the active calls to CHARAC (represented by the control vector), performance of CHARAC was estimated for the \(100 \times 100 \times 100\) mesh case at about 200 megaflops.
3. The spectral code which originated at MIT was analyzed to determine what special machine characteristics were needed to make it meet the performance goals of the NASF. The FFT and SPCFOR (Legendre transform) routines were chosen from the spectral code for simulation because they constitute the computational heart of the spectral code. These routines were vectorized quite directly and simulated for two problem sizes. The first problem, with relatively low resolution, represented 25 layers of atmosphere, 6 wave numbers and 15 latitudes. A more reasonable form for useful production work, in the opinion of CDC investigators, would consist of 48 layers, 21 wave numbers and 53 latitudes. The two cases were simulated to show the range of performance for this code.
4. A shortcut method of handling the GISS model was tried in order to reduce the resource commitments for this secondary phase of the project. The GISS model had been fully vectorized for the STAR-100 and a copy of that model was available to us. If necessary any segment of that code could be transliterated from STAR to the FMP and the results subjected to either detailed or high-level simulation. Instead, Computer Sciences Corporation (CSC) continued to project FMP performance on the codes using the simulator delivered with reference 2. As a result of that study, the FMP was shown to be substantially lower in performance on the

> GISS model than RADL investigators assumed, based on the performance of the fully vectorized version on the STAR-100. One example unearthed was a routine called LINKHO in the GISS model which \(r\) an at rates far less than 100 megaflops, and was allegedly unvectorizable. Shortly after, a corresponding vector portion was extracted from a current STAR-100 version and simulated without change on the FMP, vintage 1979. The result rate was shown to be 700 megaflops for that one routine! The LINKHO and AVRX routines were thus chosen for further study, with a representative portion of LINKHO being simulated with the LVLI simulator for the FMP. AVRX was chosen for manual extrapolation because it represented the physics computations of the GISS code and because its vector characteristics could be estimated by hand.

Mention of this difference is not meant to reflect on CSC's effort, but highlights two important difficulties in this evolutionary effort. The FMP has been a moving target as far as design is concerned since the inception of the project, and thus estimates made in summer, fall and winter of 1978 would vary wildly. The second problem is that "vector programming" or . "vector thinking" is still a young discipline and still functions more as an art than a science. In this case, the author of the original code was able to see and recognize the "parallel" nature of some of his own constructs and thus make certain vector judgments in the GISS recoding, whereas a new analyst, confronted with a code not of his own making and a machine which is a bit alien, would find great difficulty in his first attempts at mapping code.

Neither of the weather codes taxed the FMP memory system as did the implicit and explicit codes. In fact they can be completely contained in the Main Memory of the FMP. The major concern in achieving performance on these codes is the degree to which scalar operations have to be employed, where no overlap of scalar and vector can be done, and the lengths of vectors. Simulation results for some segments of the weather codes achieve only 100-300 megaflops because of the presence of short vectors and the effect of vector startup time. The hardware design attributes that contribute to startup time have been discussed previously (in the hardware design section of this report), and relate to memory access time, pipeline lengths, and interconnections. The detailed FMP simulator imposes an average of 6 clock cycles per startup of each arithmetic vector. (Design analysis to date indicates that 6 cycles is a reasonable expectation for average vector startup.)

It can be seen that if vector startup time were zero and vectors could be scheduled back-to-back by the compiler, then the actual computation rates would in. fact be the same as the theoretical rates. Thus rates of 500,1000 , and 1500 megaflops could be achieved on the weather code. If vectors are processed which are equal in length to the number of levels in the model, then lengths of \(16-32\) would be expected. Given a 1000 megaflop peak rate for such lengths a table of statup costs would show:.

Sttartup cycles

1
2
3
4
5

In truth, the Control Data FMP can have zero vector startup for many types of vector operations, in particular when there is no "loop back" in the pipelines between functional units, or mode changes for the data paths in the Vector Unit between vector operations. It is possible then for a carefully coded (yet brute force) conversion of the weather codes to achieve close to the 1000 megaflop threshold.

\subsection*{4.3 FUTURE METRIC STRATEGY}

One of the significant outcomes of the FMP investigation has been the realization that the tradeoffs between memory hierarchies, processor organization and interconnection, and problem. statement (including data structuring and movement scheduling) create complex ensembles about which some decisions must be made. In particular, the FMP structure proposed by Control Data offers three levels of memory, with consequent levels of bandwidth, combined with functional parallelism which provides concurrency of operation based on the separation of functional identity.

A brief examination of this multi-faceted architecture in the face of the problems expected to be applied to the NASF indicates the tremendous variety of performance that can be achieved. The small problem, which could be used as a research tool or for debugging a new set of parameters, will of course fit in the Main Memory and operate at much higher map rates while possessing shorter vectors that have commensurately lower
throughput rates (due to the effect of the spectre of vector overhead). On the other hand, certain research codes may exceed the LEVEL 2 memory and need to operate at lower rates than 1 gigaflop, in order to achieve the goals of the research project. The NASF must be able to accommodate the entire range of programs in order to be a viable research and development center, as well as a high capacity production facility.

The meaning of these observations is clearly that the set of "benchmarks" or "metrics" that are used to evaluate prospective computer systems for the FMP must be variegated in quantitative as well as qualitative spectrum. Thus there should be a. two-dimensional approach to the "metricization" of evaluation programs for the FMP.

One dimension of this measurement system should be the type of processing. Thus the implicit and explicit aerodynamic codes should be the primary evaluation tools for design, simulation and actual installation of the NASF. The interest and investment in the weather models should then, of course, require the inclusion of characteristic approaches to the weather projection problem. The two such models, spectral and finite difference, are offered by. NASA as representative of two "poles of thought" in weather investigations. Two other important models that were not investigated, and to which it can easily be forecast the NASF will be subjected, are the "particle in cell", or in another context "Monte Carlo technique", and the "structural" codes.

Considering the massive investment in financial and human resources, as well as the extensions into other disciplines that are planned for this facility, it would seem that a rationalized and systematic method needs to be employed for the analysis and evaluation of the NASF system to be procured. A suggested methodology for this might be:
a. Identification of type

Implicit Navier-Stokes, time-averaged, algebraic turbulence;

Explicit Navier-Stokes, time-averaged, algebraic turbulence;

Microscopic, full Navier-Stokes turbulence model;
Finite difference weather circulation model, operational forecast;

Spectral solution, weather circulation model, operational forecast;

One weather model for research purposes (mathematical method is optional);

One existing structures code, finite difference;
One particle in cell model.
b. Identification of size

Configurations should be chosen for each type that represent
1.) an ex-isting configuration that can be run on and compared with numerical results and performance of existing systems (such as the CDC CYBER 76);
2) two typical configurations expected for everyday production work (the variance of any one of the three dimensions in the implicit code, for example, can have a substantial effect on performance);
3) a maximum production run that is conceivable for the 1984-1989 timeframe;
4) a realistic research model configuration.
(For example: The existing flow models are presently running on the CYBER 76. These should be the baseline mesh sizes for these models. The expected \(50 \times 80 \times 80\) mesh should represent the production data base sizing, and the \(100 \times 100 \times 100\) should represent the MINIMUM THRESHOLD of performance benchmark. Finally, a research model of at least \(500 \times 500 \times 500\) should be chosen as a "metric" configuration.)
c. Prioritization

The designers proposing a NASF system must be given as many degrees of freedom as is possible, given the massive national attention that will haunt this project. Therefore it is imperative that a system of evaluation be established, AND ADHERED TO...NO MATTER WHAT THE PRESSURE, which
1) picks the main emphasis of the NASF (thus the minimum performance should be pegged at one gigaflop for the 3-D implicit code);
2) establishes an acceptable mix of use of such an implicit code for 1) test cases, 2) production, 3) research models;
3) identifies the relative priority of interest in the other codes (explicit Navier-Stokes, weather, structural, etc..);
4) provides measurement specification.

The implementation of the NASF will proceed through three more distinct.steps:
1. Proposal of candidate architectures.
2. Design of candidate architectures.
3. Construction of final architecture.

The set of metrics that are arrived at for evaluation must be subjected to performance analysis at each of these phases to determine the viability and reality of a given design. In the first instance (proposal), all parties offering candidates must be requested to provide estimates of performance broken down by section of probable machine code. In the second instance, the analysis should consist of systematic breakdown and simulation of the same portions of the code that were analyzed in the proposal phase. This simulation should be accomplished at the detailed level of the design to verify that the hardware to be built will, in fact, meet the proposed performance objectives. The cost of such simulation is such that a complete simulation of a metric run is impossible. Instead, the candidate design simulations should be allowed to make segmented analysis, without requiring all trips through all loops, with a specification as to how the single-pass results may be extrapolated.

The implicit model which has been discussed previously may be used as an example. A detailed design simulation is required. of
1) the first trip through each sweep loop,
2) a subsequent pass through each sweep loop,
3) the last pass through each sweep loop (to cover any end cases); and
4) one trip through VISMAT,
in order to get the basis for detailed estimates of the left-hand side computation. Given the sets of KMAX, LMAX, and JMAX taken from the sizing task (b. above), these simulation results can be extrapolated for each sized case. The method of extrapolation should be specified by the consumer (Ames) so that a uniformity of approach is guaranteed from all candidate design offerings.

Finally, each metric in each size must be prepared for execution in full on the final operational machine. In addition to the reliability, confidence, and other specified acceptance tests, this set of metrics should become the final acceptance test for performance of the FMP.

Note that for some of the metrics (particularly the Navier-Stokes research models), a data flow analysis between memory hierarchies is required for each phase of the design and construction effort, as well as the computational performance of the FMP. This data flow will include not only backing storage modeling, but. I/O transfer and support processing system activity as well.

All evaluations of prospective FMP systems should then start with the absolute requirement to achieve the 1 gigaflop level for test cases, production runs, and \(100 \times 100 \times 100\) maximum configurations. The remaining cases must be coded, simulated and estimated for their individual performances without a mandatory performance level being required. All prospects then are given the optimum level they must obtain for the primary mission stated for the NASF, but also must provide data relevant to other possible uses.

After this phase is concluded, the advice of all contractors should be collated to form the quantitative basis for the other aspects of this evaluation.

A major dilemma facing Ames reseachers is the decision that must be made in the event that one candidate could be shown to perform substantially greater (or substantially less) than all other candidates for all codes other than the implicit, 3-D, aerodynamic code. A weighting factor then must be applied to the performance of the NASF which not only includes probable mix results but also NASA-Ames priorities. It is folly to think that such priorities can be identified strictly by money or its equivalent in machine time. There are, for such a national facility, many more considerations which must be identified and prioritized by NASA before a meaningful competitive analysis can be commenced on candidate designs.

\subsection*{4.4 BENCHMARK SPECIFICATION}

In the preceeding section there is mention of the problem of properly specifying the performance benchmarks to be used for procuring the NASF. The problem arises from the scenario which is believed to be the best approach to designing and evaluating the FMP.

Specification of at least two production codes representative of the anticipated NASF workload:
2. Specification of at last one code representing the research environment which will share the facility.
3. Specification of at least four additional codes which demonstrate computational behaviors different from those exhibited in the preceding three codes.
4. Specification of a user-produced full-function evaluation. (This would demonstrate to NASA that all specified functions are present in the FMP and operating according to the documentation. These would be in addition to the vendor produced functional diagnostics.)
5. Creation of the FMP and system, and production of high-level simulators for both.
6. Execution of 1 through 4 in their entirety on both the system and the FMP simulators.
7. Derivation of the block level design of the FMP, and production of the block simulator.
8. Execution of 1 through 4 on the block model to verify functionality and performance levels.
9. Detailed design of the FMP and production of the gate level simulation of the FMP from that design.
10. Execution of parts of 1 through 4 on the gate level model.
11. Negotiate price and schedule.
12. Order parts and build.
13. Demonstrate 1 through 4 on the actual hardware. Real execution must match the three levels of simulation performance within a prescribed allowance: no variance at the gate level permitted as measured by clock cycles, \(10 \%\) variance from block level model due to unseen conflicts, and \(15 \%\) variance from high-level model. These variances are similar to those that could be seen comparing the gate model with the block model and the high-level model.

This process would minimize the risks to both Control Data and NASA, and in effect, uses the simulation model as a "meta prototype", obviating the need for the construction and demonstration of an expensive and superfluous hardware prototype. Under no circumstances should NASA anticipate entering purchase agreements for a system of this magnitude without insisting upon a "fly before buy" demonstration. The demonstration can be valid using the "meta prototype" if the simulation systems can themselves be validated with real results. Each stage of the FMP development -high-level, block and gate-- can be evaluated before the next stage is initiated, thus reducing all partner's risks on entering the next stage.

As previously discussed, however, the perfect scenario is marred by the fact that all levels of simulation absorb more computer power than can be economically, or even practicably realized. The project is faced with the need to scale, first the actual production runs, and then the runs that can be made at each level of simulation. A summary of recommendations might be:
1. The base set of benchmarks should be codes now in existence, which are in operation on the 7600. Timings for each benchmark on the 7600 should become a baseline for performance comparisons.
2. The base set should be scaled in terms of mesh sizes, time steps, and iteration counts for at least three levels of probable production runs - small (perhaps 7600 size), medium, and large. The medium size might represent the normal workload parameters and the large might represent the extraordinary "research" models.
3. The base set should be representative of existing and projected computational requirements in functionality as well as complexity.
4. The entire base set will be run for all sets of parameters as the key acceptance test on the actual FMP hardware.
5. Each benchmark will have key segments extracted for simulation by the high-level simulator. These segments can be entire code sequences limited to one pass through a loop or nested loop. A multiplication factor for the number of trips can then be manually factored at a later time. Where code sequences contain branches that cannot be vectorized, or where vector contents and lengths are dependent on the opeating data, several different passes through the sequence exercising the strategic (in terms of performance effects) alternatives should be specified.
6. Sequences of code from these prescribed segments will be identified as "validation code". These sequences will be run on the block model to verify the timing assumptions used in the simpler high-level model. These will normaly be limited to mixed strings of instructions up to 25 in length consisting of scalar, vector, and map operations. Vector lengths will be limited to 100 or less.
7. Sequences of the code extracted in 6 (above) will be further distilled to provide gate level validation code. This is done primarily by reducing vector lengths and permitting some manual multiplier to be used to extrapolate operation for lengths of 100 , when needed.
8. These last sequences will be run on a gate level basis to validate the performance at the gate level of individual units. Using mixed simulation with all other units operating either at the block or high-level and the unit under study runs in gate simulation mode.

So it is then that not only must the codes be sized, but judicious choices must be made of selected segments to be used for validation of the various simulation levels. This process has already been engaged during this study period as it became evident that the GPSS models required considerable time to run. Code analysis has consisted of breaking up the programs into sequences and then manually computing the combined operation timing.

\subsection*{5.0 PERFORMANCE ANALYSIS AND EVALUATION}

This section addresses the methodology employed to evaluate the FMP design when subjected to the four metric codes --3-dimensional implicit, 3-dimensional explicit, spectral weather, and finite difference weather models. The major approach for all four codes was to derive performance data from runs of 'pseudo-compiled' code using one of the two simulators developed by Control Data for NASA-Ames. Several factors acted to limit the extent to which this was possible.
1. The machine time required to run large code sequences at the detailed (and more precise) level became prohibitive in some cases.
2. The number of FMP machine instructions which can be handled by the higher level (less detailed) simulator is limited to 2000 individual instructions. Loop structures are unrolled into linear code by the simulator and thus this limit of 2000 instructions constrained loop parameters in many cases.
3. The amount of code that could be pseudo-compiled exceeded the available time and resources allotted to this project, due primarily to the priority assigned by Control Data analysts to the detailed analysis of the three-dimensional implicit code. The effect of this decision was to absorb considerable time and talent in the reconfiguring of the implicit code, recompiling, and resimulating of the governing elements of the implicit code.

In many instances then, it became necessary to use extrapolation techniques based on simulator results and reasonable conclusions about the behavior of each unique code sequence. The manner in which the four codes were each addressed is discussed below.

\subsection*{5.1 THREE-DIMENSIONAL IMPLICIT CODE}

The routines BTRI, VISMAT, VISRHS, MUTUR, and the FILTRX/AMATRX, FILTRY/AMATRX, FILTRZ/AMATRX segments of the STEP routine were simulated and analyzed for a variety of vector lengths, to determine the sensitivity of floating point performance to the length of arithmetic vector operations. This was of paramount concern because of the well-known effect of 'vector startup' time on 'short vector' performance in machines like the CDC FMP.

It was discovered that detailed level simulation of BTRI would have to be limited to processing sequences of code taken from BTRI, summarizing the simulator results, and then manually computing the probable behavior of BTRI in its entirety.

The same was true for FILTRX, FILTRY, and FILTRZ when simulation at the lower (more detailed) level was desired. It became obvious that the detailed model of the FMP yielded results within \(1 \%\) of the high-level model when map operations (and hence unavoidable memory conflicts) were minimized. This was true in BTRI, and thus a simulator run was made for the complete BTRI using the high-level model; these figures are used in subsequent tables.

The heavy use of map operations to perform the tranpose and gather operations from Intermediate Memory to Main Memory in FILTRX, FILTRY, and FILTRZ required one simulation pass of these routines at the detailed level and comparing the result with a similar pass made at the higher level. For the vector lengths given later, the difference between simulator results in the high-level model and detailed model was ten percent or less. In all cases the detailed model (which simulates memory accesses and memory busy exactly) would run slower (more clock cycles for the same number of floating-point operations) than the higher-level model.

The data obtained from the individual sequence runs was summarized and used to compare against other forms of runs for the implicit code to validate the manual extrapolations that had to be made.

BTRI is broken into 4 different segments of code to facilitate simulation runs; these segments are identified as BTRIO, BTRI1, BTRI2, BTRI3. The segments are then combined into code sequences representing the pseudo-compiled lines which can be found in the listings of simulator input in appendix F. The simulation results for each segment are then combined to form a total operation count and total clock cycle count for those sequences in BTRI. These can be found in the table below where the first line for each segment represents the number of floating-point operations credited to that sequence. The second line for each segment gives the number of clock periods for that sequence. The floating-point result rate is given for each subsequence as an indication of how that particular form of code would perform on the FMP.


Note that lines 4820 through 5476 and lines 6020 through 6100 are executed IMAX times, where "IMAX" is the maximum length of the column vector being solved by the tridiagonal algorithm. Thus IMAX will become either JMAX, KMAX, or LMAX depending on the sweep direction. Summing the values for the two sets of subsequences:
\begin{tabular}{lrrrrrr}
\(4330-481585490-6015\) & 76300 & 152600 & 305200 & 1220800 & 1220800 & 457800 \\
CLKPD & 5172 & 9124 & 16724 & 62324 & 31924 & 24324 \\
\(4820-547686020-6100\) & 69400 & 138800 & 277600 & 1110400 & 1110400 & 418400 \\
CLKPD & 4628 & 8164 & 4964 & 55764 & 28564 & 21764
\end{tabular}

To find appropriate values for the execution of the full BTRI, the values of IMAX that are reasonable must be determined. This is attacked by establishing characteristic mesh sizes for
the entire implicit solution. Since the primary interest is in the sensitivity of the FMP to small vector sizes, parameters are chosen as follows:
\begin{tabular}{cl} 
VLENGTH & DIMENSIONS \\
100 & \(6 * 6 * 6\) \\
200 & \(8 * 8 * 8\) \\
400 & \(10 * 10 * 10\) \\
1600 & \(16 * 16 * 16\) \\
57624 & \(-100 * 100 * 100\)
\end{tabular}
\begin{tabular}{crc} 
SLAB SIZE & IMAX & BTRI VL \\
2 & 6 & 12 \\
3 & 8 & 25 \\
4 & 10 & 40 \\
6 & 16 & 100 \\
6 & 100 & 600
\end{tabular}

The purpose of this relationship chart is to pick the lengths of operations in BTRI depending on the vector lengths used in FILTRX, FILTRY, RHS, etc. Obviously, the mesh dimensions chosen for all but the last case are smaller than would be normally used. However, it is equally obvious that if the FMP can perform well on these small meshes, and even better on larger meshes, then the threshold of one gigaflop can be attained by a broad range of problem models.

The number of planes appearing in a 'slab', a function in the program of the variables JSL, KSL, LSL, is based on the memory space available. In the case of the first meshes given, it can be seen that the entire mesh could be held in Main Memory and thus no 'slabbing' would be necessary. In addition all transposes would be performed in Main Memory instead of between Main and Intermediate Memories, with a consequent increase in Map Unit performance. For the purpose of this report however, all operations are performed as if the mesh variables had to be retained in Intermediate Memory, regardless of the "typical" dimensions chosen for the flow variable arrays. Using the values of IMAX and vector lengths given above:
\begin{tabular}{|c|c|c|c|c|c|c|c|}
\hline VLEN & TH-IMAX & 100-6 & 20008 & \$00010 & \[
\begin{aligned}
& 1600=16 \\
& 640 M O D E
\end{aligned}
\] & \[
\begin{aligned}
& 1600-16 \\
& 32-M O D E
\end{aligned}
\] & \(800=100\) \\
\hline .8TRI & TOTAL OPS & 4927.00 & 1263000 & 3081200 & 18987200 & 18987200 & 42097800 \\
\hline BTRI & TOTAL CLKS & 32940 & 74436 & 166364 & 954548 & 488948 & 2200724 \\
\hline BTRI & GFLOPS & 0.935 & 1.064 & 1.158 & 1.243 & 2.427 & 1.196 \\
\hline
\end{tabular}

Thirty-two-bit simulation was limited to the set of runs at vector lengths of 1600 . This seems to represent a reasonable length for some problem solutions, and was chosen as a midpoint in the studies of FMP performance done for this report.

To provide data for analysis of the behavior of the three-dimensional implicit code, simulation was performed on the following:

FILTRX/AMATRX
BTRI
FILTRY/AMATRX
BTRI
FILTRZ/AMATRX
BTRI
RHS
------------
VISMAT
VISRHS

MUTUR
-----------
Subtotal for implicit without viscosity

Subtotal for implicit with viscosity, laminar flow

Total for implicit
The pseudo-compilation of FILTRX, FILTRY, RHS, VISMAT, VISRHS, and MUTUR were translated into simulator input (see appendix \(F\) ) and run for the vector lengths mentioned previously. As can be seen from the pseudo-compilation there are a set of map operations that must be accomplished at the beginning of each sweep loop. The results of these map operations are interlocked to subsequent arithmetic operations by use of the read and write keys. The remaining map operations are performed concurrently with arithmetic operations during the remaining passes through the loop. Since each FILTRX/AMATRX sweep is simulated only down to the BTRI call.statement, the times given by the simulator assume that all map and vector operations must be completed before entry into BTRI. Since this is not the actual case the timings given are considered worst case, since in fact the last few vector and map operations can proceed concurrently with the initial functions appearing in BTRI.

The timings for FILTRX and FILTRZ are essentially the same, since they are governed by the "gather-record" operations. FILTRY requires "single element gather" operations however, and must be simulated separately.

All simulation of the implicit code assumes that the code and the flow variables have been staged to Intermediate Memory from either disk or backing storage, whilst a previous code is running on the FMP. Thus no explicit I/O is calculated into the analysis since all data transfers to the "outside world" are assumed to be overlapped with other program executions.

Thus, simulation results for a single run of BTRI for all desired vector lengths (described previously) must be multiplied by three. Likewise the FILTRX/AMATRX simulation results can be multiplied by two (to represent FILTRY/AMATRX as well); the remaining runs are then factored in on a one-for-one basis. The following matrix summarizes the results from the set of runs for the implicit code that can be found in appendix \(F\). The
multiplication factor used to form the subtotal for the left-hand side appears in the left-hand column of the chart. The values given for each code segment represent the single-run counts for that code.

The data that appears in the chart for FILTRX and FILTRY are taken directly from the simulation output and represent two. iterations through the slab processing loop, in order to amortize the startup costs of the initial map operations. Since the chart represents only one pass through each sweep for one single slab, the multiplication factor must be reduced by one-half for each FILTRX, FILTRY, and FILTRZ. Since the data for FILTRX and FILTRZ are the same, the factor for that line is \(0.5 * 2(F I L T R X+F I L T R Z)=1\), while the factor for FILTRY is \(0.5 * 1=0.5\).


As the chart shows, the minimal one-gigaflop rate is achieved at vector lengths of 400. The anomalous behavior of the long vector case is due to two effects. First, if one examines the FPRATES for FILTRX and FILTRY, the rate seems to drop off at the maximum vector length. This is due to the simplification used to run the codes. It turns out that the final map operation (line 800 in the FILTRX and FILTRY simulation input) dominates the Map Unit behavior. As was mentioned previously, the disengagement of the BTRI CALL from the FILTRX/AMATRX sequence for simulation purposes imposes an additional penalty at the end of the simulated sequence while the simulator is 'idling' down. This can be seen by executing the sequence with repeat factors of 1,2 , and 3 , where the 'idling' will be amortized over the total execution.

The second effect is seen in BTRI. The vector lengths used in BTRI are actually only 600, since the iteration within BTRI is recursive in the sweep direction. Thus while all of the other data reflects increasing vector lengths from 100, 200, 400, 1600, and 60000, BTRI lengths are actually \(100,200,400,1600\), and 600. This was an unfortunate choice of parameters, but was established early in the study to ease analytical evaluation of the implicit algorithm. The effect of the tapering of the "performance curve" which could be plotted from the above results is then due to two unnecessary and eliminatible elements. The first can be resolved by resimulating the entire mass with other parameters; the second can be resolved by moving the gather operation (at line 800) earlier in the code (a task for the compiler), and including the first sequence from BTRI as part of FILTRX, FILTRY, and FILTRZ. The FPRATE shown for vector lengths of 57624 is not as high as the original one-gigaflop objective. This was mostly due to the effect of the slow Intermediate Memory to Main Memory transpose operations required by the data just described, which led to a new series of runs with a slightly revised version of-STEP.

In this version the \(Q, X, Y\), and \(Z\) meshes are fully transposed into a form suitable for direct processing by FILTRY. This transposition is performed from Intermediate to Intermediate Memory during the BTRI computations in the FILTRX pass, and requires hand-coding of the gather operations immediately preceding the CALL to BTRI in FILTRX. Once this transpose is complete, FILTRX, FILTRY, and FILTRZ all execute in the same manner, and thus their rates will be identical.

To prove this was possible it was necessary to merge the simulation input files for BTRI and FILTRX, so that the code can be run as it would be executed on the FMP (with the shutdown times imposed by the simulation system-being overlapped by following code).

Unfortunately this scheme generates too many simulator input cards for reasonable values of IMAX (in BTRI). Therefore, a set of runs were made with the revised FILTRX/AMATRX/ TRANSPOSE/BTRI aglutinized, and the following results were obtained:
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline \multicolumn{2}{|l|}{VLENGTH=BTRIVL} & 400040 & 1600-96 & 60000-600 & 1600-96 & (328IT) \\
\hline IMAX \(=1\) & .Flops & 1.084 ES & 3.337E5 & 8.130E6 & 3.335E5 & \\
\hline IMAX \({ }^{\text {d }}\) & CLKPD & 1.08E4 & 2.8354 & 7.29E5 & \(1.578 \mathrm{E4}\) & \\
\hline IMAX \({ }^{\text {ma }}\) & FLOPS & 1.34985 & \(4.001 E 5\) & 8.543E6 & 4.000 ES & \\
\hline IMAX \(=2\) & CLKPO & \(1.36 E 4\) & 3.2984 & 7.51E5 & \(1.877 E 4\) & \\
\hline IMAX \(=5\) & Flops & 2.177E5 & 6.00155 & 9.805E6 & 6.00155 & \\
\hline IMAX \({ }^{\text {a }}\) & CLKPD & 2.17E4 & 4.68E4 & \(8.16 E 5\) & 2.775E4 & \\
\hline IMAX \(=7\) & FLOPS & 2.737E5 & 7.329E5 & 10.636E6 & 7.329E5 & - \\
\hline Imax \(=7\) & CLKPO & 2.72E4. & \(5.60 \mathrm{E}^{4}\) & 8.60E5 & 3.373E4 & \\
\hline
\end{tabular}

Extrapolated data was then determined for IMAX values of 10,16 , and 100 using the following extrapolation equations:

FLOPS (IMAX=7) - FLOPS (IMAX=1)

DELRATE \(=\) The asymptotic FPRATE as a function of IMAX for the additional FLOPS (determined graphically to 3 significant digits).

XFLOPS (IMAX > 7) = DELFLOPS* (IMAX - 7)
XFLOPS (IMAX)

FLOPS (IMAX = 7) + XFLOPS
\(\operatorname{FPRATE}(\operatorname{IMAX}>7)=-16 \mathrm{E}-09^{*}(\operatorname{CLKPDS}(\operatorname{IMAX}=7)+\mathrm{XCLKPDS})\)

where the pair of vector lengths shown represent the typical leng ths in FILTRX, AMATRX, RHS, VISMAT, VISRHS, MUTUR, and appearing in BTRI, respectively. The purpose for using this range was to find the 'knee' of the curve of performance versus
\[
\begin{array}{ll}
\text { ORIGINAL PAGE IS } \\
1-150 & \text { OF POOR QUALITY }
\end{array}
\]
mesh sizes. The range of IMAX was dictated by the maximum number of input statements manageable by the simulator. Thus, the values had to be extrapolated for the entire code block for IMAX values of 10,16 , and 100 which represent the mesh sizes given in the previous table. It was determined from the runs that for any value of IMAX, the \(K\) direction transpose is completely covered by the BTRI calculations. This is due to the fact that the map operations in BTRI are Main Memory to Main Memory, while the transpose operation requires only the Intermediate Map Unit.

The extrapolation was tested against the previously-run implicit data, which required hand-aggregated values for vector lengths of 57624 , and was found to be within \(1 \%\) of that data. Taking the values of FLOPS AND CLKPD' for RHS, VISMAT-VISRHS, and MUTUR from the previous table for vector lengths of 400,1600 , and 60000 results in


The counts shown above are for one pass through each sweep direction (or one full slab processed). Extending these numbers for all slabs and 256 time steps gives the following:
\begin{tabular}{lllll} 
TOTAL RUN FLOPS & \(1.17 E 9\) & \(4.43 E 9\) & \(9.07 E 11\) & \(4.43 E 9\) \\
TOTAL RUN CLKPD & \(6.64 E 6\) & \(3.13 E 8\) & \(5.39 E 10\) & \(1.87 E 8\)
\end{tabular}

The final revision of the simulation input which yields the FLOP rate shown here is felt to be practicable in terms of both programmability and compilability. It takes into account the
gather operations needed for collecting the data for each slab and includes three extra gather operations to represent the initial gather functions which must be accomplished before the first pass of the loop can be processed. If this sequence were to be extended to the proper number of passes, then the loop count would be 16 , for 16 slabs needed in the \(100 \times 100 \times 100\) mesh case. In one experiment with the simulator the three initial map operations were removed and the FILTRX, FILTRY, and FILTRZ timings were reevaluated. The difference in execution time of the entire ensemble due to the additional map operations turns out to be less then \(4 \%\), which is less than the probable accuracy of the total estimates.

From the several attempts at recoding and simulating the implicit code, it appears that all transpose and gather operations could be overlapped completely with careful 'hand-coding' or 'extremely sophisticated compiling'. In this instance the execution rate could become as high as the 1.12 gigaflops achieved by the unadorned BTRI routine by itself.
An important concern in any of these code simulations is the degree to which the system is balanced. That is, are there any major components idle or nearly idle while others are at maximum utilization? Examination of the simulation results will disclose two quantities of interest beyond the FPRATE and CLKPD--VECBZ and MAPBZ which reflect the percentage utilization (degree unit is busy) of the Vector and Map Units, respectively. In the results for the implicit code it can be seen that the Map Unit is occupied for at least half the time, and the Vector Unit is over 90 percent busy. As long as the Map Unit is never busier then the Vector Unit, and the Vector Unit is busy between 93 and 98 percent of the time, one can be confident that consumers are getting their money's worth.

Another observation of interest is that the curves of performance versus vector length (or mesh size) yield some data that points to an asymptotic behavior of the GIGAFLOP curve. Unfortunately project time hasn't permitted investigation of the location of the exact knee of the performance curve. It would seem that a simple graphing of the points for lengths of 400 (dimension \(10 \times 10 \times 10\) ), 1600 (dimension \(16 \times 16 \times 16\) ), and 60000 (dimension \(100 \times 100 \times 100\) ) will show that at vector lengths of 6400 (dimension \(24 \times 24 \times 24\) ) and greater, the FLOP rate of the proposed FMP will hover around 1.000 gigaflop. The implication is obvious that for the range of reasonable mesh sizes -- \(30 \times 30 \times 30\) through \(100 \times 100 \times 100\)-- the performance curve is relatively flat and close to one gigaflop (plus or minus \(10 \%\) ). This range is achieved even when all arrays are kept in Intermediate Memory throughout the computation!

As mesh sizes grow too large to be held entirely in Intermediate Memory they can be held in the Backing Store and swapped into Intermediate Memory, then transposed by the Intermediate Map Unit. Although this has not been simulated, the excess capability available in the current map operations points to
the likelihood that meshes on the order of \(200 \times 200 \times 200\) up to \(240 \times 240 \times 240\) could be processed at close to the one-gigaflop rate. It should be obvious, however, that the total problem solution time for the largest case will be greater by far than the acceptable threshold of \(10-15\) minutes CPU time for the nominal \(100 \times 100 \times 100\) problem.

The reality of the above conclusions rests on
1) the degree to which the simulation systems representing the CDC FMP are true and valid engineering models of the actual hardware design;
2) the sufficiency of the object code produced for simulation;
3) the validity of the extrapolations made based on the simulation results.

It is expected that analysts at NASA and CDC will continue to exercise these simulators for their own research to determine the quality of the aforementioned issues, and perhaps to locate more accurately the knee of the performance curve.

\subsection*{5.2 THREE-DIMENSIONAL EXPLICIT CODE}

As stated previously, the explicit code was considered in a different light than the implicit code. In particular, the vectorizations attempted were limited to those available on the STAR-100 computer. This permitted measuring the ability of programmer, compiler, and machine architecture while assuring that the metric conversions yielded the same answers as the original code. The characteristics of the explicit code of interest to this study have been
1) the implicit sections (LZI, LYI) which in mathematical form are similar to the implicit portions of the implicit code, but are constrained to smaller vector lengths;
2) the explicit solver schemes (LX, LY, and LZ) which limit the potential vector lengths because of \(J\) and \(L\) recursion;
3) the method of characteristics (CHARAC) which exhibits conditional processing of data, based on the data.

As in the case of the implicit code, the routines of interest were vectorized, then actually compiled (not pseudo-compiled as in the implicit code) using the STAR-100 compiler. The STAR object code was then transliterated directly into FMP object code and the results were simulated. The simulation involved fragmenting the sequences into smaller portions for processing
by the detailed level simulator (LVL2), then aggregating them into whole strings for simulation at the higher (LVL1) level. The first attempt at this led to FPRATES which were 100-300 megaflops for the \(30 \times 30 \times 30\) case. This resulted from the transliteration yielding very few dyadic and triadic operations in the pipelines. The simulation input was then subjected to a "sophisticated compiler" consisting of an experienced FMP programmer. The resulting input was resubmitted to the high-level simulator and the following results were obtained. (See appendix \(F\) for the simulator runs and FMP FORTRAN source code.)

The routines VLX and VLYI were run through the high-level simulator (LVL1) for a variety of vector lengths to demonstrate the performance range for those sequences. Runs for vector lengths above 100 were difficult to do for VLYI because of the overflow of the simulator instruction buffer. From this raw data a table of total performance is developed later.


The VLX rates given for vector lengths of 30 and 100 correspond to the mesh sizes \(30 \times 30 \times 30\) and \(100 \times 100 \times 100\), respectively. The FLOP and clock period counts for the corresponding VLYI simulation runs reflect the processing of planes of data in the J by \(K\) planes only. The repeat counters at lines 840 , 1680 , and 1900 of the VLX simulation input had to be scaled by one-fifth to make the simulation fit within the GPSS code limitation. The effect of this scaling is to reduce the effective FPRATE by \(0.01 \%\) from what should be the actual rate. In order to provide correct values of FLOPS and CLKPD for the \(100 \times 100 \times 100\) case, the quantities must be increased as follows: lines 840 through 950 yield a total of 182000 FLOPS for the VLENGTH=100 case in the simulation results. This number should be five times greater, due to the scaling used, or 910000 FLOPS. The total clock periods required for a 5 X extension of these 100 ps is 1.42 E 5 for a revised flop rate of 0.73E9.

The raw data above was further extended by the factors 30 and 100 respectively to reflect the number of operations needed for all three dimensions for that particular mesh. TURBDA, LYC, and CHARAC were estimated by hand. First, CHARAC could not be estimated with the simulator since the number of operations credited by the simulator is \(100 \%\) rather than the number of one-bits in the control vector. Second, TURBDA and LYC are obvious vectorizable entities whose performance can be estimated easily and directly from the code.


Extrapolation of full explicit performance is fraught with problems due to the lack of accounting for overlap of operations from the end of one routine into the beginning of another routine, or on the contrary, the conflicts that delay the start of a routine due to a previous routine. Time and resources didn't permit a complete evaluation via simulation. Instead, the simplification used for extrapolation was to assume that LY, LX, and LZ all execute about the same number of cycles for the same number of operations on a symmetrical mesh. The same simplification was used for LYC, LZC and for LYI, LZI. Grand total then consists of a linear combination of LX, TURBDA, LYC, and LYI routines as follows:
\(6 *\) LX
2*TURBDA
4*LYC
4*LYI
\begin{tabular}{lll} 
GRANO TOTALS & & \\
FLOPS & 2.1E8 & \\
CLKPO & \(1.97 E 7\) & \(3.19 E 9\) \\
FPRATE & \(0.666 E 9\) & \(2.34 E 8\) \\
TOPAL RUN FLOPS & \(5.38 E 10\) & \(0.85 E 9\) \\
TOTAL RUN CLKPD & \(5.06 E 9\) & \(8.17 E 11\) \\
& & \\
& & \\
& &
\end{tabular}

\subsection*{5.3 SPECTRAL WEATHER MODEL}

The FFT routine and Legendre (SPCFOR) routines were simulated with the following mesh parameters

FFT length 32 0:518 GFLOP
FFT length \(100 \quad 0.858\) GFLOP
FFT length 200 0.973 GFLOP
FFT length 2000 1.145 GFLOPS
FFT 32-bit 100 1.326 GFLOPS
FFT 32-bit 500 2.027 GFLOPS
FFT 32-bit 2000 2.246 GFLOPS
Legendre transform -- 25 layers, 6 waves, 15 gaussian latitudes
\[
0.837 \text { GFLOP }
\]

Spectral model overall estimate in its present configuration 0.879 GFLOP

The spectral code was extrapolated by assuming:
The FFT and SPCFOR/FORSPC routines constitute the bulk of the processing;

The remaining code sequences are vectorizable to the same degree as the FFT and SPCFOR routines;

FORSPC data can be estimated from the simulated data for SPCFOR.

A formula from reference 3 was then used to estimate the total number of FLOPS. Two problem sizes were used to establish the performance range. The first -- 25 layers, 6 waves, and 15 gaussian latitudes -- was the scaling in the original spectral metric provided by Ames. The second size is one determined by CDC and the code originators to be reasonable for actual
production runs of this type research model--48 layers, 21 waves, and 55 gaussian latitudes. The following table summarizes the data and extrapolation.


\subsection*{5.4 FINITE DIFFERENCE WEATHER MODEL}

Only one routine from the GISS code was simulated. LINKHO had become the major concern of CDC analysts when original projections showed performance below 50 megaflops for that routine. The size of the GISS code had prohibited a complete analysis to the recoding, compiling, and simulation level. As a result, LINKHO and AVRX were studied to see if there were some fundamental. problems that existed in the code which reflected serious problems in the FMP architecture. The LINKHO subroutine was analyzed and the first computational section was chosen for simulation (due to the sheer volume of code in LINKHO) and a single page, felt representative of the majority of the routine, was subjected to the simulator. The simulation input data can be found in appendix \(F\).

The AVRX routine was estimated by hand, with the inclusion of the effect of vector startup times. The extrapolation used is based on the code as presented in figures 7 and 8 of Division 4, the analysis of the weather codes. There are \(4 * 23 * 5 * 2+6 * 4 * 2=968\) floating point operations reflected in figure 7. There are \((23 * 5 * 2+5 * 2) * 2=484\) floating point operations reflected in figure 8. Assuming the compiler is capable of scheduling simultaneous scalar execution with the Vector Unit for some computations, \((13+6) * 2 * 2=76\) machine cycles would be required to complete the sequence in figure 7 , for a rate of 0.8 gigaflop. The sequence in figure 8 would require \((3+6) * 2 * 2 * 2=72\) machine cycles, for a rate of only 0.42 gigaflop due to the inherently short vector lengths. Together, a total of 1452 floating point operations, requiring 148 machine cycles, yields a rate of 0.62 gigaflop. Note that these timings are for 64-bit mode, while 32-bit mode is sufficient for the mathematics in the AVRX code.

The rate given is for a relatively course grid; using the resolution present in the original-Ames-supplied metric results in a grid of \(46 * 72\). This denser grid should require \(24 * 43 * 5 * 2+624 * 2=10608\) floating point operations based on the sequence in figure 7 and \((43 * 5 * 2+6 * 2) * 2=884\) operations in the sequence in figure 8. The length of vectors in this fine grid model in figure 7 is 1104 elements, and that in figure 8 is 43 elements.

With the longer vector lengths it is expected that the Scalar Unit will sustain parallel execution with the Vector Units. This last example would require \((138+6) * 2 * 2=576\) machine cycles for the sequence in figure 7 and \((6+6) * 2 * 2 * 2=96\) cycles for the sequence in figure 8. Thus, if the greater resolution given in the original model is permitted, the FMP could achieve
\[
(10608+884) /(576+96)=1.07 \text { gigaflops }
\]
for this portion of the GISS code.

\subsection*{5.5 REFLECTIONS ON THE PERFORMANCE ANALYSIS}

The process of analyzing the performance of a candidate computer system for this report has become more involved than originally anticipated. This can be attributed to two factors:
1. The simulator system could not process a total program; programs had to be broken into pieces, and each piece run independently. Some runs had to be made with the overlapping code between pieces to determine if the independent runs are valid. (The effect of splitting code is to create 'end-case' values for some cases, as the simulator counts all cycles needed to complete the last operations. In a non-split case this last operation may be overlapped with operations from a subsequent piece of code. The question then becomes how to compute the aggregate total of clock cycles in these 'end-case' situations.) In any event the composite timings must be computed by hand, with the attendant risk of manual error. The simulator should be modified to accept a much larger sequence of instructions, both by eliminating the expansion of input cards that now takes place when an \(R\) card is encountered and by enlarging the instruction buffer itself. Using the CYBER 175 revealed that the computer time required for the high-level (LVL1) simulator to process large programs is quite within reason (1 to 2 minutes for the entire implicit code perhaps). In addition, the FPRATE, FLOPS, and CLKPD values must be output in some machine-readable form so that further extension or computation can be performed with a FORTRAN or BASIC program to produce final outputs for analaysis.
2. Until the full data for a given code had been aggregated and analyzed, weak points in the hardware complex, or the coding or compiling scheme, could not be evaluated properly. The result of this was that after laborious simulation, aggregation, and projection, an area of the software/hardware was found to be eligible for more 'tuning'. Once the code was modified the same process was repeated again, and again, and again. In addition to the LVL1 runs, segments had to be submitted to the LVL2 simulator to make sure that the memory conflicts and scalar interactions of the detailed simulator produced consistent performance figures at both levels of simulation. Computer time did not prove to be the limiting factor in this process; instead the resources able to perform the analysis and system design modification became the scarce commodity on this program.

If time had permitted, the range of problem sizes should have been expanded to encompass the key areas of the performance curve (the area of the knee) so that the optimal problem sizing could be determined. It was felt that establishing the range
was a more important objective than proving the performance for a given problem statement. (Hence no attempt was made to make complete runs for the \(100 \times 200 \times 50\) grid sizes.) The symmetrical grids of \(30 \times 30 \times 30\) and \(100 \times 100 \times 100\) were used as end-points because they represented two worst-case situations -- one with extremely short vector lengths in all dimensions, the other with maximum data storage and mapping operations in all directions. In the CDC FMP, if all problems could be restricted to forms like \(100 \times 200 \times 50\), the flow mesh would be stored and processed differently, with worst-case map operations being performed along the short (50 element) axis only.

For future consideration, any data dependent features of metrics should be identified and some method of parameterization of the simulation and performance analysis developed. For example, in the CHARAC routine extrapolation, a simple-minded scheme for determining how many operations to CREDIT to the simulation was introduced.

Certainly a better set of criteria could be developed for later simulation runs. Similar parameters could be provided for routines such as MUTUR in the implicit code.

Some ambiguity exists in the manner in which FLOPS are counted. Some analysis will credit
\(A=-1\)
\(A=A B S(B)\)
as one floating point operation each; this has not been done in this report. Other vendors claim that
\(B=1 / C\)
is one floating point operation while
\[
B=A / C
\]
is counted as two operations. More complicated to analyze is the presence of a hardware function such as SQRT. How many floating point operations should be credited to this operation? Some analysts claim one, others claim 5, and some even claim 13. In the codes simulated for this report, the SQRT approximation took 11 vector operations (where the divide is counted as one operation). It should thus be claimed that SQRT be counted as 11 operations if a hardware SQRT were to be implemented. A standard set of criteria should be developed for this aspect of performance analysis for any future studies.

If a variety of simulators is to be used to evaluate a variety of candidate architectures, it would be desirable for NASA-Ames to develop a simple 'validation' routine which could be used to provide a measure of total FLOPS, FPRATE, and CLKPD (clock periods) for a variety of functions whose totals could be computed reliably and canonically. This would provide a form
of 'performance diagnostic' to verify certain characteristics of the simulation and extrapolation system being used. It is only after confidence can be established in the modeling system that the results of analyses such as these can be viewed realistically.

\subsection*{5.6 BOTTOM LINE}

A few general comments should be made about the results just presented.
1. The implicit and explicit codes run in about the same amount of time. For the 64 -bit version, the implicit code would require 14.4 minutes for 256 time. steps while the explicit version requires 15.9 minutes for 256 time steps.
2. The explicit code runs substantially less than the one-gigaflop rate in the form simulated for this study. It was discovered that if a hardware square root were employed, the explicit code FPRATE could be improved to 0.93 gigaflop, as long as 11 FLOPS could be credited to that operation. If the explicit code were to be restructured to process slabs similar to those in the implicit code, yielding minimal vector lengths (excluding CHARAC) of 600 , the overall rate for the explicit code can be elevated to 1.004 gigaflops (even without the hardware square root). This is due to the fact that the LX, LY, LZ, and TRIDIA vector lengths of. 100 are below the critical point in the FMP performance curve, while lengths of 600 are at the 'knee' of that curve.
3. A wide range of problem sizes can be accommodated by the FMP with performance \({ }^{-r}\) rates at or near the one-gigaflop threshold, for the implicit code.
4. The minimal rate for any of the 'small problems', regardless of whether the 'aero' codes or weather codes. are being dealt with, is greater than 600 megaflops. This is for mesh sizes of less than those expected to be employed on the actual FMP for production work.
5. The user of 32 -bit forms for the codes can yield substantial benefits, not only a factor of almost 2 in performance, but in the ability to store larger problems in the available memory.
6. With some effort, even essentially 'non-vector' code forms such as MUTUR, CHARAC, and LINKHO can be structured to provide reasonably effective 'parallel' operations. The 'physics' solutions in this arena have not been attacked as yet, however.
7. The effective use of the hardware system involves a process of 'rethinking', 'restructuring', and code 'tuning' to achieve optimal resuits. The burden of achieving maximum performance with the current generation of technology must be shared by the software and hardware developers alike.
8. A reliable, approachable, valid, and credible computerized simulation system is essential to the successful evaluation, as well as implementation, of candidate architecture for the NASF.

\subsection*{6.0 SYSTEM DESIGN}

One of the few unchanging proposals by Control Data for the NASF is the overall system design. It is based on an interconnection scheme called the Loosely Coupled Network (LCN) which was described in detail in reference 2. A more recent discussion of the LCN and the hardware and software support using the Programmable Device Controller (PDC) may be found in Division 4, Volume II of this report. At the outset it was realized that the NASF would realistically involve an amalgam of dissimilar computing equipment, including equipment alien to Control Data Corporation. The fundamental underpinning of the LCN is its ability to interconnect a wide variety of equipments spread over a vast geography. It is this property that makes the LCN immediately attractive for the NASF. Figures 20 and 21, shown in a previous section, illustrate the overall system. organization with the FMP represented as a single functional entity. The front-end processors, or Support Processing System (SPS), are attached to the network trunk and thus share a number of the global resources (such as the 819 disks.) with the FMP. A certain number of peripherals are attached directly to each SPS as a locally managed resource. The LCN makes use of one or more serial data trunks which, using existing products, can transfer data at peak rates of 50 megabits per second.

The serial trunk can accommodate up to 32 attachments so that all. entities on the trunk become members of a "party line" and can thus communicate with complete flexibility, with the possibility that trunk contention may reduce effective bandwidth. To solve this problem the LCN has a unique contention resolution system which is discussed in Appendix A, Division 4, Volume II. Aside from the archival storage and graphics subsystems which are attached, the key system resource is the phalanx of high performance disks which are attached to the trunk and serve as the medium of staging jobs to/from the FMP. from/to the SPS.

Specifications for the systems components in this design can be found in Volume II. The choice of SPS is highly dependent upon the amount of activity that is anticipated from interactive operation and upon the decision as to where the functions of mesh generation and compiling should be performed. For this report both functions are assumed to be done on the SPS. The sizing of the SPS at this point seems to require machines of the CYBER 175 class. Using standard software components the configuration of two such machines sharing an ECS memory for coordination and residence of system wide tables is called for. Perhaps later generations of hardware may be configured differently but, for purposes of costing and sizing the installation, it is expected that this level of SPS is necessary.

As there is much redundancy planned for the FMP, so too is the system designed around redundant trunks, processors, and disks so that it can survive one or more failures without losing function, or in most cases, without significant degradations in performance. The trunk capacity has deliberately been designed for excess performance. The major reason is the expectation that future peripheral and SPS technology will provide hardware capable of taxing the system.

Job flow has been discussed previously, however, the relationship of the redundant components was not considered. It is imperative that all componentes in such a complex system continually undergo strenuous exercise to make sure they are still viable. One important way to accomplish this is to rotate each redundant component into operational use on a regular basis, or better still, to have redundant components share the workload on a continuous basis. This latter system is employed in the Control Data NASF design. There are no "stand-by" components; instead, all elements of the system work on the problem at hand, with the option to shut down or be shut down and have the load automatically assumed by a partner. Thus all trunks will carry data over the execution of a job. When one trunk encounters a failure it goes off-line; without changing status tables, data transfers continue using the remaining trunks. The system can withstand the loss of an entire trunk and still maintain its throughput. In the case of an SPS going out of action, some loss in throughput may be sensed, but with proper sizing the loss of ability will be nearer \(10 \%\) than the apparent \(50 \%\).

The proposed configuration is certainly open to many alternative strategies, with the provision that everything must adapt to, and be harmonious with the LCN. Several smaller processors may be indicated, rather than a pair of SPS processors. It is important to view the blocks in the-diagram more as functions than as actual hardware components. For example, the FMP is provided with a CYBER 18 or similar class computer whose existence is dedicated to serving as the Maintenance Control Unit for the FMP. If the ultimate NASF consisted only of an FMP provided from Conrol Data, with all other equipment being alien, then the CYBER 18 would be considered integral to the FMP design. However, the fact has been stressed that all maintenance functions are communicated to the FMP via standard LCN messages. Thus software that can be interfaced to such messages can reside anywhere in the system. In fact, even with a CYBER 18 stand-alone MCU present, certain functions such as deadstart and recovery must also be provided on the SPS as a backup.

In a similar manner, the function of FMP manager can reside on the SPS or be distributed among a number of processors. As an example, the FMP manager function, or critical portions thereof, like the job start and stop and memory allocator, might be provided in backup software form on the CYBER 18 MCU . Then in the event of a catastrophic failure of the SPS or perhaps when
the SPS is undergoing some special off-line tests, jobs could still be pushed through the FMP.

\subsection*{6.1 SYSTEM TRAFFIC FLOW}

To determine the efficacy of the system design in concert with the FMP, a simulation system was created to assist in the analysis of job flow through the entire ensemble. Again, as for the FMP, it was found to be advantageous to create two levels of simulation one highly detailed in areas of design concern, the other intended to demonstrate the overall performance of the system. The simulators have been used to verify the sizing of the system components as well as the probable flow of control message and data traffic throughout the system due to workloads projected Ames study personnel.

Divisions 1 and 2 of Volume \(I V\) contain descriptions of these simulators and reference information in the form of a "user's manual". As with the FMP, the more detailed model was used to validate certain assumptions used in the higher level model. Among the design conclusions drawn from this model were early decisions regarding the size of data buffers needed. at each PDC node in the system ( 1536 words or three disk sectors), and the effective transfer. rates of the major peripherals, with details such as latency, message turnaround and trunk contention taken into account. Effective transfer rates for the 819 class disk to the SPS were found to be 7.67 megabits per second, for the graphics subsystem about 1.94 megabits per second, at worst, and the effective streaming rate to the FMP was found to be 17.4 megabits per second per disk controller.

These numbers were used to construct the higher level model which was then subjected to the workload analysis provided by NASA-Ames (NASF Usage Model, Version 79.001). A discussion of the results of this simulation run will be found in Division 11, of this report. The bottom line of this analysis is that while the network trunk is underutilized, the SPS and the FMP are involved in a tight race to see which one will be the bottleneck in the system. This is a good sign because it indicates that the key resources are in balance in the system. A second, and obvious observation from the simulation results is that a full 20 hours of actual FMP work cannot be done on the FMP during a continuous 20 hour interval due to the variety of activities needed to start such a session into operation and to wind it down.

Another obvious observation is that the tradeoffs between turnaround and throughput are clearly evident in the workload simulations. As long as a queue can be maintained at the FMP, the FMP can be kept busy and throughput maximized. The very existence of such a queue precludes the FMP being able to produce fast turnaround in any given random case. In fact, with all the SPS and trunk activity requirements in this system, the overhead imposed on turnaround amounts to about \(50 \%\) of the job
execution or more. Thus a 20 -minute, large scale production run may require 10-15 minutes of additional processing, measured from startup in the system to emergence of the results at a user output device.

As noted in the writeup, the turnaround conclusions were reached without employing any form of priority algorithm, although the simulator is capable of handling priorities. It is expected that as the behavior of the system becomes understood, RADL and Ames will employ some of the other options and different workload schemes to test the system in more realistic and comprehensive ways.

\subsection*{6.2 SYSTEM SOFTWARE}

The resource commitments involved in even moderate software development projects preclude the intrusion of elegance or a NIH (Not Invented Here) attitude in the NASF system. This is particularly true in the area of system software, which includes all software elements not residing on the FMP except the FMP compiler, loader, and FMP manager (which will reside on the SPS).

A hardheaded approach must be pursued in the specification and development of the system software because of the magnitude of such efforts and the far reaching impacts on the NASF of software maintenance, compatibility, training, and effectivity. First, the identification and specification of any system's software function that is not already in existence and proven in at least one operational environment must be avoided like "the plague". Given the scope of FMP oriented systems and applications software that must be developed in the short period of this development, the project can ill-afford to experiment with "just one more good idea". Therefore it is strongly urged that:
1) functional requirements for the system software include no more, nor less, than those functions already demonstrated and available in standard computer software offerings;
2) the system software not be "based" on an existing system, but instead be limited to that system and its standard derivatives as provided by the system's manufacturer;
3) in particular, the CDC NOS (Network Operating System) which manages and controls a wide variety of large scale and super computers (the CYBER 170 and CYBER 200 families) provides a good model to be used to specify functional and performance criteria for the NASF.

The choice of NOS comes about because of the considerable number of machine-years that have been engaged in its development and maturity as a large scale operating system. The high-throughput in interactive mode makes it an excellent choice for SPS functions which must manage and direct the myriad of terminals which will ultimately be attached to the NASF. The NOS system will provide a standard software "package" which will include PDC and LCN support software for interconnection to a multiplicity of attached processors.

If the NASF is severely limited to using an extant system, then many applications can be transferred directly from CYBER family systems to the NASF for execution. Users thus need to become familiar with only one set of functions and command language constructs. Debugging of NASF software (with the exception of actual FMP debugging) can take place in any system supporting NOS. The FMP manager then becomes just another job to be scheduled for execution under the NOS system, and no special interfaces need be written for the FMP and thus imbedded in the basic operating system. Instead the software provides a standard communications methodology for the FMP manager and the FMP.

The specification of systems software functionality can then be drawn from the current Control Data NOS documentation with the following additions that are under development for release well in advance of the NASF availability:
1. Loosely Coupled Network system, permitting connection of a multiplicity of devices and processors.
2. Common data base manager which controls a common pool of shared mass storage devices on the network trunk.
3. A graphics support system similar in function to the standard SCOPE offering called GODAS.
4. An archival storage management system based on the Control Data MSS (tape library) concept.

Because of its conceptual position in the NASF structure, the heart of the system will be the SPS. Therefore, if more than one vendor is to be identified to supply hardware for the system, it is urged that the SPS vendor be held responsible for integration of all alien hardware, and adapting the alien hardware "drivers" (or local operating systems) into the system's software structure provided by the SPS.

\subsection*{6.3 SYSTEM AVAILABILITY}

Division 6 of this report provides an update of the analysis of Reliability, Availability, and Mantainability factors that are expected to influence the operational use of the FMP. Some of this discussion has had further elaboration in the hardware
design portion of this division of the report. There are three probabilities that are of interest to the consumer of such a large system.
1. The probability of a single failure occurring anywhere in the system during operational use time.
2. The probability of a failure occurring which causes an interruption in system service.
3. The probability that an undetected error will occur which will affect result data in a significant way.

As will be seen from the supplementary discussions on this subject, the first item is directly related to the total parts and interconnection count of the hardware. Since system failure can occur in software, some factor must be generated for that aspect which can be combined with the hardware factors. If a standard operating system such as NOS is used for extrapolation of the software error rate, it can be seen that from 100-200 hours of production time elapse between software errors, compared with the \(9-20\) hours for the hardware. Thus the hardware failure rate dominates in this arena.

The second probability is related directly to the first probability but includes the effects of redundant hardware components and other error controls such as SECDED. From the RAM study it can be seen that without SECDED the FMP is totally infeasible, for the MTBF is far too short to ensure a high system availability. Software and firmware error recovery and restart are essential also to prevent an unacceptable period of machine interruption. A system interruption is one that takes all the resources out of action for any job submitted. It is permissible then to abort a particular job because of the loss of a single disk, or an unrecoverable error in data trunks, and immediately start another job without calling the event a "system interruption".

This second probability is also highly dependent on the maintenance strategies employed, also discussed in the hardware design section of this report.

The third probability is the most difficult to measure, and is dependent on how much checking can be done in the total system. In the FMP, SECDED and the variably redundant Vector Units are attempts at providing checking in crucial areas. It cannot be expected that the pair of SPS machines could be checking each other, since they will both be kept quite busy managing the system. It is, however, worthy of consideration to postulate a more powerful pair of SPS processors which would each be able to perform redundant checking of the partner machine on a variable load basis (much as in the FMP Vector Unit). This has not been pursued further because of the implication of insideous change in the kernels of the otherwise standard software.

\subsection*{7.0 FACILITIES STUDY}

In previous reports (refs. 1, 2) the various aspects of the NASF design and construction were addressed in some detail. For this report an update of the schedule and cost information are provided. The scheduling methodology is PERT based and contains certain data relating to lead times of proprietary Control Data technologies. For this reason the PERT schedule and cost update information will be supplied under separate cover (as Volume V) for limited distribution as an adjunct to the body of this report.

The basic principles of the schedule still hold true from the previous studies; however, some adjustments to times have been made as continued investigation mandated.
a) A. 30-month detailed design and simulation effort is needed before final commitments can be made on costs and reliability.
b) Approximately 19 months after design completion, the NASF should be available for limited production work. Four months later the entire NASF should be in full-time production.
c) Phased installation of NASF components is recommended and projected. A first installment of one SPS, a set of 819 disks, at least one archival storage, one LCN trunk, and one graphics support system should be installed at Ames for software development during the FMP design phase.
d) The FMP installation would also be phased, with one-half the Intermediate Memory and no Backing Store being in the first increment. The-Backing Store would be added in two increments with the final increment of the Intermediate Memory being scheduled as needed. This permits spreading out costs and resources over several
- years while still having production capacity on-line as early as possible.
e) Software development is predicated on the strategies outlined previously, that is, the only developments being the compiler, loader, and FMP manager, all of which can be developed on the SPS. This effort must start at the same time as the FMP design.

The implication of these latter points is that the parallel procurement strategy being contemplated by NASA, which proceeds to the point where detailed FMP designs are pursued and a single one chosen for construction, must consider the lead time for software development; this must begin concurrently with the design phase. This means that not only machine design must be evaluated and conducted during the parallel phase, but that software development will be going on also. Who pays for this effort with the attendant risks that the resulting effort will
be discarded is a subject that needs to be fully explored before any further procurement action is undertaken.

\subsection*{7.1 RISK ANALYSIS}

As evidenced by the fairly conservative design approaches taken on the FMP during this phase and the restriction of software development for the NASF, Control Data has labored to reduce the risks of this project to a level acceptable to rational customer and manufacturer management. The only technological risk presently permitted has been in the Backing Store area where a component is not yet available, but for which two backup schemes are viable -- use of a less aggressive memory part that is now coming into production, with a consequent reduction in memory capacity or reliability; delay of delivery of the Backing Store until a mature part is available, since the Backing Store can be absent in the initial configuration without severe penalties to performance.

Production of the CYBER 200 family using the memory and logic technologies planned for the FMP has now given real data points for production costs and lead times for the central processor. Assuming that the Control Data schedule and cost objectives become the project goals, then it can be stated unequivocally that the risks would be below those for a new machine product line development. This is primarily due to the fact that \(90 \%\) of the technical effort will be based on existing systems, software and manufacturing techniques. The compiler, loader, and FMP manager development are the primary software risks, and the costs and schedules for these can be conservatively rated at par with risks undertaken on any Control Data project.

Note that this risk assessment is substantially more "upbeat" than past analyses which have reflected some degree of overly conservative "gloom and doom". The major reason is that the number one emphasis of this project phase has been to create an FMP and NASF that can be built with great confidence of meeting the cost, performance, and schedule objectives.

\subsection*{7.2 LOGISTICS SUPPORT}

The operation of a large complex of equipment that necessarily characterizes the NASF engages many disciplines and involves many cost factors that can easily be missed. Assistance was obtained for this report from the CYBERNET DATA CENTER specialists who have experience with large systems. In particular, advice was sought from the STAR data center managers who provide "computational engine" services on STAR via a set of CYBER 170 front-ends, much like the FMP will be made available to the NASF user. The resulting paper (Division 9) describes the major considerations and probable resources required for long term operation of an NASF-scale system.

The figures given in this report and in the report on maintenance strategy should be considered conservative in light of the fact that the availability and mantainability of the NASF is expected to be much greater than existing systems. Factors included for systems upkeep, therefore, may well be reduced in the final analysis to \(50 \%\) or less of the projected figures: It would be well to expect that at the outset, and for the first 24 months of NASF operation, the consevative approach to systems operation and support should be adhered to until enough experience is accumulated to guide cost reductions which are sure to lurk in the projections offered here.

\subsection*{7.3 PHYSICAL REQUIREMENTS}

As part of the initial study (ref. 1) Control Data was asked to determine, to as great a level of detail as possible, the physical requirements for the actual installation site. The result of that effort was the prototype planning of an actual site containing the hardware projected for the NASF as considered in this study. Most of the equipments that absorbed the bulk of that installation have remained unchanged in quantity and configuration since that initial effort. The FMP has undergone the greatest change due to a reorganization of the memory systems, reduction of pipeline hardware and, quite signficantly, due to a change in the packaging system for the FMP Main Memory and logic. Figures 3.1-2 and 3.1-3 of the functional specification (Division 1, Volume II) show the revised floor plan of the resulting FMP.

The overall power and cooling requirements are quite similar to those reported previously (ref. 1), and the floor plan for the proposed installation has been left unmodified, since the revised NASF will fit into essentially the same area originally reserved for the NASF. The FMP power requirements differ somewhat from that proposed in the early study but the aggregate power for the installation is affected very little. Division 10 provides some detail and a summary of space, power, and cooling requirements for the proposed NASF.

\section*{CLOCK RATE AS A MEASURE OF COMPUTER PERFORMANCE}

The rapid tick-tick-tick of a wrist watch is certainly different from the slow TOCK --- TOCK of a grandfather clock. That is, the two different timepieces have different clock rates, yet when properly tuned to their respective rates, they both have the. same performance --- keeping accurate time, one revolution of the minute hand per hour. The performance of a timepiece cannot be judged, however, on the basis of its rate alone. It is necessary to have some other information such as what is inside the timepiece (to know what is accomplished per tick or per TOCK) , or what its output is (is its performance - timekeeping accurate). Without this additional information, a rapid rate cannot be judged good or bad since it may mean simply that the timepiece is not properly adjusted and therefore is not keeping accurate time.

In order to compare one computer with another, an oft-used measure is to compare the clock rates of the computers. The assumption is that if one computer has a clock rate, say twice as fast as another, then it will produce twice the result rate as the comparison machine. In fact, however, the performance of a computer should be measured in the number of basic functions accomplished in some unit time. To derive this by only counting clock periods/unit time assumes that basic functions/clock period is constant. It does not take much thought to show that this is not constant, but rather a function of machine architecture, inherent power of the chosen logic family and the specific logic design rules used.

The following explanations and examples point out some differences in machines not reflected in clock rates.
1. Difference in Architecture
a) Size of machine: number of gates. No one expects an 8080 microprocessor to match the power of, say, a middle level 370 machine even if they have comparable clock rates (in fact they do). (The number of gates in a machine is often very difficult to quantify as will be discussed later.)
b) Concurrency: How much of the machine is in use, on average, during a small unit time. For example, a machine with a \(10-n\) s clock but only \(30 \%\) of the machine in use at any one time, is very likely slower in overall throughput than a machine with a \(20-n s\) clock but with \(70 \%\) of its gates in use at a time.
2. Differences in Logic Family

It may seem obvious that the clock rate of a computer should be set by the number of gates that must be traversed to accomplish some basic function. This is not true and one of the reasons is the difficulty in determining what to call a gate. For example most variations of ECL allow both the true and complement results to be generated by most gate elements at no cost in size or speed. This contrasts with other logic families, e.g., TTL, that require another gate function to produce the logical complement of a function. This extra gate appears generally in series to some signal and thus slows down the machine. Even if the gate can be put in a parallel structure to reduce the total delay, the machine may still be slower because the larger number of gates may increase the physical size of the machine.

Another feature of most ECL families is the ability to tie the outputs of gates together. This accomplishes an 'AND' or 'OR' function at virtually no cost in logic delay and no gate cost because the connection is just wire.

The point here is that even with equal clock rate's, a computer may have higher performance because it is able to do more in a clock cycle than another computer using similar architectures and numbers of gates because it is able to get more done per unit time no matter what the unit time -- just because of differences in the logical power of differing logic families. Many other attributes, unmentioned, also will affect the logical power available to a designer --fan-in and fan-out limits, packaging, etc., etc.
3. Design Ground Rules

In, the design of high speed computers, a conflict exists in the interplay for fastest speed versus fastest throughput. Fastest speed means the minimum time for completion of a function, a multiply for example. The fastest throughput means the most results for a function, per unit time. . These two requirements are not the same! To satisfy the first design requirement a designer wants the clock cycle to be relatively long. This is to reduce the overhead caused by addition of gates that do not contribute to computation but are required for shorter clock periods. The throughput of a set of logic can be increased by reducing the number of levels of logic allowed between clocked latches. For example, assume that to perform a particular function, such as an addition, 9 levels of logic are required. A designer can choose a clock period of 16 ns , say, and can perform the addition in one clock cycle. Another designer, using the same logic, may choose to limit the number of logic levels
```

between clocked ranks to 7 in order to have a faster clock
(say 13 ns). The adder design then requires a clocked latch
rank somewhere within the adder. This results in two
things:

```
a) Addition now takes two cycles: 26 ns .
b) The number of gates in the second design is greater, resulting in greater logic cost and design time.

Of course the second adder design has higher throughput, i.e., it can accept a new input every 13 ns instead of every 16 ns as in the first design.

The set of design ground rules chosen depends on many things; among them are: cost, size, power, speed requirements, etc.

As can be seen from the above, the raw comparison of clock rates is a vast oversimplification of the question 'how fast is it?' A whole array of other questions must be asked at the same time.
```

        SUEROTTTNE STEP
    0 0 0 1 0 0
        COMMON/BASE/NMAX,JMAX,KHAX,LMAX,JM,KH,LM,OT,GAMMA,GAMI,SMU,FSMACH 000110
    ```

```

| l OXI,OYI,OZI,ND,NDZ,FV(5),FO(S),HD,ALP,GD,OMEGA,HOX,HOY,HOZ

```

```

        COMMON/REAO/IREAD,INRIT,NGR!
        COMMON/VIS/RE,PR,RMUE;RK
        COMMON/VARS/Q (24,30,6,30)
        CONMON/VARO/S{24:30.5:30)
        COMMON/VAR1/X{24,30,30),Y(24,30,30),Z(24,30,30)
        COMMON /VAR3/P{120,30),XX(4),YY(4),ZZ(4)
        COMMON/COUNT/NCINCI
        COMMON/BTRID/A (5,5),& (5,5),C(5,5),O(5,5),F
        LEVEL }2\mathrm{ Gisixiriz
        OYNAMIC A,E,C,D,S,SO,XX(4),YY(4),ZZ(4),XKL ,YKL
        DYNAMIC F,F1(5),F2,$1(5)
        DYNAMIC ZKL, YJN,ZJL,XKJ,YYKJ,ZKJ
        OYNAMIC XJL,,YJL,ZJL, XKJ.,YKJ,ZKJ
        OYNAMIC OTI,QTZ,QTJ.OTA,OTS.TV
        OYNAMIC RJ&RR,U,V,W,UU,UT,CI,C2,C3,C4,C5,C6,C7
        OYNAMIC RJ,RR,U,V,W,UU,UT,CL,CZ,C3,CA,CS,C6
        OYNAMIC RMN,RF,XK,YK,ZKK,XL,YL,ZL,XJ,YJ,ZJ
        OYNAMIC O1,02,Q3,04,Q5
        OYNAMIC XI,Y1,Z1
        INTEGER QINPOS,QOUTPOS,XYZPOS,FDESC(S),SDESC(5)
    c
CALL BC
CALL RHS
CALL SMOOTH
C COMPUTE L2 RESIDUAL
C THE CONDITIONAL gRANCH COOE
C*** IF(NG.EG,1) G0 T0 5
C*** 5 CDNTINUE
CAN BE REPLACED BY
IF(MOD(NC.IO).NE.O)GO TO G
RESIO =0.0
KMH = KM
LMH = LM
LMH = LM = 1,5
DO 10N = 1,5
OO10K=1,KHH
00 10 J=2.JM
10 RESIO = RESID*S(K,W,N,J) = W2
RESID = RESIO/((JM-1)*(KMH-1)*(LMH-1))
RESID = SQRT(RESIO)/IOT*,00005)
C can ee vectorizeo automatically,however the throughput of this local

```

```

            WRITEIO.100) NC;RESIO
        100 FORMAT{1HO,3HN= IS I 3X,I3HLC RE5IDUAL= FF16,8)
        6 \text { continue}
    c
C PM = SMU
RM = SMU
C8 = 10*2.*RM
GAM2 = 2,GGAMMA

```

```

        COMMON/BASE/NMAX,JMAX,XHAX,LMAX,JM,KH,LM,OT,GAHMA,GAMI,SMU,FSMACH 000110
    000120
        .30.30) 000190
        COMMON /VAR3/P{120,30),XX(4),YY(4),ZZ(4) 000200
        COMMON/COUNT/NC:NC1 000210
        EVEL }2\mathrm{ O, S,X,Y,S ,O(S,S),C(
        000140
        000150
        000160
        000170
        000180
        000220
        000230
        000240
    000250
    000260
    000260
    000270
    000280
    000290
    000300
    000310
    000320
    0n0330
    000340
    000350
    000350
    000360
    000370
    000380
    000390
    000400
    000410
    000420
    000430
    000430
    000440
    000450
    000460
    000470
    000480
    000490
    000500
    000530
    000540
    000540
    000550
    000560
    000570
    000580
    000590
    000600
    000610
    000610
    000620
    000620
    000630
    000640
    000650
    000650
    000660
    000670
    000680
    000690
    000700
    000700
    000710
    000720
    ```


```

    O(3,2)=0(1,2)*V-0(1+3)*C5
    O(3,3) =C4*0(1,3)*GAM2*V 
    0(3.4)=-0(1:3)*C7*O(1,4)*V 001370
    (3,5)=O(1,3) GGAN1
    O(4,1)=D(1,4)*C1*W*UUU
    (4,2) =0(1,2)*W*O(1;4)*C5
    (4,3) =D(1,3)*W*O(1,4)*C6
    O(4,4)=C4*0(1:4)*GAM2*W
    (4,5) = O(1,4)*GAM!
    O(5,1) = (-C2*2, CL)*UU
    0(5;2) = 0(1+2)-C3-C5*UU
    0(5:3)=0(1,3) CC3-C6*UU
    O(5.4)=0(1,4)*C3*C7*UU
    D(5:5) = D(l;1)*GAMMA*UU
    C
C******END OF AMATRX
RMJ*RM/RJ\**** ב:JMAX-1)
RR=RHJ*RJ(***\&I!JMAX=2)
RF=RMJ*RJ(*,0,3!JMAX)
DO 23 NE1.5
DO 22 M=1.5
DEFINE (B(N;M); {1!KMAX-2.1ILSH,1!JMAX-2)
OEFINE (DIOD(N,M))
A(N,M)==01(*,*+11 JMAX-2)
C(N;H)=01(*,*,3\JMAX)
B {N,M}=0
A(N,N) =A (N,N)=RR
B(N:N) = CB
C(N;N) =C(N,N)-RF
C3 Fl{N]=SI{N}
C*****END OF FIGTRX
S MUST EE ZSRO ON B,C.
CALL 日TRI(2.JMJ
00 24 N=1:5
SI(N)=F1(N)
CONTINUE
C
1000 CONTINUE
C
C******FILTRY
'KA = 2
KB = KMAX-1
KSM KMAX-1
JSM \#\#00 J=2,JMAX,JSL
00 2000 J=3
O06 N={,S
OEFINE (FI(N)\&F{Z:LMAX-1,JIJ*JSM\&N, Z:KMAX-1))
C
RJ = RJT(Z!LMAX=1,UtJ*JSH, 2\&KMAX)
XJL=XJT
YJL= YJLT
YN= YNLT
QU =0dT
01 =017
Q2 = 02T
03=Q3T [4F 0. 001970
01380
001380
001390
001400
001410
051420
001430
001430
001440
001450
001460
001470
0014月0
001490
001500
001510
001520
01520
001530
001540
001550
001550
001570
001580
001590
001600
001610
001620
001630
001640
01640
001650
001680
001670
001880
001690
001700
001710
001720
001720
001730
001740
001750
001760
001770
001780
001790
001800
001810
001810
001820
001830
001840
001850
001960
001870
001880
001890
001890
001910
001920
001930
001940
001950
001960

```





```

        L32=A1(3,2)-631*U12 004410
        U23*(B:(2,3)=L21*U13)*L22 004420
        [33*1./(B1(3.3)-U130631-U23*L32)
        U24E(81(2,4)-L21-U14)*L22
        U25=(8,(2,5)-L219U15)*L22
        L4=81(4,1)
    L42=81(4,2)-641*U12
    643*81(4*3)-641*U13-[42*UZ3
    U34=(81(3,4)-L.3I*Ul4-L.32*U24)*L33
    644=1,/(81(4,4)=U14*L41-U24*L42-U34*L43
    U35=(81(3,5)-6.31*U15-L320U25)*L33
    L51=&1(5.1)
    52*81(5,2)-L55**12
    L53=81(5.3)-L51*U13*L52*U23
    L54#81(5,4)-L51**14*L52*U24-L53*U34
    U45*(81(4.5) -L41*U15*L42*U25*L43*U35)*L44
    L5S*1./(81(5.5)-L51*U15-652*U25-L53*U35-L.54*U45)
    COMPUTE LITTLE R S
    01=6110F!(1)
    02=L22*(F1(2)-L21*01)
    02=L22*(F)(2)-L21*01)
    03*L33*(F1(3)-L31*D1-L32*02)
    04=L&4*(FI(4)-L41*01-L42 **02-L43*03)
    05%L55*(F1(5)-L5i*01*L52*02-L53*03-L.54*04)
    COMPUTE 保 R S
    F1(5)=05
    F1(4) =04-U45**05
    F1(3)=D3-U34*F1(4)-435*05
    Fi(2)=02-U23*F1(3)-U24*F1(4)-U25*05
    Fi(1)=01-U12*Fi(2)-U13*Fi(3)-U14*FI(4)-U15*0S
    COMPUTE C PRIME FOH FIRST ROW
    00 12 Mmi,S
    01=611*C1(1,M)
    02*L22*(Cl(2,H)-L.21*01)
    D3=L33*(CI(3,M)-L31*D1*L32*02)
    053155*(C1(5,H)-L51*0)-42*02-643*03)
    5*L5(Cl(5.M)=L51*01-L52*02-1.53*03-L54*04)
    01(5,M)=05
    B1(S:M) =05 
    81(4*N4=040045*05* (4,M)-U35*05
    81(3,M)=03-U34*日1(4,M)-U35*05
    B1(Z,M) =02-U23=8! (3,M)=U24*81(4,M)-U25*05
    ```

```

    COMPUTE B PRIME*BIGR
        00 13 I=IS.IE
        OO2 Nal*5
        OEFINE (FI(N),F(N)(*,*,I)),(F2(N),F(N)(*,*,IOI))
        DO Z M=1,5
        OEFINE (A1 (N,M),A(N,H) (*,*,I))
        DEFINE {OI(N,M},B(N,M) (*,A,I)),{BZ(N,M),B{N,M)(O,N,I-I))
        OEFINE (CI (N,M),C(N,M) (*,*,J))
        OO 14 N=1,5
    FFl(N)*Fl(N)-Al(N,1)*F2(I)-Al(N,Z)*F2(2)-Al(N,3)*F2(3)
        - -A! (N,4)&FZ(4)-AI(N,5)\otimesF2(5)
            COMPUTE & PRIME
            00 11 N=1,5
            OO 11 N=115
    11
H(N,M)=81(N,M)*A1(N,H)<82(1,M)-\&1(N,Z)*日ट(2,M)=A1(N,3)*

* B2(3;M)=Al(N;A)*82(4,M)=Al(N;5)*82(5;M)
INSERT LUDEC AGAIN
L11E1,/H(1,1)
L21=H(2.])
U12=H(1,2)*LII
L22a1./(H(2:2)-12210U!2)
L22m1./(H(2,2)-L21-U12) 005010
U13=H(1.3)*LII

```

004410
004420
004430
004440
004450
004460
004470
004480
004490
04500
004500
004510
004520
004530
004540
004550
004560
004570
004570
004580
004580
004590
004600
004610
004620
004630
004640
004650
004630
004650
0174670
004680
004690
004700
004710
004720
004730
004740
004750
004760
004770
004790
004800
004810
004820
004830
004830
074840
004850
004860
004870
```

|  | 005040 |
| :---: | :---: |
| U15玉H（1．5）＊L11 | 005050 |
| L31 $=$ H $(3,1)$ | 005060 |
| L32xH（3．2）－L31＊U12 | 005070 |
|  | 005080 |
|  | 005090 |
| U24x（H\｛2，\％\}-L2I*U14)*L22 | 005100 |
| U25：（H（2．5）－－21＊（15）¢22 | 005110 |
|  | 005120 |
| 442mH（4＊2）－641＊U12 | 005130 |
|  | 005140 |
|  | 005150 |
|  | 005160 |
|  | 005170 |
| L51： ¢ $_{\text {（5．1\} }}$ | 005180 |
| L52＊H $(5,2)-L 51 * \pm 12$ | 005190 |
|  | 005200 |
|  | 005210 |
| U45a（H（4，5）－ 441 \＃U15－642＊U25－643＊U35）＊ 444 | 005220 |
|  | 005230 |
| COMPUTE LITTLE RIS | 005240 |
| O1×L11＊F1（1） | 005250 |
|  | 005260 |
| 03＝L33＊（F1（3）－L3 4 （0）－L32＊02） | 005270 |
| 04EL44＊（F｜（4）－L41＊Ul－642＊02－L43＊03） | 005280 |
|  | 005290 |
| COMPUTE BIG RIS | 005300 |
| F1（5）$\times 05$ | 005310 |
| $F 1(4)=04 \pm$ U45＊05 | 005320 |
| F1（3）$=03-U 34$ F $1(4)-U 35 * 05$ | 005330 |
|  | 005340 |
| $F\{(1)=01412+F 1(2)-413 * F 1(3)-U 14 * F 1(4)=U 15005$ | 005350 |
| COMPUTE C PRIMES | 005360 |
| 0015 M 515 | 005370 |
|  | 005380 |
| 02\％622＊（Cl（2，H）－L21＊01） | 005390 |
| $03=633 *(C 1(3, M)-L 31 * D 1-L 32 * D 2)$ | 005400 |
|  | 005410 |
|  | 005420 |
| E）（S．M）$=05$ | 005430 |
| EI $(4, M)=04-1445405$ | 005440 |
| 日1 $(3, M)=03-434 * 81(4, M)-435005$ | 005450 |
|  | 005460 |
|  | 005470 |
| CONTINUE | 005480 |
| I＝IU | 005490 |
| $003 \mathrm{~N}=1.5$ | 005500 |
|  | 005510 |
| $003 \mathrm{M}=1,5$ | 005520 |
| OEFINE（AI（N，H），A（N，M）（＊，＊IU， | 005530 |
|  | 005540 |
| COMPUTE 8 PRIME\＃GIG R FOR LAST ROW | 005550 |
| 0017 Nay 1，5 | 005560 |
| $F 1(N)=F 1(N)=A 1(N+1) \bullet F C(1)-A l(N, 2) * F C(2)-41(N, 3) *$ | 005570 |
| $1 \quad F 2(3)=A 1(N, 4) * F 2(4)-A 1(N+5) * F 2(5)$ | 005580 |
| COMPUTE A PRIME | 005590 |
| $0018 \mathrm{Nm} 1,5$ | 005600 |
| DO 16 Mmi． 5 | 005610 |
| $H(N, M)=81(N, M)-A 1(N, 1) 8 日 2(1, M) \rightarrow 41(N, 2) * B 2(2, M 4=A 1(N, 3) *$ | 005620 |
|  | 005630 |
| INSERT LUDEC AGAIN | 005640 |
| L11＝1．／H（1）1） | 005650 |
| L21）（2．1） | 005660 |

```
U12\＃H（1，2）＊11 005670
2Z＝1．／（H（2，Z）－L21＊U12 05680

    U13표 (1.3) WL11

    05690

    U14 \(=\) 林(1.4) 0L11

    U15=H(1,5) L. II

    L \(31=\mathrm{H}(3,1)\)

    05700

    005710

    \(L 31=H(3,1)\)
\(L 32=4(3,2)-631 \cdot U 12\)





    U24 7 (H(2.4) -121 U14) * 22

    U25= (H(2,5) - L2! ज15) ーL

    \(4+1=H(4,1)\)

    42\#H(4,2)=L41*U12

    \(643 x+(4,3)-641\) - U13-L420U23



    し44





    L51=H(5, 1)
\(652 \times H(5,2)=651 * U 12\)



    005840

    005850
005860







    COMPUTE ITTLE RIS

    OLFLII*FI(1)

    \(01=L 11 * F 1(1)\)
\(02=L 22^{*}(F 1(2)-421 * 01)\)

    02=L22* (Fi(2)-L21*01)

    \(03=633 *(F 1(3)-431 * 01-L 32 * 02)\)
\(04=644^{*}(f 1(4)-641 * 01-L 42 * 02 * L 43 * 03)\)



    compute eig Ris

    F1(5) =05

    F1(4) \(=04=1445 * D 5\)







    I \(=1 \mathrm{U}\)
I
I

    \(1 * 1-1\)
00 \&
\(0 \times 1.5\)



    CONTINUE





    IF (I.GF.IL) GOTOZO

    RETURN

    RETURN
ENO

    06110

006120

\section*{IMPLICIT RIGHT-HAND-SIDE SUBROUTINES}



```

    D(2,3)=C4*RR{***,1ILMAX=1) 002030
    OU(2,3) = C4*RR (*,* , 2fLMAX)
    O(2,4) # C5*RR(*,*,1:1,MAX-1)
    DU(2*4)*C5*RR(*;**Z!LMAX) 002060
    D(2.5) =0.0
    O(2%S)=0.0
    002070
    ```

```

        W{*,क, ItLMAX-1))*RR{&,#,1:LMAX=1}
    ```

```

    1W(*)*&Z!LMAX))*RK(*,* & 1 ILMAX-1)
    O(3*2) #C4*RR{*,*,1:LMAX-1)
    QU(3,2) = C4#RR(*,*,己!LMAX)
    0(3.3) = C2*RR(*)**I:LMAX-1)
    OU(3.3)=C2*RR(*,*,2;LMAX)
    D(3.4) = CGORR(E,**,1:LGMAX-1)
    D{3:4} = C6*RR(*,*:1:1LMAX-1)
    OU(3.4) = C6
    D(3.5) = 0.0
    O(4,1) =-(C3*W(*,*,1ILMAX*1)*C5*U(***,1!LMAX*1)*C6* 002210
    1V(*)*,1!LMMA=1)\*RR(**&#1:LMAX-1) 002220
    OU(4,1) =={C3*W(***,Z!LMAX)*C5*U(*,**Z:LMAX)*C6* 002230
    1 V(*)*,2ItMAX)I*RR{*,**1:LMAX-1) 002240
    D{4&2) = CS*RR(***,1ILHAX-1)}00225
    D{4*2) = CS*RR(*,**1ILHAX-1
    DU(4*2) = CS*RR{*,*,2ILMAX)
    DU(4.3)=CS*RR(*,*,21LMAX)
    D(4,4) =C30RR{*,**IILMAX-1)
    OU(4*4) = C3*RR(***,2!LMAX)
    D(4,5)=0.0
    DU(4,5) = 0.0
    ```



```

    3 2.*CG*V{*,*,I{LMAX-1)*W(*,*,IILMAX-1)****)
    3 2.*CG*V{*,*,I{LMAX-1)*W(*,*,IILMAX-1)****)
    ```



```

    1M(S*2) = (%,*,1:LMAX=1))*RR(IILMAX=1)
    1 NU(*)*,IILMAX=1)**RR(1ILMAX=1)
    ```

```

    002420
    ```



```

    1 W(*,A,2&LNAX)) #RR(%,*,1;LMAX-1) 002470
    ```

```

    D(S*&) # ((C3*CO)**(*,**1ILMAX-1)*CS*U{****1:LMAX* 1)*C6* 
    V(***IIILMAX=1) *RR(***+1ILMAX*1)
    M, CO*RR(*,*)1:LMAX-1)
    O(5,5)=CO*RR{*,*,1ILMAX-1)
    0 0 3 1 ~ N = 2 . 5
    00 31 M = 1%5
    O(5,5)=CO*RR{*,*,1ILMAX-1)
    002540
    OEFINE(DI,O(N,H)),(DU1:OU(N,M))
    002550
    002560
    A{N,M} =A (N,M) &ORE*OT (*,*, 2!LMAX~1) - 002570
    g(N,M)=g(N,H)-DRE*(D1(*,*,31LMAX) &DUL(*,**216*MAX - 1))}0002580
    ```

```

31 CONTINUE
RETURN
ENO

```

```

    COMMON/BASE/NHAX,JMAX KMAX,LMAX&UM,KM,LM,OT,GAMMA GAMI FMU&FSHACH*OOZ640
    I DXI,OYI,OZI,ND,NOZ,FV(5) FFO(5),HD,ALP,GD,OHEOA,HDX,HOY,HOZ, 尔* 00265O
2 RM,CNBR,PI,ITR,INVISC,LAMIN*NP,INTI,INTZ INT3 002660
\#OR/GINNAL
002260
0n2270
002280
02290
002300
002300
002310
002320
002320
0N2330
002340
002350
3 2.*CG*V{*,*,I{LMAX-1)*W(*,*,IILMAX-1)****)

```

```

    OU(5,4)=({C3*CO)*W *C5*U *C6*V }*RR(****2&LMAAX)
    002560
    RETURN (%O02610
    002650
            3,
                                    002660
                            18
    ```

\begin{tabular}{|c|c|c|}
\hline & T1 \(5 S 1(* * *, L+1) * S 1(0, *, L)\) & 003340 \\
\hline &  & 003350 \\
\hline & T3 \(=53(* * *, L+1) * S 3(* * * * L)\) & 003360 \\
\hline & \(T 4=S 4(* * * L+1) * S *(*, \theta, L)\) & 003370 \\
\hline & T5 \(=55(*, * * L+1) * 55(0, *, L)\) & 003380 \\
\hline & T6 = S6 ( \(0 \cdot *, L+1) * 56(* * * * L)\) & 003390 \\
\hline &  & 003400 \\
\hline &  & 003410 \\
\hline &  & 003420 \\
\hline &  & 003630 \\
\hline &  & 003440 \\
\hline &  & 003450 \\
\hline &  & 003460 \\
\hline &  & 003470 \\
\hline &  & 003480 \\
\hline &  & 003490 \\
\hline & OU = (U1 (*, * + ¢ + 1) - U \(1(4, *, L))\) & 003500 \\
\hline & DV \(=(Y(*, *+L+1)-V(*) *, L)\}\) & 003510 \\
\hline &  & 003520 \\
\hline &  & 003530 \\
\hline & \(F 2=T 1 め U+T 4 W 0 V+T 5 \pm\) Un & 003540 \\
\hline &  & 003550 \\
\hline & \(F 4=T 5 \times D U+T 6 * D V+T 30\) DW & 003560 \\
\hline &  & 003570 \\
\hline &  & 003580 \\
\hline & \(F(1)=00\) & 003590 \\
\hline & \(F(2)=F 2+R 2\) & 003600 \\
\hline & \(F(3)=F 30 R 3\) & 003610 \\
\hline & \(F(4)=F 4=84\) & 003620 \\
\hline & \(F(5)\) FS-R5 & 003630 \\
\hline & \(\mathrm{R2}=\mathrm{FL}\) & 003640 \\
\hline & R3 \(=53\) & 003650 \\
\hline & \(R 4=F 4\) & 003660 \\
\hline & RS \(=\) FS & 003670 \\
\hline & \(00-10 \times 1.5\) & 003680 \\
\hline 40 & S2 (N) S S2 (NJ ¢F (N) \#ORE & 003690 \\
\hline 1000 & CONTINUE & 003691 \\
\hline & RETURN & 003710 \\
\hline & END & 003720 \\
\hline
\end{tabular}

\section*{APPENDIX D}

\section*{IMPLICIT PSEUDO COMPILATION}


\[
\begin{aligned}
& O R / G / N_{A L} \\
& O_{A S} \\
& P_{O A R} P_{A G E} \text { IS } \\
& Q U_{A L / T Y}
\end{aligned}
\]


```

C* VEC 031 MUL:HUL:SSUB VL={KMAX-2)*LSL*(NMAX-Z)
D(3,2)=D(1;2)*Y-0(1;3)*C5
C
VEC D32 MUL:MUL:SUB VL=(KMAX*2)*LSL*(NMAX=2)
D{3.3) = C4*0(1,3)*GAMZ*V
C% VEC 033 MUL:MUL:A00 VL*(KMAX-2)*LSL*(JMAX-Z)
D(3,4)=-D{1,3)*C7*O(1,4)*V
C
VEC 034 MUL:MML:ADO VL=(KMAX-Z)*LSL*(JMAX-Z)
D(3,5)=D({,3)*GAM!
Cllol
O(4,1)=0(1,4)*C1-W*UU
VEC 041 MULIMULISU8 VL={XMAX-2)*LSL*(JMAX-2)
O(6,2)=D(1,2)*N-0(1,4)*C5
VEC D42 MULIMULISUQ VL={KMAX-Z}*LSL*(JMAX-2)
0(4,3)= D(1,3)*W-D(1,4)*C6
VEC O43 MUL{MULISUE VL={KMAX-2)*LSL*(JMAX-Z)
0(4,4)=64*O(1,4)*GAME*W
VEC 044 MULIMUL,AOD VL=(KMAX-Z)*LSL*(JMAX-2)
D(4,5) = D({,4)*GAMI
VECD\&5 MUL VL=(KMAX-2)*LSL*(JMAX-2)
O(5,1)=(-C2+2.*C1) - UV
VEC OSI MULIAOOIMUL VL=(KMAX-2)*LSL*(JMAX-2)
0(5,2)=0(1,2)*C3-c5*uU
VEC DS2 MUL{MULISUB VL*(KMAX-2)*LSL*(JMAX-2)
0(5.3)=0{5,3)*C3-C6*UU
VEC DS3 MULIMULISUB VL=(KHAX-2)*LSL*(JMAX-2)
0(5,4) = O(1,4)*C3-C7*U(\$
VEC OS4 MULIMUL;SUS VL=(KNAX-2)*LSLO(JMAX=2)
O(5,5)=0(1,1)*GAMMA*UU
C* VECDS5 MULIAOO VL= (KMAX-2)*LSLO(JMALX-Z)
C
C******END OF AMATRX

```
001341
001342
001343
001350
001350
001351
001351
001352
001353
001360
001361
001362
001363
001363
001370
001370
001371
001371
001372
001372
001373
001373
001380
001381
001381
001382
001382
001383
001384
001384
001390
001391
001392
001393
001400
001401
001401
001402
001493
001410
001411
001412
001413
011413
001420
0014 P1
    0114 ?
    001422
    001423
    001430
    001431
    001432
    001433
    001440
    001441
    001442
    001447
    001443
    001443
001450
    001450
001451
    001451
001452
    001452
    001453
    001460
    001461
    001461
    001482
    001462
    001463
    001470
    0n 1471
    001472
    001473
    001480
    001481
    001481
    001482
    001483
    001483
001490
    001490
001500
    001500
001510


ORIGINAL PAGE IS OF POOR QUALITY



\begin{tabular}{|c|c|c|}
\hline &  & 003110 \\
\hline &  & 003120 \\
\hline &  & 003130 \\
\hline &  & 003140 \\
\hline &  & 003150 \\
\hline c & & 003160 \\
\hline \(C\) & & 003170 \\
\hline H1 & CONTINUE & 003180 \\
\hline & 0091 Naj 5 & 003190 \\
\hline & 0091 Lelolmax & 003200 \\
\hline & DEFINE（F2，F1（N）） & 003210 \\
\hline 91 & Fご＊＊＊， & 003220 \\
\hline &  & 003230 \\
\hline &  & 003240 \\
\hline &  & 003250 \\
\hline &  & 003260 \\
\hline &  & 003270 \\
\hline &  & 003280 \\
\hline &  & 003290 \\
\hline &  & 003300 \\
\hline & Z2（3）＝（XJ＊YK－YJ＊XK）＊RJ & 003310 \\
\hline &  & 003320 \\
\hline & 1 oYKJ（2iKMAX－1，2iJSM＊2：LBl＊2Z（3）） & 003330 \\
\hline & & 003360 \\
\hline & O（1，2）\(=22(1)\) H \({ }^{\text {（ }}\) & 003350 \\
\hline & O（1，3）＝Z2（2） HDZ & 003360 \\
\hline & D（1．4）\(=22(3)\)－HOZ & 003370 \\
\hline & D（1．1）＝22（4）＊ HOZ & 003380 \\
\hline c & & 003390 \\
\hline Com & ＊amatrax & 003400 \\
\hline & RR＝1．／O1 & 003410 \\
\hline & U＊FR＊Q2 & 003420 \\
\hline & \(V\)＊RR－03 & 003430 \\
\hline & WmRReQs & 003440 \\
\hline &  & 003450 \\
\hline & UT＝U＊＊2＊V＊＊2＊＊＊＊2 & 003460 \\
\hline & CI 3 gAMI＊UT＊＊5 & 003470 \\
\hline & CRwRR－O5＊GAMHA & 003480 \\
\hline & C3：C2－C1 & 003490 \\
\hline & C6EO（1，1）\({ }^{\text {duU }}\) & 003500 \\
\hline & CSmGAMI \({ }^{\text {d }}\) & 003510 \\
\hline & Cbegamioy & 003520 \\
\hline & C7＝GAMI＊W & 003530 \\
\hline &  & 003540 \\
\hline & \(0(1,5)=0\) 。 & 003550 \\
\hline & \(D(2,1)=0(1,2)-C 1-\) unu & 003560 \\
\hline & O（2．2）＝C4＊D（1，2）wGAM2＊U & 003570 \\
\hline & \(O(2,3)=\$ 0(1,2) * C 6 * D(1,3)<0\) & 003580 \\
\hline & \(0(2, b)=-0(1,2) * C 7+0(1,4) * U\) & 003590 \\
\hline & \(0(2,5)=0(1,2) * G A H T\) & 003600 \\
\hline & \(0(3,1)=0(1,3) *\) clavouu & 003610 \\
\hline & \(0(3,2)=O(1,2) * V \sim 0(1,3) * C 5\) & 003620 \\
\hline & \(0(3,3)=C 4+0(1,3) *\) AAM \(2 * V\) & 003630 \\
\hline & \(D(3,4)=-0(1,3) * C 7 * 0(1,6) * V\) & 003640 \\
\hline & \(0(3,5)=0(1,3)=3 A M T\) & 003650 \\
\hline & \(D(4,1)=D(1,4)=C 1-⿰ 幺 幺\)－ & 003660 \\
\hline & \(D(4,2)=O(1,2) * W \sim D(1,4) * C 5\) & 003670 \\
\hline & \(0(4,3)=O(1,3) \oplus W \sim D(1,4) * C 6\) & 003680 \\
\hline & \(D(494)=C 4 * D(1,4) * G A M 2 * H\) & 003690 \\
\hline & \(D(4,5\}=0(1,4)\) GAMI & 003700 \\
\hline &  & 003710 \\
\hline & \(D(5,3)=0(1+3)\)－c3－C6＊uU & 003730 \\
\hline & \(0(5,4)=D(1,4) *\) C3－c7＊ut & 003760 \\
\hline
\end{tabular}





\begin{tabular}{|c|c|c|c|c|c|}
\hline \multicolumn{5}{|c|}{L31) H (3, 1)} & \[
005060
\] \\
\hline \({ }_{\text {c }}\) & Map L31 & GTHRIMM NR & *RS:SSL*SSMAX & & 005062 \\
\hline \(c\) & & & & & 005063 \\
\hline \multicolumn{5}{|l|}{\multirow[t]{2}{*}{\[
\begin{aligned}
& L 32 \approx H(3,2)-631 * \cup 12 \\
& \cup 23=(H(2,3)-L, 21 \odot U 13) * 22
\end{aligned}
\]}} & 005070 \\
\hline & & & & & 005080 \\
\hline \multicolumn{5}{|l|}{c} & 005081 \\
\hline c* & VEC L32 & MULISUB & \multicolumn{2}{|l|}{LnsSL*SSHAX} & 005082 \\
\hline c* & sss 11 & MUL. & & <U23*L32> & 005083 \\
\hline \multirow[t]{3}{*}{\({ }_{c}^{\text {c }}\)} & vEC U23 & MUL \({ }^{\text {SUS:MUL }}\) & VLesSL*SSMAX & & 005084 \\
\hline & & & & & 005085 \\
\hline & \multicolumn{4}{|l|}{L33=1./if(3.3)-U13*L31-U23* L 32)} & 005090 \\
\hline \multicolumn{5}{|l|}{c} & 005091 \\
\hline C* & VEC L33 & Oiv & VLeSSL*SSHAX & & 005093 \\
\hline \multirow[t]{2}{*}{\(c\)} & & & & & 005094 \\
\hline & \multicolumn{4}{|l|}{U24x(H(2.4)-L21*U14)*L22} & 005100 \\
\hline c & & & & & 005101 \\
\hline \({ }^{\text {c* }}\) & VEC 324 & MUL: SUE:MUL & VLaSSL*SSM4X & & 005102 \\
\hline \multicolumn{5}{|l|}{\multirow[t]{2}{*}{6 U25* (H(2,5)-L21*U15)*L22}} & 005103 \\
\hline & & & & & 005110 \\
\hline \({ }^{\text {c }}\) & & & & & 005111 \\
\hline \({ }_{6}{ }^{*}\) & VEe Ult & MULISUE & \multirow[t]{2}{*}{VLESSL*SSMAX} & <H15*L11> & 005112 \\
\hline c & & & & & 005114 \\
\hline \multicolumn{5}{|c|}{L4l=H(4, 1)} & 005120 \\
\hline c & & & & \({ }^{-}\) & 005121 \\
\hline c* & \multicolumn{4}{|l|}{MAP LAI GTHRIMM NR=1,RSxSSL*SSMAX} & 005122 \\
\hline \multicolumn{5}{|l|}{\multirow[t]{2}{*}{C L42=H(4,2)-641*U12}} & 005123 \\
\hline & & & & & 005130 \\
\hline c & & & & & 005131 \\
\hline C* & VEC 442 & MULisua & \multicolumn{2}{|l|}{VLwSSL*SSMAX} & 005132 \\
\hline \({ }^{\text {c }}\) & \$5s T1 & MUL & & <6420U23> & 005133 \\
\hline \multicolumn{5}{|c|}{} & 005140 \\
\hline c & & & & & 005161 \\
\hline c* & vEC L¢3 & MUL:SUB:SUE & VL=SSL*SSMAX & <H63-T.1-L41~U13> & 005142 \\
\hline \multicolumn{5}{|l|}{\multirow[t]{2}{*}{}} & 005143 \\
\hline & & & & & 005150 \\
\hline c & \multicolumn{3}{|l|}{\multirow[b]{2}{*}{VEC II MULIMULIADO VLaSSLesSmax}} & & 005151 \\
\hline C* & & & & <6310U140L.32-U24) & 005152 \\
\hline \({ }^{\text {ct }}\) & VEC U34 & SUB:MUL & \multirow[t]{2}{*}{VLaSSL-SSMAX
VLISSL
SSMAX} & <(H34-71)*633> & 005153 \\
\hline \(c^{*}\) & \$53 71 & MUL & & <U340643> & 005154 \\
\hline c & & & & & 005155 \\
\hline \multicolumn{5}{|c|}{} & 005160 \\
\hline \multirow[t]{3}{*}{\({ }_{c}^{C}\)} & \multicolumn{2}{|l|}{\multirow[b]{2}{*}{VEC TE MULBMULIADD}} & & & 005161 \\
\hline & & & VLeSSL*SSMAX & くU14*L61*U24* 4 (2) & 005162 \\
\hline & VEC T1 & Sue: Su日 & \(V L=S S L 4 S S M A X ~\) & <H44-T1-72> & 005163 \\
\hline \multirow[t]{3}{*}{\({ }_{6}^{6}\)} & VEC L*\% & oty & VLaSSL*SSMAX & & 005164 \\
\hline & & & & & 005165 \\
\hline & \multicolumn{3}{|l|}{} & & 005170 \\
\hline c* & vEC U25 & MUL & VLensLossmax & <T90622> & 005171 \\
\hline c* & \multirow[t]{2}{*}{\$5S P1
VEC U3S} & MULISUB & VLesStesshax & <H35-L32-U25> & 005172 \\
\hline \multirow[t]{3}{*}{\[
{ }_{6}^{6}
\]} & & MUL;SUB:MUL & \(V L=S 5 L\) SSMAX & <(T1-L.31* \({ }^{\text {(15 }}\) ) \({ }^{\text {L33 }}\) & 005173 \\
\hline & & & & & 005174 \\
\hline & \multicolumn{2}{|l|}{L.51 \(=\mathrm{H}(5.1\) )} & & & 005180 \\
\hline \[
6
\] & & \multirow[t]{3}{*}{GTHRIMM N} & \multicolumn{2}{|l|}{\multirow[t]{2}{*}{- 1:RS*SSL*SSHAX}} & 005181 \\
\hline \multirow[t]{4}{*}{\[
\begin{gathered}
6 \\
6 \\
\hline
\end{gathered}
\]} & MAP L5l & & & & 005182 \\
\hline & \multicolumn{4}{|l|}{\multirow[t]{2}{*}{}} & 005183 \\
\hline & \multicolumn{4}{|l|}{\multirow[t]{2}{*}{\begin{tabular}{l}
\(1.52=H(5,2)=-51-12\) \\
L53*H(5.3)-6.51*U13-2.52*U23
\end{tabular}}} & 005190 \\
\hline & & & & & 005200 \\
\hline \({ }_{6}^{6}\) & & & & & 005201 \\
\hline C" & VEc Lse & MULISUB & VLaSSLessmax & & 005202 \\
\hline
\end{tabular}





``` 05at55＊（F1（5）－551＊01－L52＊02－L53003－L54004）
C．VEC T1 ，MULIMULIADO VLaSSLBSSMAX
© VEC T1 MULIMULIADO VL\＃SSLシSSMAX
YEC T1 MULISU日 \({ }^{\text {SUB VLESSLESSMAX }}\)
```



```
\({ }_{c}{ }^{\circ}\)
CH VEC T2 MULISUE
MULiMULIADD VLaSSLeSSMAX
VEC Ti MULISUE\＆SUB VLESSLESSMAX
C COMPUTE BIGRIS
\[
F I(5)=05
\]
\[
F 1(\$)=04=045 \times 05
\]
```




```
Fi（2）＝02aU23＊Fi（3）－U24－FI（4）－U25＾05
```



```
C＊S\＄S T1 MULISUB
VEC TE MULIAUL：ADO YLISSLकSSMAX
```




```
\(\begin{array}{lll}\text { VECF12 } & \text { SU8 } & \text { VLISSSL＊SSMAX } \\ \text { SSST．} & \text { MULISU8 } & \\ \text { VEC T2 } & \text { MULIMULIAOD } & \text { VLASSLWSSHAX }\end{array}\)
VEC FII MULISUBISUS VLISSL＊SSMAX
20
エロIU，
\(I=I-1\)
004 Na I． 5
DEFINE \((F I(N), F(N)(*, *, I)),(F Z(N), F(N)(*, *, I \bullet!))\)
－CONTINUE
\(0019 \mathrm{~N}=1+5\)
\(19 F 1(N)=F 1(N\}-F 2(1) * B 1(N, 1)-F 2(2) * B 1(N, 2)-F 2(3) \in E 1(N 43)\)
＊－Fद（4）＊日1（N，＊）－FZ（5）＊日1（N，5）
```

```
\(C\)
\(C\)
\(C W\)
\(C W\)
\(C W\)
```

\#EC II MULIMULIADD VIESSLESSMAXX

```
#EC II MULIMULIADD VIESSLESSMAXX
VECT1 MULISUB:SUB VLESSL*SSMAX
VECT1 MULISUB:SUB VLESSL*SSMAX
* VECFIN MUL
* VECFIN MUL
ECFIN SUB
ECFIN SUB
    RETURN
    RETURN
    END
```

    END
    ```

005930
005931 005932
005933
005940
005950
005951
005952
005952
005953
005953
005954
005955
005956
005957
005958
005960
005970
005970
005980
005981
005982
005983
005984
005985
005990
005990
006000
006001
006002
006003
006004
006005
006006
006010
006011
006011
006012
006013
006014
006015
006016
006016
006020
006030
006040
006040
006050
006050
006070
006080
006090
006091
006092
006092
006093
006093
006094
006095
006100
006110
006120

\section*{Appendix E}

\section*{LOADER CONVENTIONS}

This section contains formats for loader tables, some of which can provide information that could be useful for debugging. The following are the loader tables that the system uses during error processing:

Module Header Table
Code Block Table
External/Entry Table
Debug Symbol Table
Symbol Definition Table
Pseudo Address Vector Table

Error processing information is provided for every object module loaded to produce a controllee file, including object modules of user-specified files and required object modules for system library files. Figure E-1 is a dump of a typical controllee file, illustrating the error processing information area at the end of the dumped file. A pointer to the error processing information is placed in register \#D. The register contains the total word length of error information in its upper 16 bits and the starting address in its lower 48 bits.

\section*{general TABLE STRUCTURE}

The loader works with files that are composed of one or more object modules. Each object module consists of a number of standard tables; each table begins with a standard two-word header:

1

2
\begin{tabular}{|c|c|c|}
\hline - & ASCII Table Name & \\
\hline Length & Address & \\
\hline
\end{tabular}

Word 1 Name of the table in ASCII
. Word 2 Length Length of the table in full words .
Address Bit difference between first word of the respective table and word 1 of module header table; i.e.:

Back pointer (bits) + address of first word of respective table (bits) \(=\) address of word 1 of header table (bits)
 54592044 4F4F 452 F

0000000000000000 \(000000000200 \operatorname{coce}\) 0000 cooc 4 CDE 0000 \(000000 c 220020002\) \(005000000 c 00050\) 0001000030000000 OCOG COCO OCOO 0900
 E28 00017F25 CC 23 7F20 \(\mathbf{3 0}\) PF Br 31 OC 04 7F2D 3234 3E35 0001
 2020292020202029 3030 COCO OCCO 2000 0 COO 090000000000 4 C4 1 5420 455 ? \(524 F\) 41544544 260G 0000

5459 2ECO 545? 5020 \(53504543 \quad 45465920\) 312045494 CL 5222 C
 203030303 C30 2030 4 F 4 F 9420 424 E 2042 2030202025202020 4544250000000000 454 F \(5420 \div 7452040\) 415445442 EOO OOCC 0 OOO 0000 ORO OCO

0000 00co 00000000 0000 On00 00000100
 ocoe cogg nocu 0000 \(0 C 3 G 50 C 0300000 C 1\) 0 OSO OCCC OOSO OGOD CCh5 COCD 4001 U000

7RIE caso ratmojac PFPE 00こ3 FFZ7 00'zs 9E? FFFF G? 0 P 0014 010 C C013 TE.20 0031
 2020 20P0 2020202 C acoc ccco coso 0900 cajo 0000 00Jo 0901 522 E 4 E 4 F 554 E 442 E 5E4E 4 4424 C45 2054

0009000006001008 0003500030003000 0000 oeno oago joco 0000000000000000 00000010 ODOD 0000 0006 0050 4004 20e0 000000004050 2EBO

BEII COOO \(4000 \equiv 640\) TEPA 004A 7F2a 6CZA 3E2C 0013 7F20 2C23 3F32 00017 720 1233
 2829200000002000 003000000001000 50415 5241 40455445 \(20205554 \quad 4945: 9054\) 4 FOO 4Y:4 \(4050 \quad 4 \mathrm{C45}\)

t'E UKILTTY. YRY AGAYA
SPECIFY PARAMETERS-
(FILEL,FILE2, \(=L N G, B=A O R 1, B=A\) DR2, N=ERRLTMIT USEP SPECTFIED NG IS LONRER THATI FILE ENG. TOUN GATEU TO 0000000 ATHIS FIL E COULO MOT OE DPENED -

FIL COMPAPE YERMINATED. FILENAME
COULD HET OE MAPPED-IN compape COULO HOT QE MAPPED-IN. COMPAR

E TERAINATFO. iy oone.

COMPARED EOUALI.Y. UNTLI

\section*{ERROR PROCESSING INFORMATION AREA}

\section*{*** ZERD ***}
\(10543800 \quad 2040 \quad 4544 \quad 554 C \quad 4520\) \(\begin{array}{lllll}10643800 & 2040 & 4544 & 554 C & 4520 \\ 10648900 & 1016 & 2925 & 2046 & 4 C 40\end{array}\)
 \(\begin{array}{llll}1604 E A G O & 9301 & 0000 & 4003 \\ 1004 E 800 & 4041 & 494 E & 2520 \\ 2 C 20\end{array}\) Ce4acoa \(00 \pm 4\) OJCO DOOC 0000 00480000065000040010000 1004 EEOO 0000000000023800
\(0 \operatorname{cog} 0000000019000\)
 0500455541464
\(000903 C 00000 \operatorname{coso}\) \(0.3 A C\) coco ocoo 1940 455A 54 \(20 \quad 4\) F4E 5452 2329202020202020 001 F 09000 OCO 38 CO COOG FFFF FFFF FFFF 000 C IFIC OODC IFIC
 OCOC 1FIC OOOC 1FIC

5940444840532020 \begin{tabular}{lll}
0.202 & 4000 & 4053 \\
\hline 0020
\end{tabular} 0 CD1 003000008003 OOI 0000 ccce jcan 004000040001000 00006004001900 OOOC IFIC OOOC IFIC


Figure E-1. Dump of a Controllee File

\section*{MODULE HEADER TABLE}

The module header table contains general information concerning the object module and provides a linkage to all the other tables in the module:
Word

Word 3 Name of module in ASCII, expressed as 8 characters, left justified and blank filled
Word 4. Date and time module was created, in packed decimal with a positive sign. Date and time format is: year, year, month, month, day, day, hour, hour, minute, minute, second, second, millisecond, millisecond, millisecond

Word 5 Word length of tables, excluding code, followed by ASCII name of processor that created module

Word 6 Word length of code, followed by bit length of data base area. The maximum size of the data base is one large page.

Word 7 Each word contains a table type and an address pointer to a table of that type. The pointer
\& on contains a bit address relative to the first word address of the header. By convention, the first table described is the code, and the second is the external/entry table. If HEX type is 0004, the pointer contains the bit address of the next module header table. Each table type is described in detail in this section.
\begin{tabular}{lll} 
Type & Module Name & Description \\
0001 & CODE & Code Block Table \\
0002 & EXT ENTR & External/Entry Table \\
0003 & REL CODE & Code Relocation Table \\
0005 & XFER SYM & Transfer Symbol Table \\
0006 & SYMB TAB & Debug Symbol Table \\
0101 & INT DATA & Interpretive Data Initialization Table \\
0201, & INT RELO & Interpretive Relocation Initialization Table \\
0301 & PAV & Pseudo Address Vector Table
\end{tabular}

Only types \(1,2,6\), and 301 appear in the error processing information area of an object module.

\section*{CODE BLOCK TABLE}

The code block table contains the executable code in the following format:


The code block table has a pointer in the error processing information area. In this capacity the table has the following format:
\[
1-E-4
\]


Word 1 Program name in ASCII.
Word 2 A pointer to the beginning of the error processing information area for that program.
Word 3. The executable code.
\& on

\section*{CODE RELOCATION TABLE}

This table describes relocation in the code itself.
Word

Word 3 ribi is number of bits per index in the bit string starting in word 5 ni is number of indexes in the string

Word 4 Current base: current bit address to which this module is relocated

Word 5 Bit string of indexes, each nbi bits long. Each index references a half word of code to be relocated relative to the base address of the code

When this table is processed, the bit base address of the code is added to the 48 -bit fields pointed to by the indexes in the bit string.
\[
1-E-5
\]

\section*{EXTERNAL/ENTRY TABLE}

The extemal/entry table contains definitions for all entry points, external symbols, and common blocks.


1 -E-6
\begin{tabular}{ll} 
Word 3 & \begin{tabular}{l}
\(m\) is number of entry point names in table \\
\(n\) is number of names in table
\end{tabular} \\
Word 4 through \(3+m\) & List of entry point names \\
Word \(4+m\) through \(3+n\) & List of external names \\
Word \(4+n\) through \(3+m+n\) & List of entry point descriptors \\
Word \(4+m+n\) through \(3+n+n\) & List of external descriptors
\end{tabular}

Each descriptor is of the following form:
\begin{tabular}{|c|ccc|}
\hline Type & Value & \\
\hline
\end{tabular}

Type Field Symbol Type
1 Entry point in code
2 Entry point in data
3 Constant entry point
14 External procedure 0
15 External datum 0
16 Common block Bit length of the common block

\section*{ENTRY POINTS}

An entry point is a named value defined in the procedure; it is to be referenced as an external by an external procedure. It may be an address in the code block, an address in the data base, or a constant value.

\section*{COMMON BLOCKS}

A common block is a named alterable space referenced by one or more procedures. A common block can be initialized with relocatable data. Blank common is a common block with name of eight blanks.

\section*{EXTERNAL PROCEDURE}

An external procedure reference is used in a call. Having a symbol doubly defined as a common block and extemal procedure is specifically allowed. All names are eight characters, left justified and blank filled.
\[
1-E-7
\]

\section*{EXTERNAL DATA}

An external datum is an external that is referenced by a method other than a procedure call.

\section*{INTERPRETIVE DATA INITIALIZATION TABLE}

When the loader processes information in this table, areas of static space are initialized.


Word 3 Data item descnptor and item pairs, formatted as follows: \& on ,
\begin{tabular}{|ll|ll|l|l|l|l|l|}
\hline ord1 & & & & & & \\
\hline
\end{tabular}
ordl Pseudo address vector ordinal of static space to be initialized ord2 Pseudo address vector ordinal relative to which relocation is to be done (relocation base)

Type Type of data item that follows
\[
1-E-8
\]

Mode \(\quad 00\) Values to destination
01 Values plus relocation base to destination
02 Destination plus relocation base to destination
When mode \(=00\), the values in the item are stored directly into the destination fields, and ord2 is ignored. When mode \(=01\), the relocation base is added to the values before they are stored in the destination fields. Halfword values are not defined for this case. When mode \(=02\), the relocation base is added to the destination fields. The value fields are absent in the various items in this case.

Chain Relative full-word count to next data item descriptor in table

Data items may be stored in one of the following formats, depending on the type:

\section*{Data Items}

\section*{Item Format 1}
\begin{tabular}{|c|c|c|}
\hline Length & Relative Address & 48 \\
\hline & Value & 64 \\
\hline
\end{tabular}

\section*{Item Format 2}


Item Format 3


The data item format corresponding to each type is as follows:
\begin{tabular}{clc} 
Type & Description & Data Item Format \\
1 & Full-Word Broadcast & 1 \\
2 & Half-Word Broadcast & 1 \\
3 & Full-Word Vector Transmit & 1 \\
4 & Half-Word Vector Transmit & 1 \\
5 & Full-Word Sparse Vector & 2 \\
6 & Half-Word Sparse Vector & 2 \\
7 & Full-Word Index List & 3 \\
8 8 & Half-Word Index List & 3 \\
9 & Byte String & 1 \\
A. & Bit String & 1 \\
D & Nested List & Any
\end{tabular}

For each data item type, the appropriate format is applied as follows:

\section*{FULL WORD BROADCAST}

Data Item Type 1
Item Format 1
Length Full word vector length
Value A full word to be stored in consecutive full words starting at the relative address in the section of static space

\section*{HALF WORD BROADCAST}
Data Item Type ..... 2
Item Format ..... 1
Length Half-word vector length
Value A left justified half-word to be stored in consecutive half-word locations startingat the relative bit address
FULL WORD VECTOR TRANSMIT
Data Item Type ..... 3Item Format1
Length Full-word vector length
Value Full-word vector to be transmitted to the relative address in control section
HALF WORD VECTOR TRANSMIT
Data Item Type ..... 4
Item Format ..... 1
Length Half-word vector length
Value Half-word vector to be transmitted to the relative address in control section1-E-11
FULL WORD SPARSE VECTOR
Data Item Type ..... 5
Item Format ..... 2
Length Number of values in item
Value Full-word values
Length2 Length of control vector
Bit String Control vector having a length specified by length2
HALF WORD SPARSE VECTOR
Data Item Type ..... 6
Item Format ..... 2
Length Number of values in item
Value Left justified half word vector
Length2 Length of control vector
Bit String Left justified control vector
FULL WORD INDEX LIST
Data Item Type ..... 7
Item Format ..... 3
Length Number of values in item
Value Full word values
nbi Number of bits per index
ni Number of indexes
Bit String . A bit string of ni indexes; each index is nbi bits long and contains a full-wordcount

\section*{HALF WORD INDEX LIST}

\section*{Data Item Type 8}
Item Format 3

Length Number of values in item
Value A left justified half-word vector
nbi
ni

Bit String

BYTE STRING
Data Item Type ..... 9
Item Format ..... 1
Length Number of bytes in value field
Value A left justified byte string
BIT STRING
Data Item TypeItem Format1
Length Number of bits in value field
Value

\section*{NESTED LIST}

\begin{tabular}{|c|c|}
\hline Ord1 & Pseudo address vector ordinal relative to the data area to be initialized \\
\hline Ord2 & Pseudo address vector ordinal relative to which relocation is to be done (relocation base) \\
\hline Type 1 & D-nested list \\
\hline \multirow[t]{3}{*}{Mode} & 00 Value to destination \\
\hline & 01 Value plus relocation base to destination \\
\hline & 02 Destination plus relocation base to destination \\
\hline Length & Number of nested item types that follow \\
\hline Rba & Relative bit address \\
\hline Nil & Nested data item \\
\hline Ni2 & Nested iteration start item \\
\hline \(\mathrm{Ni3}\) & Nested iteration end item \\
\hline Niter & Number of times data item/items associated with this iteration start item are to be repeated \\
\hline Type2 & Any initialization type. If more than one data item in an iteration, types may not be mixed \\
\hline
\end{tabular}
\begin{tabular}{ll} 
Chain1 & Relative full word count to next data item in nested list \\
Chain2 & Length of data item in number of words \\
Chain3 & \(0 \quad\) No nested item types follow \\
Length2 & Hore nested item types follow \\
Value word vector length
\end{tabular}

\section*{INTERPRETIVE RELOCATION INITIALIZATION TABLE}


Word 3 Relocation items; item formats are similar to data initialization table formats but do not \& on contain values
\[
1-E-15
\]

TRANSFER SYMBOL TABLE


Word 3 The symbol name of the entry point to which control is to be transferred at the start of execution. The name is left justified with blank fill.

\section*{DEBUG SYMBOL TABLE}

The debug symbol table, which contains the ASCII representation of symbols that appear in a program, allows a symbol to be referenced by name rather than by address. This table appears in the error processing information area if the compiler/assembler used is capable of generating the table and the appropriate option is selected and used during compilation/assembly.


Word 2 Length of table including the symbol definition table. Back Pointer is the bit difference between word 1 of this table and word 1 of the module header table.

Word 3 Number of symbols in this table.
\[
1-E-16
\]

Word 4 Symbols, which can be any of the following:
\& on
Variable or array names in ASCII; must be left-justified with blank fill.
Statement line numbers in ASCII; must be hexadecimal values, right-justified with ASCII zero fill.

Statement labels in ASCII. Labels that are symbolic names are stored leftjustified with blank fill; labels that are statement numbers are stored rightjustified with ASCII zero fill.

\section*{SYMBOL DEFINITION TABLE}

The symbol definition table is an extension to the debug symbol table. It provides further definition to the debug symbols including the type of symbol, address, and mode.
Word 14 SYMBADEF

Word 3 Symbol type:
\(0=\) Unknown
\(1=\) Half-word register variable name
\(2=\) Full-word register variable name
\(3=\) Variable or array name
\[
1-E-17
\]

4 = Line number
\(5=\) Label

Location field for symbol type:
\(1=\) Half-word' address within register file; since half-word values may be stored in full-word registers, location can range up to hexadecimal 1 FF
\(2=\) Full-word register number
3 = Bit address relative to the start of the data base
\(4=\) Bit address relative to the start of the code base
\(5=\) Bit address relative to the start of the code base
Word 4 Mode. Symbol mode, consisting of three parts: precision, description, and data type. In the case of a descriptor, P and Dtype describe the contents of the reference vector.
\begin{tabular}{|l|l|l|l|}
\hline\(P\) & Desc & & Dtype \\
1 & & 3 & \\
\hline
\end{tabular}

P Precision base indicator:
\(0=\) Precision base is 32 -bit, or irrelevant
\(1=\) Precision base is 64 -bit
Desc Descriptor indicator:
\(0=\) Not a descriptor
\(1=\) Vector descriptor
\(2=\) Vector descriptor array
\(4=\) Sparse vector descriptor
\(5=\) Sparse vector descriptor array
Dtype Type of the referenced vector:
\(0=\) Unknown
\(1=\) Logical
\(2=\) Integer
3 = Real
\(4=\) Complex
\(5=\) Double precision
\(6=\) Character
\(7=B i t\)
Ordinal: The pseudo address vector table of the data base or common block
\[
1-E-18
\]

\section*{PSEUDO ADDRESS VECTOR TABLE}

\section*{(Ordinal Description)}

The table has the following format:


For common:
\begin{tabular}{|cc|cc|}
\hline 0 & & Address & \\
\hline 0 & 16 & & 48 \\
\hline & & & Bit Length \\
\hline
\end{tabular}

For external symbol, referencing entry point in code:
\begin{tabular}{|cc|cc|}
\hline 0 & & & \\
\hline Data Base Length & & & \\
\hline & & & \\
\hline
\end{tabular}

For external symbol, referencing entry point in data:
\begin{tabular}{|cc|cc|}
\hline 0 & & Entry in Data Base & 48 \\
\hline Data Base Length & & \\
\hline
\end{tabular}

For external symbol, referencing constant entry point:
\begin{tabular}{|cc|cc|}
\hline 0 & Constant Entry Value & 48 \\
\hline Data Base Length & 16 & & \\
\hline & & & \\
\hline
\end{tabular}
\[
1-E-20
\]

\section*{APPENDIX F}

\section*{SIMULATOR INPUTS AND RESULTS}
\begin{tabular}{|c|c|}
\hline & \[
0,1,5,1+100,600
\] \\
\hline M & 0,2,5,1,100,800 \\
\hline M & 0,3,5,1,100,800 \\
\hline M & 0,4,5,1,100,800 \\
\hline M & 0,5.5.1.98.600 \\
\hline M & 0,6,5,1,98,600 \\
\hline M & 0,7,5,1,98,600 \\
\hline & 0,8,5,1,100,600 \\
\hline M & 0,0,5,1,98,600 \\
\hline & 0,0,5,1,98,600 \\
\hline & 0,0,5,1,100,800 \\
\hline & 0,0,5.1,100,800 \\
\hline & 0,0,5al,16,128 \\
\hline & 2,0,2,1,60000,12,3,1 \\
\hline & 3,0,2,1,60000,12,3.1 \\
\hline & 4,0,2,1:00000,12,3.1 \\
\hline & 0,0,2,1,60000,12,3.1 \\
\hline & 0,0,2,1,60000,12,3,1 \\
\hline & 0,0,2,1,60000,12,3,1 \\
\hline & 0,0,3,1,60000.9,4.1 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,2,1.60000,9,4,2 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,2:1,60000,9:4,2 \\
\hline & 0,0,2,1,60000,9,2,2 \\
\hline & 0,0,2,1,60000,9,2,2 \\
\hline & 5,0,1,1,60000,18,2,2 \\
\hline & 0,0,0,1,60000,20.3.1 \\
\hline & 0,0,2:1,60000,9,3,2 \\
\hline & 0,0,3,1,60000,9,4.1 \\
\hline & 0,0,3:1,60000,15,4,2 \\
\hline & 0,013,1:60000,9:4,1 \\
\hline & 0,0,2,1,60000:9,3,1 \\
\hline & 0,0,2.1,60000,15.3.1 \\
\hline & 0,0,2,1,60000,15,3.1 \\
\hline & 0,0,2,1,60000,9,4,2 \\
\hline & 0,0,2,1,60000,9,3,2 \\
\hline & 0,001,1,60000,9,2,1 \\
\hline & 0,0,2,1,1,60000 \\
\hline & 0,0,3.1,60000.9,4,1 \\
\hline & 0,0.3.1.60000,15:4,1 \\
\hline & 0,0,3,1:60000,9,4.1 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,3,1:60000,9,4:1 \\
\hline & 0,0,3,1,60000,15,4,1 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,2:1,60000,9,4,2 \\
\hline & 0,0,3,1,60000,9,4,1 \\
\hline & 0,0,3.1.60000,9,4.1 \\
\hline & 0,0,3:1,60000,9,401 \\
\hline \(v\) & 0,0,3,1,60000,15,4,1 \\
\hline \(v\) & 0,0,1.1.60000,9,2,1 \\
\hline & 0,0,3,1,60000,15:4,1 \\
\hline & 0,0,3,1,60000,9,4.1 \\
\hline & 0,0,3:1,60000:9,4:1 \\
\hline & 0,0,3,1,60000:934,1 \\
\hline \(\checkmark\) & 0,0,2,1,60000,9,3,1 \\
\hline \(\checkmark\) & 0,0,1,1+60000,20,2,1 \\
\hline \(\checkmark\) & 0,0,0,1,60000,18,3,1 \\
\hline & \\
\hline
\end{tabular}


\begin{tabular}{|c|c|}
\hline & \[
\begin{aligned}
& 0,0,3,1,600,12,4,1 \\
& 0,0,3,1,600,12,4,1
\end{aligned}
\] \\
\hline \(R\) & 5 \\
\hline \(\bar{V}\) & 0.0 .3 .1 .600 .12 .482 \\
\hline \(V\) & 0,0,3,1,600,12,4,2 \\
\hline \(V\) & \(0,0,3,1,600,12,4,1\) \\
\hline V & \(0,0,3,1,600,15,4 * 1\) \\
\hline \(V\) & 0,0,3,1,600,15,4,1 \\
\hline \(V\) & 0,0,3,1,600,12,4,2 \\
\hline V & 0,0:3:1,600,15.4.1 \\
\hline \(V\) & 0,0,3,1,600,15,4,1 \\
\hline \(V\) & \(0,0,1,1,600,9,3,2\) \\
\hline \(V\) & \(0,0,3,1,600,12,4,2\) \\
\hline \(V\) & \(0 \cdot 0 \cdot 3 \cdot 1,600: 15 \cdot 4.1\) \\
\hline \(V\) & \(0,0,3,1,600,15,4,1\) \\
\hline \(V\) & \(0,0,3,1,600,15,4,1\) \\
\hline \(V\) & 0,0,3,1,600,15,4,1 \\
\hline \(V\) & \(0,0,3,1,600,15 \cdot 4 \cdot 1\) \\
\hline \(V\) & \(0 \cdot 0 \cdot 2 \cdot 1 \cdot 600 \cdot 12 \cdot 3 \cdot 1\) \\
\hline C & \\
\hline \(R\) & 1 \\
\hline \(R\) & 5 \\
\hline \(V\) & 0,0,3,1,600,15,4,1. \\
\hline \(V\) & 0,0,3:1:600,15:4,1 \\
\hline \(V\) & 0,0,3,1,600,15,4,1 \\
\hline \(V\) & 0,0i1,1,600,9,2,1 \\
\hline & \\
\hline 8 & 25 \\
\hline \(V\) & 0,0,3,1,600,15,4,1 \\
\hline \(V\) & 0 0,0,3,1,600.15.4.1 \\
\hline \(V\) & 0,0,3,1,600,15,4,1 \\
\hline \(v\) & 0,0,1,1,600,9,2,1 \\
\hline & \\
\hline \(V\) & 0,0.1.1,600,18.2,2 \\
\hline \(V\) & 0,0,0,1,600,18,2,1 \\
\hline 4 & 10,1,1.1.1.600 \\
\hline V & 1,0,3,1,600.9,4,2 \\
\hline , & 0,0,1,1,600,18,2,2 \\
\hline \(V\) & (0,0,0.1,600,18,2,1 \\
\hline \(V\) & V 0,0,2,1,600,9.3.2 \\
\hline \(V\) & V0,0,1,1,600,9,3,1 \\
\hline \(V\) & \(v 0,0,1,1,600,9,2,1\) \\
\hline \(V\) & \(\vee 0,0,3,1,600,12,4,1\) \\
\hline \(V\) & \(0.0,3,1,600.9,4,1\) \\
\hline V & V0,0,1,1:600,9,2,1 \\
\hline \(V\) & \(V 0,0,1,1,600,18,2,2\) \\
\hline \(V\) & V 0,0,0,1,600,18,2,1 \\
\hline \(V\) & \(V 0,0,3,1,600,9,4,1\) \\
\hline \(V\) & \(v 0.0,3,1,600,9,4,1\) \\
\hline M & (0,2,1:1,1,600 \\
\hline V & V 2,0:3,1,600,12,4,2 \\
\hline & \(V 0,0,3,1,600,15.4,1\) \\
\hline & \(V 0,0,3,1,600,9,4,1\) \\
\hline & \(V 0,0,2,1,600 \cdot 12,3,1\) \\
\hline & \(v 0.0 .3,1,600,9,4 i 1\) \\
\hline & \(V 0,0,3,1,600,9,4,1\) \\
\hline & \(V 0,0,1,14600,18,2,2\) \\
\hline & \(V 0,0,0,1,600,18,2,1\) \\
\hline & \(V 0,0,3,1,600.9,4,1\) \\
\hline & V 0,0.2,1,600.12,2,1 \\
\hline & M 0,2,1.1.1,600 \\
\hline & \(V 2,0,3,1,600,12,4,2\) \\
\hline & \(V 0,0,3,1,600,12,4,2\) \\
\hline
\end{tabular}
\begin{tabular}{|c|c|c|c|}
\hline \[
\begin{aligned}
& 4693 \\
& 4694
\end{aligned}
\] & T2
F11 & & \[
\begin{aligned}
& 001380 \\
& 001390 \\
& 001400 .
\end{aligned}
\] \\
\hline 4732 & 2 DI & & 001410 \\
\hline 4742 & D2 AND Ti & & 001420 \\
\hline 4744 & D3 & & 001430 \\
\hline 4762 & \(71^{\circ}\) & & 001440 \\
\hline 4762 & T1 & & 001450 \\
\hline 4764 & D4 AND T! & & 001460 \\
\hline 4766 & T2 & & 001470 \\
\hline 4767 & T1 & & 001480 \\
\hline 4772 & 05 AND 015 & & 001490 \\
\hline 4792 & 84M AND T1 & & 001500 \\
\hline 4794 & 83M & & 001510 \\
\hline 4802 & T1 & & 001520 \\
\hline 4803 & B2M & & 001530 \\
\hline 4812 & 11 & & 001540 \\
\hline 4813 & 72 & & 001550 \\
\hline 4814 & 81M & & 001560 \\
\hline END & 0012 & & 001570 \\
\hline DO I & IMAX TIMES & & 001580 \\
\hline 001 & 14 L00p & & 001590 \\
\hline 4922 & T1 & & 001600 \\
\hline 4923 & T2 & & 001610 \\
\hline 4924 & T1 & & 001620 \\
\hline 4925 & Fl & & 001630 \\
\hline END & DO LOOP 14 & & 001640 \\
\hline 00 L & LOOP 11 & & 001550 \\
\hline 4972 & T1 & & 001660 \\
\hline 4973 & 72 & & 001670 \\
\hline 4974 & 11 & & 001680 \\
\hline 4975 & HNM & & 001690 \\
\hline ENO & DO LOOP 11 & & 001700 \\
\hline 4330 & L11-1/8(191) & & 001710 \\
\hline & OIVIDE 2ND PASS & & 001720 \\
\hline 4340 & MAP: L21*81(2.1) & & 001730 \\
\hline 4350 & \& 4360 U12 \(=81(1,2)\) \#111 & ¢1 5 81*L21*U12 & 001740 \\
\hline 4360 & \[
L 22=1 . / T 1
\] & & 001750 \\
\hline & OIVIDE ZND PASS & & \[
002760
\] \\
\hline 4380 & U13, U14 & & 001770 \\
\hline 4390 & U15, 131 & & 001780 \\
\hline 4410 & L32 & & 001790 \\
\hline 4420 & U23 & & 001800 \\
\hline 4430 & TEMP1 & & 001810 \\
\hline 4430 & TEMPZ & & 001820 \\
\hline 4430 & DIVIDE IST PASS & & 001830 \\
\hline 4430 & L33 & & 001840 \\
\hline 4440 & U24 & & 001850 \\
\hline 4450 & U25 & & 001860 \\
\hline 4460 & L41 & & 001870 \\
\hline 4470 & L42, 71 & & 001880 \\
\hline 4480 & 143 & & 001890 \\
\hline 4490 & \[
\begin{aligned}
& \text { T1=L41*U13*L32*U24 } \\
& U 34=(81(3 \cdot 4) * T 1) * \operatorname{lon}
\end{aligned}
\] & & 001900
001910 \\
\hline 4500 & T1×U14*641*U24*642 & & 001920 \\
\hline 4500 & T1=81(4,4)=T1* (U34*L43) & & 001930 \\
\hline 4500 & L44*1*/T1 & & 001940 \\
\hline & OIVIDE 2NO PASS & & 001950 \\
\hline 4512 & T1 & & 001960 \\
\hline 4513 & 3 U35 & & 001970 \\
\hline 4522 & 151 & & 001980 \\
\hline 4542 & L52 ANO T1a 5 ( 1 ¢U13 & & 001990 \\
\hline 4544 & 4 L53 & & 002000 \\
\hline
\end{tabular}





```

V 0,0,3,1,60000:15,4,1
V 0,0,1,1,60000,9,2,1
V 0,0,3,1,600000.15,4,1
R }
v 0,0,2,1,60000,12,3.1
C
V 7.1.2.1.600000:12:3.-1
v a,2,2,1,60000.12,3.1
v9,3,2,1,60000,12,3:1
0,4,2,1,60000,12,3,1
0,5:2,1,60000,12:3.1
0,6,2,1,60000:12,3,1
00,20,3,1,60000,15,4,1
0,21.3,1,60000,15,4,1
0,22,2,1,60000:9,3,2
0,23,3,1,60000,15,4:1
22,24,3,1,60000*15,4,1
0,0,2,1,60000,9,3.2
0;0,1,1,60000,18,2,2
0:0,0,1,60000,18,2:1
0,0,2,1,60000,9,4,2
0,0,3,1,60000,12,4,2
0,0,2,1,60000,15:4,1
0:0,3,1:60000:12,4,2
0,0,3,1,60000,15,4,1
0.0,3,1,60000.15.4.1
0,0,3,1,60000:15:4,1
0,0,3,1,60000,-15,4,2
0,0,3,1,60000,15,4,1
0:0,3:1:60000:15:4,1
0,0.1:2,60000,9,2:1
0,0,3,1,60000,15,4,1
5
0,0,2,1,60000,12,3,1
C
640 FI(4)
650 {QS+PP)
00 20 LOOP - 000750
710 S2 % 000750
000765
420 XK 000770
430 YK . 000780
440 ZK 000790
450 XL 000800
460 YL 000810
470 ZL 000820
480 XX(1) 000830
490 TEMP2 000840
490 XX(I), XX(2) 000850
S00 TEMPI. 000860
510 TEMPZ 000870
510 XX(3). XX(4) 000880
550 RR=1./Q 000890
550 64 \&IT DIVIDE }000090
560 ANO S70 U ANO V 000910
590 W 000920
590 T 000930
590 FINISH QS,FORM FI(I) AND TI 0009\&0
600 U**Z*V**2 000950
600 W\$\&2+TI\&Q1 000960
600 P=GAMI*(Q5-.5*T1) . n00970
620 F1(2)
000980
630 F1(3)
640 F1(4) 001000
650 (Q5*PP) 001010
650 Fi(5) }00102
0 0 2 0 ~ L O O P ~ . ~ 0 0 1 0 3 0
71052 001040
001040
001050

```

\begin{tabular}{|c|c|c|}
\hline NUMEER & ....***.... CONTENTS & NUMBER *........... CONTENTS \\
\hline VECFP & I.0333437718820E*000 & VECBZ 9.0235653319400E=001 \\
\hline CLKPO & 772976 & \\
\hline
\end{tabular}


\[
1-F-8
\]

\begin{tabular}{|c|c|c|}
\hline NUMEER & ......*...... CONTENTS & NUMBER ............. CONTENTS \\
\hline VECFP & 1.1846927223380E*000 & VEC8Z \(9.9933263653100 \mathrm{CO}=01\) \\
\hline CLXPO & 930621 & \\
\hline
\end{tabular}






NUMEER
VECFP
CLKPD

\[
1-F-13
\]
\begin{tabular}{|c|c|c|c|}
\hline & 32768,16,3116 & LX Routine & 000090 \\
\hline \(F\) & & :OTDX=DT*DX1 & 000100 \\
\hline R & 1 & 100 1 KmKS1.KE2 & 000110 \\
\hline R & 1 & 100 2 JaJSliJE2 & 000120 \\
\hline \(R\) & 5 & & 000130 \\
\hline M & 0.0.1.1.100.1 & :PROTCT 1 : & 000140 \\
\hline c & & & 000150 \\
\hline \(v\) & 0,0,2,1,100,15,2,1 & ; \(P=A * B * C\) & 000160 \\
\hline R & 2 & 3004 \(\mathrm{N}=1,2^{\circ}\) & 000170 \\
\hline F & & & 000180 \\
\hline F & & & 000190 \\
\hline \(F\) & & & 000200 \\
\hline \(v\) & 0,0.2.1,101.12,2.1 & UUII= ( \(A+B) * C\) & 000210 \\
\hline \(v\) & 0,1,2,1,101:9,2,1 & ITMP \(=(A * B-C)\) & 000220 \\
\hline \(\checkmark\) & 0,0,3,1,101,15,3,0 &  & 000230 \\
\hline \(v\) & 0,0,1+1,100,9,2,0 & 1U(I* 1\()=U(1)<0\)-CONTROL VECT & 000240 \\
\hline M & 0,0.401.100,25 & IIf \()\) UiIE & 000250 \\
\hline H & 0,0,4,10100,25 & IIF \({ }^{\text {f UIIE }}\) & 000260 \\
\hline F & & 14(1) = & 000270 \\
\hline \(F\) & & & 000280 \\
\hline \(v\) & 0,0,2,1,100,9,2,2 & ;RLMBDA*A*BSRK=C*O & 000300 \\
\hline \(F\) & & 1Cx.5*DY1 & 000340 \\
\hline \(v\) & 0,0,2,1:100.12,2:1 & ; \(0 Y X=(A+B) * C\) & 000350 \\
\hline \(y\) & 0,0,2,1,100,9,4,2 & \(\mathfrak{Z Y X}=A-B S V Y X=C-0\) & 000360 \\
\hline \(\checkmark\) & 0,0,3,1,100,12.2,1 & ( \((W-W)\) MOZ & 000370 \\
\hline & 0,0,3,1+100,12,2,1 & ( (U-U)*0X & 000380 \\
\hline \(v\) & 0.0.3,1.100.9.2.1 & IVXY*DY*T & 000390 \\
\hline \(v\) & 0,0,2,1,100,9,3,1 & IUXY*DY-T & 000400 \\
\hline \(v\) & 0,0,3,1.100,9,3.1 & HMBOA \(+2 * M U\) & 000410 \\
\hline \(v\) & 0,0,2,1,100,12,2.1 &  & 000440 \\
\hline \(v\) & 090,3,1+100,9,3,1 & it-Vyxaorx & 000450 \\
\hline \(v\) & 0,0,2,1,100,9,2,1 & TUYX*DY + T & 000460 \\
\hline \(v\) & 0,0,2,1,100,12,3,1 & 1. \((W-W) \subseteq D Y X\). & 000480 \\
\hline \(v\) & 0,0,3,1,100,12.2,1 & ( \((\mathrm{w}-\mathrm{w})\) ¢ DX & 000490 \\
\hline \(\checkmark\) & 0,0.3.1.100.12,2.1 & (U-U)*DZ & 000500 \\
\hline \(\checkmark\) & 0,0,2.1.100,15,3.1 & it+T-T & 000510 \\
\hline \(\checkmark\) & 4.0:0,3,1,100,12,2,1 &  & 000540 \\
\hline \(\checkmark\) & -0,0,2,1,100,12,3,1 & PRK* (T+T) & 000550 \\
\hline \(v\) & -0,0,3,1,100,15,4,1 & SSoUstaU*W & 000560 \\
\hline \(v\) & \(v 0,0,3,1,100,15,4,1\) & ¢DISX=TAU*V*T+T & 000570 \\
\hline \(v\) & -0,0,1,1,100,9,2,1 & iF(1) \(\times\) A* B & 000580 \\
\hline & R \({ }^{\text {a }}\) & & 000590 \\
\hline \(v\) & \(v 0,0,2,1,100,9: 3: 1\) & ; \(F(J)=A * B * C\) & 000600 \\
\hline c & c & & 00061.0 \\
\hline \(F\) & & 1F(ISMO=0) GO TO 25 & 000620 \\
\hline \(F\) & & 111=1I+1 & 000630 \\
\hline \(v\) & -0,0,3,1,98,15,3,1 & ; \(T=P \sim 2+P+P\) & 000640 \\
\hline \(v\) & \(\vee 0,0,3,1,98,15,3,1\) & ; \(T+2 * A B S(P) * T\) & 000660 \\
\hline & \(\vee 0,0,1,1,98,9,1,1\) &  & 000670 \\
\hline & \(\checkmark 0.000,1,98,20,2,2\) & 1 / PaSS 1 & 0006880 \\
\hline & \(v 0,0,1,1,98,18,3,1\) & 1 / Pass 2 & 000690 \\
\hline & \(v 0,0,1,1,98,9,1,1\) & IG*ABS (T) & 000700 \\
\hline & R 29 9 & CIIIESQRT(G*ABS) & S000710 \\
\hline & L & jrange reduction & 5000720 \\
\hline & R 5 & & \$000730 \\
\hline & F & ;SCALAR MANIPULATION OF EXPONENTS & 5000740 \\
\hline & c & & 5000750 \\
\hline & 6 & 1 & 5000760 \\
\hline & c & & S000770 \\
\hline & 83 & 18(2:1) = P/Q (3.660) & 5000780 \\
\hline & v 0,0.2.1.98,9.1.1 & PPxPOLY S Q=POLY & 5000790 \\
\hline & c & & 5000800 \\
\hline & \(v 0,0,0,1,98,20,2,2\) & ; R P P/Q & 5000810 \\
\hline
\end{tabular}
\begin{tabular}{|c|c|c|}
\hline 0,0,1,1,98,18,3,1 & & 5000820 \\
\hline 2 & ITWO NEWTON ITERATIONS (14.60) & 5000830 \\
\hline \(\mathrm{v} 0,0,0,1,98,20,2,2\) & PX/Y(N) & S000840 \\
\hline v 0,0.1.1.98,18,3.1 & & 5000850 \\
\hline 0,0,2,1,98.12,2,1 & 3.5*(Y(N)*X/Y(N)) & 5000860 \\
\hline \(C\) C & & S000870 \\
\hline V. 0,0,1-1-98,9,2,1 & ;POST NORMALIZATION - ENO OF SQRT & S000880 \\
\hline \(\checkmark 0,0,2,1,98,12,3,1\) & - COEF= (T*T)*T & 000890 \\
\hline 5 & \$00 K6=1,5 & 000900 \\
\hline 0,0,3,1,98,1504*1 & iFz \((A * B) * C * 0\) & 000910 \\
\hline C & PEND OF FX SUBR & 000920 \\
\hline 5 & PPROICT : & 000930 \\
\hline V 0,0.3.1.98,15,3.1 & \((14+B) * C+D\) & 000940 \\
\hline 0,0,3,1,98,15,2,1 & ( \(A *\) ( \(T+C * 0)\) & 000950 \\
\hline \(C\) C & & 000960 \\
\hline \(v 0,0,0,1,98,20,2,2\) & \(1 /\) PaSS 1 & 000970 \\
\hline 0,0,1,1,98,18,3,1 & ( \(/\) PASS 2 RHOI=1/PRDICT & 000980 \\
\hline 0,0,2,1,98:9,3,2 &  & 000990 \\
\hline 0,0,2,1,98,9,3,2 & ; W=A"S S TaC* \({ }^{\text {c }}\) & 001000 \\
\hline 0,0,3,1,98,15,2,1 & ; USU+V"V & 001010 \\
\hline 0,0,2,1,98,9,2,1 & ; Wow-T & 001020 \\
\hline 0,0,2,1:98.9,2,1 & 1EI=0.50TbT & 001030 \\
\hline 0,0,2,1,98,15,2,1 & !P=A*B*EI & 001040 \\
\hline \(F\) F & IIF(J.NE,2) GO TO 100 & 001080 \\
\hline F & 3FF(U.LT.JE2) GO TO 35 & 001090 \\
\hline F & IFF(K.NE,KE2) GO TO 50 & 001160 \\
\hline c & [4 CONTINUE END OF N LOOP & 001190 \\
\hline c & 12 CONTINUE END OF J LOOP & 001230 \\
\hline c & ;1 CONTINUE END OF \(\times\) LOOP & 001240 \\
\hline E & IEND & 001350 \\
\hline
\end{tabular}

\begin{tabular}{|c|c|c|}
\hline NUMBER & CONTENTS . & NUMBER .............. CONTENTS \\
\hline VECFP & 7.7348901456800E=001 & VECEZ 7.0556960045800E=001 \\
\hline CLKPO & 2398 & \\
\hline
\end{tabular}
\begin{tabular}{|c|c|c|}
\hline NUMBER & contents & NUMRER \\
\hline & 456436185400E-002 & MAPEZ 2.28517813860 \\
\hline
\end{tabular}
\begin{tabular}{|c|c|c|c|c|}
\hline & \[
32768,16,3,16
\] & LYI & ROUTINE & \[
\begin{aligned}
& 000100 \\
& 000110
\end{aligned}
\] \\
\hline & 2 & & & 000120 \\
\hline \(F\) & & & IVIYI - SETUP PORTION & 000130 \\
\hline \(F\) & & & ISINGLE QUOTE KLEN & 000140 \\
\hline \(F\) & & & 1COLON = KLEN*JLEN/2 & \[
000150
\] \\
\hline \(V\) & 0.1,0.1.5000,20.1,2 & & \$0LD 008 LOOP & 000160 \\
\hline \(V\) & 1.2.1.1.5000,18,3.1 & & 1RHOI= & 000170 \\
\hline \(v\) & 2,0,3,1,5000,15.3.1 & & & 000180 \\
\hline \(V\) & 0,0,3,1,5000,15,3,1 & & & 000190 \\
\hline \(v\) & 0.0.2.1.5000:9,3,2 & & & 000200 \\
\hline \(v\) & 0,0.2,1,5000,9,2.2 & & & 000210 \\
\hline \(V\) & 0,0,3,1,5000,15,3,1 & & & 000220 \\
\hline \(V\) & 0,0,3,1,5000,15:4,1 & & 18 CONTINUE & 000230 \\
\hline R & \[
6
\] & & \[
3 M I S C
\] & 000240 \\
\hline M & 0,0.1.1.1.100 & & IUI*UI ETC & 000250 \\
\hline \(V\) & 0,0,1,1:100,9,1:1 & & & 000260 \\
\hline C & c 000101.100 .90101 & & & 000270 \\
\hline \(F\) & & & \(10 L O D O 10 L 00 P\) & 000280 \\
\hline \(F\) & & & ISUBR GI & 000290 \\
\hline M & 10,091:1.1:5000 & & 1 RMU & 000300 \\
\hline & ,0,0,2,1,5000,9,1,1 & & & 000310 \\
\hline & \(v 0,0,2,1,5000,12,2,1\) & & & 000320 \\
\hline & 0,0,3,1,5000,15,3,1 & & & 000330 \\
\hline & , 0,0.2,1,5000,9.2.1 & & & 000340 \\
\hline & \(v 0,0,3 \cdot 1,5000,15,3,1\) & & & 000350 \\
\hline & \(\vee 0,0,2,1,5000,12,3,1\) & & & 000360 \\
\hline & \(V 0,0,3,1,5000,15,3,1\) & & & 000370 \\
\hline & \(\vee 0,0,3,1,5000,15,3.1\) & & & 000380 \\
\hline & \(V 0,0,2,1,5000,12,3,1\) & & & 000390 \\
\hline & \(v 0,0,3,1,5000,15,3,1\) & & & 000400 \\
\hline & \(V 0.0,2,1,5000,9,3,1\) & & & 000410 \\
\hline & \(V 0,0,3,1,5000,15,3,1\) & & & 000420 \\
\hline & \[
R 3
\] & & * & 000430 \\
\hline & \(v 0.0,2,1,5000.12 .3 .1\) & & & \[
000440
\] \\
\hline & C \(V 0.0 .2,1.5000 .12 .3 .1\) & & & \[
000450
\] \\
\hline & \(v 0,0,2,1,5000,9,2,2\) & & ENO OF GI & 000460 \\
\hline & \[
R 3
\] & & & 000470 \\
\hline & V.0.0.2.1,5000,12,3.1 & & IOISX ETC & 000480 \\
\hline & \(V 0,0,3,1,5000,15,4.1\) & & & 000490 \\
\hline & \(v\) 0,0,3,1,5000,9,1,1 & & & 000500 \\
\hline & c \({ }^{\text {coses }}\) & & & 000510 \\
\hline & V 0,1,0.1,5000,2011.2 & & ; A/RHO & \[
000520
\] \\
\hline & \(v 1,0,1,1,5000,18,3.1\) & & & 000530 \\
\hline &  & & ISUBR DIAGON & 000540 \\
\hline & 84 & & & \\
\hline & \(\vee 0,0,2,1,5000,15,2,1\) & & & 000560 \\
\hline & C & & & 000570 \\
\hline & \(\vee 0,1,2,1,5000,9,4,2\) & & & \\
\hline & \(v 1,0,2,1,5000,9,2,2\) & & & 000590 \\
\hline & R 3 & & & 000600 \\
\hline & \(v 0.0 .3,1.5000,15.4 .1\) & & & 000610 \\
\hline & \(v 0,0,3,1,5000,15,3,1\) & & & \[
000620
\] \\
\hline & \(v 0\) (0,2,1,5000,9.3.1 & & & 000630 \\
\hline & \(C\) & & & 000640 \\
\hline & \(\vee 0.0 .3,1.5000 .15 .3 .1\) & , & & 000650 \\
\hline & \(V 0,0,2,1,5000,15,3,1\) & & & 000660 \\
\hline & \(v 0.0 .2 .1 .05000 .9 .3 .1\) & & & 000670 \\
\hline & F & & IEND OF DIAGON & \[
000680
\] \\
\hline & \(F\) & & 110 CONTINUE & 000690 \\
\hline & 96 & & & 000700 \\
\hline & \(v 0,0,2,1,100,9.3,1\) & & 1FFU= & 000710 \\
\hline &  & & & 000720 \\
\hline & \(\vee 0,0,3,1,100,15,3,1\) & & & 000730 \\
\hline & \(\vee 0,0,2,1,100,9,1,2\) & & & 000740 \\
\hline & F 0.0.2.1.100.9.1.2 & & 113 CONTINUE & 000750 \\
\hline & \(F\) & & IVLYI - SOLVER PORTION & 000770 \\
\hline & \(F\) & & 30OLLAR = JLEN/2 & 000780 \\
\hline & \(F\) & & ISINGLE QUOTE \(=\) KLEN & 000790 \\
\hline & \(F\) & & ;COLON = KLENAJLEN/2 & 000800 \\
\hline
\end{tabular}
\[
1-F-16
\]
\begin{tabular}{|c|c|c|c|}
\hline & 7 & . CALL VTRIE (7 TIMES) & 000810 \\
\hline \(\cdots\) & 0.0.1.1.1.100 & & 000820 \\
\hline M & 0,0.1,1.1,100 & & 000830 \\
\hline R & 10 & 100 JmJSl, JE! & 000840 \\
\hline \(v\) & 0,1,2,1,100,9,3,1 & & 000850 \\
\hline \(v\) & 1,2,201,100.911,1 & & 000860 \\
\hline \(V\) & 2,3,0,1,100,20,1,2 & & 000870 \\
\hline \(v\) & 3,4,1,1,100,18,3.1 & & 000880 \\
\hline \(v\) & 4,0,2,1,100,15.2,2 & & 000890 \\
\hline \(Y\) & \(0,0,3,1,100,15,3,1\) & & 000900 \\
\hline c & & & 000920 \\
\hline \(R\) & 10 & \$00 J=JSl.JEl & 000930 \\
\hline \(F\) & & & 000940 \\
\hline \(V\) & 0,0,2,1,100,9,3,1 & & 000950 \\
\hline c & & & 000960 \\
\hline C & & IEND OF VTRI2 & 000970 \\
\hline \(v\) & 0,0,2,1,100,9,2,2 & & 000980 \\
\hline \(V\) & 0,0,1.1.100.9,1.1 & & 000990 \\
\hline \(\cdots\) & 0,0,1,1,1,100 & & 001000 \\
\hline M & 0,0,1,1,1,100 & & 001010 \\
\hline \(R\) & 3 & 10 O 0013 LOOP & 001020 \\
\hline \(v\) & 0,0,2,1,5000,15,2.1 & & 001030 \\
\hline \(v\) & 0,0,3,1,5000,15,3,1 & & 001040 \\
\hline \(V\) & 0,0,1,1,5000,9.1.1 & & 001050 \\
\hline \(v\) & 0,0,2,1,5000,9,2,1 & & 001060 \\
\hline c & & & 001070 \\
\hline \(V\) & 0,0,3,1,5000,15,3,1 & & 001080 \\
\hline \(v\) & 0,0.1,1,5000,9,201 & 313 CONTINUE & 001090 \\
\hline \(R\) & 2 & 30 O DO 14 LOOP & 001100 \\
\hline \(v\) & 0,0,3,1,5000,15,2,1 & & 001110 \\
\hline \(v\) & 0,0,3,1,5000,15,3,1 & & 001120 \\
\hline \(V\) & 0,0:3,1,5000,15,2.1 & & 001.130 \\
\hline \(v\) & 0,0,1,1,5000,9,1,1 & & 001140 \\
\hline C & & & 001150 \\
\hline \(V\) & 0,0,3,1,5000,15,3,1 & & 001160 \\
\hline \(V\) & 0,0,3.1.5000,15,2,1 & & 001170 \\
\hline \(V\) & 0,0.1.1.5000,901:1 & & 001180 \\
\hline V & 0,0,3,1,5000,1503,1 & & 001190 \\
\hline \(v\) & 0,0,3,1,5000,15,4,1 & & 001200 \\
\hline \(V\) & 0,0,3,1,5000,15,2,1 & & 001210 \\
\hline \(v\) & 0,0,3,1,5000,15,2,2 & ila continue & 001220 \\
\hline F & & ; A3*OY1 & 001240 \\
\hline F & & :-CRKNIS & 001250 \\
\hline \(R\) & 3 & & 001260 \\
\hline \(V\) & 0,1:3,1,100,15:3,1 & 1RMU*DY1*(UI-UI) & 001270 \\
\hline \(V\) & 1,0,2,1,100,12,2,1 & 1GUPK2: (A-B)*C ETC & 001280 \\
\hline c & & & 001290 \\
\hline \(R\) & 2 & & 001300 \\
\hline \(V\) & 0,1,2,1,100,9,2,1 & 10U-T*SV & 001310 \\
\hline \(v\) & 1,0,2,1,100,9,2,1 & :SECsSEC*A*B ETC & 001320 \\
\hline 6 & & & 001330 \\
\hline V & 0,0,1,1,100,9,2,1 & 1SBC(I,4) \(=A+B\) & 001340 \\
\hline \(F\) & & :FORTH*DY1 & 001350 \\
\hline \(v\) & \(0,1,3,1,100,15,3,1\) & ( \(\mathrm{VSQ}-\mathrm{VSQ}\) ) \({ }^{\text {c }}\) +FUSQ & 001360 \\
\hline \(v\) & 1,0,3.1.100,15,4,1 & IUSQ-USG+T*FVSQ & 001370 \\
\hline \(v\) & 0,1,3,1,100,15,3,1 & IDY \({ }^{\text {\# }}\) RMU* (WSQ=WSQ) & 001380 \\
\hline V & 0,0,3,1,100,15,3,1 & 1DYI*RK*(EII-EII) & 001390 \\
\hline \(F\) & & 1-COSTSG & 001400 \\
\hline V & 1,2.3.1.100,15.3.1 & : RMUO (-COSTSG)AT+S & 001410 \\
\hline V & 2,3,3,1,100,15:4,1 & : \(A+F W S Q=B+F\) & 001420 \\
\hline \(v\) & 3,0,2,1,100,9,2,1 & 1SBC(I,5)=A*日* \(C\) & 001430 \\
\hline \(F\) & & ; JADD: & 001440 \\
\hline \(F\) & & & 001450 \\
\hline V & 0,1,0,1,5000,20,1,2 & & 001460 \\
\hline \(V\) & 1,2,1,1,5000,18,3,1 & ( RHOI=1/PROICT & 001470 \\
\hline R & 3 & & 001480 \\
\hline \(V\) & 2,0,1,1,5000,9,2,1 & !U\#P*RHOI ETC & 001490 \\
\hline c & & & 001500 \\
\hline \(v\) & 0,1,3.1.5000,15,2,1 & ! \({ }^{* * 2 \text {-V**2 }}\) & 001510 \\
\hline \(\checkmark\) & 1,2,3,1,5000,15,2,1 & 1*.5*(T+W**2) & 001520 \\
\hline \(V\) & 2,3,2,1,5000,9,3,1 & 1EI=P*RHO*T & 001530 \\
\hline \(V\) & 3,0,2,1.5000.15.2.1 & 1PEA*B*C & 001540 \\
\hline
\end{tabular}
\[
1-F-17
\]


\begin{tabular}{|c|c|c|c|}
\hline \begin{tabular}{l}
NUMBER \\
VECFP \\
CLKPD
\end{tabular} & \[
\begin{array}{r}
\text { 1.O2S2380i71560E+000 } \\
140233
\end{array}
\] & NUMBER VECBZ & \(* * * * O A C O N T E N T S\)
\(9.3682614968100 E-001\) \\
\hline & . & & \\
\hline NUMEER & ............ CONTENTS & NUMRER & ............. CONTENTS: \\
\hline MAPFP & 3.2089436825900E=005 & MAPRT \({ }^{\text {a }}\) & \(7.2137053984600 E=002\) \\
\hline
\end{tabular}
\[
1-F-18
\]


NUMBER ................... CONTENTS
CLKPD
IGIF

NUMBER ............... CONTENTS: CLKPO 16IF


\[
1-F-19
\]
\begin{tabular}{|c|c|}
\hline \multicolumn{2}{|l|}{M 0,0,2,1,23.48} \\
\hline M & 0.0.2.1.23.48 \\
\hline M & 0,0.2,1.23:96 ? \\
\hline M & 0i1,2,1.23,192 \\
\hline \(V\) & \(1 * 0,3,1,1344,9,4,1\) \\
\hline \(V\) & 0,0,3,1,1344,9,4,1 \\
\hline \(V\) & 0,0,2,1,1344,9,3+1 \\
\hline \(V\) & 0,0,2,1,1344,9,3,1 \\
\hline \(V\) & 0,0,2,1,1344,9,3.1 \\
\hline V & 0,0,2,1,1344,9,3,1 \\
\hline \(F\) & \\
\hline \multicolumn{2}{|l|}{\(F\)} \\
\hline 8 & 22 \\
\hline M & 0.0.2.1.11.48 \\
\hline M & 0,0.2.1,11,25 \\
\hline M & 0,0.2,1.11.96 \\
\hline M & \(0.2,2,1.11,768\) \\
\hline M & 3,0,1,1,1,400 \\
\hline F & \\
\hline \(V\) & 2,0,3,1:1344,9,4,1 \\
\hline \(V\) & 0,0ヶ3,1.1344,9.4.1 \\
\hline \(V\) & 0.0.3,1, 1344,9,4,1 \\
\hline \(V\) & \(0,0,3,1,1344,9,4,1\) \\
\hline \(V\) & 0,0,2:1.1344.9.3.1 \\
\hline \(V\) & \(0,0,2.1,1344.9 .3 .1\) \\
\hline \(V\) & \(0,0,2,1,1344,9,3,1\) \\
\hline \(V\) & 0,0,2:1.1344,9,3,1 \\
\hline \(V\) & \(0,3,2,1,1344,9,2,2\) \\
\hline \(V\) & 0,0,2,1,1344,9,2,2 \\
\hline \(V\) & 0,0,2,1,1344,9,2,2 \\
\hline \multicolumn{2}{|l|}{\(F\)} \\
\hline \(V\) & \(0,0,3,1,1344,9,4,1\) \\
\hline \(V\) & \(0,0,2,1,1344 \cdot 9,3,1\) \\
\hline \(V\) & 0,0,2,1,1344,9,3,1 \\
\hline \(F\) & \\
\hline \(V\) & 0,0,3,1,29568,9,4,1 \\
\hline \(V\) & \(0 \cdot 0 \cdot 3 \cdot 1 \cdot 29568 \cdot 9+4 * 1\) \\
\hline \(V\) & 0,4,2.1.29568:9,3,1 \\
\hline \(V\) & 0,0,2,1,29568,9,3,1 \\
\hline R & 22 \\
\hline M & 450.1 .1 .1 .200 \\
\hline V & 0:0:2,1:672,9*2:2 \\
\hline C & \\
\hline \(E\) & - \\
\hline
\end{tabular}


\begin{tabular}{|c|c|c|c|}
\hline NUMEER VECFP CLKPD & \[
\begin{array}{r}
\text { OMOM CONTENTS } \\
1.1388200815660 E * 000 \\
61590
\end{array}
\] & NUMBER VECBZ & .............. CONTENTS 9.5469946957600E=001 \\
\hline NUMEER & *.***....... CONTENTS & NUMAER & ....*......... CONTENTS: \\
\hline MAPFP & \(1.2203106020520 \mathrm{E}=003\) & MAPEZ & 5.08361231164008-001 \\
\hline
\end{tabular}
\[
1-F-20
\]
\begin{tabular}{|c|c|c|c|}
\hline M & 32768.16.3.16 LINKHO & PARTIAL SIMULATION & 000090 \\
\hline F & & & 000100 \\
\hline \(F\) & LINGHO ROUTINE & & 000110 \\
\hline \(F\) & & & 000120 \\
\hline \(F\) & & & 000130 \\
\hline \(V\) & 0,1,1,1,3456,9,2,1. & & 000140 \\
\hline F* & -* & & 000150 \\
\hline \(F\) & & & 000160 \\
\hline \(F\) & & & 000170 \\
\hline \(v\) & 0,0.1:1.1152:9.1:1 & & 000180 \\
\hline \(F\) & & & 000190 \\
\hline \(F\) & & & 000200 \\
\hline \(F\) & & & 000210 \\
\hline \(F\) & . & & 000220 \\
\hline V & , 1,0,3,1,3456,15,4,1 & & 000230 \\
\hline \(F\) & & & 000240 \\
\hline \(F\) & - & & 000250 \\
\hline \(F\) & - & & 000260 \\
\hline \(F\) & & & 000270 \\
\hline \(v\) & 0,0,1,1,768,9,2,1 & & 000280 \\
\hline \(F\) & - & & 000290 \\
\hline \(F\) & - & & 000300 \\
\hline \(F\) & * & & 000310 \\
\hline \(F\) & - & & 000320 \\
\hline \(V\) & 0,0.1.1.384,9,2.1 & & 000330 \\
\hline \(F\) & & & 000340 \\
\hline \(\boldsymbol{F}\) &  & & 000350 \\
\hline \(F\) & - & & 000360 \\
\hline \(F\) & - & & 000370 \\
\hline \(v\) & 0,0,1,1,3456,9,2,1 & & 000380 \\
\hline \(F\) & - & & 000390 \\
\hline \(F\) & - & & 000400 \\
\hline \(E\) & & & 000405 \\
\hline \(F\) & & & 000410 \\
\hline \(F\) & - & & 000420 \\
\hline \(V\) & , 0,0,0.1,384,9,2,2 & & 000430 \\
\hline \(F\) & - & & 000440 \\
\hline \(F\) & 5. & & 000450 \\
\hline \(V\) & 0,0,0,1,384,9,1,1 & & 000460 \\
\hline \(F\) & F & & 000470 \\
\hline \(F\) & F & & 000480 \\
\hline \(E\) & E & & 000485 \\
\hline \(F\) & - & & 000490 \\
\hline \(F\) & - & & 000500 \\
\hline \(E\) & E & & 000505 \\
\hline \(v\) & \[
0,0,0,1,3456,20,2,2
\] & & 000510 \\
\hline V & v0,0.1,1.3456,18,3,1 & & 000520 \\
\hline \(F\) & - & & 000530 \\
\hline \(V\) & \[
v 0,0,2,1,4608,15,2,1
\] & & 000540 \\
\hline \(v\) & V0,0,1,1.4608,20,2,2 & & 000550 \\
\hline \(v\) & V0,0,0,1,4608,18,3,1 & & 000560 \\
\hline \(V\) & \(v 0,0,1,1,3456,20,2,2\) & & 000570 \\
\hline \(V\) & \(v 0,0,0,1,3456,18,3,1\) & & 000580 \\
\hline \(R\) & 88 & & 000590 \\
\hline \(F\) & - & & 000600 \\
\hline \(F\) & \(F\) & & 000510 \\
\hline \(F\) & F & & 000620 \\
\hline \(F\) & F & & 000530 \\
\hline \(F\) & \(F\) & & -000640 \\
\hline \(F\) & - & & 000650 \\
\hline \(F\) & F & & 000660 \\
\hline \(F\) & F & & 000670 \\
\hline \(F\) & F & & 000880 \\
\hline
\end{tabular}
\begin{tabular}{|c|c|c|}
\hline \(F\) & & 000490 \\
\hline \(F\) & & 000700 \\
\hline V & 0,0,3,1,384,15,1,1 & 000710 \\
\hline \(F\) & & 000720 \\
\hline F & & 000730 \\
\hline \(F\) & & 000740 \\
\hline \(F\) & & 000750 \\
\hline \(F\) & & 000760 \\
\hline \(V\) & 0,0.3.1.384.15.2.1 & 000770 \\
\hline \(F\) & & 000780 \\
\hline \(F\) & & 000790 \\
\hline \(V\) & 0,0.1,1.384,9,1.1 & 000800 \\
\hline \(V\) & 0.0.1.1.384,20.2,2 & 000810 \\
\hline \(V\) & 0,0,0,1,384,18,3,1 & 000820 \\
\hline C & & 000830 \\
\hline \(F\) & & 000840 \\
\hline \(F\) & & 000850 \\
\hline F & & 000860 \\
\hline \(V\) & 0,0:2,1,3072,12,3,1 & 000870 \\
\hline \(F\) & & 000880 \\
\hline \(V\) & 0,0,3,1,3072,15,4,2 & 000890 \\
\hline \(V\) & 0,0,2,1,3072,9,4, & 000900 \\
\hline \(F\) & & 000910 \\
\hline \(F\) & & 000920 \\
\hline \(V\) & 0,0,1,1,3072,20,2,2 & 000930 \\
\hline \(V\) & 0,0,0,1,3072,18,3,1 & 000940 \\
\hline \(F\) & & 000950 \\
\hline \(F\) & & 000960 \\
\hline \(F\) & & 000970 \\
\hline \(F\) & & 000980 \\
\hline \(V\) & 0,0,2,1,384,12,3,1 & 000990 \\
\hline \(F\) & & 001000 \\
\hline \(F\) & & 001010 \\
\hline \(F\) & & 001020 \\
\hline \(V\) & 0,0,3,1,384,12,2,1 & 001030 \\
\hline \(\checkmark\) & 0,0,2,1:384,12,3,1 & 001040 \\
\hline \(F\) & & 001050 \\
\hline F & & 001060 \\
\hline V & 0,0,2,1,384,12,3,1 & 001070 \\
\hline E & - & 001080 \\
\hline
\end{tabular}

\begin{tabular}{|c|c|c|c|}
\hline NUMEER & ........... CONTENTS & NUMBER & - CONTENTS \\
\hline VECFP & 1.13457830912805000 & VECBZ & \(9.7103548979500 E-001\) \\
\hline
\end{tabular}


\section*{DIVISION 2}

THE THREE-DIMENSIONAL AERODYNAMIC IMPLICIT CODE

THE THREE-DIMENSIONAL AERODYNAMIC IMPLICIT CODE

\subsection*{1.0 INTRODUCTION}

A final recoding of the Ames implicit code has been completed with the proposed extensions under consideration for the FMP; these extensions are included in the FMP FORTRAN Manual (Volume III) and are discussed tutorially in Division 1 of this volume. This report presents coding strategy which was applied in recoding the implicit algorithm, to create an awareness of factors which affect performance of codes on the FMP. This is followed by a discussion of the recoding which was done. Performance analysis of this recoded version can be found in Division 1 of this volume.

To achieve the optimum performance of the FMP, the programmer must be aware of the two key characteristics of the machine architecture:
a. Memory hierarchy
b. Functional parallelism

The data content of the larger production runs of the 3-D model make it necessary to allocate portions of the data base and working storage to each level of the memory hierarchy in the FMP. For example, a \(100 \times 100 \times 100\) mesh would entail the storage of 6 million flow variables ( \(Q\) matrix), 3 million coordinate variables ( \(X, Y\), and \(Z\) ), and 5 million intermediate results (S matrix). The expansion of this data into working storage areas for the block tridiagonal solution requires \(25 \times 3\) \(x\) VL elements for the \(A, B\), and \(C\) arrays plus 30 x VL elements for the \(L, U\), and \(F\) arrays in the BTRI computation (VL=vector length). The engineering tradeoffs that have led to the current design of the FMP have dictated a maximum main memory configuration (using existing memory technologies) of 8 million 64-bit words. Decisions must be made as to where each of the major portions of the data in the implicit solution are to be retained. It does not seem feasible at this time for a compiler to be able to make the allocation determinations automatically, thus the programmer must use the LEVEL statements to advise the compiler (and the system) as to the desired storage assignments for each block of data.

This becomes even more necessary as one contemplates the creation of even larger "research" codes that have meshes of the order of tens of millions of points, since the third level of memory (LEVEL 3) which is made up of block-transfer-only Backing Storage is brought into play. The programmer must not only be aware of the relative storage capacities of each memory level, but also the implications in using that level of memory during processing. The basic groundrules are:

\subsection*{2.1 MEMORY HIERARCHY}

\subsection*{2.1.1 MAIN MEMORY}
a. Arithmetic memory-to-memory operations can only be performed from Main Memory.
b. Effective rates for arithmetic access in 64-bit mode of Main Memory are four sets of eight operands transferred to the Vector Units every machine clock cycle, and one set of 8 results stored back to memory in the same clock cycle.
c. In 32-bit mode these rates are doubled (four sets of 16 operands input, one set of 16 results).
d. Depending on the operation being performed, one, two, or three arithmetic processes (ADD, SUBTRACT, MULTIPLY) can be accomplished per set of operands delivered to a Vector Unit, per clock cycle.
e. Concurrent with vector arithmetic the Main Map Unit can achieve a simultaneous processing rate of 8 64-bit input operands per clock cycle, while storing 8 operands in the same clock cycle.
f. Single-element access rates to Main Memory for scalar load/store or vector scatter/gather operations are one per clock cycle.
2.1.2 INTERMEDIATE MEMORY
a. Data can be mapped in blocks to and from Intermediate Memory at a maximum rate of 8 elements every 3 clock cycles.
b. Single-word access rates are one element every 6 clock cycles.
c. No memory-to-memory arithmetic can be performed involving the Intermediate Memory.
d. Data transfers to and from Main Memory can proceed at the Intermediate Memory rates, and are fully concurrent with all Main Memory activities listed previously except other Main Map Unit operations.
e. The maximum configuration of Intermediate Memory 'is 32 million 64-bit words.
2.1.3 BACKING STORAGE.
a. Data can only be transmitted between Backing Storage and Intermediate Memory.
b. Data can only be accessed and transferred in integral 32,768-word blocks.
c. Access time per block is negligible since a data transfer begins as soon as a starting address (current address within the block) is transmitted to the backing storage controller (Swap Unit).
d. Transfer rates for Backing Storage can attain a maximum of 864 -bit words every 16 clock cycles.

\subsection*{2.2 FUNCTIONAL PARALLELISM}

The memory system description above indicates the degree of concurrency available in the FMP. Maximum performance of the FMP is achieved by maximizing the overlap (or concurrency) of mapping operations (needed to organize data into efficient vectors) and the arithmetic processing. If one examines modern day scalar machines and FORTRAN object code, a "dual" can be found to this "scheduling" situation. Most high speed processors today possess the ability to engage in several simultaneous activities in order to attain high performance. In particular, the time.required to access a data element in memory via a scalar load operation can be overlapped with the processing of other data that had been loaded previously. A great deal of work has been expended in compiler development to maximize the automatic scheduling of load, store, and arithmetic operations so that the arithmetic units are not left idle, while unnecessarily waiting on the results of a load operation to be returned from memory.

This same approach is used by the programmer and compiler for the FMP. Data for one set of arithmetic operations can be "prepared" by the Map Units while the Vector Units are operating on a previously aggregated set of data. The degree to which these processes can be overlapped determines the extent to which the Vector Units can be kept busy. The objective for maximum FMP performance would be to keep the Vector Units \(100 \%\) active, performing triadic operations at every turn. This level of activity would yield an operation rate of 1.5 billion 64 -bit floating-point operations per second, or 3 billion 32-bit floating-point operations per second. It is obvious that to attain a sustained rate of 1 billion floating-point operations per second in 64-bit mode, the combination of hardware, programmer, and compiler have to maintain a \(67 \%\) efficiency in the use of the Vector Units. Given the current state of the art of compilers, it can be stated firmly that the programmer must provide some assistance in the statement of codes in order to achieve the requisite efficiency.

The coding strategy for the 3-D implicit code consists of the following general principles:
a. Allocation of all flow variables, coordinate arrays, and the intermediate (S) array to Intermediate Memory.
b. Reserving Main Memory for working storage and temporary holding areas for data being mapped to and from the Intermediate Memory.
c. Backing Storage usage to be invisible for this set of metrics, that is, relying on the Operating System to roll the entire job in and out of Backing Storage but no explicit data transfers during program execution.
d. Processing of "slabs", "chunks", or "pencils" of the data base at each step of the algorithm to maximize the vector lengths seen by the vector arithmetic operations.
e. Slab sizes to be limited by the available workspace in Main Memory.
f. Smaller problems can be run entirely in Main Memory as a single slab but the main program remains unchanged, provided declarations such as LEVEL are done dynamically.
g. When slabs or subsets of the major arrays are to be processed, they are explicitly described with subarray notation so that this usage is obvious to the reader and the compiler. This means that although the compiler may be able to discern a slab process from the construct
\[
\begin{aligned}
& \text { DO } 10 \mathrm{~J}=1, \mathrm{JMAX} \\
& \text { DO } 10 \mathrm{~L}=\mathrm{L} 1, \mathrm{~L} 2 \\
& \text { DO } 10 \mathrm{~K}=1, \mathrm{KMAX} 10 \\
& \text { RJ=Q }(\mathrm{K}, \mathrm{~L}, 6, \mathrm{~J})
\end{aligned}
\]
the programmer should use the explicit notation
\(R J=Q(*, L: L S L, 6, *)\)
which highlights the fact that a slab of \(Q\) is being used.

This notation not only makes it clearer to the reader (particularly in a complex DO loop of several hundred lines) of the code but the compiler can deal with this single statement as a single map function. Note the duality of this concept; in scalar mode the statement at 10 would result in a scalar load, while the subarray statement generates a map operation. Both the load and the map may be handled by the compiler in similar ways, in terms of scheduling the resulting object code for efficient execution.
h. Special-casing of subroutines is used rather than single, general-purpose routines. For example, the XXM, YYM, and ZZM subroutines contain data dependent branch operations whose purpose is discernable at the time the program is being created. An in-line expansion of these routines in the STEP subroutine, eliminates the need for these branches, since during STEP the execution of XXM, YYX, and ZZM is not data dependent.
i. Solution algorithms and methodology (other than the use of slabs) remains unchanged.
j. As much as possible a line-by-line congruency is maintained between the original scalar coding and the FMP vectorized version. The major exceptions to this are the in-line incorporation of XXM, YYM, and ZZM, and the use of explicit notation for data structuring, restructuring, and transformation. The heavy use of the DEFINE and DYNAMIC FORTRAN extensions makes it possible to deal with the familiar scalar temporaries such as L11, L12,..., and RJ,RR,U,.... as temporary vectors (or arrays).
3.0 THE STEP SUBROUTINE

The most computationally intensive portion of the implicit code is found in STEP and the called subroutine BTRI. This set of programs has therefore received the most attention in developing language extensions and compiling strategies. The problem restatement will be examined as it impacts data movement and arithmetic in each of the three sweep directions.

\subsection*{3.1 SLABS}

The maximum vector processing is achieved by ensuring vector lengths of more than 1000 elements so that the effect of vector startup time is negligible. It is desirable to reiterate here that since there are effectively 8 Vector Units (four units, half-clocked), a vector of length 8 would be processed in one clock cycle, but with a startup time of six to nine clock cycles. The non-arithmetic overhead of such an operation would be \(600 \%-900 \%\) of the arithmetic processing time. At vector lengths of 1000 or greater this overhead amounts to \(.6 \%-.9 \%\), or less, of the arithmetic processing time.

The first problem then, is to systematically divide the mesh to be processed into chunks that can be fed to the arithmetic unit as long vectors. Obviously, if all flow variable could be held in Main Memory, the entire mesh could be processed as a single vector. For a cubic mesh of \(N \times N \times N\) dimensions, the data storage required would be
\(14 * N * * 3+105 * N * 2\) elements
where:
\(N\) cubed is the number of mesh points., 14 the number of variables to be stored per mesh point, 105 the number of temporary variables per vector element, and \(N\) squared is the vector length (one plane of the cube).

Assuming an 8 million-word limit on contiguous high speed storage, then
\(14 * N * * 3+105 * N * * 2 \leq 8000000\)
and thus N can be approximately 80 and there will still be space left for incidental working storage for the solution process.

In such a case all memory map oprations would proceed at a rate no slower than one element per clock cycle, and with a peak rate of 8 elements per clock cycle.

For meshes larger then \(80 \times 80 \times 80\) however, the computational variables must be held in Intermediate Memory, rather than Main Memory because of the storage requirements for data and temporaries. In this case. any map operations involving Intermediate Memory would run slower than their counterparts in Main Memory. For example:
- Main Memory to Main Memory map operation, 8 64-bit words per clock
- Intermediate Memory to Main Memory map operation, 8 64-bit words per 3 clocks
- Intermediate Memory to Intermediate Memory operation, 4 64-bit words per clock.

In estimating performance, the problem arises as to what mix of data and memory should be used. That is, how much data should be stored in Main Memory and how much in Intermediate Memory? If for example, a mesh size of \(85 \times 85 \times 85\) were to be processed, all the flow variables could be retained in Main Memory and the \(X\), \(Y\), and \(Z\) matrices placed in Intermediate Memory. Then a process would be coded for 'slabbing' of the \(X, Y\), and \(Z\) from Intermediate to Main Memory as the computations proceeded. As the mesh dimensions change the amount of inter-memory mapping also changes, and thus the performance rates change.

To simplify the estimation process, it was decided to assume that all meshes are held in Intermediate Memory regardless of size. If the resulting simulations show that the performance of the FMP on these meshes is at least one gigaflop in 64-bit mode, it is obvious that the same problem, if held in Main Memory only, would run at least as fast.

To provide maximum overlap, space must be allocated in Main Memory not only for the slab being processed, but also the next slab of data being mapped in from Intermediate Memory. Thus the execution of line 930 (from the FORTRAN listing found in appendix B, Division 1)
\[
R J=Q(2: K M A X-1, L: L+L S L-1,6, *)
\]
-would initite a map operation into a data area called RJ during its first pass. As soon as all flow variables have been mapped into Main Memory for the current value of \(L\), code generated by the complier would perform the operation
\[
\mathrm{RJ}^{\prime}=\mathrm{Q}\left(2: \text { KMAX }-1, \mathrm{~L}+\mathrm{LSL}: \mathrm{L}+\left(2^{*} \mathrm{LSL}\right)-1,6, *\right)
\]
where RJ' is a data buffer area created by the compiler and invisible to the programmer. This map operation would be carried out during the arithmetic processing of the first batch of mapped data. At the next pass through the loop, the pointers to data area RJ would be modified (without programmer intervention) by the object code to point to the new data area RJ'. Likewise, the pointers to RJ' would be modified to point to the original data area RJ.

This activity is similar to the technique used by multiregister machines and compilers that "prefetch" data into working registers during one trip through a DO loop, and move the data to a new register for processing during the next trip through the loop. This "invisible" allocation does impact the total main memory storage requirement however, and is a function of the slab size. Inversely, the slab size is a function of the available memory, the three mesh dimensions, and the sweep direction. The allocation of slab space is now examined for the J-sweep direction.

First a slab is made consisting of an integral number of planes (each consisting of JMAX*KMAX elements) as seen in figure 1. Vector lengths then are JMAX*KMAX*LSL elements where LSL is the number of planes in the \(L\) direction. Since working storage for the block tridiagonal is 105* (vector length), 105*JMAX*KMAX*LSL words of storage is required for temporary data.

Also, 14*KMAX*JMAX*LSL words of storage must be provided for the flów variables mapped to and from Intermediate Memory, plus an equal amount of storage for buffering the data for the next trip through the loop. The gross requirements for Main Memory are then

105*JMAX*KMAX*LSL+28*JMAX*KMAX*LSL=133*JMAX*KMAX*LSL
If for example, JMAX=KMAX=100, then \(1,330,000\) words of data would be required for each plane to be processed. Given a somewhat unintelligent and brute force data allocation to Main Memory, LSL would have to be about 6 in order to fit the problem within the 8 million words available in Main Memory. This would give vector lengths on the order of LSL*JMAX*KMAX \(=6 * 100 * 100=60,000\) elements, which is remarkably close to the maximum size of 65,536 elements for the FMP. This optimizes vector performance by minimizing startup time per element in the vector.

The corresponding slab sizes for the other sweep directions, JSL and KSL are computed in the same manner. Note then that data allocation of arrays such as RJ is different in total size and. dimensionality for each sweep direction (assuming JMAX, KMAX, and LMAX not identical). Thus such array variables necessarily become DYNAMIC elements in this code.


Figure 1. Storage Allocation for the \(X\) Array in Intermediate Memory

The first discussion will be of the processing for the left-hand-side solution, followed by the subroutines RHS, VISRHS, and MUTUR.

The computation of the residue (lines 500 through 680) has been left intact, and in place. The compiler is capable of automatically vectorizing this segment of code. However, the processing rate is limited by the rate at which the \(S\) matrix elements can be transmitted from Intermediate Memory (hereafter called LEVEL 2). A better approach for this computation would be to embed the RESIDUE calculation within SMOOTH, where \(S\) array elements are mapped into Main Memory.

Processing of the flow variables in the J-sweep direction begins at line 730. Figure 1 gives the storage mapping of a specimen \(X\) array of dimension \(10 \times 10 \times 10\) to demonstrate the method of "slabbing" the data. Elements of the \(X\) array are stored sequentialy in physical memory from 00 to 999. To improve vectorization the metric computation XXM is recoded in-line from lines 930 through 1100. Three intermediate variables not found in the original XXM routine (XKL, YKL, and ZKL) are created to hold the mapped \(X ; Y\), and \(Z\) slabs. This becomes necessary in order to minimize the number of times map operations are performed, as well as to provide the buffer space which holds mesh data for the differencing operations (lines 970 through 1020).

The statements
\[
\begin{aligned}
& \mathrm{XKL}=\mathrm{X}(*, \mathrm{~L}-1: \mathrm{L}+\mathrm{LSL}, 2: \mathrm{JMAX}-1) \\
& \mathrm{Y} K \mathrm{KL}=\mathrm{Y}(*, \mathrm{~L}-1: \mathrm{L}+\mathrm{LSL}, 2: \mathrm{JMAX}-1) \\
& \mathrm{ZKL}=\mathrm{Z}(*, \mathrm{~L}-1: \mathrm{L}+\mathrm{LSL}, 2: \mathrm{JMAX}-1)
\end{aligned}
\]
produce map operations which are of the "prefetch and buffer" type described previously. This means that there are actually two buffer areas for each variable XKL, YKL, and ZKL, one of each "invisible" to the programmer. Note that the slab being moved for these variables is LSL+2 planes in size. This is due to the need for the adjacent differencing used in the metric computation. If LSL is 6, thus requiring 17 trips through the loop (16 trips for LSL planes and 1 trip for 4 planes), then 2 extra planes of data will be moved at each trip through the loop. The effect of this is moving \(2 * 17+\) LMAX planes of data.

This extraneous movement could be reduced by explicitly programming the retention of the LSL+2 plane before commencing the next trip through the loop. This would require additional thought and data movement in the existing program. Instead, the brute force approach of moving the extra data has been taken to keep the program as similar to its scalar counterpart at possible. The overlapping of map operations ensures that this extra data motion is hidden by the computations being done in the loop. The only penalty for this technique then becomes the initial overhead in getting the first slabs of data ready for
the first trip through the loop. In the worst case this additional burden becomes 2*KMAX*JMAX 2000 elements which are moved at a rate of 8 per 48 -nanosecond period, or 12 microseconds per sweep per trip (time step) through the entire STEP subroutine.

Although. a \(10 \times 10 \times 10\) mesh could fit in Main Memory, those dimensions will be used for this part of the discussion while referring to the FORTRAN listing (appendix B, Division 1) for the metric. Thus some of the figures which follow can be used to aid the illustration of the coding of the J-sweep.
Therefore:
JMAX, KMAX, LMAX=10 LSL=2

Figure 2 shows the storage of the XKL array after executing the statement at line 940:
\[
X K L=X(*, L-1: L+L S L, 2: J M A X-1)
\]

This statement becomes a map operation which can be called Gather Record wherein LSL+2 columns of KMAX operands are gathered JMAX times. The resulting data is moved to Main Memory and appears as the slab in figure 2. Note that the dimension of this temporary mesh is KMAX*JMAX* \((2+\mathrm{LSL})\).

In statement 970
\[
\mathrm{XK}=(\mathrm{XKL}(3: \mathrm{KMAX}, 2: \mathrm{LSL}+1, *)-\mathrm{XKL}(1: \mathrm{KMAX}-2,2: \mathrm{LSL}+1, *)) * \mathrm{DY} 2
\]
the entire slab is processed by a diadic arithmetic operation. The vector length for this operation is JMAX* (KMAX-2)*LSL elements. The compiler produces object code which computes the proper starting addresses for the offsets needed to compute the adjacent differences. A subsequent map operation is also generated which compresses the \(K=1\) and \(K=K M A X\) elements from the array. This operation proceeds in parallel with the arithmetic statement 980. The result is shown in figure 3 where the final slab of dimension (KMAX-2)*LSL*JMAX is stored. From this point up to statement 1520 all slabs processed are conformal and have the same dimensions.


Figure 2. The J-sweep XKL, YKL, and ZKL Arrays in Main Memory


Figure 3. The J-sweep \(X K, Y K\), and \(2 K\) Arrays in Main Memory

Statements 930,1140 through 1170, and 1210 perform gather operations on operands in the \(Q\) matrix. Figure 4 shows the storage layout of the \(Q\) matrix in physical memory. Note that element 600 of the array is the first element in the \(\mathrm{J}=2\) segment of the \(Q\) array. This is due to the storage allocation that makes \(Q(K, L, 1, J)\) the density component of the flow variables at the point (K,L,J) in the mesh. Thus the statement (line 930)
\[
R J=Q(2: K M A X-1, L: L+L S M, 6, *)
\]
results in a map operation that gathers KMAX-2 elements from each column for LSL columns, JMAX times. Figure 5 shows the resulting slab as it would appear in Main Memory, with each element labled with its original sequential storage location within the \(Q\) array in LEVEL 2 memory. Thus the first element moved from the \(Q\) matrix would be
\[
R J(1)=Q(2,2,6,1)
\]
or element 511 in the linear storage of the array. Note that the entire slab can be processed as a single vector by all statements down to statement 1520.

The effect of some of the FORTRAN extensions on this sequence of code is now examined briefly. Statements 930 through 1020 make use of the explicit subarray notation described in the extension specification to define not only the slab to be processed but also the offsets
\[
\begin{aligned}
& \text { (3:KMAX... } \\
& \text { (1:KMAX-2... } \\
& \text {.... } 3: L S L . . \\
& \text {....1:LSL-2.... }
\end{aligned}
\]
necessary for the adjacent differencing operation. The variables RJ, XKL, YKL, ZKL, XK, YK, ZK, XL, YL, and ZL are defined as DYNAMIC. Thus they take on the dimensionality of the right-hand expression, and their storage is allocated dynamically at object time.

Statements 1030 through 1060 then compute with these DYNAMIC variables producing arrays for each of the DYNAMIC ARRAY variables \(X X(1), X X(2), X X(3)\), and \(X X(4)\). In this case the DYNAMIC ARRAY XX will contain 4 sets of pointer information, with dimensionality established at object time and storage allocation also defined during execution of the statements. The DYNAMIC ARRAY variable \(D\left(i, j^{\prime}\right)\) is handled in much the same way.


Figure 4. Q Matrix


Figure 5. The J-sweep RJ Matrix in Main Memory

At statement 1270 it is necessary to insert an explicit DEFINE statement
```

DEFINE (D(1,5),(1:KMAX-2,1:LSL-2,1:JMAX-2))

```
for although all other pointer data in the DYNAMIC ARRAY D is implicitly defined by the associated arithmetic statements such as
\[
D(1,2)=X X(1) * H D X
\]
where \(X X(1)\) is a slab (from figure 5) of dimensions (KMAX-2), LSL, and (JMAX-2), then \(D(1,2)\) takes on these identical characteristics. \(D(1,5)\) however, has not been given any such characteristics, and the desire is to fill this variable with scalar zeroes. The DEFINE statement explicitly establishes the necessary relationships and then statement 1280
\[
D(1,5)=0
\]
performs the filling of a created slab with the needed zeroes.
Statements 1520,1530 , and 1540 perform arithmetic operations on the slab called RJ. By simply adjusting the starting address and field lengths at object time, these operations yield new slabs with dimensions (KMAX-2)*LSL* (JMAX-2) without the need for any intervening map operations. The DO loops commencing at 1550 and continuing through 1650 are a slight restatement of the original scalar loops. This restatement is partly for convenience and expedience, and partly to make visible the power of the DYNAMIC ARRAY and DEFINE constructs in the FORTRAN extensions.

The purpose of statement 1570 is the same as 1270 , to assign a set of attributes to all DYNAMIC ARRAY elements \(B(n, m)\), since the dimensionality is not established implicitly.

The purpose of statement 1580
DEFINE ( \(D 1, D(N, M)\) )
is to establish the characteristics of a single DYNAMIC variable D1 so that they can be modified with offsets in statement 1590:
\[
A(N, M)=-D 1\left(^{*}, *, 1: J M A X-2\right)
\]

This is necessary since DYNAMIC POINTERS appearing in assignment statements such as 1590 cannot have any modifiers included, but DYNAMIC VARIABLES such as D1 can possess a modified set. of subscript notation such as shown here.

The result of these loops is to establish 25 pointers in the DYNAMIC ARRAYS \(A, B\), and \(C\) which point to vectors of data (KMAX-2)*LSL elements in length. The BTRI subroutine then
operates on these vectors as 11 eacn were in ract a scalar quantity in the original implicit code. In this manner the BTRI algorithm can be kept intact as provided by Ames.

Skipping to BTRI briefly (line 4120) this setting up of the input arrays is discussed.

\subsection*{3.3 BTRI}
'The DYNAMIC' ARRAYS (pointer data) A, B, and C are passed to BTRI through COMMON. Notice that for every pointer in \(A\) a set of attributes exists which describe the vectors created in STEP. Each vector possesses three dimensions (K, L, J). At the beginning of the \(L-U\) decomposition the \(J=1\) element should be extracted from each vector. This is done as in statement 4300
\(\operatorname{DEFINE}(B 1(N, M), B(N, M)(*, *, 1))\)
where a dummy DYNAMIC ARRAY B. 1 is created whose pointers each point to the \(J=1\) element of the corresponding \(B(N, M)\). Note that this is the only construct wherein DYNAMIC ARRAY pointers can be redefined. The vector length of operations upon \(B(N, M)\) would be (KMAX-2)*LSL* (JMAX-2) elements whereas vector operations on B1(N,M) would involve lengths of (KMAX-2)*LSL elements! This is the effective length of most vector operations in the balance of the BTRI routine.

This technique is found again at statement 4870 and beyond....
\(\operatorname{DEFINE}(A 1(N, M), A(N, M)(*, *, I))\)
In all of the computations in BTRI no map operations are required, just the recomputation of pointers and lengths at object time. .

Other than the operation on vectors rather than scalars, BTRI appears almost identical to its scalar version in the original code.

On return from BTRI the elements of the \(S\) matrix are updated with the tridiagonal solution (see statement 1730)
\[
\text { DO } 24 \mathrm{~N}=1,5
\]
\[
S 1(N)=F 1(N)
\]

24 CONTINUE
where the pointer data in the DYNAMIC ARRAYS S1 and F1 were established in statements 870 and 880 :

DO \(5 \mathrm{~N}=1,5\)
5 DEFINE (S1(N),S(2:KMAX-1,L:L+LSL-1,N,2: JMAX-1))
3.4 K AND L SWEEPS

The memory mapping of arrays during the J-sweep have been diagramed in what may seem excessive detail. This was done to show the basic organization of data that is efficient for processing by the heavily compute-bound BTRI subroutine. Since vectors are linearly stored, the statements

DIMENSION \(A(100,100,100) A(*, *, 1)=2 * A(*, *, 1)\)
would result in the generation of a vector multiply operation of length \(100 \times 100\), beginning at \(J=1\). The statement
\[
A(*, 1, *)=2 * A(*, 1, *)
\]
would result in a series of \(J=100\) vector operations, each of length 100 and
\[
A(1, *, *,)=2^{*} A(1, *, *)
\]
would result in a series of \(100 \times 100\) scalar operations for all K1.

It can thus be seen for subscript notations of
\[
A(K, L, J)
\]
that processes applied one at a time to each J index would yield vectors of length KMAX*LMAX. The J-sweep direction can therefore be vectorized with ease since the "natural" vectors are already stored in physical memory in the most efficient way.

The K-sweep is not an efficient method, since vectors are stored sequentially beginning with \(K=1, K=2 \ldots, K=K M A X\), then for \(L=2\) they continue in storage for \(K=1, \ldots\), etc. In the \(K-\) sweep direction however, the interest is not in vectors in the \(K\) direction, but instead for each \(K\), a vector of length at least LMAX*JMAX is desired. To achieve this a method must be contrived to "transpose" the data from the flow and coordinate meshes so that the LMAX*JMAX vectors result. Using figure 1 to represent the original LEVEL 2 storage of the \(X\) matrix, figure 6 then shows the desired storage of the XJL array (slab) in Main Memory.


Figure 6. The K-sweep Matrix XJL in Main Memory

Following the previous representation of physically sequential storage of data by vertical column, with each vertical column contiguous by plane, figure 6 shows the new positions of \(X\) array elements as they would appear in Main Memory. Each block in figure 6 contains the original sequential number of the data as it appeared in LEVEL 2 memory. Note that subscript axis (upper. left-hand corner of the figure) shows the new directions implied by the subscripts \(K, L\), and \(J\).

To achieve this transposition, the map operations of data from LEVEL 2 memory must be described differently than were coded in the first ( \(J\)-sweep) phase of this code. The transposition is explicitly described by statements 1900 through 2020, wherein RJ, XJL, YJL, ZJL, and the transposed arrays Q1 through Q5 are created. Taking the XJL map operation from statement 1920
\[
\text { XJL }(1: \operatorname{LMAX}, 1: J S L+1, K)=X(K, 1: \text { LMAX }, J-1: J+J S L)
\]
it can be seen that for every trip through the loop, K will be advanced by one element. The statement shown becomes then:
"For each K, transfer a vector of length LMAX*(JSL+2) to the array XJL".

The map operation called for can be accomplished with a single map function which will not only move the requisite vector but will perform the move for KMAX times. Note that this map operation must make single element references to LEVEL 2 storage for each element of the vector in the L and J directions. This requires 6 clock cycles per reference instead of the \(3 / 8\) of a cycle used for the corresponding map operations in the J-sweep direction. The key to the FMP achieving its performance goals is tied directly to the ability of the compiler to overlap this "slow transpose" with computations. A rough estimate suffices to show that for STEP this overlap is achievable in principle:

Elements to be moved from LEVEL 2 per mesh point:
```

a. Q1 through Q5 = 5 elements
b. X, Y, Z = 3 elements
c. Additional data for adjacent differencing = 2/LSL=
2/6=1/3 element
d. S1 through S5 = 5 elements
e. Update of S1 through S5 = 5 elements

```


Six clock cycles are required per element moved \(=6 * 18.3=110\) clock cycles. The peak \(64-\) bit arithmetic rate \(=24\) results per clock cycle; thus in. 110 clock cycles the Vector Unit could produce \(24 * 110=2640\) results maximum, or 880 results minimum. To achieve complete overlap of the mapping functions necessary in the K-sweep direction, then at least 880 , or at most 2640 , different arithmetic operations would have to be performed on each grid point of the whole mesh. The heart of the BTRI subroutine itself yields many more arithmetic operations than

2640, so that if the compiler can properly schedule the respective map operations, almost all transpose operations can be overlapped.

Once the transpose operations are completed (by the time statement 2090 is reached) the resulting main memory slabs can be proccessed identically by.AMATRX, FILTRY, FILTRZ, and BTRI functions as were used in the J-sweep direction. These functions are retained as in-line code to maintain congruency with the original scalar code, as well as to permit the compiler to perform more intelligent scheduling of the map operations.

The L-sweep direction is handled in exactly the same manner as the K-sweep with a transpose operation called out at the beginning of the loop and a transpose operation required for the updating of the \(Q\) matrix in LEVEL 2 storage after computations are complete. Figure 7 shows the corresponding storage of the XLJ array in Main Memory for this sweep direction. Note again the revised subscript directions in the upper left corner of the figure. As in the \(\mathrm{K}-\mathrm{sweep}\) direction, once the transpose has been completed the computations on the main memory slabs are identical to those in the \(J\) - and \(K\)-sweep portions of the STEP code.

Two major differences should be noted however. One is the retention of the originally mapped \(Q\) array data in a main memory buffer until it is updated in statements 3980 through 4080. The other is the retention of the Q1 through Q5, \(D(1,1)\) through \(D(1,4)\), and \(Z Z(1)\) through \(Z Z(4)\) temporary arrays for use in VISMAT so that no additional map operations are required in that routine.

The transpose operation in the L-sweep direction proceeds at the same rate as the simple map operations in the \(J\)-sweep portion of the program. At a maximum, 8 elements are moved every 3 clock cycles (see description of gather in section 2.1.3, Division 1, this volume). This is 16 times faster than the single-element transfers required for the K-sweep direction. Thus only 165 arithmetic operations are required to ensure overlap of processing in the Vector Unit and Map Unit for those sweeps. This number of vector operations is much less than the nearly 4000 operations that appear in each sweep direction in STEP.


Figure 7. The L-sweep Matrix XLJ in Main Memory

\subsection*{4.0 THE RIGHT-HAND SIDE COMPUTATION}

The "slab" technique used in the left-hand side computation is not as efficient in machine usage when employed for the right-hand side (RHS)! The major reason for this is that the amount of arithmetic computation required for the right-hand side is substantially less than that needed in the left-hand solution, and data mapping for the right-hand side becomes the governing commodity in the calculation rate.

As an example, assume that the throughput rate for the right-hand side is limited to the map rate for the data to be used for computations. The data to be moved would be
```

a. 3X, Y, Z quantities
b. }5\textrm{Q}\mathrm{ mesh quantities
c. 5 S matrix quantities (moved back to Intermediate
Memory)

```
for the first (L direction) pass. In order to provide the data for the adjacent differencing needed in metric calculations, 2 additional planes of data must be moved for each slab. Thus if a slab consists of 6 planes, then \(2 / 6\) more data is moved than is required for the final computation. This extra data is moved from the \(X, Y\), and \(Z\) meshes, so that \(3 * 2 / 6=1\) more element of data on the average needs to be moved per mesh point. In the first pass then, \(3+6+5+1=15\) elements are moved per mesh point calculated on. The time needed to move this data from Intermediate Memory is
\(15 * 3 / 8\) clock cycles \(=6\) clock cycles per mesh point.
For the second pass (K direction) the 15 data elements already described must be moved plus retrieving the intermediate \(S\) mesh elements for a total of \(15+5=20\) elements. The rate for this move is the single-element map rate of one element every 6 cycles. Thus the per-point cost of mapping is
\[
20 * 6=120 \text { clock cycles. }
\]

The third pass (J direction) requires the transfer of 20 elements at the gather-record rate of \(3 / 8\) elements per clock cycle. The cost of this map per mesh point then is
\[
20 * 3 / 8=8 \text { clock cycles. }
\]

The total number of cycles committed to data mapping for the right-hand side (excluding viscosity calculations) is
\(6+120+8=134\) clock cycles per mesh point.
A quick count of the floating-point operations for the right-hand side shows, that a total of 221 operations are performed per mesh point. If \(64-b i t\) mode only is assumed, the rates for the Vector Units are
1. arithmetic operation \(=8\) ops per clock cycle \(=.5\) gigaflop,
```

2 arithmetic operations = 16 ops per clock cycle = 1

```
gigaflop,
3 arithmetic operations \(=24\) ops per clock cycle \(=1.5\)
gigaflops.

The number of clock cycles needed to process the 221 operations required per mesh point for the various gigaflop rates are
\[
\begin{aligned}
& 221 / 8=28 \text { cycles for a } .5 \text { gigaflop rate, } \\
& 221 / 16=14 \text { cycles for a } 1 \text { gigaflop rate, } \\
& 221 / 24=10 \text { cycles for a } 1.5 \text { gigaflop rate. }
\end{aligned}
\]

Note that it takes 134 cycles just to move the data, but only 28 cycles (at worst) to compute with it. The right-hand side is thus Map Unit bound. Thus the gigaflop rate for this processing method is
\[
\text { GIGAFLOP RATE }=\text { NUMBER OF OPS/(NUMBER OF CYCLES*16.) }
\]
for a 16-nanosecond clock cycle.
\[
\text { RATE }=221 /(134 * 16)=.103 \text { gigaflops. }
\]

This is clearly much less than the 1 gigaflop rate objective for the FMP, and even when combined with the left-hand side computation, the RHS acts as a major constraint on the implicit code performance.

The best solution to this dilemma is to recode the right-hand side to reduce the amount of data mapping necessary. The key to this approach is that there is no recursive relationship between the data used at each pass. Therefore it is possible to map a complete slab from Intermediate Memory (see figure 8) and process all three sweeps for that slab. Not only does this reduce the number of data transfers, but the single-element gather operation is eliminated.


Figure 8. Right-hand Side \(X\) Array in Intermediate Memory Matrix XJL Shown Before Move to Main Memory

Lines 260 through 1430 of the RHS listing (found in appendix \(C\), Division 1) give the major DO loop wherein the slabs are moved along the J direction. Note that 9 elements are moved to Main Memory and the slab of \(S\) array is retained there until all updates are complete, then mapped back to LEVEL 2 memory at the completion of each pass through the RH loop. To reduce map operations the choice was made to process all data for points at \(K=1, K=K M A X\) and \(L=1, L=L M A X\) even though the operations are meaningless. Thus in a \(100 \times 100 \times 100\) mesh this method will have passed over 40,000 extra data elements that need not be processed. This is an added burden of \(4 \%\) on the number of computations against a benefit of more straightforward programming which results in fewer (slower) map operations. Note that when the \(S\) matrix is mapped back to LEVEL 2 memory, the unneeded data points at \(K=1\), \(K M A X\) and \(L=1\), LMAX are discarded.

The computation rate for this approach is:
9 elements mapped into Main Memory
5 S elements returned to LEVEL 2 memory
1 element per mesh point moved as an extra for adjacent differencing
15 elements at \(8 / 3\) element moved per cycle \(=6\) cycles per mesh point

A glance at the arithmetic rates given previously for the FMP Vector Units shows that if a way can be found to achieve the maximum of 1.5 gigaflops, 10 clock cycles per mesh point will be required.for computation. Obviously the 6-cycle map time can be completely overlapped by the computation (assuming a smart compiler can schedule the object code).

\subsection*{5.0 RIGHT-HAND SIDE--VISCOSITY AND TURBULENCE COMPUTATIONS}

The right-hand side calculations are complicated somewhat by the inclusion of two subroutine calls which have been retained in the recoding for the FMP--VISRHS and MUTUR.

The restructuring of this code needed for the FMP consists of bringing the large slabs from Intermediate Memory in the most efficient form possible (minimal memory conflicts during the gather operation).

\subsection*{5.1 VISRHS}

With the storage algorithm chosen for the flow variables and \(X, Y, Z\) data (the \(L\) sweep direction proceeds along the FORTRAN-stored "rows" of data), this means that the maximum length of vectors for many calculations in VISRHS are no more than column length of the flow mesh -- KMAX elements.

To improve the processing rate of the FMP for this computation, VISRHS (viscosity calculations) contains a transpose operation on the \(U, V\), and \(W\) arrays that were generated in the RHS subroutine. The time required for the three key transpose operations is offset by the ability to process vectors of KMAX*JSL*LMAX elements. With no recursion in VISRHS, this technique permits maximum vector performance for the FMP.

If the parameter LAMIN is not equal to zero, the turbulence model MUTUR is called. This routine offers some programming of interest since there are both explicit recursion and data dependent IF branches involved in that routine.

\section*{5.? MUTUR}

The computations found in MUTUR are a "mixed bag" for the effective use of the FMP. It was decided that transposition operations on all the necessary meshes in MUTUR--ZZ, XX,YY,RR,Q1...Q6--was too expensive to undertake for the performance returned in the vector arithmetic operations. Thus the meshes were left in their original orientation. This means that vector lengths will be as small as KMAX elements. The first approach to conversion consisted of changing the program into a rational form for the STAR compiler. This involved the replacement of the three-dimensional array references with four-dimensional array references. The use of the compound variable KL is thus eliminated. Appendix \(A\) is a listing of the original MUTUR code but with KL eliminated.

Once this variable was removed the intent of the code is more obvious to the reader and to the compiler. Evaluation of the resulting program by an FMP FORTRAN compiler could possibly yield optimal results. However, to make clear the probable restructuring done by either programmer or the compiler, the choice was made to recode the problem using extensions provided by the proposed FMP FORTRAN.


The next step was to recode as many statements as possible in the subarray form. It became obvious that the subscripts and loops at the beginning of MUTUR could be replaced by an "unrolled" form found in lines 430 through 990 of appendix \(B\), the recoded MUTUR. These statements yield vectors of length KMAX*LMAX*JSL, despite the fact that there are offsets in the subscripts of +1 and -1 for all three dimensions. A feature of the FMP is its ability to discard results under the guidance of a bit-by-bit control vector. In this case the compiler is capable of discerning the fact that Q3(3:KMAX+2,*,*) will involve only adjustment of starting addresses and discarding of results later. Thus the length of the operation will be KMAX*LMAX*JSL although the useful results will, in the final output, be (KMAX-2)*(LMAX-2)*JSL elements.

The presence of IF statements in original FORTRAN source code usually causes some difficulty in the vectorization process. In MUTUR there were four different classes of IF statements, each of which has to be resolved by analyzing the intent of the code.
1) The first class of IF statement in MUTUR was apparently used to reduce source and object code computation of arrays TA, \(T B, T C . .\). through manipulation of subscripts and a one-time looping back to recompute a new set of values. This was resolved, as discussed previously, by unrolling the loop into statements 430 through 990 . Although unrolling of the loop requires a greater allocation of object code, the intent of that sequence seems to be clearer in the restructured version (see appendix B).
2) The second class of IF statement is used to identify and locate the maximum and minimum elements of arrays \(U U\) and YDU. The replacement construct shown in lines 1260 and 1270
\[
\text { DO } 21 \mathrm{~L}=2, \mathrm{LM}
\]
\[
\begin{equation*}
\operatorname{IF}\left(\operatorname{UU}\left(*, L,{ }^{*}\right) \cdot \operatorname{LT} \cdot \operatorname{UMIN}(*, 1, *)\right) \operatorname{UMIN}(*, 1, *)=U U(*, L, *) \tag{21}
\end{equation*}
\]
creates object code wherein a control vector is generated for every \(J, K\) element at the position \(L\) in the matrix, with a one representing the fact that the particular K, L, J element of UU is less than the corresponding element in UMIN. The entire statement 21 then causes a replacement (using the Map Unit) of all UMIN elements by new minimum values where indicated by the control vector. A slight modification in the source statement would have instead generated the machine instruction Q8SMIN (minimum of vector X) :

DO \(21 \mathrm{~L}=2\), LM
where the maximum would be evaluated for vectors of length KMAX. If the position of the minimum element is desired, the programmer must invoke the 'Q8 IN-LINE' construct that permits direct access to the hardware from the FORTRAN compiler. Thus the previous example would become

In this case the parameter MINPOS will be returned with the index of the minimum element in array UU.

Examination of the MUTUR code showed that the maximum length for the minimum function under these circumstances would be KMAX elements. By using the control vector and processing planes of data, the maximum and minimum functions can also be accomplished, but at a much higher performance level.

In lines 1280 through 1330 the control vector is generated for a plane of the meshes at each \(L\) and then used to suppress or permit the movement or insertion of data into the arrays YDUM, YM, and KM. This control vector use is performed solely by the Map Unit, while the generation of the control vector requires the use of the Vector Units to accomplish the arithmetic compare operations.

Note that in line 1330 a single scalar is broadcast to all elements of KM2 where permitted by the control vector. The construct

IF (BIT)......
where BIT is defined to be of type BIT, will normally generate control vector operations.

The third class of IF statement appears to have the same function as that of the second, that is, the selective insertion of data into meshes under the control of some conditions of interest. This class becomes converted to control vector operations as shown in statement 1490 through 1560 and also in statements 1570 and 1580 (although less obvious in this latter situation). A bit vector BIT is generated at line 1380. It is then combined in a logical operation with another bit vector generated at line 1490 resulting in a new BIT at line 1500. This new BIT is thenceused to control the modification of corresponding elements of YMM and YDU in statements 1510 and 1520. Note that in this case the arithmetic is performed for all elements of YM, AM, BM, and YDUM but results are stored into the result vector only where permitted by the control vector. Since this is a data dependent operation, estimation of processing rates is difficult. In the analysis of the simulation results (see Division 1) this will be dealt with for MUTUR by providing three possibilities for the contents of the control vectors'-- \(0 \%, 50 \%\), and \(100 \%\) density of permissive bits. The processing rate range ean then be determined for this portion of the code.
4) The final IF statement class is similar also to the last two, but has one additional area for caution and analysis. In this case (Iines 1840 through 1860) elements of TURMU are
to be set depending upon the position of a crossover point in the values of TMO and TMI. Until the crossover point is reached data is to be moved from TMI, and after the crossover, data is to be moved from TMO. In this particular situation, analysis shows that all elements of TMO and TMI have values which are on the proper side of the crossover point. That is, elements 1 through n of TMI will all be less than their counterparts in. TMO. If \(n\) is chosen as the crossover point then it can be said that all elements from \(n\) to LMAX of TMO will be less than those in TMI. It is this fact that allows simply replacing the IF statements with a control vector operation which will place the correct TMO and TMI elements in TURMU.

If, however, the positional relationship of TMO elements and TMI elements did not hold on both sides of the crossover point, a form of recursion exists which requires manipulating the control vectors to produce the proper 'mask' for movement of data.

Because of recursions like that in lines 1920 through 1950 it has been necessary to retain the DO loops in \(L\) and to process planes of data at each point L. The maximum vector length in all of the recursive portions is thus KMAX, although the compiler does generate all of the necessary vector operations in the J direction, automatically. No gather operations are indicated, as the Map Unit expense would overshadow the value that longer vectors in arithmetic might have.

\section*{APPENDIX A}

\section*{ORIGINAL MUTUR ROUTINE}




\section*{APPENDIX B}

\section*{RECODED MUTUR ROUTINE}
\begin{tabular}{|c|c|c|}
\hline & SUEROUTINE Mutur & 000100 \\
\hline & COMMON/GASE/NHAX, JMAX, XMAX, LMAX, JM, Kh, LM, ot, gamma,gami, Smu, FSMACH & 000110 \\
\hline & ,OX1,OY1,OZ1,NO,NO2, FV (5),FD(5), HD, AL, P,GD,OMEGA, HDX, HOY, HOZ & 000120 \\
\hline & 2,RM, CNER,PI,ITR,INVISG,LAMIN,NP, INT1,INT2,INT3 & 000130 \\
\hline & COMMON/GEO/NE1, NBZ, RFRONT, RMAX, XR, XMAX, DRAD, OXC & 000140 \\
\hline & COMMON/REAO/IREAD.IWRIT, NGRI & 000150 \\
\hline & COMMON/VIS/RE, PR,RMUE,RK & 000160 \\
\hline & COMMON/VARS/0 \(1720,6 \cdot 30)\) & 000170 \\
\hline & COMMON/VAROSS '720,5,301 \(^{\text {a }}\) & 000190 \\
\hline & COMmON/VAR1/X (720,30),Y(720,30), \(2(720,30)\) & 000190 \\
\hline & COMMON /VAR3/ \(P\), XX \% Yy, ZZ & 000200 \\
\hline & LEVEL \(2, G, S, X, Y, z\) & 000210 \\
\hline & COMHON/COUNT/NC,NCI & 000220 \\
\hline & COMMON /gTRID/ A, 日, C. o. F & 000230 \\
\hline & COMmOn tURMU & 000240 \\
\hline & DYNAMIC \(P, X X, Y Y, Z Z, A, B, C, D, F, F 1\) & 000244 \\
\hline & OYNAMIC TAS, TMO, TMI, \(55.56,0, V, W, E, R R\) & 000250 \\
\hline &  & 000260 \\
\hline & OYNAMIC BIT, DYZ,SY3, AM, \(\mathrm{BM}, \mathrm{YMM}, \mathrm{BITI}\) & 000262 \\
\hline & OYNAMIC \(\mathrm{zZ1,2Z2,z23,2Z4,2Z5,RL1,RL}\) & 000270 \\
\hline & DYNAMIC T1,T2,T3,51,52,53,TA,T8,TC & 000280 \\
\hline & OYNAMIC TO,TE,TF,TM,TAS,UTOT, UU, TURMU & 000290 \\
\hline & OYNAMIC TAU.WHU,RA,SNOR, SNORA, & 000300 \\
\hline & DATA F27,LEDGE/1.6.25/ & 000310 \\
\hline & DATA FK,FKK,YOUMF/0,4,0.0168,1.01 & 000320 \\
\hline & data fxleg/0.3/ & 000330 \\
\hline c & & 000331 \\
\hline &  & 000332 \\
\hline & OEFINE (TMO,P(1:60,4)),(TME,P(1)60,5) ,(550P(1:60,6)) & 000333 \\
\hline & DEFINE (S6,P(1:60, 71\()\), (U,P\{1:60,81), (V,P(1)60,91) & 000334 \\
\hline & OEFINE (W,P(1)60,10)),(E,P(1)60,11)),(RR,P(1160,12) & 000335 \\
\hline & & 000340 \\
\hline \[
c
\] & Calculate furgulent viscosity & 000350 \\
\hline c & & 000360 \\
\hline &  & 000370 \\
\hline & OEFINE (225,22T(5) ) & 000380 \\
\hline c & zzm computations elfminatedizz retained from rhs calculations & 000390 \\
\hline c & & 000400 \\
\hline & calculate vorticity tas (l) and total velocity uu(l) & 000410 \\
\hline & & 000420 \\
\hline &  & 000430 \\
\hline &  & 000440 \\
\hline &  & 000450 \\
\hline & 2 *8L1-03(*)*, +1) & 000460 \\
\hline &  & 000470 \\
\hline &  & 000480 \\
\hline & 2 *(Q4 (*, 2 LLMAX \(+1, *) * R L 1=Q 4(* * *, *) * R L)\) ) & 000490 \\
\hline &  & 000500 \\
\hline &  & 000510 \\
\hline &  & 000520 \\
\hline & si \(=0.0\) & 000530 \\
\hline & \(52=0.0\) & 000540 \\
\hline & S3 \(=0.0\) & 000550 \\
\hline c & xxh computations elihinatedidue to retention of xx results & 000560 \\
\hline c & yym computations eliminated,yym retained from rhs calculations & 000570 \\
\hline &  & 000580 \\
\hline &  & 000590 \\
\hline &  & 000600 \\
\hline &  & 000610 \\
\hline &  & 000620 \\
\hline &  & 000630 \\
\hline &  & 000640 \\
\hline & DEFINE (RL),RR(3:KMAX,*,*),RL(RRIIIKMAX-2,**) & 000650 \\
\hline
\end{tabular}

``` 1, © 5*YY(3)*(03(JiKMAX*2,*,*) *RL1-03(*)*,*)*RL) \(T E=.5 * Y Y(3) *\{Q 2\{3 \mid K M A X * 2, *, *\}\) RL. \(1=02(*, *, *) * R L\}\)
\(1-5 * Y Y(1) *(Q 4(31 K M A X+2, *, *) * R L I-G 4(*, *+*) * R L)\)
TF \(=5\) FYY( 1 ) \((03(3\) KMAX \(* 2, *+*) * R L I=03(\%, *, *) * R L)\)
\(1-5 * Y Y(2) *(02(3: K M A X+2, * *) * R L 1-Q 2(* * * *) * R L)\) S1 : S1*.50イTA0TD
```



```
S3 = S3**5*(TC*TF)
```



``` KMZ \(=\) KMAX \(\rightarrow\) 2 KMLEKMAX+2
LH2*LMAX*2
```



```
\(10,5 \pi x \times(3) *(Q 3(\%, 31 L M+, 31 J S L+2) \oplus R L\}-03(*, 31 L M 2,6\} * R L)\)
```







``` RL = 1:/O1 (*)3ILM2**)
```








```
S1 = S1*. S*(TA*TO)
\(\mathrm{SZ} \neq \mathrm{S2**5*(TB+TE)}\)
S3 = 53*-5*iTC*TF)
\(T K=(T 1+S 1) * * 2 *(T 2 * S 2) * * 2 *(T 3 * S 3) * * 2\)
TAS \(=\) SQRT(TW)
UTOT = Q2**2*Q3**2*ク4**2
UU \(=\) SGRT (UTOT)/QI
TURMU(*,**JIJSL) \(=0\)
COMPUTE RA
WMU \(=1\).
TAU \(=A B S(T A S(*, 1, *))\)
```


COMPUTE NORMAL OISTANCE SNOR(L) AND YDUM

```


```

SNOR(1) $=0$. KMZ $=1$
YOUH $=1 . E-3$
YM = ,5/SURT(ABS $(Z \angle 1(*, 1, *) * 2 Z 1(*, Z, *)+2 Z 2(*, 1, \theta) * Z 22(0,2, *)$

```

```

YOUS $=0.0$

```

```

$1+223(*, *, *)=223(*+2$ (LMAX $+1+\infty)$
SCAL $=1.0 / 50 R T$ (SCIS)
OO 19 L $=2, \operatorname{LMAX}-1$
SNOR(L) = SNOR(L.-1) - SCAL
SNORA $=0.5^{\circ}\left(\right.$ SNOR $^{(L)}$ • SNOR (L-1))

- 001230

```


```

1 DO 21 Li.EXP
IF (UU(*) Li, *) 1

```

```

001240
001250
001260
001270
001280

```


\section*{DIVISION 3}

THE THREE-DIMENSIONAL AERODYNAMIC EXPLICIT CODE

\title{
DIVISION 3 THE THREE-DIMENSIONAL \\ AERODYNAMIC EXPLICIT CODE
}
1.0 OVERVIEW

The second metric provided to Control Data analysts was the Hung-MacCormack 'explicit' code which used meshes of \(31 \times 31 \times 31\) to model three-dimensional corner flows. This metric exhibited several characteristics of interest which differ from the 'implicit' form of solution previously discussed (Division 2). Instead of restructuring the entire code as was done for the Steger-Pulliam code, those routines of greatest interest were essentially dealt with independently, within the code itself. The routines chosen were LX, LYI, and LYC/CHARAC because they constituted the major part of the computation in the explicit code, and because they demonstrate the essentially different characteristics needed for this study.

The LX and LYI subroutines were each vectorized first for the STAR-100 because the programming techniques used are expected to be common for the STAR-100 and the Control Data FMP. Further, this permitted verifying answers against the scalar code, and permitted analysis of the code using the STAR-100 performance counters.

The major points of interest in this section are the findings regarding o short vector lengths (compared to the counterpart vectors in the implicit code);
- 'local' vectorization instead of 'global' vectorization;
- data dependent branching, fundamental to the method of characteristics.

In a later section the effects of these factors on the performance of the FMP will be discussed. What foilows is the basic programming considerations employed in vectorizing the key parts of the three-dimensional aerodynamic explicit code.

The explicit code is a numerical scheme for solving the three-dimensional Navier-Stokes equations for a supersonic, laminar flow over a compression corner with sidewall effects (ref. 4). There are three directional operators (Lx,Ly,Lz) corresponding to the three coordinate directions. The Ly- and Lz- directional operators are split into suboperators; each of these uses a different method to solve the appropriate equations on the appropriate grid. The three different methods are the explicit method, the Implicit method and the method of characteristics.

The computational procedure for each time step can be outlined in this way:
(1) The Ly- directional operator is executed
(a) LYC - characteristic method on fine mesh
(b) LYI - implicit method on fine mesh
(c) LY - explicit method on coarse mesh
(2) The Lz- directional operator is executed
(a) LZC - characteristic method on fine mesh
(b) LZI - implicit method on fine mesh
(c) LZ - explicit method on coarse mesh
(3) The Lx- directional operator is executed
(a) LX - explicit method on whole grid
(b) TURB- turbulence model
(c) LX - explicit method on whole grid
(4) The Lz- directional operator is again executed
(a) LZI - implicit method on fine mesh
(b) LZC - characteristic method on fine mesh
(c) LZ - explicit method on coarse mesh
(5) The Ly- direction operator is again executed
(a) LYI - implicit method on fine mesh
(b) LYC - characteristic method on fine mesh
(c) LY - explicit method on coarse mesh

To determine the relative effect of vectorization on each of these components, relative timings of the different methods on the 7600 for a \(31 \times 31 \times 31\) grid were collected:

Implicit Method
\begin{tabular}{lr} 
LYI & \(24.53 \%\) \\
LZI & \(\frac{24.81 \%}{49.34 \%}\)
\end{tabular}

Explicit Method
LX 23.58\%

LY
5.76\%

LZ \(\quad \underline{6.12 \%}\)
Total \(35.46 \%\)

Method of Characteristics
\begin{tabular}{lll} 
LYC & \(3.29 \%\) & Excluding subroutine CHARAC \\
LZC & \(3.33 \%\) & ExCluding subroutine CHARAC \\
CHARAC & \(\underline{6.64 \%}\) & \\
Total & \(13.26 \%\) &
\end{tabular}

Miscellaneous other operations
\begin{tabular}{cc} 
Total & \(\underline{1.94 \%}\) \\
Overall Tótal & \(\overline{100.00 \%}\)
\end{tabular}

\subsection*{2.1 Data Handling}

The model for the FMP version of the code is to use the same algorithm, but expand the problem to a \(100 \times 100 \times 100 \mathrm{grid}\). The basic variables associated with each grid point are RHO, RHOU, RHOV, RHOW, E, EI, U, V, W, RMUL, giving a basic storage requirement of \(10 \times 10\) data points.

In the proposed FMP the Intermediate Memory contains \(32 \times 10\) words, which is sufficient to hold 1 program of the \(10 \times 10\) size while permitting the staging of the next job to Intermediate Memory during current job execution.

6
6
The Main Memory has \(4 \times 10\) words (option to \(8 \times 10\) words), a size not large enough to store the whole problem, thus forcing the algorithm to incorporate some sort of "circular buffering" scheme. The Intermediate Map Unit must be used to read in new data to the Main Memory and write out old data to the Intermediate Memory while the Vector Unit is processing the "current" data.

In addition to the circular buffer, the algorithms must also reorder (or rotate) the data, so that the vectors for the Vector Unit are stored in sequential locations. One can do this in several ways. First, one can do gathers and scatters from/to Intermediate Memory. Second, one can also do a
"transposition" within Main Memory or Intermediate Memory. A third option depends upon the groupings of the operators in each time step. One can initially store the data indexed by (k,j,i), then perform the 3 methods of the Ly operator vectorized on "k". The data is stored back or fetched from the Intermediate Memory, in ( \(j, i, k\) ) order for the \(L z\) operator which is vectorized on " \(j\) " etc. Stated differently, it is not required that the data be rotated between each of the subroutines. In any case, it is clear that the data handling between the Intermediate and Main Memory is an integral part of the problem.

\subsection*{2.2 Speed Optimization}

When launching into a vectorization effort on a new piece of production code, the programmer/analyst is well advised to get reacquainted with the basic tradeoffs inherent in the FMP architecture. The basic execution rate of the Map Units and the degree to which they can be made to operate concurrently with the Vector Unit is a major factor in the vectorization process. In the case of the implicit and explicit codes, it has been determined that the map operations (scatter, gather, transpose) can be almost totally overlapped with other functional unit execution. The 'second order effects' of the FMP architecture then become of great concern.

In particular, the effectivity of the Vector Unit becomes the real measure of the machine performance. The efficiency of operation is tied to the amount of time the unit can be kept productively busy throughout a code execution. The sole contributing factor that limits the unit's full capacity from being utilized is vector startup time. This concept is
 should be reviewed here. First, startup time is a function of the pipeline configuration for the previous operation, the function now being initiated, and the interdependency of one vector operation with another. Thus a short vector operation whose results must feed another vector operation requires that all the data be stored in memory before the subsequent operation can be started. The time delay required for data to clear the pipelines and be stored back to memory, then brought back from memory to refill the pipelines is, in effect, called startup. Startup time is a fixed overhead assigned to a particular sequence of instructions, and thus the longer the vector to be processed, the less effect that startup time has on overall rates of computation.

Tables I and II are provided to illustrate the effect of this architectural characteristic. These tables represent the relative megaflop rate of the FMP pipelines when compared to operations on vectors of length 30 , or when compared to theoretical sequences of vector operations which have zero delay between operations due to the dependency of one vector on another.

Table I
\begin{tabular}{|c|c|c|c|c|c|c|c|}
\hline \(\underline{R}(N, f, d)\) & & \(N=\) & 30 & 100 & 500 & 1000 & 10000 \\
\hline \multirow[t]{3}{*}{R ( \(30, \mathrm{f}, \mathrm{d}\) )} & \(\mathrm{d}=\) & 0 & 1.0 & 1.66 & 2.16 & 2.24 & 2.324 \\
\hline & \(\mathrm{d}=\) & 5 & 1.0 & 2.04 & 3.16 & 3.395 & 3.638 \\
\hline & \(\mathrm{d}=\) & & 1.0 & 2.27 & 4.03 & 4.46 & 4.94 \\
\hline
\end{tabular}

Table II
\begin{tabular}{lrlll}
\(\mathrm{R}(\mathrm{N}, \mathrm{f}, \mathrm{d})\) \\
\(\mathrm{R}(\mathrm{N}, \mathrm{f}, 0)\) & \(\mathrm{d}=\) & 0 & 5 & 10 \\
& \(\mathrm{~N}=30\) & 1.0 & .63 & .46 \\
& 100 & 1.0 & .77 & .63 \\
& 500 & 1.0 & .93 & .87 \\
& 1000 & 1.0 & .96 & .93
\end{tabular}

From this example one can conclude that long vectors are much faster. Next, it can be seen that for shorter vector lengths (30-50), reducing the average dependency delay can produce the same increase in speed as going to much longer vectors (500-1000).

The reduction in dependency delay can be accomplished by two techniques. The compiler can schedule a sequence of interdependent vector operations far enough apart, and with other independent operations interspersed, so as to obviate the need for the dependency key being used. This is analogous to the current scalar compilers scheduling a number of interdependent scalar operations by interspersing other operations between the dependent ones, to maximize the use of the floating point bandwidth of the Scalar Unit. The second approach is to maximize the use of long vectors, for the purpose of eliminating the dependency keys. If a vector is sufficiently long, its first elements will be well settled in memory long before the last elements have emerged•from the pipelines. These first elements can be 'fetched' from memory for the next vector operation while the current one is still in progress. In this instance the compiler (if it knows that the
vectors are long enough at compile time) can eliminate the dependency keys, and the consequent delay.

To achieve long vector operations for the majority of the explicit code execution requires a recoding of the explicit portions such that they no longer compute successive displacements (that is the next value at the current point is based on the just computed new value for the adjacent points), but instead use a process of simultaneous displacements. The metric, as provided, requires the use of successive displacements however, and this fact is reflected in the short vector lengths that were used in its simulation.

\subsection*{3.0 IMPLICIT METHOD}

The two routines that use the implicit method are LYI and LZI; these routines account for approximately half the compute time required on the CDC 7600. They are basically the same, except that LYI operates in the Z direction: The LYI routine is discussed below.

The LYI routine operates on "j-pencils" and takes two "half steps" for each pencil. Each half step requires setting up and solving seven tridiagonal systems of equations. The j-pencils are solved successively by.a pair of outer DO loops on "i" and " \(k\) ". The j-pencils only extend over the fine mesh region, which is about half of the total grid.

There are several approaches that may be used to vectorize the LYI routine. One could run the vectors in the "j" direction. This runs head on into the problem of the inherently sequential nature of solving tridiagonal systems by the standard Gaussian elimination scheme; each computation depends on the result of the previous computation. There are several algorithms available for solving tridiagonal systems on vector.machines given in reference 5.

Their analysis indicates that the best algorithm is cyclic reduction. However, the timing comparison (albeit for the STAR-100) shows no significant improvement over the standard Gaussian elimination, until the vector lengths are on the order of 250 . Since the j-pencil will only have a length of approximately 50 for a \(100 \times 100 \times 100\) mesh, there is little promise in vectorizing on the " \(j\) " direction.

One can run the vectors in the "k" direction. In this case all problems with solving the tridiagonal systems disappear. In addition, the vectors are now of length 100 instead of 50 . There is the problem associated with computing the j-pencils. simultaneously, rather than successively, as they are in the original scalar code. This means that any terms involving the subscript " \(k-1\) " will be using values from the previous time step, rather than the newly computed values from st the ( \(k-1\) ) j-pencil.

One could also run the vectors in the "i" direction; however, the two variables "UP" and "VP" are indexed by "i" masquerading under the names of "K3", "K4", and "K5", and the computations depend heavily on "UP" and "VP", i.e. there is "recursion" on \(i^{\prime \prime}\).

As a final alternative, one could try to vectorize the problem by, treating the whole "k" \(x\) "i" plane as a vector. This approach would run into the same problem that the "i" vector approach had with the variables "UP" and "VP". In addition, there would be a problem with temporary storage, since the temporary vectors used in the scalar code would now have to be three-dimensional matrices, each of order \(100 \times 100 \times 50\).

A combination of this last alternative and the second method (running the vectors in the "k" direction) was implemented (see . appendix A) and run on the STAR-100 for the ten time steps.

There are differences between the scalar version and the vector version, as expected. The timings per time step on a \(31 \times 31 \times 31\) grid are shown below.


The recoding of the VLYI subroutine (as the vector version of LYI is called) was kept as straightforward as possible, and every attempt was made to make the statements as similar as possible to the original scalar code. VLYI permits a high degree of local vectorization compared to the implicit code. In this instance the variables to be processed were retained in the same form as in the scalar code, and were transposed within the VLYI subroutine itself, where necessary. This permitted direct replacement of the LYI subroutine with the VLYI subroutine on the STAR-100 to verify the correctness of the final results. To accomplish this a small routine called ROTATE was created for the STAR-100 version. This routine is replaced on the FMP by a vector map operation which performs the rotation while gathering data from Intermediate Memory for the current slab.

In a totally vectorized version of the explicit code these map operations would be moved outside the VLYI subroutine and 'hidden' under the arithmetic operations of routines like TURBDA (which is called before the \(Y\) operations are initiated). Local optimization then consisted of further local vectorization, almost on a DO loop by DO loop basis. The rotation process
arranged the data such that all arithmetic operations could proceed over vectors of at least KMAX length. Each of the DO loops in \(J\) were analyzed to determine if there were recursions in \(J\), and if not, the loop was vectorized for the \(J\) direction. also, yielding vectors of length KMAX*JMAX/2. For mesh sizes of \(100 \times 100 \times 100\) this would provide vectors of about 5000 elements, which is considerably shorter than the 60000 elements possible in the implicit code.

In the vectorization of the tridiagonal routine, the maximum vector length can only be KMAX since the solution is recursive in J. The maximum vector length is then 100 in TRIDIAG, which is the same as that in BTRI in the implicit code.

To make the vectorization easier, routines such as GI and DIAG were incorporated in-line, rather than as subroutines. Although it is expected that the compiler will be able to include out-of-line subroutines into in-line object code automatically, the programmer's help is still desirable.

\subsection*{4.0 EXPLICIT METHOD}

The three routines that use the explicit method are LX, LY and LZ. These routines are basically the same except for the direction of the operator, and the fact that the LY and LZ routines are restricted to the coarse mesh portion of the grid. The LX routine is discussed below.

The LX routine operates on i-pencils and takes two half steps for each pencil. Each half step solves the explicit equations for the LX operator. The i-pencils are solved successively by a pair of outer DO loops on "k" and "j".

Again there are several ways in which to vectorize the LX routine. The most straightforward way is to run the vectors in the "i" direction. This can be done since the equations are explicit equations. There is some apparent recursion on "i" in the "DO 5 I=2,IE" loop, but a careful examination of the \(F X\) subroutine shows that it is, in fact, not recursive on "i". This was implemented (see appendix B) and run for ten time steps on the STAR-100. The timings per time step on a \(31 \times 31 \times 31\) grid are shown below:
\begin{tabular}{lllll} 
& 7600 & STAR & STAR & \\
& Scalar & Scalar & Vector & \\
LX & 8.2 & 16.7 & 3.6 & seconds
\end{tabular}

Running the vectors in the "j" or "kk" direction will run into the rate of convergence question arising from solving simultaneous pencils, rather than successive pencils. The equations for SIGX, TAUXY, TAUXZ, and DISX show a greater dependence on the "j" index than the "k" index, which implies
that it would be preferable to vectorize on "k" rather than "j". This gives the second option of vectorizing the routine on both "i" and "k". This would give a vector length of up to 10,000 rather than 100 and a corresponding speed increase of up to approximately \(80 \%\) (subject to the dependency delays and size restrictions of Main Memory).
5.0 METHOD OF CHARACTERISTICS

There are two routines LYC and LZC that use the method of characteristics.

The original scalar subroutine CHARAC i-s computationally very efficient. One can attribute its efficiency to the scheme used to generate the mesh points. The mesh points are generated in such a manner that the solution of the characteristic equations involves simple additions and subtractions. The mesh generation scheme involves some very tricky logic that does not readily lend itself to machines with a pipeline architecture.

Examination of the routine LYC shows that except for the call to CHARAC, the remainder of the routine is easily and directly vectorized in the same local fashion as were LX and LYI. Assuming that all vectorized routines improve uniformly in performance from the 7600 to the FMP, it is necessary then to focus on the potential 'weak spot' in the vectorization--the CHARAC routine itself. This routine, quite obviously, cannot remain scalar in nature if all other routines become highly vectorized, for although it takes up only \(6 \%\) of the compute time of the 7600 , it could be the bottleneck in the FMP performance on the explicit code.

One method of attack in vectorization is given in the routine VCHARAC which is shown in appendix \(C\). The key feature in this approach is the processing of entire planes of data of length IL*KL/2 by both LYC and CHARAC. The problem with this is that in CHARAC each element of the plane may be handled differently from any other element depending upon the data itself.

As in MUTUR (in the implicit code) use of the 'control vector' is introduced; this is a bit-string of ones and zeroes which controls the storage of data, depending on the presence (or absence) of one-bits in the string. In the CHARAC routine the control vector has as many bits in it as the plane (IL*KL/2). Elements in this plane are updated depending upon whether the the corresponding element of the control vector is a one or not. The control vectors may be manipulated and combined using the bit logical operations -- . NOT., .AND., .OR. ... et'c.

A brief glance at the listing of CHARAC in appendix \(C\) will show that, for the most part, the scalar variable names have been retained from the original CHARAC but have become DYNAMIC variables representing arrays (usually of IL*KL/2 size). This can create some confusion on the part of analysts familiar with the original version and thus some care must be used in reading the listing to remember that almost all operations shown are array operations and not scalar operations.

There are two FORTRAN constructs used here that do not appear in the implici.t code: \(\operatorname{IF}(B I T O) Y\left({ }^{*}, *, *\right)=Z\left({ }^{*}, *, *\right)\)
and
\(Y\left({ }^{*},{ }^{*}\right)=Z(\operatorname{JLIST}(*, *))\)

The first construct moves elements from the array \(Z\) to the array Y depending upon the presence of a one-bit in the control vector BITO. The second construct subscripts the array \(Z\) with an array JLIST. This operates as follows:

The first element of JLIST--JLIST(1, 1)--is used as an
integer subscript for the array \(Z\). The element at that position in \(Z\) is then moved to the first element in the array \(Y--Y(1,1)\). The next element of JLIST--JLIST(2,1)--is then used as a subscript of \(Z\) and the data moved to \(Y(2,1)\). This continues until all elements of JLIST and \(Y\) are processed. Obviously JLIST and YLIST must be conformal, but \(Z\) need not be. If a subscript in \(Z\) is out of range of Z no error message is created.

A third construct used employs the Q8xxxx in-line call for machine language instructions that is referenced but not illustrated in the FORTRAN specification. In the CHARAC routine the call Q8NOBITS(BITO)
determines if there are any one-bits in the string BITO. If. there are none, the condition is TRUE and a branch may be triggered by that condition. This call is used to 'bail out' of the DO loops when all elements in the I-K plane have become inactive, to prevent unnecessary passes through the DO loops.

Examining one example of the vectorization technique illustrates how these constructs help in 'parallelizing' CHARAC.

DO loop 10, lines 1730 through 2060 of VCHARAC, shows the means used to process each element in the plane separately, while still performing vector operations.

First, the starting value of the DO loop index could be different for each element in the plane depending on values calculated for elements of JLIST in the preceding loop. Thus the DO 10 loop must be viewed as a parallel set of DO 10 loops
(as a matter or fact, a whole plane's worth of DO loops), each with a potentially different starting index. The DO 10 shown is then set up to start at the earliest index found in JLIST (via the function Q8SMIN which returns the scalar minimum of the entire vector JLIST), and the loop 10 could potentially end at JL, unless ended by a 'bail out' earlier.

Once launched into the loop, the immediate need is to find out which elements in the I-K plane to process, first of all, which elements have indexes in JLIST that have not yet reached the limit JL. This is accomplished by the statement at 1870
\[
\text { BIT4 }=(\mathrm{JLIST} . \mathrm{LE} . \mathrm{JL})
\]
which forms a control vector at vector rates from the conditional test shown. In statement 1880 any elements (bits) from this control vector are eliminated if they have already been 'deactivated' in previous loops. This information is carried in the bit-string BIT5 (and BIT3), and the logical .AND. operation thus provides a BIT4 which represents all active elements at this time.

The 'bail out' check is then made at statement 1890 , to skip processing entirely if all elements are inactive.

Statements 1900 through 1930 deal with temporary data areas where there is no need to worry about controlling the data storage. Likewise the updating of YJK2 temporary data need only be controlled by the condition YJK2.GT.Y2. Note that the form shown in lines 1940 and 1950 could be replaced by the original

IF (YJK2.GT.Y2)YJK2=Y2
however, the desire was to explicitly show how the hardware actually performs the operation at this point in the listing. In later examples the original scalar form is preserved and the programer must be aware of its vector/control vector nature.

Lines 1960 through 2010 then update key data arrays based on the contents of the control vector. Finally at line 2030, a test is made to determine if other elements should become inactive. For all remaining active elements, the individual indices are then updated in JLIST, and a return is made to the beginning of the loop. It can be seen that elements can become 'deactivated' in this loop by having their individual index reach the limit JL, or by having the computed value of YJK2 for that particular element become equal to the value of \(Y 2\).

The computations in lines 1960 through 2010 involve the use of the list of indexes JLIST rather than the scalar DO variable. Thus a potentially different value of \(Y\) may be used in the processing for each element of the I-K plane. The operation implied by this involves performing gather operations from the array \(Y\). The Map Unit can perform both gather operations for a single statement such as 1980:

\section*{\(\operatorname{IF}(B I T 4) V I J K 2=W T 1 * H Y V(J L I S T)+W T 2 * H Y V(J L I S T+1)\)}

In this example one map and one vector operation will be generated by the compiler.

The use of index lists and control vectors throughout the oth loops in CHARAC follows the same pattern described here. In
some cases the DO loop limits rather than the DO indexes themselves are separate entities, and in the case of LOOP 38 both the starting and ending conditions are individualized for every element in the I-K plane.

\subsection*{6.0 IMPLICATIONS}

The method of vectorization shown here appeared to be the most straightforward one handy. The use of control vector techniques is not without its penalties; however; as more passes are made through the loops, and more elements are deactivated, the efficiency of this scheme degenerates quickly. NASA and Control Data mathematicians firmly believe that in the domain of real problem solutions, the number of iterations through each loop would be from 3 to 5, as the 'waves' tend to travel closely together in real physical solutions. In this instance then, the unused calculations in the I-K plane would be discarded, but would still take processing time. If a \(1 / 2\) reduction in elements is assumed for each pass until the last, then the number of useful operations would be
```

Pass 1---IL*KL/2
Pass 2---IL*KL/4
Pass 3---IL*KL/8
Pass 4---IL*KL/16
Pass 5---IL*KL/32

```
while the number of actual operations. would be constant for each pass at IL*KL/2 operations. In five passes 5*IL*KL/2 operations would have been done, but only 31*IL*KL/32 operations would have been used in actual results. This can
reduce the floating point rate potential of the vectorization by a factor of \(31 / 80\).

If it turns out that the number of passes required to resolve all the waves is quite large and the number of residual active elements remaining from pass to pass is quite small, then another approach would be called for. In this instance, the structure of the program would remain the same, but instead of performing controlled storage operations, the choice would be to perform compress and merge operations using the control vector to squeeze the I-K plane down to only the active elements each pass.

In the section on performance evaluation of the metric codes (see Division 1) the impact of this vectorization technique can be seen more clearly.
```

*OECK YLYIZ (000100
SUGROUTINE VLYI
000110
COMMON /A11/ RHO(31.31,31) RHOU(31.31.31), RHOV{31,31.31) 000120
000130
COMHON /A13/ U(31,31,31), Vi31.31,31), Y(31.31.31)
COMMON /AG / RMUL(31.31,31) 000160
COMMON /A6,Y(31),OYCELL(31).JS1,JEI,JS2,NE2,JLFM,JL,YF,YH,

```

```

        COMHON /A&/ ISHK, ILE, IE, IL, K1, K2, K3, K4, KS 000190
        COMMON /AS/ GAKMA, GAMMI. GAMMPR, CV, CVI. STOKES. UOF CO: 000200
    1 PO= RHOO. RL, XO
    00210
        COMMON /AT/ DX,OXI,OY,OUMDYI,OZ,DOLIEIHALL,IADBNL,OT,CFL, CONST 000220
        COMHON /A8/ ISMTHX,ISMTHY, ISMTHZ, GYICNT, LYCCNT, LZCCNT, LZICNT,000230
    1. NLYI,NLZI,BETA,BETAI,CRKNIS 000240
        COMMON/SEC/ SBC(31,31,5) SBCN 000250
        COMMON /ANGL/ TANT(3Z). COST(32). TANTH,TANTHB,COSTH,COSTSO&SECTH 000260
        COMMON /AZ/ POUM{32,5},C9{32), 000270
    C
COMMON /TRYD/ KLEN
000280
OIHENSION UP(31,23,3),VP(31,23,3),000290
I UI{31,23),VI(31,23),WI(31,23), , 000310
lll:VSQ(31,23), WSQ(31,23),EII(31,23):000320
FFU(31;23),FFV(31.23).FFW(31.23),000350
ONON FFUSQ(31,C3),FFVSQ(31,23),FFWSO(31,23),FFEI(31,23) 000360
OLMENSION ETA(31,23), RKAPPA(31,23), RLMBOA(31,23)
OIMENSION F(31,23,5), P(31,31), PROYCT(31,31,5)
OIMENSION DYI(31,23),DZI(31,23), DTOROY(31,23)
OIMENSION GUPK2(3I), GVPKZ{31), GWPK2{31)}0000420
DIMENSION RHOI(31,27); RMU(31;23), RK(31,23) 000430
C
OIMENSION DYSELL(3I:23)
DATA FORTH/1.3333333333333/ 000460
00 2 JEI.JL 000470
P(1,J) =CP(J) 000480
2 CONTINUE }000050
CALL ROTATE (RHO :IL, JL, KL) 000510
CALL ROTATE {RHOU , IL: JL: XL) 0005>0
CALLG ROTATE (RHOV IL, JL, KL) - 000530
CALL ROTATE {RHOW,IL, JL, KL) 000540
CALL ROTATE (E %,IL, JL, KL)
CALL ROTATE (U, IL,JL: KL)
CALL ROTATE {V ,IL; JLV XL) 000580
CALL ROTATE (W ,ILF JL, KL) 000590
CALL ROTATE (RMUL : IL, HL, KL) 000600
TSTR = SECOND TD\ 000610
LYICNT =LYICNT \$ 1
\AOD FHOD(LYICNT,Z)
KLEN % KE2 - 1 000650
K3=1 KE2*1
K4=2
c 0X2 = .5*0\times1
C - SETUP VECT EQUIV OF OYI,OZ1, AND DYCELL . 000720

```


```

                l -V{K1M:K2H,J\P!J2P,I)=TANTH*{U(K1P:KZP,J1P:J2P,I)
    ```

```

C
c*onote - the index on delta v and delta u is out of bounds
C

```

```

    F{*:J!tJ2;5} = TANFH*RK(*+JItJZ)*(EI(*:J1PiJ2P:I*I)
    ```



```

        ETA(**NI;J2) = TANTM*RMU(*)\1&J2)*OY1(**J1\J2)
    ```


```

C
C* NEED TO MODIFY OUPPUT OF GI FOR THE J=I CASE
C
FVSQ(*)1)=0.0
J1=JS!
JIM= \1=1
J1M= \1=1
J2M=J2-1
\2P= N2-1
\MP=N1*1,

```

```

C

```





```

C
C***E*CALL DIAGON
C
CRKNIS \& 1./NLYI

```




```

        CCE(*,JI IJC) ==OTORDY(*,JIIJZ)*RKAPPA(*,JIPIJ2P)*CRKNIS
        CCE(*,V1!JZ)==OTOROY(*)J1IV2)*RKAPPA(*,JlP
    C

```


```

C

```


```

C

```


```

c

```


```

c

```


```

C

```

```

    lllol
    002000
002010
002020
002030
02040
002040
002050
002060
002070
002080
002080
002100
002110
002120
002120
002130
002140
002150
002160
002170
002170
JZ = JE1
02190
002190
J1P = J1*1 0n2210
002260
002280
002270
002280
002290

```

```

092330
002340
C CRKNIS \& $1 . /$ NLYI
002360

```

```

C
002380
002380
002390
002600
0'02410
002410
002420
002440
C
002450
002460
002470

```

```

002480
002490
002490

```

```

002510
002520
002530
C

```

```

                    002560
                    002550
    002560
    ```

```

002560
002590

```



\begin{tabular}{|c|c|c|}
\hline C & DEFINE K2
K2 \(\quad\) JE1 & 003880 003890 003900 \\
\hline & CONTINUE & 003910 \\
\hline c & & 003920 \\
\hline c & SPECIAL B.C. AgAIN & 003930 \\
\hline c & & 003940 \\
\hline & JUJE! & 003950 \\
\hline &  & 003960 \\
\hline &  & 003970 \\
\hline & & 003980 \\
\hline &  & 003990 \\
\hline &  & 004000 \\
\hline & & 004010 \\
\hline & GVPK2(*) \(=\) (A3*RMU(*, J)*(VI(*, J-1)- & 004020 \\
\hline &  & 004030 \\
\hline & & 004040 \\
\hline &  & 004050 \\
\hline & 1 -TANTH*GVPKa (*) - Costso & 004080 \\
\hline & & 004070 \\
\hline & - S8C \((*, 1,3)=58 \mathrm{Sc}(*, 1,3)+(\) GVPK2 \((*)\) & 004080 \\
\hline & 1 -TANTH*GUPK2 (*) - COSTSQ & 004090 \\
\hline & & 004100 \\
\hline & SBC \((*, I, 4)=5 \mathrm{CH}(*, 1,4)\) +GWPK2 (*) & 004110 \\
\hline & & 004120 \\
\hline &  & 004130 \\
\hline & (USQ(*) ل-1) -USU(**J) * FORTH - & 004140 \\
\hline &  & 004150 \\
\hline &  & 004160 \\
\hline &  & 004170 \\
\hline &  & 004180 \\
\hline &  & 004190 \\
\hline & & 004200 \\
\hline c & & 004210 \\
\hline & modify Jado & 004220 \\
\hline c & & 004230 \\
\hline &  & 004240 \\
\hline c & & 004250 \\
\hline & *********** CALL PRSETY (1,K.JSL, JEL, 2 ) & 004260 \\
\hline c & & 004270 \\
\hline &  & 004280 \\
\hline - & & 004290 \\
\hline &  & 004300 \\
\hline & & 004310 \\
\hline &  & 004320 \\
\hline , & & 004330 \\
\hline &  & 004340 \\
\hline & & 004350 \\
\hline &  & 004360 \\
\hline &  & 004370 \\
\hline & 2 - W (*, JSllJEl, I)**2) & 004380 \\
\hline & & 004390 \\
\hline &  & 064400 \\
\hline & IF(JSI-LE, 己) P(*, 1)=P(*, z) & 004410 \\
\hline c & & 004420 \\
\hline & ********** CALL BCY (K,I,I.JSl, JEl) & 004430 \\
\hline c & & 004440 \\
\hline & U(*) 1,1\()=\) (**2, \({ }^{(1)}\) & 004450 \\
\hline & \(W(*, 1,1)=m(*, 2,1)\) & 004460 \\
\hline &  & 004470 \\
\hline & EI( \(0,1,1)=E I(0,2,5)\) & 004480 \\
\hline & IF(I,LT.ILE) G0 TO 253 & 004490 \\
\hline & \(W(*, 1,1)=-W(0,2, I)\) & 004500 \\
\hline
\end{tabular}










\section*{VECTORIZED VCHARAC ROUTINE}
 SUM \(1=0 \cdot 0\)

000460
DEFINE (」LTST. (IITL.KIKL)) 000470
\(\begin{array}{ll}\text { OEFINE (JLIST.IITILTKiKL) } & 000480 \\ \text { DO S JC=2.JCL } & 000490\end{array}\)
BIPO=(YJCB.LT,YL.AND,YJCT,GT,YL)
000500
000510
THIS STATEMENT GENERATES ONE VECTOR OPERATION WHICH CREATES 000520
TWO Bit STREAMS REPRSENTING THE CONDITIONS BEING TESTEO FOR
000530
THE ENTIRE I*K PLANE at Je! 000540
the gits strings ake then anded together gy the scalar unt t 000550
AT THE RATE OF G4 GITS EVERY FOUR CLOCK CYCLES,AND THE 000560
CONSEGUENT GRANCH INSTRUCTION EXECUTED.

Where ognobits is a machine language call to the data flag 000570
000580 000590 000600
is a machine ganguage call to the data plag 000610
GRANCH TEST. ANY FUNCTION REGINNING WITH TME SYMBOLS TQBI 0006P0
IDENTIFIES AN IN IINE MACHINE LANGUAGE INSTRUCTION,WHICH 000630
GANNOT GE INVOKED BY NORMAL FORTRAN
000640
IF (BITO)JCLNEJC 000650
000660
IF (BIT0) YJC=YJC*DYG
000670
this construet generates the mathine banguage control vector 000680 000690
OPERATIUN NHEREIN THE ARITHMETIC STATEMENT IS EXECUTEO 000700 FOR EVERY ELEMENT OF THE I OK OLANE WHERE THE CORRESPONDING GIT IN THE STRING 日ITO IS A ONE.

000710
000720
\begin{tabular}{|c|c|c|}
\hline 6 & \[
\begin{aligned}
& \text { OEFINE (JLTST, } 11: I L, 1: K L) \cdot\} \\
& \text { JLIST=0 }
\end{aligned}
\] & \begin{tabular}{l}
000730 \\
000740 \\
000750
\end{tabular} \\
\hline \({ }^{C}\) & & 000760 \\
\hline c & JLIST WILL BE USED AS AN INDEX VECTOR DEPENDING ON THE VALUES & 000770. \\
\hline \({ }^{C}\) & GENERATED GY THE FOLLOWING: & 000780 \\
\hline C & & 000790 \\
\hline & 003 JJe1tJ6 & 000800 \\
\hline &  & 000810 \\
\hline & IFIOSNORITS(BITl) GOTO. 4 & 000820 \\
\hline & J\#J*1 & 000830 \\
\hline & BITlagItl. ANO.8ITO & 000840 \\
\hline \(c\) & & 000850 \\
\hline \({ }^{\text {c }}\) & WE ARE ONLY GOTNG TO PROCESS ELEMENTS IN THE PLANE THAT ARE & 000860 \\
\hline C & STILL ACTIVE & 000870 \\
\hline 6 & & 000880 \\
\hline &  & 000890 \\
\hline \({ }_{6}\) & & 000900 \\
\hline 6 & WKEREVER THE CONDITION IS MET.JLIST WILL BE UPOATED & 000910 \\
\hline C & & 000920 \\
\hline 3 & CONTINUE & 000930 \\
\hline 4 & CONTINUE & 000940 \\
\hline \({ }_{6}\) & & 000950 \\
\hline \({ }_{6}\) & WE NOW HAVE A LIST OF INDEXES FOR EVERY I -K ELEMENT THAT IS & 000960 \\
\hline 6 & ACTIVE, THIS LISt CAN GE USED AS A MEANS TO GATHER & 000970 \\
\hline \({ }^{6}\) & THE CORRESPONDING ELEMENTS GROM VARIOUS FLOATING POINT ARRAYS & 000980 \\
\hline C & & 000990 \\
\hline &  & 001000 \\
\hline \({ }^{c}\) & & 001010 \\
\hline C & THIS OPERATION GENERATES DNE GATHER OPERATION USING JLIST INOICES & 001020 \\
\hline \({ }_{6}\) & INTO THE ARRAY \(Y\) * & 001030 \\
\hline \(C\) & & 001040 \\
\hline & WT2*1.0.wT1 & 001050 \\
\hline & RI(JC) \(=\) WTI*HYRHO (JLIST) *WTE*HYRHO (JLIST*I) & 001060 \\
\hline & PI \{JC\} =W7 - WYP (JLIST) +WT2eHYP \{JLIST*1) & 001070 \\
\hline &  & 001080 \\
\hline &  & 001090 \\
\hline & OYCEDTCOCI (JC) & 001100 \\
\hline \({ }^{C}\) &  & 001110 \\
\hline C & REMEMEER AT THIS PUINT THAT RI, PI,VI,ANO CI CONSIST OF & 001120 \\
\hline 6 & PLANES OF DATA IL BY KL IN SIZEtHITH A PLANE FOR & 001130 \\
\hline C & EVERY ACTIVE YALUE OF JC & 001140 \\
\hline c & & 001150 \\
\hline & YJCBaYJet & 001160 \\
\hline & YJCteYJC*0.5*DYC & 001170 \\
\hline & OYI \{vC\} \(\begin{aligned} \text { YucT-YJes }\end{aligned}\) & 001180 \\
\hline & BITE\# \{JCokE,NOT* 1 ) & 001190 \\
\hline & IF (BIFZ) SUM 1 =SUM 1 *PI (JC) & 001200 \\
\hline & IF (81T2) SUM3-SUM3-VI (JC) & 001210 \\
\hline 5 & CONTINUE & 001220 \\
\hline 6 & CONTINUE & 001230 \\
\hline & DEFINE \{JCAOD, (1)IL.11K6) & 001240 \\
\hline & JCADOFO & 001250 \\
\hline &  & 001260. \\
\hline & IF [g173) JCAOOE (NDT*3-2*JCLN) & \(001270^{\circ}\) \\
\hline & JCMAX & 001380 \\
\hline & & 001290 \\
\hline c & WRITE STATEMENY ELIMINATEO HERE TEMPORARILY & 001300 \\
\hline C &  & 001310 \\
\hline & IF (G8NORITST, NOT, 日IT21) GOTO 12 & 001320 \\
\hline \({ }^{C}\) & & 001330 \\
\hline \({ }^{\text {c }}\) & 日itz inoicated nhere fhere were values greater than zero in & 001340 \\
\hline C & JCAODVIF THE ENTIRE PLANE OF JCADO IS ZERO SKIP THE & 001350 \\
\hline
\end{tabular}





\section*{DIVISION 4 \\ WEATHER/CLIMATE}

APPLICATION STUDY

\section*{WEATHER/CLIMATE APPLICATION STUDY}

\subsection*{1.0 INTRODUCTION}

A portion of this study effort was to be devoted to analyzing operation of the FMP on codes other than aerodynamic flow simulations. For this purpose, the application of weather/ climate modeling was selected and NASA provided two codes to be studied. One code was developed by Goddard Institute for Space Studies (GISS); the other code was developed at Massachusetts Institute of Technology (MIT). Sections 2 and 4 present some specific aspects and salient points of the GISS model and MIT model, respectively. Section 3 briefly discusses investigation of portions of the GISS code, and section 5 provides analysis of the MIT (spectral) code. Some background and significant events leading to these two models was supplied to NASA-Ames as a separate, incidental report.

These two models represent, to some extent, current state-of-the-art in dynamical forecasting with numerical methods. They differ from each other, among other things, mainly in their approaches: the GISS model employs the finite difference method whereas the MIT model employs the spectral method. Both are more or less designed for the purpose of long term prediction rather than the day-to-day weather forecasting. However, there is no fundamental reason why they could not be extended to predict daily weather. In the MIT model, the quasigeostrophic approximation is imposed to conserve computations. For the operation of day-to-day global prediction, the
quasi-geostrophic approximation is not very desirable and therefore has to be relaxed. On the other hand, the Australian meteorological community has successfully put the spectral method, without the quasi-geostrophic approximation, intooperation for predicting weather.

Although both are more for climate than weather, the GISS model is mainly concerned with the dynamical development in the troposphere, the climate/weather in which man lives and that which is commonly discussed. The MIT model, on the other hand, is geared to the state of the upper atmosphere which certainly affects the troposphere below. Consequently, in the GISS model the effect of radiation, such as ozone absorption, is parameterized in some tractable manner for the purpose of calculating the circulation. In the MIT model, the production and destruction, as well as transportation, of ozone are computed explicitly using the dynamical model as a vehicle. Table 1 tabulates some of their differences for a closer comparison.

Symbols used in this report are listed and defined in table 2.

TABLE 1. Characteristics of the GISS and MIT Models
\begin{tabular}{|c|c|c|}
\hline Characteristic & GISS Model & MIT Model \\
\hline Method & Finite difference & Spectral \\
\hline Prediction time scale & Medium (wks/mos) & Very long (yrs) \\
\hline Dynamic system & Primitive & Quasi-geostrophic \\
\hline Time step ( \(\Delta t\) ) & 5 minutes & 1 hour \\
\hline Vertical coordinate & \(\sigma\) & \(\ln P\) \\
\hline \multicolumn{3}{|l|}{Number of layers} \\
\hline Total & 9 & 25 \\
\hline In troposphere & 8 & 6 \\
\hline In stratosphere & 1 & 19 \\
\hline Pressure at top & 10 mb & 0.04 mb \\
\hline Height at top & 30 km & 72 km \\
\hline \multicolumn{3}{|l|}{Horizontal resolution} \\
\hline Grid ( lat x long) & \(4 \times 5\) & \(\sim 15 \times 15\) \\
\hline Waves & \(\sim 20\) & 6 \\
\hline
\end{tabular}

TABLE 2. Symbol Definitions
a
radius of earth
C rate of condensation
\(c_{p} \quad\) specific heat at constant pressure
E rate of evaporation
F horizontal frictional force
f Coriolis parameter \((=2 \omega \mu)\)
G production/destruction rate of ozone
\(G^{\prime} \quad\) devation from horizontal average
\begin{tabular}{|c|c|}
\hline \(\mathrm{H}_{0}\) & scale height \\
\hline \(\mathrm{h} \nu\) & an ultraviolet photon \\
\hline \(h(Z)\) & an empurical infrared cooling coefficient \\
\hline 1 & solar flux incident on the top of the atmosphere \\
\hline J & meridional wave number resolution \\
\hline k & vertical unit vector \\
\hline \(K_{\text {d }}\) & vertical diffusion coefficient \\
\hline \(k_{0}, k_{1}, k_{3}, k_{4} \cdot k_{5}\) & parameters of chemical reaction \\
\hline M & planetary wave number resolution \\
\hline N & number of ozone molecules in the column \\
\hline P & pressure \(\div 1000 \mathrm{mb}\) \\
\hline \(\mathrm{P}_{\mathrm{m}, \mathrm{n}}\) & associated Legendre polynomal \\
\hline \(p\) & pressure \\
\hline \(p_{t}\) & pressure at top of model atmosphere, constant \\
\hline \(\mathrm{p}_{5}\) & pressure at bottom of model atmosphere \\
\hline Q & heatıng rate per unit mass \\
\hline q & water vapor mıxing ratio or specific humidity \\
\hline R & gas constant \\
\hline S & stability \\
\hline T & temperature \\
\hline T* & an "equilibrium" temperature \\
\hline t & time \\
\hline T', \(\phi^{\prime}\) & deviation of temperature and.geopotential from the standard atmosphere distributions \\
\hline V & horizontal velocity \\
\hline W & \(\mathrm{d} / \mathrm{dt}\) \\
\hline \(\chi\) & number of mixing ratio of specie j \(\left(\mathrm{j}=\mathrm{O}_{3}, \mathrm{OH}, \mathrm{HO}_{2}, \mathrm{NO}_{2}\right.\) ) \\
\hline \(\chi_{0}\) & ozone number maxing ratio \\
\hline \(\bar{\chi}_{0_{3}}\) & horizontal average \\
\hline \(\chi^{\prime}{ }_{v_{3}}\) & deviation from horizontal average \\
\hline
\end{tabular}
\begin{tabular}{|c|c|}
\hline \(Y_{m, n}\) & spherical harmonics of order \(m\), degree n \\
\hline Z & vertical coordinate \((=-\ln P)\) of the MIT model \\
\hline \(\alpha\) & absorption coefficient \\
\hline \(\beta\) & mass of an average.molecule \\
\hline \(\gamma\) & catalyst \\
\hline \(\epsilon\) & energy of a photon of wavelength \(\Lambda\) \\
\hline \(\zeta\) & solur cenith angle \\
\hline \(\eta\) & number densily of the "neutial atmosphere \\
\hline \(\theta\) & potentral temperature \\
\hline \(\wedge\) & optical wavelength \\
\hline \(\lambda\) & longttude (counted eastward from Greenwich) \\
\hline \(\mu\) & sine of latitude \\
\hline \(\xi\) & vortucity \\
\hline \(\pi\) & \(\mathrm{p}_{s}-\mathrm{p}_{\mathrm{t}}\) \\
\hline \(\rho\) & density \\
\hline \(\sigma\) & vertical coordinate \(\left\{=\left(p-p_{t}\right) /\left(p_{s}-p_{t}\right)\right\}\) of the GISS model \\
\hline \(\tau\) & index for simulated time \\
\hline \(\Phi\) & geopotential \\
\hline \(\psi\) & stream function for horizontal velocity \\
\hline \(\psi_{m} \cdot \psi_{m, n}\) & Fourier and spectral coefficients of \(\psi\) \\
\hline \[
\psi_{m}^{*} \cdot \psi_{m, n}^{*}
\] & complex conjugate \\
\hline \(\omega\) & angular velocity of the earth \\
\hline \(\partial \mathrm{X} / \partial \mathrm{P}\) & velocity potential for horizontal velocity \\
\hline \(\nabla\) & spherical gradient operator \\
\hline
\end{tabular}

\subsection*{2.0 THE GISS MODEL}

The GISS model is a 9-layer primitive equation model. Its evolution began with models developed at UCLA by Arakawa and Mintz, in particular their 3-level model. Consequently, the GISS model shares the overall structure of the UCLA model and retains the vertical coordinate formulation, the Arakawa scheme with advective quasi-conservation of important quadratic quantities, as well as much of the UCLA representation of physical processes occurring at or near the lower boundary of the atmosphere.

It differs from the UCLA model, however, not only in having finer vertical resolution, but also in its treatment of four crucial areas of physical processes, namely, moist convection, turbulent subgrid-scale processes, solar radiation, and long-wave (terrestrial) radiation. As the GISS model was used for observing system simulation experiments, asynoptic data assimilation studies, and experimental long-range forecasting, it was not bound by the very high horizontal resolution nor the stringent real-time requirements of the operational numerical weather prediction. Also, those aspects which are of importance only to very long time-scale are simplified because the GISS model is intended for medium-range forecast. The model was verified by the short-range forecast, and by. integration which is intermediate in length between the few days of the operational prediction and the seasonal, annual, or multi-annual period of climate simulation.

It includes a realistic distribution of continents, oceans, and topography. Detailed calculations of energy transfer by solar
and terrestrial radiation make use of cloud and water vapor fields calculated by the model. The hydrological cycle of the model includes two precipitation mechanisms: large-scale supersaturation and a parameterization of subgrid-scale cumulus convection.

Numerical integration requires about 70 minutes of IBM 360/95 computer time for each simulated day. In the interest of. computational economy, most aspects of the model representations of physical processes (other than advection) are calculated only for every half-hour of simulated time. These processes include surface interaction and hydrology. Similarly, solar and terrestrial radiation calculations are performed, however, only for every 2 hours of simulated time.

The system of governing equations for the GISS model is. displayed in figure 1. It consists of the following:
(1) the equation of motion,
(2) the equation of continuity,
(3) the equation of state,
(4) the first law of thermodynamics,
(5) the hydrostatic approximation,
(6) the conservation equation of water vapor.

The behavior of the atmosphere is represented primarily by the following dependent variables:
- the temperature,
- the specific humidity (mixing ratio) of water vapor,
- the difference of surface pressure and pressure at model top,
- the horizontal velocity with the components zonal wind and meridional wind.

The basic independent variables are:
- time,
- Iatitude,
- longitude,
- the vertical coordinate.

The atmostphere is modeled to have a vertical resolution of 9 evenly divided layers. The top of the atmosphere is taken as a 10 mb isobaric surface. It should be noted, however, that in the treatment of terrestrial radiation, the number of layers is doubled and in addition, 2 more layers are introduced to extend the model atmosphere from 10 mb to 1 mb , in order to obtain better estimates of the radiative fluxes.
\[
\begin{align*}
& \frac{\partial}{\partial \mathrm{t}} \mathrm{~V}=-\left\{(\mathrm{V} \cdot \nabla) \mathrm{V}+\mathrm{fkx} \mathrm{~V}+\nabla \Phi+\frac{\sigma}{\rho} \nabla \pi+\dot{\sigma} \frac{\partial}{\partial \sigma} \mathrm{V}\right\}+\mathrm{F}  \tag{1}\\
& \frac{\partial}{\partial \mathrm{t}} \pi=-\left\{\nabla \cdot(\pi \mathrm{V})+\frac{\partial}{\partial \sigma}(\pi \dot{\sigma})\right\}  \tag{2}\\
& \frac{\mathrm{p}}{\rho}=\mathrm{RT}  \tag{3}\\
& \frac{\partial}{\partial \mathrm{t}} \theta=-\left\{(\mathrm{V} \cdot \nabla) \theta+\dot{\sigma} \frac{\partial}{\partial \sigma} \theta\right\}+\frac{1}{\mathrm{c}_{\mathrm{p}}} \frac{\theta}{\mathrm{~T}} \mathrm{Q}  \tag{4}\\
& \frac{1}{\pi} \frac{\partial}{\partial \sigma} \Phi=-\frac{1}{\rho}  \tag{5}\\
& \frac{\partial}{\partial \mathrm{t}} \mathrm{q}=-\left\{(\mathrm{V} \cdot \nabla) \mathrm{q}+\dot{\sigma} \frac{\partial}{\partial \sigma} \mathrm{q}\right\}+(\mathrm{E}-\mathrm{C}) \tag{6}
\end{align*}
\]

Figure 1. System of Equations for GISS Model

The horizontal grid of the model has the configuration of a Mercator projection with increments in latitude of 4 degrees and in longitude of 5 degrees. With the 9 layers, this results in a total of 29,160 grid points, each having 4 points associated with it as drawn in figure 2. This way, variables of identical location in true physical space are attached to different points on the grid mesh. This placement is of great importance in maintaining the numerical stability of the integration, as well as simulating planetary flows.






\[
:
\]




Figure 2. Finite Difference Grid for GISS Model

In older versions, the grid of the model contained an equal number of longitudinal points at all latitudes. This means that a grid square near the pole will occupy less area. than a grid square near the equator. This nature of the grid was later modified to have the capability of readjusting the relative areas. The latter is known as a split grid in which the number of longitudinal points can be properly selected at each latitude.

In the prediction equations (1), (2), (4), and (6), terms other than the time derivatives on the right-hand side fall into two categories, namely, dependent terms and source terms (refer to figure 1). Specifically, \(Q\) in (4) and \(C\) and \(E\) in (6) are source terms. They arise from physical processes and are generally small in magnitude relative to those dependent terms which arise from the conservation and continuity properties of the atmosphere. Consequently, dependent-terms are evaluated at every time step but source terms are not. Some source terms that come from radiation are even smaller. To conserve computations, source terms other than radiation are evaluated once every 6 time steps and radiation once every 24 time steps. The time step is chosen to be 5 minutes.

For time integration at each time step, a first estimate B, as a predictor, is obtained with the forward differencing in time
\[
B(n)=A(n-1)+\Delta t * D(A(n-1))
\]
where A represents the appropriate physical quantity such as one of the primary dependent variables, \(n\) is the index for a particular time step, \(D(\) ) represents the value evaluated for the derivative which is the contribution from appropriate terms
on the right-hand side in the prediction equation. With such a first estimate of \(B(n)\), a second and refined estimate \(A(n)\) is then obtained with the backward differencing in time
\[
A(n)=A(n-1)+\Delta t * D(B(n))
\]
to serve as a corrector. This predictor-corrector cycle completes one time step of integration. The predictor and corrector are signified by the flags \(M R C H=1\) and \(M R C H=2\), respectively (see figure 3). In both cases, the space differencing is centered. For the first 2 time steps however, \(D(\) ) is evaluated as follows: for \(n=1\), by up-right uncentered space differencing flagged as \(\operatorname{MRCH}=3\); and for \(n=2\) by down-left uncentered space differencing flagged as \(\mathrm{MRCH}=4\).

Every sixth cycle of this marching process, an additional prediction cycle is imposed upon the already predicted value, with \(D(\) ) now representing only the contribution of the physical processes (but excluding that of radiation), as indicated by the box of COMP3 and COMP4 in figure 3 . Every 24th cycle, the radiational contribution is included with all the physical processes mentioned above. Also, as represented by \(n=0\), the initial quantities are given as data input to start this marching process. This marching process in time is summarized in figure 3.


Figure 3. Marching Sequence for GISS Model

In the actual computation, primary depenaent variaoıes do not stand by themselves. Rather, they appear in the form of pressure-area-weighted variables like \(\pi\) A. Accordingly, they have to be scaled by the factor \(\pi(\tau)\) at the beginning of each time step and de-scaled by the factor \(\pi(\tau+\Delta t)\) after the integration of the time step. Furthermore, in order to avoid reducing the time step in high latitudes, the zonal mass flux and zonal pressure force are smoothed longitudinally.

The distribution of variables over horizontal grid points adopted in the GISS model, shown in figure 2, is suitable for simulating the geostrophic adjustment. The space differencing for the nonlinear advective (dependent) terms in the equation. of motion is constructed so as to maintain a constraint analogous (but not strictly equivalent) to mean square vorticity (enstrophy) conservation. In fact, the differencing for these terms reduces to the enstrophy.- and kinetic-energy-conserving Jacobian scheme, when the mass flux is non-divergent. The resulting integral quasi-conservation of enstrophy as well as kinetic energy is an effective aid to preserving the shape of the energy spectrum and the area enclosed by it.
3.0 GISS MODEL VECTORIZATION

The GISS code conceptually is divided into two parts, namely, the physics portion and the dynamics portion. The physics portion of the code is found in subroutine COMP3, while the dynamics is found in subroutines COMP1 and COMP2. The division of the physics into two separate routines was done mainly as a matter of convenience, to reduce the length of the source programs and divide the workload among the team members. A byproduct of this division is, however, that COMP1 contains all of the data dependent branches (which impact the vectorization process) that appear in the dynamics calculations.

COMP1 and COMP2 are readily vectorizable routines, yielding vectors of moderate length for the model in its present state of resolution. COMP3, however, consists of a collection of physical processes which are programmed in composites of many imbedded short loops (yielding very short, inefficiently processed vectors).

The COMP3 routine in the original scalar model was called once for every grid point, as opposed to being called to process a plane or subplane of grid points. This was due to the concern for storage of temporary data on the 7600 class machines that were being used for the model. Given a massive amount of memory, COMP3 can be directly recoded to process all grid points at a single call. Thus the short DO loops, induced by the inherent physics, will be applied to groups of data points rather than a single point. This form of vectorization leaves the original computations almost exactly as it was in the original scalar model. The sheer size of the GISS model and the amount
of resources needed to recode the model in the same way the 3-D implicit code was recoded makes the effort impractical under the current study contract.

Control Data was fortunate to possess a version of the GISS model which had been vectorized for the STAR-100 by David Soll and his team at the Goddard Institute for Space Studies. Since the basic programming characteristics of the STAR family and the FMP are the same, it was felt that the STAR version could be used for purposes of this report. In this model the physics processing in COMP3 is recoded to process multiple grid points together, as was discussed previously. Since the model and results of its operation have been the subject of several public disclosures (e.g. ref. 6) a decision was made to limit this investigation to two key routines which illustrate the behavior of the FMP when challenged by the GISS code.

It. should be noted that although the STAR-100 version of the GISS code solves the same problem as the metric provided by Ames, and retains the same basic structure, some of the mathematics had to be revised to achieve effective vectorization of the model. This difference will become. evident by examination of the AVRX and LINKHO routines, whose listings appear in appendices. \(A\) and \(B\). The other major difference between the metric provided by Ames and the STAR-100 model is the degree of resolution in each. The STAR-100 version uses a \(24 \times 16\) grid while the Ames version uses a \(46 \times 72\) grid. The result of this is that vector lengths in the model studied by Control Data are shorter ( 384 versus 2312 in most instances). The shorter vector length happens to be below the knee in the
theoretical performance curve of the FMP, and thus yields considerably less than optimum results. The effect of this is that any extrapolations made for the FMP performance on the GISS model should be considered essentially 'worst case' for comparison with the model provided by Ames.

Figure 4 displays the original code for the routine AVRX, as it appeared in the Ames version of the model. Figure 5 is a listing of a version of AVRX for a climatology simulation with a course grid. This version was then adopted by David Soll for the STAR-100, the coding for which is found in figure 6, restated in figures 7 and 8 for the purposes of this discussion.

Figure 7 contains loops in \(J=2,5\) and \(J=12,15\) with 7 (a) showing an imbedded loop on \(K=2,24\), while 7 (b) shows a sequence solely for elements at \(K=1\) and \(K=25\). The sequence in \(7(b)\) can conceivably be executed in parallel with that in 7(a). If the double loop in \(7(a) K=2,24\) is extended to \(K=1,26\), the whole double loop can be addressed with a single vector statement of length \(26 \times 4=104\), whilst the loop in \(7(b)\) is processed by the Scalar Unit. The cost of this technique is that, for each J, elements \(K=1, K=2, K=24, K=25\), and \(K=26\) must be saved beforehand, as well as \(K=1, K=25\) updated and \(K=26\) restored after finishing the vector operation. This updating and saving requires the use of gather and scatter operations.

\begin{tabular}{|c|c|c|c|}
\hline C**** & SUBRQUTINE AVRX & - OAVRX & 2
3 \\
\hline ceeot & This subroutine smoothes the zonal mass flux and geopotential & oavrx & + \\
\hline C*** & gradients near the poles to help avoid computational instability: & oavrx & 5 \\
\hline C*** & & OAVRX & 6 \\
\hline cose* & COARSE NINE LAYER CDE 日lock february 1976 & OAVRX & 7 \\
\hline & REAL KAPA,LAT & oavrx & 8 \\
\hline & COMMON JM, IM, NLAY, PTROP, JMM , FIM, NLAYMI, NLAYPI, OLAT, OLON, & OAVRX & 9 \\
\hline & - ISTART, KH, TAUT,IROT, MROT, & 0AVRX & 10 \\
\hline & - OT, TAU, ITAU, XINT, IOAY, JDAY, TOFDAY, JDATE, JMONTM(2), JYEAR *NSTEP, & oayrx & 11 \\
\hline & - NCYCLE, NCOMP 3,NHOGAN, TAUP, TAUI, TAUE, TAUO, MRCH. & OAVRX & 12 \\
\hline & - PI,GRAV,RGAS,KAPA,PSL,ED,FMU,AFLW,PSF,RSDIST,SIND, COSD,RMMAX. & oavrx & 13 \\
\hline &  & OAVRX & 14 \\
\hline & - Xhagel zo),SIG(20), DSIG(20),SIGE(21),0SI00(19), & OAVRX & 15 \\
\hline & - LAT(I6),SINL(16), COSL (16),0XU(16), OXP(16),OYU(16), OYP(16), & OAVRX & 16 \\
\hline & - DXYP(16),F(16), DUMMY(24): & oavax & 17 \\
\hline & - TS (16,24),SHS(16,24),G7(16,24),GW(16,24),PHIS(16,24), & gavrx & 18 \\
\hline & - TOPOG(16,24) & oavax & 19 \\
\hline &  & OAVRX & 20 \\
\hline & - C(300) & OAVRX & 21 \\
\hline &  & OAVRX & 22 \\
\hline & COMMON U,V,T,SH.P & oavrx & 23 \\
\hline & COMMON/WORK/PU(16.24) & OAVRX & 24 \\
\hline & DO \(00 \mathrm{~J}=1 . \mathrm{JMMI}\) & OAVRX & 25 \\
\hline & ORATmOY0 (2)/DXP(J) & qavrx & 26 \\
\hline & IF(DRAT,LT, 1.) GO TO 40 & OAVRX & 27 \\
\hline &  & OAVRX & 28 \\
\hline & NMmORAT & oavrx & 29 \\
\hline & FNMaNM & oavrx & 30 \\
\hline & ALPNEALP/FNK & OAVRX & 31 \\
\hline & \(0030 \mathrm{Na}, \mathrm{NH}\) & OAVRX & 32 \\
\hline & IMIEIM-1 & oavrx & 33 \\
\hline & 151 M & oavax & 34 \\
\hline & \(0010 \mathrm{IPI}=1, \mathrm{IM}\) & OAVRX & 35 \\
\hline &  & OAVRX & 36 \\
\hline & [M] \({ }^{\text {a }}\) & oavax & 37 \\
\hline 10 & I=1P1 & OAVRX & 38 \\
\hline & \(00201 \pm 1\) IH & OAVRX & 39 \\
\hline 20 & PU(J,I) =OUMMY(I) & OAVRX & 40 \\
\hline 30 & continue & oavex & 41 \\
\hline 40 & CONTINUE & OAVRX & 42 \\
\hline & RETURN & oavax & 43 \\
\hline & END & OAVRX & 44 \\
\hline
\end{tabular}

Figure 5. Version of AVRX for Climatology Simulation with a Course Grid
```

c
SUBROUTINE AVRX( PU ) AVBX
OIMENSION PU(4,6),NH(16),ALPHA(16),X(26),Y(26)
DATA NM/0,3,1,1,1,0,0,0,0,0,0,1,1,1,3,0/
DATA ALPHA/0.01.180572E-1,1,208591E-1,4.513013E-2,9.563327E-3.
0..0.*0..0..0.,0.19.563327E-3.4.513013E-2.1.208591E=1
1.1965725-1,0.1
SMOOTHES THE LONAL MASS FLUX AND GEOPOTENTIAL GRADIENTS
NEAR THE POLES TO HELP AVOID COMPUTATIONAL INSTABILITY
note. this routine has been sligmtly aliered
00 40 Ja2,15
[F (NM(J).LE.O) GO To 40
J!=2G*(J-1)*1
J2=J1+1
NMJ=NM (J)
00 30 N=1,NMJ
X(2)F\#4) = PU(J2t24) - PU(J1:24)
x(1)=x(25)
x(26)=x(2)
Y(1t?5) = X(2:25) - X(1:35)
Y(1:25)=Y(1125) - ALPHA(J)
PU(N1:25) = PU(JI:25) \& Y(1125)
30 CONTINUE
40 CONTINUF
AETURN
RETUP

| AVRX | 2 |
| :--- | ---: |
| AVRX | 3 |
| AVRX | 4 |
| AVRX | 5 |
| AVRX | 6 |
| AVRX | 7 |
| AVRX | 8 |
| AVRX | 9 |
| AVRX | 10 |
| AVRX | 11 |
| AVRX | 12 |
| AVRX | 13 |
| AVRX | 14 |
| AVRX | 15 |
| AVRX | 16 |
| AVRX | 17 |
| AVRX | 18 |
| AVRX | 19 |
| AVRX | 30 |
| AVRX | 21 |
| AVRX | 22 |
| AVRX | 23 |
| AVRX | 34 |
| AVRX | 35 |
| AVRX | 26 |
| AVRX | 27 |
| AVRX | 38 |
| AVRX | 39 |
| AVRX | 30 |

C
*
Y(1,25) = x(2,25) - x(1,75)

$$
\begin{aligned}
& \mathrm{J}=2,5(\text { or } \mathrm{J}=12,15) \\
& \text { DO } 20 \quad \mathrm{~K}=2,24 \\
& \mathrm{P}(\mathrm{~K}, \mathrm{~J})=[\{\mathrm{P}(\mathrm{~K}+1, \mathrm{~J})-\mathrm{P}(\mathrm{~K}, \mathrm{~J})\} \\
& \\
& -\{\mathrm{P}(\mathrm{~K}, \mathrm{~J})-\mathrm{P}(\mathrm{~K}-1, \mathrm{~J})\}] \\
& \\
& * \mathrm{~A}+\mathrm{P}(\mathrm{~K}, \mathrm{~J})
\end{aligned}
$$

## 20 CONTINUE

(a)

$$
\begin{aligned}
& \mathrm{Z}= {[\{\mathrm{P}(2, \mathrm{~J})-\mathrm{P}(1, \mathrm{~J})\}} \\
&-\{\mathrm{P}(25, \mathrm{~J})-\mathrm{P}(24, \mathrm{~J})\}] \\
& * \mathrm{~A} \\
& \mathrm{P}(1, \mathrm{~J})=\mathrm{Z}+\mathrm{P}(1, \mathrm{~J}) \\
& \mathrm{P}(25, \mathrm{~J})=\mathrm{Z}+\mathrm{P}(25, \mathrm{~J})
\end{aligned}
$$

$P(26, J)$
(b)

Figure 7. Restated AVRX Loops in $\mathrm{J}=2,5$ and $\mathrm{J}=12,15$

The code in figure 8 is executed twice. The $J$ index takes the values 2 and 15 during these executions. This fact is highlighted in figure 8 by underlining the indices $J=2$, the first of which can be found to have a 15 underneath it. As in the previous example, figure 8 consists of two parts, (a) and (b). The loop shown in $8(a)$ can be put in a vector form with length of only 23 elements. There is no need in this case, however, to gather and scatter data in order to save and restore elements, since the large register file of the FMP can be scheduled to handle the necessary sequence of scalar load and store operations.

In addition to sequences like $A V R X$ which cause concern for analysts wishing to optimize code for the STAR-100/FMP, another subroutine loomed as a mighty challenge for FMP vectorization. The subroutine LINKHO in the original GISS model was recoded for the Control Data FMP and yielded a sparse . 020 gigaflop performance. No amount of 'local' vectorization, as was. applied to the other metrics, could seem to yield any better performance. The LINKHO routine developed by David Soll was then acquired and its STAR-100 listing is included in appendix B. The methodology employed in this vectorization is amply discussed in reference 6 .

$$
\begin{aligned}
& \mathrm{K}=2,24 \\
& \mathrm{P}\left(\mathrm{~K}, \frac{2}{15}=\right. {[\{\mathrm{P}(\mathrm{~K}+1, \underline{2})-\mathrm{P}(\mathrm{~K}, \underline{2})\}} \\
&-\{\mathrm{P}(\mathrm{~K}, \underline{2})-\mathrm{P}(\mathrm{~K}-1, \underline{2})\}] \\
& * \mathrm{~A}+\mathrm{P}(\mathrm{~K}, \underline{2})
\end{aligned}
$$

(a)

$$
\begin{aligned}
& \mathrm{Z}= {[\{P(2, \underline{2})-P(1, \underline{2})\}} \\
&-\{P(25, \underline{2})-P(24, \underline{2})\}] \\
& * A \\
& P(1, \underline{2})=z+P(1, \underline{2}) \\
& P(25, \underline{2})=Z+P(25, \underline{2})
\end{aligned}
$$

(b)

Figure 8. Restated AVRX Executed for $\mathrm{J}=2$ and $\mathrm{J}=15$
4.0 THE MIT MODEL

The MIT model is authored principally by Cunnold, Alyea, Phillips, and Prinn and supported as part of the Climatic Impact Assessment Program by the U.S. Department of Transportation. Its objective is to simulate the distribution of ozone as effected by a simple but reasonably realistic dynamical model with some more-up-to-date photochemistry. The dynamics is simplified by the quasi-geostrophic approximation so that a time step of one hour can be used because the pronounced seasonal variations in ozone require several years of integration before a statistically steady state can be expected.

Although the heating due to the absorption of solar radiation by ozone is computed explicitly and precisely for the stratosphere, empirical representation is employed for the lower atmosphere. The latter is not appropriate for daily prediction of tropospheric weather, nor is it satisfactory as a means of predicting in a fundamental sense such properties as the typical pole-to-pole temperature difference in the troposphere. In spite of its simplicity, it fulfills its purpose to create zonal flows, large-scale eddies, and meridional circulations in the troposphere which are realistic in a statistical sense, and which can be affected by stratospheric ozone at higher elevations and can redistribute that ozone in a natural manner.

The model was used to simulate stratospheric motion patterns, meridional circulations, and ozone density over a 3-year period as a function of height and latitude, eddy transports of
ozone, surface destruction of ozone, correlations of ozone with other variables, and annual cycle of columnar ozone in high latitude. It has a mission to predict ozone distribution under some perturbed conditions such as those which could conceivably be caused by No from aircraft engines.
For the MIT model, the system of governing equations, shown in figure 9, is basically similar to that of the GISS model shown in figure 1. Comparing the two figures, a few differences can be noted. First, because the emphasis of the MIT model is more on the development in the upper atmosphere where the water vapor is virtually non-existent, equation (6) in figure 1 for the conservation of the specific humidity is replaced by a corresponding one for the ozone mixing ratio denoted as (6') in figure 9. The heating term $Q$ in equation (4) will consequently no longer have any contribution from the precipitation. Rather, the radiation is the only process that plays here.

Secondly, the quasi-geostrophic approximation necessitates some reformulation of the equation of motion for prediction. A vorticity equation is first derived from the equation of motion by cross-differentiation to cancel the pressure term. In the meantime, the vorticity can be represented in terms of the geostrophic wind and, in turn, the balanced pressure which can be taken as the stream function of the horizontal velocity. Thus, equation (1) in figure 1 as the prediction equation of motion is replaced by the vorticity equation in terms of the stream function as in equation ( $1^{\prime}$ ), together with the balanced condition (geostrophic), equation ( $1^{\prime} a$ ), both in figure 9.

The vertical coordinate is $Z=-\perp n P$ where $r$ is pressure divided dy 1000 mb which represents the surface pressure. In this vertical coordinate, the continuity equation takes the form shown as equation ( $2^{\prime}$ ), and the hydrostatic equation is given in equation (5').

Equation (3), the equation of state, remains the same and is duplicated as (3'). The thermodynamic equation (4) becomes (4') in figure 9, where $Q$ contains the absorption of solar radiation by ozone as well as infrared cooling.

$$
\begin{align*}
& \nabla^{2} \frac{\partial \psi}{\partial \mathrm{t}}=-\mathrm{kX} \nabla \psi \cdot \nabla\left(f+\nabla^{2} \psi\right)-\nabla \cdot f \nabla \frac{\partial \mathrm{X}}{\partial \mathrm{P}}-\mathrm{F} \\
& \nabla^{2} \phi^{\prime}=2 \omega \nabla \cdot \mu \nabla \psi  \tag{1'a}\\
& \mathrm{~W}=\nabla^{2} \frac{\partial \mathrm{X}}{\partial \mathrm{P}} \\
& \frac{\mathrm{p}}{\rho}=\mathrm{RT} \\
& \frac{\partial \mathrm{~T}^{\prime}}{\partial \mathrm{t}}=-\mathrm{kX} \nabla \psi \cdot \nabla \mathrm{~T}^{\prime}-\mathrm{WS}+\frac{\mathrm{Q}}{\mathrm{c}_{\mathrm{p}}}
\end{align*}
$$

$$
\mathrm{RT}^{\prime}=\partial \phi^{\prime} / \partial \mathrm{Z}
$$

$$
\frac{\partial \chi_{\mathrm{O}_{3}}}{\partial \mathrm{t}}=-\mathrm{kX} \nabla \psi \cdot \nabla \chi_{\mathrm{O}_{3}}^{\prime}-\mathrm{W} \frac{\partial}{\partial \mathrm{Z}}{\overline{X_{0}}}+\frac{1}{\mathrm{H}_{\mathrm{o}}^{2} \mathrm{P}} \frac{\partial}{\partial \mathrm{Z}}\left(\mathrm{~K}_{\mathrm{d}} \mathrm{P} \frac{\partial \chi_{\mathrm{O}_{3}}^{\prime}}{\partial \mathrm{Z}}\right)+\mathrm{G}^{\prime}
$$

Figure 9. System of Equations for MIT Model.

The derivation of this set of equations went back to Lorenz' work in 1960 for an energetically consistent formulation of the quasi-geostrophic system (ref. 7). In order to have suitable energy invariants, all terms in the vorticity equation which involve both the rotational and the divergent part of the wind field should be omitted.

Peng in 1965 (ref. 8) was the first one to successfully perform a simple numerical experiment concerning the general circulation in the lower stratosphere with Lorenz' dynamical system. In addition, he chose the spectral method on the grounds that
i) the variation of the Coriolis parameter can be handled easily;
ii) there is no distortion of mass distribution;
iii) no artificial lateral boundary conditions are necessary.

For simplicity, only wave numbers 2 and 6 , in addition to wave number 0 , are included to represent the extra-long wave in the stratosphere and the dominant unstable wave in the troposphere, respectively.

Following in Peng's footsteps, Clark used the same dynamical framework in 1970 (ref. 9) to model the radiative and photochemical process as suggested by Lindzen and Goody (ref. 10) for the winter stratospheric circulation with waves $0,1,2,3$, and 6. Clark did not include odd nitrogen but increased his ozone absorption coefficients artifically by $40 \%$. The
integration was carried out to only 230 simulated days which is certainly not long enough to show any complete annual cycle.

The MIT model continues the work of Clark to predict ozone by simulation of a three-year period with a 25-layer dynamic model. The distributions of $N O$ and odd hydrogen deduced by McConell 2 and McElroy (ref. 11) are used to incorporate in a simple way the chemical effect of these species. The dynamics is essentially the same as that of Clark and Peng; except that all waves up to wave number 6 were included.

Because the number of waves resolved is as low as 6, taking advantage of the tendency for larger-scale motion in the stratosphere, use of the classical interaction coefficient method is justified to handle the nonlinear terms. An exception to this is the ozone heating term $Q$ of (4') and photochemical term $G^{\prime}$ in the rate of change of ozone mixing ratio (6').

For convenience, the two hemispheres are assumed to be geographically similar. This has the advantage that the approach to statistically stationary behavior of ozone can be examined at intervals of 6 months rather than 12 months in the presence of a pronounced annual cycle at a fixed latitude. The difference between hemispheres is a matter which the authors of the code chose to postpone. Accordingly, the orographic height used in the model is defined such that the southern hemisphere is a mirror image of the north.

The vertical domain of the integration extends from $Z=0$ at the surface to $Z=10.13675$ at the model top which corresponds to an
isobaric lid on the model at a pressure of about 0.04 mb , a standard atmospheric height of 71.6 km . This height was chosen to be suitably far above the main ozone layer, high enough to include the photochemical equilibrium region and to minimize the mechanical effects on the motions below. The range in $Z$ is divided evenly into 25 layers, each of thickness $\Delta Z=0.40574$. The values of stream function are defined at the midpoints of these layers, while vertical motion and temperature are defined at the interfaces. The coordinate $Z$ varies almost linearly with height according to the hydrostatic relation so that $\Delta Z$ corresponds almost uniformly to a height increment of 2.89 km . This choice gives good resolution in the stratosphere.

The spectral method calls for expansion of variables in terms of spherical harmonics. Thus the representation of the stream function, for example, takes the form of equation (7) in figure 10. Equation (8) gives the structure of the spherical harmonics and (9) gives the associated Legendre Polynomial. Note that (8) is in complex form and to insure that the fields are real, it is necessary to have equations (10) and (10a) in which the asterisk denotes the complex conjugate. Equation (11) gives the vorticity.

The methodology is illustrated with a shorter version of the vorticity equation (1') having only 2 terms, namely the Coriolis term and the advection term as in equation (12). Inserting the appropriate expansions into the vorticity equation (12) and integrating the whole equation with respect to $\mu$ and $\lambda$ obtains the transformed form as equation (13). The truncation used to date has $M=J=6$. This provides 79 degrees of freedom in each variable at each vertical level.

$$
4-25
$$

$$
\begin{align*}
& Y_{m, n}(\mu, \lambda)=P_{m, n}(\mu) e^{\ln \lambda} \\
& P_{m, n}(\mu)=\left[(2 n+1) \frac{(n-m)!}{(n+m)!}\right]^{1 / 2} \frac{\left(1-\mu^{2}\right)^{m / 2}}{2^{n} n!} \frac{d^{n+m}}{d \mu^{n+m}}\left(\mu^{2}-1\right)^{n} \\
& \psi_{-\mathrm{m}}(\mu)=\psi_{\mathrm{m}}^{*}(\mu) \\
& \psi_{-m, n}=(-1)^{m} \psi_{m, n}^{*}  \tag{10a}\\
& \xi=\nabla^{2} \psi=-\sum_{m=-M}^{M} \sum_{n=|m|}^{|m|+j} \frac{n(n+1)}{a^{2}} \psi_{m, n} Y_{m, n}  \tag{11}\\
& \frac{\partial \xi}{\partial t}=-\frac{2 \omega}{a^{2}} \frac{\partial \psi}{\partial \lambda}+\frac{1}{a^{2}}\left[\frac{\partial \psi}{\partial \mu} \frac{\partial \xi}{\partial \lambda}-\frac{\partial \psi}{\partial \lambda} \frac{\partial \xi}{\partial \mu}\right]+\cdots  \tag{12}\\
& \frac{d \psi_{m, n}}{d t}=\frac{2 \omega m}{n(n+1)} i \psi_{m, n}-\frac{a^{2}}{n(n+1)} F_{m, n}+\cdots  \tag{13}\\
& L_{n n_{1} n_{2}}^{m m_{1} m_{2}}=0 \\
& \text { for } m \neq m_{1}+m_{2}
\end{align*}
$$

Figure 10. Some Algebra of Spectral Model

The interaction coefficients are pre-calculated and stored. To extropolate in time (prediction), the "4-cycle" version of the time-differencing scheme formulated by Lorenz (ref. 12) is used for step-wise numerical integration. Taking advantage of the quasi-geostrophic approximation, a time step of one hour is used in each cycle. Using this model, four hours of computer time was required on the IBM $360 / 95$ to integrate one simulated year.

The ozone heating term and the photochemical term are transcendentally nonlinear, and quasi-linearization to put them in a quadratic form did not seem to be accurate enough in some' cases because of the rapid dependence on temperature of some reaction rates. These were therefore evaluated at each time step by the transform method as follows:
i) transform the wave number representation of variables to physical space at points of a fixed longitude-latitude grid;
ii) compute the above terms at these physical points;
iii) transform these computed physical point values back into the spectral form;
iv) add these contributions (now in spectral form) to the remaining terms (that were calculated by the interaction coefficient method).

The longitude-latitude grid of the physical space employed in this transform method had 16 points in longitude and 15 in latitude. The latter provides an exact quadrature for the quadratic product.

The heating scheme consists of a relatively precise evaluation of heating in the stratosphere combined with the more empirical Newtonian cooling in the lower atmosphere (see figure 11). The contribution due to the absorption of solar radiation by ozone is computed explicitly by evaluating the integral of equation (17) in figure 11.

$$
\begin{align*}
& \mathrm{Q} 1=\left(x_{\mathrm{O}_{3}} / \beta\right) \mathrm{Q}\left(\mathrm{~N} \sec \left(\xi^{\prime}\right)-\mathrm{c}_{\mathrm{p}} \mathrm{~h}(\mathrm{Z}) \mathrm{T}^{\prime}\right.  \tag{16}\\
& \text { where } \mathrm{Q}\left(\mathrm{~N} \sec \zeta^{\prime}\right)=\int \alpha I \epsilon \exp \left(-\alpha \mathrm{N} \sec \xi^{\prime}\right) \mathrm{d} \Lambda  \tag{17}\\
& \mathrm{Q} 2=\mathrm{c}_{\mathrm{p}} \mathrm{~h}(\mathrm{Z})\left(\mathrm{T}^{*}-\mathrm{T}^{\prime}\right)  \tag{18}\\
& \mathrm{Q}=\mathrm{Q} 1+\mathrm{Q} 2 \tag{19}
\end{align*}
$$

Figure 11. Heating/Physics for MIT Model

$$
4-28
$$

The same kind of integral is also employed in the computation of photochemistry as in equation (21) of figure 12. For speed of computation, a table was initially evaluated by numerical procedure in integration with respect to $\Lambda$ for a large range of $N(\sec \zeta)$ in equations (17) and (21). In the course of the model simulation, a table look-up is then performed in place of actually carrying out the integral when encountered.

The model photochemistry includes the Chapman reactions, the NO and $\mathrm{NO}_{2}$ catalytic cycle, and several reactions between hydrogen and atomic oxygen. Figure 12 includes a list of the reactions used in the model.

$$
\begin{equation*}
\mathrm{G}=0.42 J_{\mathrm{O}_{2}} \frac{\left(\mathrm{x}_{\mathrm{O}_{3}} J_{\mathrm{O}_{3}}+x_{\mathrm{NO}_{2}} J_{\mathrm{NO}_{2}}\right)}{0.21 \eta \mathrm{k}_{0}}\left[2\left(\mathrm{k}_{1} x_{\mathrm{O}_{3}}+\mathrm{k}_{3} x_{\mathrm{NO}_{2}}\right)+\left(\mathrm{k}_{4} x_{\mathrm{OH}}+\mathrm{k}_{5} x_{\mathrm{HO}}^{2}\right)\right. \tag{20}
\end{equation*}
$$

$$
\text { where } \begin{aligned}
J_{\mathrm{i}} & =\int \alpha_{i}(\Lambda) \mathrm{I}(\Lambda) \exp \left(-\alpha_{1} \mathrm{~N}_{\mathrm{i}} \sec \xi\right) \mathrm{d} \Lambda \\
\mathrm{i} & =\mathrm{O}_{3}, \mathrm{NO}_{2}, \mathrm{O}_{2}
\end{aligned}
$$

Reaction

1. $\mathrm{O}_{2}+\mathrm{h} \nu \rightarrow 2 \mathrm{O}$
2. $\mathrm{O}+\mathrm{O}_{2}+\gamma \rightarrow \mathrm{O}_{3}+\gamma$
3. $\mathrm{O}_{3}+\mathrm{h} \mathrm{\nu} \rightarrow \mathrm{O}_{2}+0$
4. $\mathrm{O}+\mathrm{O}_{3} \rightarrow 2 \mathrm{O}_{2}$
5. $\mathrm{NO}+\mathrm{O}_{3} \rightarrow \mathrm{NO}_{2}+\mathrm{O}_{2}$
6. $\mathrm{NO}_{2}+\mathrm{O} \rightarrow \mathrm{NO}+\mathrm{O}_{2}$
7. $\mathrm{NO}_{2}+\mathrm{h} \nu \rightarrow \mathrm{NO}+\mathrm{O}$
8. $\mathrm{OH}+\mathrm{O} \rightarrow \mathrm{O}_{2}+\mathrm{H}$
9. $\mathrm{HO}_{2}+\mathrm{O} \rightarrow \mathrm{O}_{2}+\mathrm{OH}$

### 5.0 SPECTRAL CODE ANALYSIS

The MIT code consists of two parts/programs, namely STRAT1 and STRAT2. The primary purpose of STRAT1 is, among other things, to establish an appropriate "initial" condition for the longterm climate simulation, with full dynamics and chemistry, which is coded in STRAT2. The main concern therefore, in this analysis, will be with the program STRAT2 only.

The code is, in general, quite well structured in that it is. very modular. As can be seen in figure 8, the execution of the program STRAT2 is essentially a double loop in which one subroutine is called after another. Hence, figure 13 can also be taken as a flowchart in the sense that each subroutine represents a block for a task.

Much of the code is structured in loops. In fact, a few subroutines are composed of only one single, simple loop as displayed in appendix $C$. To vectorize these is a rather trivial matter.

The subroutines in appendix $D$ have been recoded, from the standpoint of syntax only, so that they can be easily followed as well as verified. This could be considered as a prevectorization pass to show an intermediate step on the way to complete recoding and restructuring of data. This is discussed at greater length later in this report. In the original code the indices of arrays are managed/manipulated explicitly by the programmer. In the recoded version, use has been made of the array structure implied by the FORTRAN DIMENSION declarations in terms of the ordering of elements. This brings the logic of most routines into much better perspective, i.e., in simpler


Figure 13. Program Flow of MIT Model
structures of loops whose vectorization can be effected readily. As a matter of fact, it exposed not only the possibility/ feasibility, but also the advantage to restructure the data block in the entire program.

In the process of recoding, an effort was made to pull out portions of code that have to be executed/initialized only once. These are grouped together under an entry point with the same name as that of the routine except appended with a digit 0 . In addition, a letter $V$ was added as a prefix to the original name of each routine that was recoded, as a name for the recoded routine. However, the recoded version is not necessarily in vectorized form.

The discussion of the spectral code will start with the simplest routine, DZERO.

SUBROUTINE OZERO(A)
COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEV1,ILEVZ,NVERT,NRTP,
1 LRTP, NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
OIMENSION A(1)
IL=(ILEV1-1) *NVREAL* 1
IHzILEVZsNVREAL
DO 100 IzIL-IH
$100 A(1)=0.00$
RETURN
END

This routine is used to zero out a block in the memory for initialization. The construct is straightforward; it consists of a single, simple loop 100 which can be put easily in one vector instruction as
$A D=0$
where $A D$ is the descriptor representing the vector $A(I), I=I L, I H$. However, since each of the FMP vector pipes has two paths, it would be wise to break the vector $A D$ into two halves, recoded as follows:

```
    SUBROUTINE VOZERO(A)
    COMMON /CONSTS/ INDEX, . . .
    DIMENSION A(1)
    DYNAMIC ADL:ADZ
9001 AD1=0.
    AD2=0.
    RETURN
ENTRY DZEROO
IL=(ILEV1-1)*NVREAL*I
IH*ILEVZsNVREAL
L#IH-IL*I
L2aL/2
DEFINE (ADI, A(IL!IL*L2OI))
IL2=1L+LZ
LLELL-LL
DEFINE (ADZ. A(IL2;IL2+LL-1))
RETURN
END
```

In the program, $\operatorname{ILEV} 1=2$, $\operatorname{ILEV} 2=25$, and NVREAL=79. Consequently, here $L=1896, L 2=948$ and $L L=948$. With the rate of 8 results (in 64 -bit mode) per cycle, it takes $119=(948+4) / 8$ cycles for both vectors AD1 and AD2 to pass through the pipes. With the start-up of 6 cycles, 15.17 results per cycle are obtained, or 0.95 gigaflop. This takes advantage of the multiple paths of the FMP vector pipes, but not the capability of performing multiple operations in each vector pipe. This is exploited in the following.

```
    SUBROUTINE ADD2A
    COMMON P(2366)
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,NRTP,
    1 LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,OT
    K=ILEV2
    00 200 ITMS=ILEV1,ILEVZ
    IHEK*NVREAL
    IL:(K-1)*NVREAL+1
    00 100 ImIboIH
    J=I-NVREAL
    P(I):0.5*(P(J)+P(I))
100 CONTINUE
    K=K-1
200 CONTINUE
    RETURN
    ENO
```

This is another example of straightforward vectorization. The vector length is much shorter. The vector code is as follows:

```
    SUBROUTINE VADDEA
    COMMON P(2366)
    COMMON /CONSTS/ INOEX, . . .
    DYNAMIC QD,PD(26),PO1,PO2
    DO 150 K=1:KK,2
    K1=K+1
    K2=K+2
9002 QD*PD(Kl)*PO(K)
    PD(K1) *PD(K2) &PD(K1)
    PD(K) = OD
150 CONTINUE
9003 PDI=.54PO1
    PD2=.5*PD2
    RETURN
    ENTRY ADDEAO
    IL=(ILEV2-1)*NVREAL*1
    KK=ILEVZ-ILEV1+1
    DO 50 K=1,KK
    DEFINE (PD(K),P(IL:IL*NVREAL-'1))
    IL#IL-NVREAL
50 CONTINUE
    L=ILEV2*NVREAL
    L2#L/2
    DEFINE (PDI.P(1:L2))
    IL!*L2*1
    LL#L-L2
    DEFINE (PD2:P(ILI:ILI*LL-I))
    RETURN
    END
```

Notice that in loop 150 there seems to be a temporary QD. In fact, this temporary is not necessary during execution, as the FMP is equipped with 4 read ports and 2 write ports which can operate simultaneously. The appearance of the temporary $Q D$ is therefore merely for the sake (or convenience) of presentation to avoid any confusion (or ambiguity). Another point worth mentioning here is that in loop 150, the FMP is processing 2 vectors at a time, taking advantage of the parallel nature of the FMP vector pipes in performing multiple operations.

For NVREAL $=79$, it takes $16=6+(79+4) / 8$ cycles to complete the vector statement 9002. Also, 2 cycles are required for the Scalar Unit to update the indices K1 and K2. With KK=25, loop 150 takes $234=18 * 13$ cycles. On the other hand, $L=1975$ and LL $=988$, statement 9003 takes $130=6+(988+4) / 8$ cycles. There are a total of $6050=(3 * 79+5) * 25$ flops. This therefore achieves 1.04 gigaflops.

```
        SUBROUTINE PROICT(Y,Z,FY,N)
        COMMON/CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,NTRP,
    1 LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,OT
    DIMENSION Y(1),Z(1),FY(1)
    DATA SFLG/O/
    IF (IFLG.GT.O) GO TO }10
    DTINV=1-0*0/DT
    DOTINV=1.D*O/(NCYC*DT)
    IFLG=100
100 CONTINUE
    IL#(ILEVI*1)*NVREAL*1
    IH*ILEVZ*NVREAL
    A=-(N-1)*DDTINV
    BmOTINV$A
    B=1.D*O/B
    IF (N.GT.1) GO TO 300
    DO 150 ImIL:IH
150 Z(I)=B*FY'(J)
    GO TO 400
300 CONTINUE
    DO 350 I=IL.IH
350 Z(I) =B*(A*Z(I)+FY(I))
400 00 450 İIL.IH
450Y(I):Y(I)+Z(I)
    RETURN
    END
```

The main code consists of loops 150,350 , and 450. All three are simple loops and can easily be put in vector form. For example, loop 450 will take the form

$$
Y D=Y D+Z D
$$

where $Y D$ and $Z D$ represent the vectors $Y(I)$ and $Z(I)$, respectively, as initialized in the following. Since the FMP

```
vector pipes are capable of performing multiple operations in
parallel, it is not advantageous to separate loop 150 from loop
450 for the case of N=1. Therefore it should be recoded as
follows:
```

```
    SUBROUTINE VPROICT(Y,Z,FY,N)
COMMON /CONSTS/ INDEX, . . .
DIMENSION Y(I). . . .
OIMENSION BB(4),CC(4)
DYNAMIC ZD,YD,FYD,ZDI,ZDZ,YDI,YDZ
    IF (N.LE.1) GO TO 200
9004 ZD=CC(N)&ZD*B8(N)*FYD
9005 YDI=YDI*2DI
    YOZ=YOZ+ZO2
    RETURN
200 CONTINUE
9006 Z0^B8(1)*FYD
YD=YD*2D
RETURN
ENTRY PRDICTO
DTINV=1.00/DT
DOTINV=1.DO/(NCYC*DT)
DO 1 I=1,NCYC
AAx(I-I)*DDTINV
B8(I)=1.00/(AA*DTINV)
d CONTINUE
IL=(ILEV1-1)ANVREAL*1
IH#ILEVZ*NVREAL
L=IH=IL+1
DEFINE (ZD,Z(IL:IL*L-1))
DEFINE (YD,Y(IL:IL+L=1))
DEFINE (FYD,FY(ILIIL*L*I))
L2=L/Z
DEFINE (YDI,Y(IL,IL*6R-1))
DEFINE (ZDI,Z(IL:IL*LZ-1))
ILI=IL+L2
LL!L-L2
DEFINE (YDE.Y(ILI:ILI+LL=1))
DEFINE (ZDZ;Z(ILI:ILI*LL-1))
RETURN
ENO
```

For ILEV2=25 and NVREAL=79, the vector length is $L=1975$, and it will take $247=(1975+4) / 8$ cycles to pass through the pipes. With a startup of 6 cycles, it takes 383 cycles for 7910 flops, or 1.29 gigaflops, in the case of $N$ not equal to 1 . When $N=1$, 0.98 gigaflop is attained as there are 3960 flops for 253 cycles.

One of the important points is that in the case of $N=1$, the whole routine can be completed in one single pass of the pipes, thereby doubling the efficiency. Since $N$ will be less than or equal to 4 , the saving should be quite noticeable.

Statement 9004 is seen to take full advantage of the multipleoperations capability of the FMP vector pipes.

```
        SUBROUTINE RTECON(LEV)
        COMMON /CHEM/ XNEUT(26),TEMP(6240),XK1(240),XK3(240),DOHL(240),
    1 TABEXP (5000,2)
    COMMON. /FTCST/ NLON,NLAT,NGRID
    COMMON /CHMCON/ XKIP(26),XK3P(26),ACTENI,ACTEN2, XDOHP(26)
    DIMENSION T(570)
    TLOW=.25605/6.5536
    J=NGRIO"(LEV=1)
    JP=J*1
    K=0
    DO 100 LATE1,NLAT
    DO 100 LONE1,NLON
    J=J&1
    K=K+1
    T(K)=TEMP(J)
    IF (T(K).LT.TLOW) T(K)=TLOW
100 CONTINUE
    CALL EXPT(TABEXP,T,NGRID)
    b=0
    K=0
    OO 200 LAT-1,NLAT
    DO 200 GONRI,NLON
    L=L+1
    K=K+1
    XK1(K)=XK1P(LEV)*T(L)
    b=L+NGRID
    XK3(K)=XK3P(LEV)*T(L)
    DOHL(K) =XDOHP(LEV)*T(L)
LxL-NGRID
200 continue
    RETURN
    END
```

In both loops 100 and 200, the formal indices are LON and LAT but the true indices are J, K; and L. This fact is not detected by the compiler and therefore these loops are considered uncollapsible for the purpose of vectorization. Incidently, this type of loop appears throughout the entire program. It has the virtue of ease in managing the loop.

```
SUBROUTINE VRTECON(LEV)
COMMON /CHEM/ . .
COMMON /FTCST/...
COMMON /CHMCON/...
DIMENSION T(570)
```

Bit $C(4) . C D$
DYNAMIC CO,TEMPD(25),TO,XK1D,XK3D,OOHLD,TDL
9011 TD=TEMPD(LEV)
CDETEMPD (LEV).LT.TLOW
9012 IF (CD) TD=TLOW
CALL EXPT(TABEXP,T,NGRID)
9013 XKID=XKIP(LEV)*TD
$9014 . \times K 3 D=X K 3 P(L E V) A T D L$ DOHLD $=\times$ DOHP (LEV)*TDL RETURN

Entry rtecono
TLOWN.25605/6.5536
MGRID=NLAT NLON
DEFINE (TD, TiliMGRID))
DEFINE (TDL, T(MGRID+1IMGRID))
DEFINE (XKID,XKI(1:MGRID))
DEFINE (XK3D,XK3(IIMGRID))
DEFINE (DOHLD.DOHL(1)IMGRID))
JP=1
DO 1 ImILEV1,LEVZ
DEFINE (TEMPD(I),TEMP(J:J+MGRID-1))
Joj + NGRID
1 continue
return
END

There are $1680=7 * 16 * 15$ flops. The statements 9011, 9012, 9013, and 9014 are vector instructions of length 240 . Each of these will need $36=6+(240+4) / 8$ cycles. Therefore, the rate is 1.47 gigaflops.

Worth noting here is that statements 9013 and 9014 cannot be combined since the FMP vector pipes cannot perform more than 2 multiplications in parallel. Half of statement 9014 could be moved to statement 9013 but in that case, all 4 read ports would be put to use, and the total system would be busier. The way the above is coded, only 2 read ports would be used in statement 9013 and 3 read ports in statement 9014.

The data in the vector TEMP is prepared in the routine 03HEAT which calls DX3CHM, which calls O3CHEM, which calls RTECON. Therefore, statements 9011 and 9012 could have been rescheduled.

The routine RTECON is called by the routine O3CHEM in a loop. For greater efficiency, this structure should be modified.

```
    SUBROUTINE CHEMEQ{DUMMY.LEV)
    COMMON /SPECIE/ XO3(6240),CNO3(6240), XNO2(330)
    COMMON /FTCST/ NLON&MLAT,NGRID
    COMMON /WORKBK/ AVJO3(5280);AVJO2(5280),AVQO3(5280)
    COMMON /CHEM/ XNEUT(26),TEMP{6240),XK1(240),XK3(240),DOHL(240)
    COMMON /CONSTS/ INOEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ
    COMMON /O30X/ O3XFAC(2400),O3XCON(2400)
    DIMENSION DXOZDT(1),OUMMY(1)
    EquIVALENCE (AVJO3(1),DXO3DT(1))
    Jm(LEV-1)*NGRID
    INO2*(LEV-1)*NLSAT
    K=0
    FAC=9.1E-12*XNEUT(LEV)*6856.8
    OO 300 LATEI;NLAT
    INO2=INO2+1
    OO 300 LON=1,NLON
    J=J*1
    KaK+1
    IF (AVJO3(J).GT.1.E-20) GO TO 200
    03XFAC(U)=1.
    03XCON(J)=0.
    DX030T (J)=0.
    Go T0 300
200 A=XK1(K)
B=DOHL(K) *XNO2(INOZ)*XK3(K)
CxAVJ02(J)
D={B=1,)#C
BaB*AVJO3(J)+A*C
C=D
A=A*AVJO3(J)
```

```
    03XFAC(J)=1.*XK3(K)*AVJ03(J)/FAC
03XCON(J)=AVJ02(J) #XK3(K)/FAC
Dm8*g-4*A*C
IF (D,LE,O.) D=O.
X=(-B*SQRT (D))/Z.*A.
X03(J)= XO3(N)*O3XFAC(N)*O3XCON(J)
Y=XO3(J)-2.5#DUMMY(K)
IF (Y.GT.O.) GO TO 299
X03(v) =X*O3XFAC(N)*O3XCON(N)
0X03DT(J)=0.
GO TO 300
299 DX030T(U)=AVJO2(J)-DUMMY(K)
300 CONTINUE
500 CONTINUE
RETURN
END
```

This is another routine that has similar structure to the subroutine RTECON and is also called by O3CHEM in a loop just like RTECON. Here, the vector temporaries $A, B, C$, and $D$ will be made to streamline the vectorization by using dynamic arrays.

```
    SUBROUTINE VCHEMEQ(DUMMY,LEV)
    COMMON /SPECIE/ . . .
    COMMON /FTCST/ . . .
    COMMON /WORKEY/ . . .
    COMMON /CHEM/ . . .
    COMMON /CONSTS/ . . .
    COMMON /O3OX/ . ..
    DIMENSION DXOSDT . . .
    EQUIVALENCE (AVJO3 . . .
    DYNAMIC A,B,C,O,X,Y
    BIT E(4),ED,Z(4),ZD,Z1(4),Z10
    DYNAMIC DOHLO,XK3O,AVJO2D,AVJO3D,O3XFACO,O3XCOND,DUMMYD,XO3D,
    1 DXO3OTD
    DYNAMIC XXNOZD,ED,ZD,Z1D
    DIMENSION XXNOZ(16.15),XFAC(25),03XFACO(25),03XCOND(25).
    1 DXO3DTD(25)
    DIMENSION AVJO3D(25),AVJ02D(25),X03D(25)
    INO2=(LEV-1)*NLAT
    DO 49 I=1,NLAT
    XXNO2(%,I)=XNO2(I + INOZ)
49. CONTINUE
    Z1D=AVJO30(LEV).LE.1.E-20
    Y=2.5&DUMMYO
    8=0OHLD*XXNO2D*XK30
    C=B*AVJOZU(LEV)
    03XFACD(LEV) =1*XK30*AVJ03D(LEV)*XFAC(LEV)
    B=8*AVJ030(LEV)&XK10ヵAVJ020(LEV)
    A=XK1D@AVJO30 (LEV.)
    C=(C-AVJOZD(LEV))#A
    03XCOND(LEV) =AVJO2D(LEV) }5\timesK30*XFAC(LEV)
    0=8*8-4*C
    ED=D.LT.O.
    X=X03D(LEV)*03XFACD(LEV) +03XCOND(LEV)
    ZD=X,GT,Y
    DXO3DTD(LEV) =0
    IF (ED) D=0
    X=(SQRT(D)-A)%.5
    X=X/A
    IF (ZD) DXU3OTD(LEV)=AVJOZD(LEV) ODUMMYD
    X=XO3D(LEV)
    IF (Z1D) O3XFACD(LEV)=1
    03XCOND (LEV)=0
    IF (.NOT.ZlD) XO3D(LEV) mX*03XFACD(LEV)*O3XCONO(LEV)
    RETURN
    ENTRY CHEMEQO
    MGRID=NLAT*NLON
    DO 24 IEILEVI,ILEV2
    XFAC(I)=9.1E=12*6850.8*XNEUT (I)
    J=(LEV-1)*NGRID+1
    DEFINE (OJXFACD(I),O3XFAC(JIJ+MGRID=1))
    DEFINE (O3XCOND(I),O3XCON(J:J+MGRID-1))
    DEFINE (DXO3OTD(I),UXOZOT(JIJ+MGRID-1))
    DEFINE (AVJO30(I),AVJO3(J:J*MGRID-1))
    DEFINE (AVJOZD(I),AV,OZ(J:J+MGRID-1))
    DEFINE (X03D(I);XO3(U!J+MGRID-1))
24 CONTINUE
    DEFINE (XK30,XK3(1:MGRID))
    DEFINE {XXNO2O,XXNOZ(*,1))
    DEFINE (XK10,XK1(1:MGRID)-)
    DEFINE (DUMMYD,DUMMY(I:MGRID))
    DEFINE (DOHLD,DOHL(1:MGRID))
    DEFINE (ZO,Z(1:MGRID))
    DEFINE (ZLD,ZI(1:MGRID))
    DEFINE (ED:E(1:MGRIU))
    RETURN
    ENO
```

This routine is relatively lengthy. A number of things can be found here. The dynamic arrays have illustrated themselves. In fact, the condition vectors ED, $2 D$, Z1D may be put in the form of dynamic arrays to save some syntactical handling.

Loop 49 is an example of broadcasting that creates a longer vector from a shorter one so that the former is compatible with other vectors.

In addition to vectorization, the order of the computations has been somewhat rearranged to avoid delay/conflict as necessitated by waiting for operands which would have been the result of the previous instruction. This is, however, not completely successful as can be seen in the computation of the quadratic root $X$. Since 8 results can be produced per cycle, and it takes 30 cycles for the first result to come out of the pipe, the vector length has to be longer than $240=8 * 30$ to avoid any wait. Interestingly, in this case, NLON=16 and NLAT=15, the vector length is $240=15 * 16$ which barely avoids a wait.

A square root function is included in this routine; it will not be discussed except to remark that it can be approximated with some rational function. For the purpose of timing, it will be treated as a known quantity.

There are $7221=6+15^{*}\left(1+30^{*} 16\right)$ flops. It takes 16 passes of the FMP vector pipes. With the vector length of 240 , a rate of 0.78 gigaflop is achieved in this routine.

In this routine, there are two data dependent paths. A strategy has been adopted to perform all the operations but to

$$
C-S
$$

store only the selected components. Another approach is to compress out appropriate components before operations and insert them back after computations. When the vector length is rather short, the latter is not used.

Worth noting however, is that things may be different if the loop in which this routine is called is expanded. This will be pursued further in the following.

```
    SUBROUTINE OSCHEM
    COMMON /SPECIE/ X03(6240),CNO3(6240), XNO2(330),04(5340),
I XNEVEN(176), XNODO(176)
COMMON /FTCST/ NLON,NLAT,NGRID
COMMON/QUBLK/ NZJ,L103,COLO3(26),LEVPCM,LEVDYN
COMMON /O30X/ O3XFAC(2400),03XCON(2400)
COMMON /CHEM/ XNEUT(26),TEMP(6240),XK1(240),XK3(240).00HL(240)
COMMON /WORKEK/ AVJO3(5280), AVJO2(5280), AVQO3(5280)
COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,NRTP,
1 LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,OT
COMMON /MNCON/ SD,CD,SOXN
    DIMENSION DUMMY(240):DXO3DT(1)
    EQUIVALENCE (AVJO3(1),DXO30T(I))
    NH=(NLAT*1)/2
    J=0
    L=ONLAT
    DO 650 LEV=1:NZJ
    L=L+NLAT
    DO 640 bAT=1,NH
    I=L*LAT
    JxJ&1
    IP=L*NLAT-LAT +1
    A=XNOOD (J)*SDXN
    XNOZ (I) =XivEVEN(J) &A
    XNOZ(IP) =XNEVEN(J)=A
640 CONTINUE
650 CONTINUE
    CALL AVQJ
    ILEVI=LEVDYN
    ILEVZ=NZJ
    J=(ILEVl-1)*NGRID
    INO2=?ILEVI-1)*NLAT
    DO 500 LEV=ILEVI,NZJ
    K=0
    CALL. RTECON(LEV)
```

```
        DO 300 LATEI,NLAT
        INO2=INO2+1
        DO 300 LON=1,NLON
        J=J+1
        K=K+1
        X=XK1(K)*XO3(J)*XK3(K)*XNO2(INOZ)*DOHL (K)
        Y#AVJ03(J)*X03(J)&AVJ02(J)
300 DUMMY (K) =X*Y
    IF (LEV,GT.LEVPCM) GO TO 295
    CALL CHEMEQ(DUMMY'LEV)
    GO TO 500
295 J=J-NGRIO
    DO 299 K_1,NGRID
    J=J+1
299 DX030T(J)=AV,IO2(J)=OUMMY(K)
500 CONTINUE
    RETURN
    END
```

A number of observations are in order. As noted before, in loop 500, routines RTECON and CHEMEQ are called. IT would be desirable to incorporate these routines into the loop to avoid any subroutine call. In addition, the loop should be broken
into two parts, namely, one for LEV=ILEV1,LEVPCM and one for LEV=LEVPCM+1,NZJ. In fact, this division can be implemented only for the second half to avoid a conditional branch.

The broadcast which was done in the routine CHEMEQ is seen here too. The double loop 640 and 650 seems to be needed only once.

In fact, the mission of routine O3CHEM is to obtain the vector DX03DT. In the process, intermediates such as $\mathrm{X}, \mathrm{Y}$, and DUMMY are computed; they actually play a role of dynamic temporaries. To conserve the storage space requirement, they were originally dimensioned with a moderate length for the available computing machinery. It has been said that one can buy computer time but not computer memory. The FMP offers memory size that is
unthinkable to the "old timers", in addition to its revolutionary computational capability. Therefore, it is not necessary to continue being stingy in the use of memory space, though there is no point in wasting it either. As longer vectors would be preferred by the FMP for performance, it is wiser to increase the dimension for the intermediates $X, Y$, and DUMMY, etc. in order to avoid calling routines RTECON and CHEMEQ in loop 500.

```
    SUBROUTINE VOBCHEM
    COMMON./SPECIE/ . . .
    COMMON /FTCST/ ...
    COMMON /OJBLK/ ...
    COMMON /OZOX/ . . .
    COMMON /CMEM/ . . .
    COMMON /WORKBK/ . ..
    COMMON /CONSTS/....
    COMMON /MNCON/ . . .
    dIMENSION . . .
    equivalence . . .
    DYNAMIC ZK1,ZK3,TOH,DUM,ZNO
    DIMENSION YNO(16,15),ZK1P(240,25),ZK3P(240,25),ZTOH(240,25),
1 ZFAC (240,25)
    DIMENSION LNO(240,25)
    DYNAMIC A,B,C,D,X,Y
    CALL AVQJ
```

```
    T=TEMPD
    E=TEMPD.LT.TLOW
    YY=AVJ03D2*X0302*AVJ02D2
    IF (E) TETLOW
    CALL EXPT{TABEXP,T, . . .
9020 2K1*2K1P*T
    2K3=ZK3P%TL
    TOH=2TOH&TL
    DUM:ZK1*XO302*2K3*ZNO
    DUM=YY* (DUM*TOH)
9021 DXO30TDIxAVJO2DI-DUMMYO1
    DX03DTD3=AVJ02D3-DUMMYD3
    Z1*AVJO3D.LE.1.E=20
    Y=2.5*DUM
    B=TOH+ZNO*2K3
    C=8*AVJO20
    03XFACO=1*2K3%AVJ030*2FAC
    8=8*AVJO3U*ZK1*AVJO2D
    A=ZK]*AVJO3D
    C=(C-AVJOCD)*A
    03XCOND=AVJO2D*2K3*LFAC
    D=8*B-4*C
    E=0.LT.0
    K=XO3D*03XFACD*03XCOND
    ZEX.GT.Y
    DX03DTD=0
    IF (E) D=0
    X=(SQRT(D) -B)*.5
    X=X/A
    IF (Z) DXO30TD=AVJO2D=DUM
    X=X030
    IF (Z1) 03XFACD=1
    O3XCONO=0
    IF (.NOT.Z) XO3D=X*O3XFACD*O3XCOND
    RETURN
ENTRY O3CHEMO
\* (LEVDYN-1)*NGRIO*1
L2=(NZJ-LEVDYN+1)*NGRID
DEFINE {TEMPD,TEMP(J:J*L2-1)}
DEFINE (X03D2,X03(J:J+L2-1))
DEFINE (AVJO3D2,AVJO3(N:J+L2-1))
DEFINE (AVJ0202,AVJOZ(J:J*L2-1))
LI=(LEVPCM-LEVOYN+1)*NGRID
DEFINE (AVJO2D,AVJOZ(J:J+L1-1))
DEFINE (AVJO3D,AVJO3(J!J+LI=1))
DEFINE {03XFACD,03XFAC(J!J&LI-1)}
DEFINE {O3XCOND,O3XCON(J:J*LI=1)}
DEFINE (DXO3DTO,OXOSDT(J:J+LI=1))
DEFINE (X030,X03(JILI))
LL=(NZJGLEVPCM)&NGRIO
JJ=(LEVPCM-1)$NGRID*1
LL2=LL/Z
DEFINE (DX03DTOI,DX03DT(JJ{JJ+LL2-1)}
DEFINE (AVJ0201,AVJO2(JJ!JJ+LLZ-1))
DEFINE (DUMMYD1,DUM(JJ:JJ&LL2=1))
LL3FLL=LLZ
JJ3=\J+LL2
```

```
DEFINE (DX030TD3,DX03DT(JU3:\JJ3+LL3*1))
DEFINE (AVJO2D3,AVJ02(JJ3:NJ3+LL3-1))
DEFINE (DUMMYD3;DUMMY(JJ3!JJ3+LL3-1))
NH:(NLAT*1)*.5
J=0
LamNLAT
OO 651. bEV=1.9NZJ
L=b &NLAT
DO 641 LATzI,NH
Jm\!}
INX(J) =L+LAP
IPX(J)=L+NLAT-LAT+1
641 CONTINUE
651 CONTINUE
    LOmNH*NZJ
    XNO2(INX)=XNEVEN(1:LO)*XNODO (1:LO)*SDXN
    XNOZ(IPX)=XNEVEN(1:LO)-XNODD(1:L0)*SDXN
    - INOZ=(LEVDYN-1)*NLAT
    DO &1 I=1,NLAT
    YNO($,I)&XNO2(I+INO2)
41 CONTINUE
    OO 42 I=LEVDYN,NZJ
    ZK1P(*,I)=XK1P(I)
    2K3P(*,I)=xK3P(I)
    ZTOH(*,I) =XDOHP(I)
    ZFAC(*,I)=
    ZNO(*,I)EYNO(*)
42 CONTINUE
    RETURN
    ENO
```



The statement 9021 replaces the loop 299. Since it consists of one single operation, it is broken into 2 halves for parallel processing. The vector length in this case is $1920=16 * 15 * 16 / 2$ and it takes $246=6+(1920+4) / 8$ cycles for statement 9021 to complete.

Statement 9020 can be considered similarly. The vector length here is $3000=16 * 15 * 25 / 2$; it takes $381=6+(3000+4) / 8$ cycles to complete.

After statement 9021 are codes from the routine CHEMEQ except that the vectors are 9 times longer. Each therefore needs $276=6+(16 * 15 * 9+4) / 8$ cycles. Except for statement 9020 , the vector lengths for lines before statement 9021 are $6000=16 * 15 * 25$.

The code after statement 9021 needs $4140=276 * 15$ cycles, while the code before needs $4161=756 * 5+381$ cycles. A grand total of 138,042 flops is required. With 8547 cycles, it yields 1.01 gigaflops.

Loops 41 and 42 are for broadcasting. Loops 641 and 651 prepare the index vectors INX and IPX which are used to scatter XNO2 using indirect addressing.

```
SUBROUTINE FFTFOR(DATARL,OATAIM)
COMMON/FFT/ WP(7,7:15),W(2,7),NTRANS(16),NNN,NN,LR1,NLATHF,
1. NCPAR(7):LOGN
COMMON /FTCST/ N
DIMENSION DATARL(IVTDATAIM(D)
ISIGN=1
1 DO 91 INNIN,NNN,N.
KNN=INN-N
DO 12 J=1,N
TEMPR=DATARL(J+KNN)
TEMPI=DATAIM(J+KNN)
DATARL (J+KNN) =DATARL (NTRANS (J) & KNN)
DATAIM(J+KNN) =DATAIM(NTRANS(J) &KNN)
DATAIM(NTRANS(J) +KNN) =TEMPI
12 DATARL(NTRANS (J) &KNN) =TEMPR
NSS=N/2
DO 15 Jm1:NSS
L=2*J-1->KNN
M=1*1
TEMPR=DATARL(L)*DATARL(M)
TEMPImDATAIM(L) +DATAIM(M)
DATARL (M)=DATARL(L)=DATARL(M)
```

```
    DATAIM(M)=DATAIM(L)=DATAIIM(M)
    DATAIM(L)=TEMPI
15
    DATARL(L)*TEMPR
    IF (N-2) 91.91,20
20 DO 90 &#2.LOOGN
    NUM=2**I
    NUMHF=NUM/2
    NSS=N/NUM
    DO 90 J=IONSS
    NUMJK=NUM*(J=1) &KNN
    L=1+NUMJK
    M=L+NUMHF
    TEMPR=OATARL(L) & DATARL(M)
    TEMP!=DATAIM(L) & DATATM(M)
    DATARL (M) #DATARL(L)=DATARL}(M
    DATAIM(M) =DATAIM(L)*DATAIM(M)
    DATARL(G) =TEMPR
    DATATM(L)=TEMP?
    DO 90 K=2,NUMHF
    LEK*NUMJK
    M=6 +NUMHF
    MMzNSS*(K-1)
    W2=W(2,MM)
    IF (ISIGN:GT.0) GO TO 80
    W2a-W2
80 CROSSR=DATARL (M)*W(1.MM) &DATAIM(M)*WC
    CROSSI=OATAIM(M)*W(II,MM)=OATARL (M)*W2
    DATARL (M) =DATARL (L)-CROSSR
    DATAIM(M) =DATAIM(L)=CROSSI
    DATARL(L)=DATARL(L)*CROSSR
    DATAIM(L)=DATAIM(L)*CROSSI
    CONTINUE
9] CONTINUE
    IF (ISIGN.LT.0) GO TO 99
    DO 97 I=1,NNN
    DATARL(I)=OATARL(I)/N
97 DATAIM(J)=DATAIM(I)/N
99 RETURN
    ENPRY FFTREV
    ISIGN:-1
    GO TO I
    END
```

Routine FFTFOR performs the Fast Fourier Transform as a step in the transformation between the physical space variables and their spectral coefficients. First, the branch to statement 80 can be avoided if the variable $W 2$ is properly set upon entry. This is very little pre-processing but would allow having a branch-free code sequence. Another branch after statement 15 can be removed as it serves no real purpose in practice.

Besides the pre-process setting, the code can be divided into 2 parts. The first part consists of loop 12 only to rearrange the input sequence in support of the branch-free computation that follows as the second part. The code is constructed to transform, at one time, a single input sequence whose elements are in the memory consecutively. Since the length of the sequence $N=16$ is a small number in the program, there is very little point to vectorize the transform itself. A better way of utilizing the FMP is to perform the transform in the same manner over many input sequences together. To do this, the sequences must be aligned in such a way that all the first elements of input sequences are consecutive in the memory, followed by all the second elements, etc. This involves a transpose operation which is time consuming in that one can expect as few as one single word/result per cycle.

```
SUBROUTINE VFFTFOR(DATARL, OATAIM)
COMMON /FFT/ •••
COMMON /FTCST/ • •
OIMENSION OATARLI....
OIMENSION RL(NN,1),YM(NN,1)
EQUIVALENCE (RL,DATARL): (YM,OATAIM)
DYNAMIC DR,PI,TR,TI
DYNAMIC RLO,RLDO,YMD,YMDO
ISIGN=0
W2エWO
9 CONTINUE
```

```
    CALL REARR
    00 15 L=1,N.2
    M=L+1
    PR=RLD(L)*RLD(M)
    RLD(M)=RLD(L)-RLD(M)
    RLD(L) =PR
    PI=YMD(L) YMO(M)
    YMO(M) =YMO(L)=YMD(M)
    YMO(L) =PI
15 CONTINUE
    NUM=2
    NSS=N:.5
    00 89 I=2.LOGN
    NUMHF=NUM
    NUM=NUM*2
    NSS=NSS*.5
    NUMJKz-NUM
    00 89 J=1.NSS
    NUMJK =NUMJK$NUM
    L=1 +NUMJK
    M=L+NUMJK
    PR=RLD(L)*RLD(M)
    RLD(M) =RLO(L)-RLD(M)
    RLD(L)=PR
    PI=YMD(L) &MMD(M)
    YMD (M) =YMD(L)-YMD(M)
    YMD(L) =PID
    MM=0
    DO 89 K=2,NUMHF
    MMZMM+NSS
    L=K+NUMJK
    M=L+NUMHF
    TI=YMD(M) कw (1,MM)-RLD(M)*W2(MM)
    TR=RLD(M)*W(1,MM)->YMD(M)*WZ (MM)
    YMD(M)=YMD(L)-TI
    YMD(L) =YMD(L)*TI
    RLD(M)=RLO(L) -TR
    RLD(L)=RLD(L)*TR
    CONTINUE
    CONTINUE
    IF (ISIGN.NE,-1) RETURN
    RLDO=RLDODFN
    YMDOFYMDO*FN
    RETURN
    ENTRY FFTREV
    ISIGN==1
    W2=-W0
    GO TO 9
    ENTRY FFTO
    DO 110 ixi,N
    OEFINE (RLD(I),RL(1:NN,I))
    DEFINE (YMO(I),YM(I:NN,I))
110 CONTINUE
    DEFINE (RLDO,RL(IINNN,I])
    DEFINE (YMDO,YM(1INNN,1))
    FN=1./N
    DO 111 I*1:7
    WO(I)=W(2,I)
111 CONTINUE
    RETURN
    END
```

This routine is one that cannot be vectorized from the point of view of the algorithm. There are 496 flops in each transform. Assuming no instruction has to wait for the previous result, each transform will need 496 cycles to get out of the Scalar Unit. Each time an FFT subroutine is called, it has to perform (15*NLEV+1)/2 transforms, where NLEV is the number of levels involved.

NLEV is less than or equal to 25. If $\operatorname{NLEV}=25$, for example, there will be 188 transforms and 93,248 flops altogether.

If all transforms are performed through the pipeline as coded above, the number of cycles needed for $N N$ transforms is $126 M+867$ where $M=(N N+4) / 8$.

For $N N=188,3891$ cycles are required, which yields 1.5 gigaflops. This is for the computations only.

```
SUBROUTINE SPCFOR(ASPEC,AGRID,AI,NVERT)
REAL ARSP{30),AGRID;AI
COMMON /FFT/ WP(7,7,15),W(2,7),NTRANS(16),NNN,NN,LR1:NLATHF.
1 NCPAR(7),LOGN
COMMON /CONSTS/ JI(2),LR.J2(10),NREAL,NZONE
COMMON /CGBLK/ J4(86),NCOMP(12)
COMMON /FTCST/ N,NLAT,J6,ARSP
COMMON /GLOP/ P(7.7:15),WT(50).AR(50)
DIMENSION ASPEC(1),AGRIO(1),AI(1)
OO 5 IEI,NNN
AGRID(I) =0.
5 AI(I)=0
OO 30 K=1,NVERT
M=(K-1)*NLAT*N
MM=(K-1)*NREAL
00 30 J=1,NLATHF
```

```
    JJ=NLAT+1*J
    ODOR=0.
    EVENR=0.
    LJO=M+(J-1)*N
    LJJO=M+(JJ-i)aN
    OO 10 %=1,NZONE,2
E EVENR=EVENR+ASPEC(MM+I)&P(I,I,J)
    IF (LJOONE.LJJO) GO %O 11
    AGRID(LJO*1)=EVENR
    GO TO 16
11 DO 15 I=2,NZONE,2
15 ODOR=ODDR&ASPEC(MM*I)&P(1,I,J)
    AGRID (LJO* 1) =EVENR *ODOR
    AGRID(LJJO+1) =EVENR-ODDR
16 ICE=NZONE +1
    ICO=NZONE +3
    00 30 L=2,LRI
    ODDR=0.
    ODOI=0.
    EVENR=0.
    EVENI=0.
    IEND=NCOMP(L)
    LJEbJ0*6
    bJJ=LJJOOL
    DO 20 I=1.IEND,2
    EVENR=EVENR*ASPEC(MM*ICE)&P(L,I,J)
    ICEEICE+1
    EVENI=EVENI * ASPEC (MM*ICE)*P(L,I,J)
20 1CE=ICE*3
    IF (LJ.NE.LJJ) GO TO 21
    AGRID(LJJ) =EVENR
    AI(LJJ)=EVENI
    GO TO 26
    21 DO 25 I#2.IEND,2
    ODOR=ODOR&ASPEC (MM*ICO)&P(L,I;J)
    ICO=ICO*I
    ODOI=0000I*ASPEC(MM+ICO)*P(L,I;J)
25 ICO=ICO*3
    AGRID(LJ) =EVENR*ODDK
    AI (LJ) mEVENI + OODI
    AGRID(LJJ)=EVENR-OODR
    AI(LJJ)=EVENI =ODDI
    LLJ=LJO +N+C-L
    AGRID(LLJ)=AGRID(LU)
    AI(LLJ)x=AI (LJ)
26 LLJJ=LJJO ON+2mL
    AGRID(LLJJ) =AGRID(LJJ)
    AI(LLUJ)==AI(LJJ)
    IF (NCPAR(L).EG.O) GO TO 30
    ICK=ICO
    ICO=ICE
    ICE=ICK
    CONTINUE
    RETURN
    ENTRY FORSPC
    OO 8O I=1,NVERT
    M=(I-1)*NLAT*N
    MM=(I-1)&NREAL
    DO 70 J=1,NZONE
    R=0.
    DO 60 K=1,NLAT
```

```
    LL=M*(K-1)*N+j
60 R*R*AGRID(LL)*WP(1,N,K)
70 ASPEC (MM+J) =R
    IC=NZONE
    DO 80 J=2,LRR1
    IENDZNCOMP(J)
    DO 8O JJ=l.IEND
    R=0.
    C=0.
    IC=IC+1
    DO 75 K=1.NLAT
    LL&M+(K-1)*N+J
    R=R*AGRID(LL)*WP(J,JJ,K)
75 CxC+AI(LL)*WP(J.JJ,K)
    ASPEC(IC+MM)=R
    IC=IC*I
    ASPEC(IC*MM) =C
80 CONTINUE
    RETURN
    END
```

There are two parts in this subroutine to perform the Legendre transform in both directions. SPCFOR transforms the spectral coefficients into Fourier coefficients and FORSPC transforms the Fourier coefficients into the spectral coefficients. They are really independent of each other in instructions, although they share the same data block.

The data is structured in such a way that the spectral coefficients of a given level are grouped together and followed by those of the next level, and so on. The Fourier coefficients of a sequence are grouped together and followed by those of another sequence. This is a good structure for most conventional non-vector computers to process data in vector fashion, mainly to maximize the data flow rate. In fact, this entire program was coded originally in this manner to achieve vector-like performance with non-vector computers.

For the FMP, it is wiser to structure the data another way, namely, to group coefficients of same index but for all different levels and latitudes. This way, every instruction in the transform can be performed on an array of elements that are of the same index, rather than one single element. The algorithm can then be kept virtually unchanged. As a matter of fact, a closer look at the code will reveal that the algorithm is basically scalar in nature and vectorization of it is almost self-destructive.

One of the fundamental operations here is the inner product between two vectors. One machine instruction on the FMP will do this but it is designed to serve long vectors. In fact, for a vector length of 4 , nothing will be accomplished. In the routine SPCFOR, the vector length will range from 3 to 4 ; in FORSPC, the vector length is 15 . In both parts, the inner products are in the innermost loop. To avoid the drawback, the program is recoded in such a way that the inner product is broken into fundamental arithmetic operations and the loop for it is turned inside out as follows:

```
SUBROUTINE VSPCFOR(ASPEC,AGRID,AI,NVERT)
    *
    *
    0O 30 K*I,NVERT
MM= (K-1) WNREAL
DO 30 J=1.NLATHFI
L:1
EVENR(K.J,L)=ASPEC(MH+1)*P(L,1,J)+ASPEC(MM+3)*P(L,3,J)
OOOR(K,J,L)=ASPEC (MM+2)*P(L,2,J)*ASPEC(MM*4)*P(L,4,J)
10 EVENR(K:J,L)=EVENR(K,J,L)*ASPEC(MM*5)*P(1,5,J)
is OOOR(K,J,L)=ODOR(K,J,L)+ASPEC(MM+6)&P(1;6;J)
EVENR(K,J,L) =EVENR(K,J,L) &ASPEC(MM+7)*P(L,7,J)
AGRID(K.J,L) #EVENR(K,J&L)*ODOR(K,J.L)
AGRID(K:NLAT +I #J,L) EEVENR(K:J,L)=ODDR(K,J,L)
16 ICE=NZONE-11
DO 30 L=2.LRI
ICE=ICE+12
EVENR(K,J,L)=ASPEC(MM+ICE)*P(L,1,J)*ASPEC(MM+ICE*4)*P(L,3,J)
```

```
    EVENI(K,J*L)=ASPEC(MM*ICE*I)*P(L,I,J)*ASPEC(MM*ICE*5)*P(L,3,J)
    ODOR(K,J.L)=ASPEC(MM+ICE+2)&P(L,2,J)+ASPEC(MM+ICE+6)&P(L,4,J)
    ODOI(K,J&L) =ASPEC (MM+ICE+3)*P(L,Z*J)*ASPEC(MM*ICE+7)*P(L,4,J)
    EVENR(K,J,L) =EVENR (K:U,L) *ASPEC (MM*ICE*8)*0(L,5:J)
    EVENI (K,J,L)=EVENI K,J,L) +ASPEC (MM*ICE&9)mP(L,5,J)
CONTINUE
AGRID(K;NLAT+I-J,N+2-L)=AGRID(K,NLAT+1-J,L)
    AI(K,NLAT+I-J,N+2-L)=-AI{K,NLAT*1-J&L)
    CONTINUE
    DO 80 K=1,NVERT
        MM=(K-1) ANREAL
        L=1
        EVENR (K,NLATHF,L)=ASPEC (MM*1)&P(L, 1,NLATHF) & ASPEC (MM+3)
    1 *P(L.3.NLATHF)
60 EVENR(K,NLATHF,L)=EVENR(K.NLATHF,L) &ASPEC(MM+5)&P(1,5,NLATHF)
    AGRID(K,NLATHF,L) #EVENR(K,NLATHF,L)*ASPEC(MM*T)&P(L,7,NLATHF)
66 ICE=NZONE-1:
    DO 80 L=2,LRI
    ICE=ICE*12
    EVENR(K,NLATHF,L)=ASPEC(MM*ICE)AP(L,I,NLATHF)*ASPEC(MM*ICE*4)
    1 P (L.3,NLATHF)
    EVENI(K&NLATHF,L) #ASPEC(MM*ICE*I)*P(L,Z&NLATHF)*ASPEC(MM*ICE*5)
    1 OP (L,G,NLATHF)
    CONTINUE
    AGRID(K,NLATHF,L) =EVENR(K,NLATHF,L)+ASPEC(MM+ICE*8)*P(L,5,NLATHF)
    AI (K,NLATHF,L)=EVENI (K,NLATHF,L) & ASPEC (MM+ICE+G)*D,(L,S,NLATHF)
    CONTINUE
    AGRID(K,NLATHF,N->2=L)=AGRID(K,NLATHF,L)
    AI(K,NLATHF,N+Z-L)=-AI(K,NLATHF,L)
    CONTINUE
    RETURN
    ENTRY FORSPC
    DO 180 I=1,NVERT
    MM=(I-1)*NREAL
    j=1
    DO 170 JJ=1,NZONE
    DO 160 K=1.NLATHF!
    KK=160K
    R(I,JJ,J,K) =AGRID(I,K,J)*WP(J,JJ,K)*AGRID(I,KK,J)*WP{J,JJ,KK)
160 CONTINUE
    R(I;JJ,J,I)=AGRID(I, 8,J)*WF(J,JJ;8)*R(I;JJ,J,I)
    R(I;JJ,J,3) =R(I;JJ,J,2) &R(I;JJ,J,3)
    R(I;JJ,J,5)=R(I,JU,J,4) &R(I,JJ,J,5)
    R(I;UJ,U,7) mR(I;UJ,J:6) & R(I;JJ:J:7)
    R(I,JJ,V,I)=R(I,JJ,J,7)&R(I:JJ,J,1)
    R(I,JJ,N,3)=R(I,JJ,J,5)*R(I,JJ,N,3)
    ASPEC(MM+JJ)=R(I,JJ,J,3)*R(I,JJ,J,l)
    CONTINUE
    IC=NZONE + I
    DO 180 Jaz&LR1
    0O 180 JJ=1.IENO
```

$$
4-56
$$

```
DO 174 K=1,NLATHF1
KK=16-K
R(I;JJ,J,K)=AGRID(I*K,J)कWP(J,\J,Ki)*AGRID(I,KK,J)*WP(J:JJ,KK)
C(I,JJ,J,K)=AI(I,K,J)*WP(J,JJ,K)*AI(I,KK,J)*WP(J,JJ,KK)
17.4 CONTINUE
    R(I,JJ,J,7)=R(I,JJ,U,7)+AGRID(I,8,J)#WP(J,JJ,8)
    C(I;JJ,V,7)=C(I,JJ,J,7)&AI(I,8,J)*WP(J&JJ,8)
    DO 175 K=1:502
    R(I:JJ,J,K)=R(I,JJ,J,K)+R(I,JJ,J&K+1)
    C(I;UJ,J,K)=C(I;JJ,J*K)*C(I;JJ,U&K+I)
175 CONTINUE
    00 176 K=1,5,4
    R(I;JJ,J,K)=R(I;JJ,U,K)+R(I:JJ,J,K+2)
    C(I,JJ,J,K)=C(I,JJ,J,K)+C(I,JJ,J,K+2)
176 CONTINUE
    ASPEC(MM+IC)=R(I,JJ,J,I)+R(I,JJ,J,5)
    ASPEC(MM*IC+1)=C(I,JJ,J,1)*C(I,JJ,J,5)
    IC=IC+2
180 CONTINUE
RETURN
ENTRY INT
NLATHF:#NLATHF-1
IEND=NCOMP(2)
RETURN
END
```

The intermediates $R, C$, EVENR, EVENI, ODDR, and ODDI have been made vectors, and the variables $A G R I D$ and $A I$ have been re-dimensioned. The new dimensional structure is compatible with the counterpart in the routine FFTFOR/FFTREV. However, the structure of the array ASPEC is for this moment left untouched. In fact, what has been done in the above is far from vectorization intentionally because the purpose was to retain the main code structure so that it is easily followed. Though it is not at all in vector form, it can be put in vector form in a straightforward manner for one to see without having it actually carried out. The key here is to move the loop over the level into the innermost. The dimensional structure reflects the order of the loops, and therefore the form of the vectors.

SPCFOR is broken into two parts to avoid branches, leaving a branch-free code!!

The performance is incredibly good; nearly the full speed of the FMP can be realized as much of the instructions are 3-operation combinations. The only consideration remaining is to restructure ASPEC before SPCFOR and after FORSPC. This is slow, but a proper segmentation can be arranged so that the computation would be done for free.

The index of ASPEC takes the form of $M M+K$ where MM provides the starting point of the level $L$, and $K$ gives the index of the spectral coefficient. Conceivably, the data of the entire program could have been restructured with $\operatorname{ASPEC}(\mathrm{L}, \mathrm{K})$ in place of $A S P E C(M M+K)$. In that case, no further restructuring such as pre- or post-processing is required.

Most subroutines of the original code not discussed above were recoded using multiple indices, i.e., replacing $Z(M M+K)$ with $Z(K, L)$ in syntax. These are found is appendix $D$ where each subroutine appears followed by the recoded version which has the same name except prefixed with a $V$. It can be noticed that one of the most common structures is as follows
DO $200 \mathrm{~K}=\mathrm{K} 1, \mathrm{~K} 2$
DO $100 \mathrm{~L}=\mathrm{L} 1, \mathrm{~L} 2$
$\mathrm{Z}(\mathrm{K}, \mathrm{L})=\ldots \mathrm{X}(\mathrm{K}, \mathrm{L}) \ldots$
100 CONTINUE
200 CONTINUE
 identified by brackets around them in the left margin of appendix $D$.

A double loop in this very form is not collapsible for yectorization because the elements of the array $Z$ and $X$ are not visited consecutively. However, if the data array is
restructured in such a way that the indices $K$ and $L$ interchange their positions, the result is
DIMENSION $Z(L L, K K), X(L L, K K)$
DO $200 \mathrm{~K}=1, \mathrm{KK}$
DO $100 \mathrm{~L}=1, \mathrm{LL}$
$\mathrm{Z}(\mathrm{L}, \mathrm{K})=\ldots \mathrm{X}(\mathrm{L}, \mathrm{K}) \ldots$
100 CONTINUE
200 CONTINUE

This same double loop is then readily reducible to one single vector instruction of length LL*KK.

Simple loops are identified by an arrow in the left margin of appendix $D$; these are, of course, readily vectorizable. Other loops which are not quite in the form of (22) are marked with brackets in the right margin of appendix $D$. However, these can be converted to the form of (22) by merely interchanging the order of the double loop; these, in turn, can be changed to the vectorizable form of (23) by restructuring of data.

What this amounts to is essentially that each data array not only can be, but also should be, restructured into its transpose. In other words, elements of same index (spectral or
physical) but of different levels should be grouped together, followed by a group of another index, etc. This will virtually render the entire program instantly vectorizable without change to the program logic. In this sense, the code in its original form is suitable for the FMP.

Some comments are in order regarding the streamlining of the code for high performance. Although these considerations have to be treated on an individual basis, they are by no means obscure and should be rather obvious to a reasonably welltrained programmer.

It can be noted that level delimiters, say L1 and L2, frequently appear in the code such that most loops run from L1 to L2. Data should be structured so that the loop can be transformed into a single long vector from L1 to L2. Elements for L < L1 and L > L2 can be grouped separately and processed differently. Conceivably, these are few and could be processed by the Scalar Unit in parallel with the vector operations.

The 79 components of the spectral coefficients for a 4 -level case are originally ordered as shown in figure.14. Elements of the same index but different level should be grouped together and, in addition, the data should be structured in three portions as shown in figures 15a, 15b, and 15c. This avoids the necessity to Compress in VCORFOR and Compress/Merge in VSPCFOR, ... , etc.

In the physical domain, the data block should be structured as A(LONG1:LONG2, LAT1:LAT2, LVL1:LVL?). This change of the data


Figure 14. Ordering of Spectral Components in the Original Code/Data for a 4-Level Case as an Example.


Figure 15. A Restructure and Reordering of the Data Block for Spectral Components in Figure 14.

$$
4-62
$$

structure will not effect the above analysis for O3CHEM, RTECON, and CHEMEQ if the code is integrated so as to remove the subroutine calls.

With the restructuring of data just discussed, the transforms SPCFOR and FFTFOR are free from the time-consuming task of data shuffling. As a matter of fact, REARR can be removed from VFFTFOR if loop 110 under ENTRY FFTO is changed:

DO $110 \mathrm{I}=1, \mathrm{~N}$
ITEMP $=$ NTRANS (I)
DEFINE (RLD(I), RL(1:NN, ITEMP))
DEFINE (YMD (I), YM(1:NN, ITEMP))
110 CONTINUE

Total analysis and evaluation of the spectral code was not completed; time and resource limitations precluded carrying it to a point where it could be run on a benchmark basis. Relative importance in execution time of each routine is not obvious. An educated estimate of performance can be extrapolated from the analysis which was completed as being one gigaflop or better for the spectral code.

## APPENDIX A <br> AVRX ROUTINE FROM GISS MODEL

SUBROUTINE AVRX( PU )
DIMENSION PU(416),NM(16),ALPHA,(16),X(26),Y(26) DATA NM $/ 0,3,1,1,1,0,0,0,0,0,0,1,1,1,3,01$
OATA ALPHA/O.,1.186572E-1,1.208591E~1,4.513013E-2,9.563327E-3.

-1.186572Eー1,0.1
Śmoothes the zonal mass fiux and geopotential granients near the poles to help avoio computational instability
note. this routine has been slightby altefeo

```
0040 J=2,15
    IF (NM(N).LE,O) 60 T0 40
    Jl=26*(J-1)*1
    J2mJI*l
    NMJ=NM(J)
    OO 30 N=1,NMJ
    X(2124) = PU(U2124) - PU(J1324)
    X(1)=X(25)
    X(26) =x(2)
    Y(1:25) = X(2:25) = X(1:25)
    Y(1;25)=Y(1;25) ALPHA(J)
    |
3O CONTINUE
4O contINUE
    RETURN
    ENO
```

| AVRX | 2 |
| :--- | ---: |
| AVRX | 3 |
| AVRX | 4 |
| AVRX | 5 |
| AVRX | 6 |
| AVRX | 7 |
| AVRX | 8 |
| AVRX | 9 |
| AVRX | 10 |
| AVRX | 11 |
| AVRX | 12 |
| AVRX | 13 |
| AVRX | 14 |
| AVRX | 15 |
| AVRX | 16 |
| AVRX | 17 |
| AVRX | 18 |
| AVRX | 19 |
| AVRX | 20 |
| AVRX | 21 |
| AVRX | 22 |
| AVRX | 23 |
| AVRX | 24 |
| AVRX | 25 |
| AVRX | 26 |
| AVRX | 27 |
| AVRX | 28 |
| AVRX | 29 |
| AVRX | 30 |

## LINKHO ROUTINE FROM GISS MODEL



```
    REAL XX1(3072);XX2(3072),PLXE(3072),TH(3456)
    QEAL EUPCN(384),RUPCN(384):X3(384)
    REAL EUPCN(384),RUPCN(384):X3(384)
    OIMENSION UNH2O(4608), UNCO2(4608),UNOZ (4606), P(4608),
    * E(4608) &CLOUD(4608), TE(5376) , dTOP(5376).FE(4992),
    - TAUN(13824), FKGAS (1152) &FKGAS2(1152).EUP (4608).
    - EDN(4608), EUPC(4608), EDNC(4608) / TDF(4608) TDOFC(4608).
    * REF{4608},RDNC{460B}
    EQUIVALENCE (ITY(1),RITY(1)),(I111),RII(1))
    EQUIVALENCE (ITY(1),RITY{1)),(II\1),RII(1)
    EGUIVALENCE (FKGASII):EUP(1)};(FKGAS2{1) iEUP(1153))
    EQUIVALENCE (XXI(1),TAUN(1)), (XX2(1);TAUN(4609)1.,
    * (PLKE(1).TAUN(92I7)):(TH(1).EUP(I))
    EQUIVALENCE (A(1) EONCN(1)) (AAA(1),TDFCN(1). (SE{I),RDNCN(1)),
```



```
    * (TAUT{!):TY(1)),(AA(I),ITY(I)):(AERI(I),BE(I)),(AERZ(I),CC(I)).
    * (TNSQ(i),AERA(1))
    EGUTVALENCE (LI(1).ILI(1)).(L2(1).IL2(1)).(L3(1).IL3(1))
    EQUIVALENCE (CLOFLG(I).ICLD(I)):(AERFLG(I).IAER(1))
C*****
C*****SCALAR ARRAYS (TABLES OR USED FOR INITIIALIZATION}
    OIMENSION CQ{12,IZ)*PIZ(12,12),TA(12112),PF1(12),PF2{12):
        * TEMP{23},YE3(301),OV(11),PIAERO(12,21,NAERO(12).
        * ACOSBR(12,2),AEREKT(12,2),ATAU55(Z),PICIRO(12).
    - CIREXT(12),CCOSBM(12), COELAM(12), COEK(3)
        DIMENSION SH2O(3.3), ENEO(3.3),WK(5,3),A1(12,3), AZ(12.3),A3(I2,3),
        DIMENSION SH2O(3,3), BH2O(3,3),NK(5,3),A1(12,3),AZ(12,3), &3(12,3)
C=***DATA V/240,360,480,560,680,760,840,960,1050,1160.1320.1560/
        0ATA DV/2*60.F40.160.92440.460..45.455.,80.,120./
C****20 MICRON WATER VAPOR CONTINUUM
    DATA #K<0/2.651日E=03.7.2321E-04*6.1875E-02.4.0982E-02/
C***EIRRUS CLOUO OROPERTIES
    DATA CTAN55/1.EO/
    DATA CCOSRR/0.827220.0.812128.0.770656.0.787898.0.884853.0.906536
    * 0.92,219.0.936090.0.941993.0.937202.0.945603.0.963118/
    OATA GIREXT/1.291097,1.431162,1.025916.0.861792,0.783619.0.763708
    0.743796,0.770507,0.790540,0.742040,0.688896.0.643580/
        DATA PICIRO/0.510481,0.747140,0.680009,0.524641,0.278540,0.249016,00079191/INKHO
        * 0.219492,0.349411,0.446850,0.353371,0.263841,0.161871/ 00079201/INXHO
c*****AEROSOL PROPERTIES
    DATA ATAU55/2*0.0/
    OATA AJAERO/12*O/
    DATA ACOSBR/0.00331,0.00703,0.01093.0.01589.0.02011,0.00000.
    * 0.0.4478.0.00000,0.04604,0.04875.0.05902.0.092997.
                    0.28075,0.33668,0.35990.0.46584,0.59244,0.00000,
    * 0.55580.0.00000*0.39450.0,69517.0.71047,0.69187/
    0ATA PIAERO/0.00248,0.00678,0.00798,0,01180,0.01817,0,00000,
    0.05839,0.00000,0.04052,0.03904,0.01792,0.09674,
    * 0.05839.0.0000,0,0.04%78,0.00798,0.01180.0.01817,0.00000%
                0.05839.0.00000.0.04052,0.03904.0.01792.0.09674/
    * 0.05839,0.00000,0.04052,0.03904*0.01792.0.09674/
    * 0.02652,0.00000*0.13152.0.18389,0.10690.0.04951,
    * 0.03343,0.05528,0.08571,0.07775,0.05821,0.00000.
        0.07908,0.00000,0.15930,0.08634,0.06592,0.07536/
C*****PLANCK FUNCTION COEFFICIENT AT V
    DATA PF1/1,06671:3.60014,8,53366,13.5511,24.2626,33.8729,
        OATA PFI/1,06671:3.60014,8,53366,13,5511,24,2626,33,87298
    *N4TA PF2/345,319,517,979,690.638,805,745,978,404.1093.51%
    OATA PF2/345,319,5177:979,690,638,805,745:978,404.1093.51%
0007883 LINKHO 0007884 L INKHO 0007885 L INKHO \(0007886 L\) ?NKHO 0007887 LINKHO 00078881 INKHO
- \(1208,62,1381.28,1510,77,1669,04,1899.26,2244.57 / \quad 00079401\) INKHO C****QUAORATURE FIT COEFFICIENTS OF S.E OF WATER VAPOR AT 680,760:105000079416INKHO
```



``` - \(=3.9394 \mathrm{E}=04,1,4498 \mathrm{E}=02.6,1017 \mathrm{E}-03,2.7344 \mathrm{E}-041\)
OATA BNZO/-1.0739E-02.1.2100E-0149.6612E*03:5,8873E-02,-0.2536E 0000079446INKHO 126
```



68
69
70 70

0007899 LINKHO $0007890 L$ INKHO 0078911 INKHO 0078911. NKHO $0007892 L$ INKHO 0007893 L INKHO 0007894 LINKHO 0007895 LINKHO 0007896LINKHO $0007897 L$ INKHO 0007898 I INKHO 0007899 INKHO 0007900LINKHO 0007900 LINKHO 0007901 INKHO $0007902 L I N K H O$ 0007903 LINKHO 0007904 LINKHO 0007905 L INKHO 0007906 LINKHO 0007907 I INKHO 00079081 I NKHO 0007909 LINKHO $0007910 L$ INKHO 0007911 LINKHO $0007912 L I N K H O$ 000791 3LINKHO 000791 LLINKHO $0007915 L I N K H O$ 00079161 INKHO 0007916 LINKHO 000791 INKHO 0007919 LINXHO 10 00079201 INXHO 102 00079 ZILINKHO 103 0007922 INKHO 104 0007923 INKHO 105 0007924 LINKHO 106 $0007925 L I N K H O 107$ $\begin{array}{lll}0007925 L I N K H O & 107 \\ 0007926 L \text { INKHO } & 108\end{array}$ $\begin{array}{lll}0007926 L I N K H O & 108 \\ 0007927 L I N K H O & 109\end{array}$ $\begin{array}{ll}0007927 L I N K H O & 109 \\ 0007928 I I N K H O & 110\end{array}$ 0007929 LINKHO 111 $0007930 L I N K H O 112$ 0007931 INNKHO 113 0007932 LINKHO I14 0007933 IINKHO 115 0007934 L INKHO 116 0007935 L INKHO 117 0007936 INKHO 11 B 0007937 L INKHO 119 000793 Z LINKHO 120 0007939 INKHO 121 0007943 LINKHO 125 71
72 72 73
74
75 75
76 76
77
78 78 79 80 81 82 83 84
85 85 86 87
88 89 89
90 91 92 93 94
95 95
96 97 97 99 00 102
103 110 112 0007945 L INKHO 127

```
C****EQUADRATURE FIT COEFFICIENTS OF WATER VAPOR CONTINUUM AT 840.10SO. 0007946LINKNO
C****ANO 1160 CM-1
```



```
    *00.133841;00.1138839,=0.080575%=0.077208*0.1093719,0.045289, 0007949LINKHO
    0.039877,0.027158,0.026355/
CW**PQUADRATURE FIT COEFFICIENTS OF WATER VAPOR SELECTIVE ABSORP
    DATA A1/5.2200E=02,3.7670E-01;6.7870E-01,5.6300E-02.1.6690E=02.
    * 4.7784E=02.2.9330E=03.0.0000E 00.
                    7.9000E=04.3.4790E=03.5.0278E=01,3.1559E 01.
    1.8971E 01,=6.390E 00,*1.880E 00,=1.930E-01,-5.131E-02*
    -7.4814E=07&-7.795E=03.0.0000E 00,
                    -2.156E-03.-1.044E*02,*1.541E 00,-3.3418E 01,
    9.5713E 00.1.0424E 01.1.3858E 00.1.9100E-01.4.1490E=02.
    2.9366E-02.5.2520E-03.0.0000E 00,
                            1.4930E-03.8.2320E=03.1.3180E 00.1.4540E 01/
        OATA AZ/1.9924,1,3814,00.6081,1,0755,=0,6334,8,5722,00.0990,0,000000007960LINKHO
```



```
    *3.271,014,4357,1:854,0,0000.2,0357.0.1680,*21,886, 2. 2.0612.1,0139.0007963L, INKH0
```



```
    *8.722E-02.9.9179.1.0342/
```



```
    *0.000E 00.2.686E=03.2.684E-03,2.605E=03.2.614E=03.3.1.875E=02,
    -10883E-02.1.890E-02+1,878E-02,1.884E-02,0.000E 00.1.880E-02,
    *1.881E-02,1.826E-02,1.833E-02+1201.01
        OATA A4/0.02042.0.02050.0.02051,0.02114.0.02246,0.02061,0.02149.
        0,0000,0.01716,0.02114,0.01798,0.01825,2*0.0866,0.0868.0.0887.
    0.009089,0.08739.0.08918,0.0000,0.0876,0.0881,0.0791.0.07993.
    *0.09089,
CO***OQUADRATURE FIT COEFFICIENT OF CARBON DIOXIDE AT 6BO 760 CM-I
        DATA B1/1.2797,6.8537,205.7428,-1.8824.-12.8905,-277.6561.1.64.15.
        110.0278.294.3491%
    1.3940E=02,9.7310E*02.4.0701E 00.-2.0.477E-02,-1.4362E-01,
    *-5,9580E 00,7,6840E=03,5,3619E-02.2.2095E 00/
    OATA B2/0.8390,0.7830.6.3778,-0.03698,-0.7746,-3.2630,00.03424,
        OATA B2/0.8390.0
    11.0889,12.1420,
    *4.2260E=02,1.2300E=01,8,4814E 01.1,8314E-01,9.6351E-01,=1.2326E 0
    **04.6549E-02%-2.9198E-01,5.0527E 01/
        DATA E3/0.1639,0.9301,1.3578,00.2338,0.06693,00.3961,0.2574.
        1=0.02676.0.2506,
        -4.9551E=01,-5.2184E-02,1.3029E-01,1.0379E 00.1.1386E-01.
    *-1.6563E-01,3.7517E-01,9.2239E=01,1.2554E 00/
C****OUAORATURE FIT COEFFICIENT OF OZONE AT 1050 CM-1
    OATA C1/14.1003.89.8407.-14.3659,*91.9237,9.7780.59.5051/
        DATA C2/1.0952,2.9766.1.3823.11.92584*0.1894.03.2240/
        OATA IOATA/O/
C FTEMP (X;Y,Z,T) #X*Y*T*Z*T*F
C*****
G*E***G LAYERS DYNAHICAL MODEL IS USED HITH 3 EXTRA LAYER AT O-IO ME
C*ree*
    IF(IDATA) 2000.2000:2001
2000 CONTINUE
NLAY=9
C****GROUND ALBEDO
    AGRND=O.
```



```
    NLAY1 ENLAY + I 
        NLAYRS=NLAY**
        NGI=NG*I
C*****CM-STP/MG S.11E-4*2.24E4/44/G
    CO2CM=2.65287E-0I
COSE*SSTRAPOSPHERIC WATER VAPOR MIXING RATIO
    H2OMIX=3.E=06
    COEK(1) #.293478
```




```
    COEK(2) =.413043 00080094INKH0.191
    COEK(3) =COEK(1)
    COELAM(1)=0V(1)*120.
    COELAM(12)=OV(11)*&OO.
    DO 43 LAH=2. I1
    43 COELAM (LAM) =OV (LAM) & DV (LAM-1)
C**** CALCULATE COSEAR ANO TAUAER\trianglePIAERO
    00 51 N=I;NLAYRS
    O1 LAMEl,12
    GO 10 (52,52,52.53.52.52,52,52,52,52,52,54),N
    TA(LAM,N)=AEREXT(LAM,1)*ATAU55(1)
    NN=1
    GO TO 58
54 TA(LAM,N) #AEREXT(LAM:Z)*ATAUS5 (2)
    NN=2
    GOTO 58
52 TA(LAM,N)=0.
```



```
C NN UNOEFINEL, FOR THIS G्GANCH IN ORIFINAL
C SET IT TO I TO PREVENT MACHINE PRORLEMS
O**)
    NN=!
S8 CB(LAM,N) = (ACOSGR{LAM,NN) #AEREXT(LAM,NN) ©CCOSBR{LAM}`GIREXT(N})/
    * (AEREXT(LAM*NN) & CIREXT (N)*1.E*40)
    PIZ(LAM,N)=TA(LAM,N) &PIAERO(LAM,NN)
    CONTINUE
    H2OFAC=H2OMIX/(H2OMIX*I.)*I.27E3
    ZERO(17384) =0.0
    ONE(11384) = 1.0
C.....FILL UP TSTR FIRST TIME AROUND ONLY
    TSTR(1;1152)=200.0
    IDATA=I
2001 CONTINUE
C****
C**** CHANGE GRIO TO I2 LAYERS
C**** COMPUTE LAYER THICXNESS*-STORED IN UNCOZ
C*** UNCO2(1)=FE(Z) =PE(1) x 2. -0,*2.
C**** UNCO2(2) =PE (3) -PE(2) =5, -2, =3.
C**** UNCOZ(3)=PE(4)-PE (3) =10.-5,=5.
    UNCOZ(1:384)=2.
    UNCOZ(1:384)=2. 
    UNCO2(385:384)=3.
C UNCO2(N)@PLE{N-Z}-PLE{N-3) *. N##,NLAYRS
    UNCO2(1153;3456)=PLE (385\3456)-PLE(113456)
C**** COMPUTE H2O PARTIAL PRESSURE
C FOR GAYERS I TO 3 UNH2OFH2OHIX/(1)H2OMIX)*1.27E3* (LAYER THICKNESS)
C FOR LAYERS & TO 12 UNH20= (SHL{N-3}*1,E=20)*1.27E3* (LAYER THICKNESS)
    UNH2O(111152) =H2OFAC*UNCO2(1:1152)
    UNH2O(111152) =H2OFAC#UNCO2(1;1152)
    UNH2O(1153:3456) =SHL{1{3456} +1.E-20
    UNH2O(115313456) #NNCO(1153:3456) #UNCO2(115313456)
C****
    UNO3(1:384) #OZALE{4225:384)-0ZALE(3841:384)
    UNO3(385:384) =OZALE(4609:384)=OZALE(4225;384)
    UNO3(769;384)=OZALE (1 $384)=OZALE(4609%384)
    UNO3{769{384)=02ALE(1 {384)=0ZALE(46097384)
    UNO3(1153:3456):O2ALE(385:34
C**** COMPUTE
C PE(I)=0.
    PE(2)mz*
    PE(3)=5.
    PE(4)=10.4.3
    O0 &8 NE1:3 (N) QRE(N+1))\(2
000801OLINKHO 192
0008011LINKHO 193
0008O12LINKHO }19
008O13L INKHO 194
008013LINKHO 195
0008014LINKHO 196
OOOROISLINKHO 197
OONOISLINKHO 197
000BOI7LINKHO 199
0008018LINKHO 200
00880191 INKHO 201
INKHO 201
0008020LINKHO Z02
OOOROZILINXHO }20
000ROZILINKHO 203
0008023LINKHO 205
0008024GINXHO 206
OOOROL5LINKHO 207
0008026LINKHO 208
0089027L INKHO 209
0008027LINKHO 209
00080Z8LINKHO 210
0008029LINKHO 211
00080301.INXHO 212
00080316INKHO 213
0008032LINKMO 214
0008033LINKHO 215
0008033LINKMO 215
0008034LINKHO 216
0008035LINKHO 217
0008036LINKHO 218
0008038LINKHO 218
0008037LINKHO 219
0008038LINKHO 220
000804OLINKHO 222
0008O4OLNKHO 222
0008041LINKHO 223
0008042LINKHO 224
0008043LINKHO 225
0008043LINKHO 225
0008044LINKHO 226
000B045LINNKHO 227
0008045LINKHO 227
0008047LINKHO 229
0008048LINKHO 230
00G8048LINKHO 230
0008048LINKHO 230
0008050LINKHO 232
0008051LINKHO 233
0008051LINKHO 233
0008052LINKHO}23
0008053LINKHO 235
0008055LINKHO 237
0008055LINKHO 237
0008056LINKHO 238
0008057LINKHO 239
0008058LINKHO 240
0008058LINKHO 240
0008060tINKHO 242
0008061LINKHO 243
0008062LINKHO 244
0008062LINKHO 244
OOOB063LINKHO 245
0008063L INKNO 245
0008064LINKHO 246
000806SEINKHO 247
000806SLINKHO 247
0008067LINKHO 249
OOOROG&LINKHO 250
0008069LINKHO 251
0008069LINKHO 251
0008071LINKHO 253
```

```
C 48 P(N)*PSTR(N)/101.3.25
    P{1t\84}#.98692323E*3
    P{385:384}z.34542314E=Z
    P{7691384)**74019209E記
    p{1153:3456}=PL{{13456)/1013
C**&E COMPUTE E (H2O ABSORPTION PARAMETER)
C E(N) आUNH2Ó(N)=12*38E=4/UNCO2(N) #P(N)
    E{I$4608)={2.38E=4*(UNH20(1)$4608)
    E(1;4608)=E{1;4608)/UNCO2(1$4608)
    E{114608) #E (1;4608)*P(1%4608)
C**** COMPUTE TE ILOGARITHMIC INTERPOLATION
    TH{113456) =TL(1;3456)/P(XX(1)3456)
C
C USE EXPBYK APPROXIMATION TO GET **.286
    00 66 L=2,NLAY
    KL=(L-1)*384*)
    KL1=KL-384
    CALL EXPBYKZ(PLKE(K&1):PLE(KL),L)
    66 CONTTNUE
        XXI(1;3072) =PLKE (1:3072) -PLK(38543072)
        XX1(1;3072) =xX1(1;3072)&TH(1:3072)
        xXZ(1;3072) =PLK{1:3072) -PLKE{1;3072)
        xx2(113072) =xx2{1!3072)*TH(385;3072)
        Xx1{1{3072} =xx1(1;3072) कxX2(1;3072)
        XX{(1;3072) = XX1(1;3072} BPLKE{{:3072)
        XX2{1;3072) कPLK(1;3072)-pLK(385:3072)
        TE(1537;3072)=xX1(113072)/XX2(1;3072)
C**** TE FOR STRATOSPHERE
    TE(385:384) =TSTR(1:384) +TSTR(385:384)
    TE (3854384) =TE (3858384)/2.
    TE(769{384) :TSTR(385:384)*TSTR(7691384)
    TE(769;384) =TE(769;384)/2
    TE{1:384)=2.*TSTR(1;384)
    TE{1:384)=2.*TSTR(1;384)
- TE(I153:384)=TSTR(769:384)*TLI17384)
    TE(I153:384) #TSTR(769:384)*T
C*** TE NEAR GROUND
    TE(4609;384)=TS(1;384)
    TE(4993:384) =TG(1;384)
C****
    COZ PARIIAL PRESSUNE FROM DP
    UNCO2(1;4608) #COZCM*UNCO2(1;4608)
G*0**
    CLOUD ARHAY
    NCLOUO (111152)=0
    NC6OUO(115333656)=CLOUO(1:3456)
C0***
CO日** EAND LOOP
C**e*
    FE(1;4992) =0.
    FLXONG(11384)=0
    OO द00 LAHEI,12
C**** CALCULATE OPTICAL.THICXNESS TAU(N,K)
    DO 2 N*!,NLAYRS
    NPTR=(N-1)*384*!
    IF (N:LE:3) TN({:384)=TSTR(NPTR:384)/273.
    IF (N,GE,4) TN(1;384) =TL{(N-6)9384*1;384)/273.
    TNSQ(1;384)=TN(1;384)*TN(1;384)
C A=FTEMP(AIILAM,I],Al(LAM,2I,A](LAM,3),TN)
    A(1:384) =A (LAAHF3)OTNSQ(11384)
    X1(1:384) =AL (LAN:Z己) #TN(1 $384)
    A{1:384) =A (1:384}*X{{1:384)
    A(1:384) #A(1;384)*A1(LAM,1)
    AAA=FTEMP (AZ (LAM, 1),AZ(LAM,C), AZ (LAM,3), TN
00080726.INKMO 254
0008073LINKHO 255
00080741 INKHO 256
0008075t INKHO 256
0008075LINKHO 257
0008076LINKHO 25B
0008076LINKHO 258
0008078LINKHO 260
0008079LINKHO 261
0008O80LINKHO 262
00O8081LINKHO 263
000日082IINKHO 263
000&08ZLINKHO 264
00O8083LINKHO 265
0008084LINKHO 266
0008084LINKHO 266
0008086L. INKMO 268
0008087LINXHO 269
00080日8LINKHO 270
0008089L INXHO 271
O00809OL INKHO 271
O0ORO9OLINXHO 272
0008091LINKHO 273
0008092LINKHO 274
0009093LINKHO 275
0008094LINKMO 276
0008095LINKHO 277
0008096LINKHO 278
0009097L NKHO 27%
000R097LINKHO 279
0008098LINKHO 280
0008099LINKHO 281
OOOR100LINKHO 28Z
OOOR1OILINKHO 283
0008102L.INKHO 284
0008103LINKHO 285
00081041INKHO 285
0008104LINKHO 286
0008105LINKHO 287
0009106LINKHO 288
000R100LINKHO 288
O008107LINKHO 289
0008109LINKHO 291
0008110LINKHO' 292
0008111LINKHO 293
O0OR112LINKHO 293
000RIILLINKHO 294
0008113LINKHO 295
OOOB114LINXHO 296
0008115LINKHO 297
00n8116LINKHO 298
OOOB1171INKHO 299
000B118LINKHO 300
00081191INKHO 300
0008119LINKHO 301
000812OLINKHO 302
OOORI21LINKHO 303
O00RIEILINKHO 303
0008I22LINKYO 304
0008124LINKHO 306
OODBIZ5LTNKHO 307
0008125LINKHO 307
0008126LINKHO 308
0008127LINKHO 309
O008128LINKHO 310
000&129LINKHO 311
0008129LINKHO 311
0008131IINKHO 313
0008132LINKHO 314
000&133LINKHO 315
0008134LINKHO 316
```

```
    AAA(11384)=A2(LAM;3)*TNSQ(11384) 0008135LINKMO 31,7
    XI(1!384)=A2(LAH,2)*TN(1:384)
    AAA(1;384) #AAA (1;394)* X1 (1;384)
    AAA(11384) =AAA(1 {384)*AC(LAM & 1)
    GO TO (3,3,3,3,20,20,3,3,20,3,3,3) LAM
C**#* OVERLAPPING REGION
C*EEA WATER VAPOR SELECTIVE ABSORPTION
20 JJ=LAM-4
    JJ=LAM-A (LAM.EQ.9) JJ=LAM=6
    SE=FTEMP(SH2O(JJ,1):SH2O(JJ,Z),SH20(JJ, 3),TN)
    SE(11384)#SHZO(JJ,3) FTNSO(11384)
    X1(1;384) =SH20(JJ,2)-TN(1;384)
    SE{1;384}=SE(1:384)+X1(1;384)
    SE(1 | 384)=SE(1:38*) +SH20(JJ.1)
```



```
    EE (1:384) = (H2O(NJ:3)*FNSO(1!394)
    X1(1;384) =8H20(JJ,2).TN(1&384)
    BE(I!384) EEE(1:384)+XI(I!384)
    gE(1;324) =8E(1;384) *日420(JJ.1)
    BE(11384)=EE(11384)*P(NPTR:384)
    X1(1:3A4):SE(1:384)*UNH2O(NPYR{384}
    X1 (1;384) x (1)(11384)/BE(1:384)
    L1(1;384)=x1(11384).LF.1,E=2
C IF(NOT LI) TAUNZ.*EE*(SQRT {I* XI}-1)
    TAU(1;384)m2.*日E(11384)
    X1(1{384)mX1{1%384)+1.
    X{(1;384)=VSQRT(X1(1;384);X1(1;384))
        X1{1;384) ax ({1;384) =1.
        TAU(1:384)=7AU(1;384)解(113884)
C $F{LI)TAU=SEWIJNH2O(N)
    X1(1;384)=SE(15384)*UNH2O(NPTR:384)
    TAU(1 $384) #QBVCTRL(X1(1;384).61(1:384):TAU(11384))
5
C^*凶4 GO FO {G,6:Z1),JJ, -
C21 SE1=12.7892-14.3649*TN+7.3921*TN*TN
21 SE(1;384)=7.3921*YNSQ{11384)
    X1(11384)={4-3689*TN(11384)
    SE(1:384)=SE(1:384)-X1(1:384)
C
    BEl=(1.0635*1.9570*TN=.3227*TN*TN)*PN
    BEl=(1.0635*1.9570*TN=.3227*
    BE(11384)=-3227*TNSQ{1:384
    8E(1;384)=&E{1;384}* X1(1:384)
    BE (1;384):&E (1;384)* 1.0635
    BE(1;384)=&E(15384)*ロ(NPTM;384)
    UN:11 $384} %UNO3{NPTR{384)
    GO rO 7
C****
    CARBON OIOXIOE SELECTIVE ABSORDTION
    CAREON OIOXIOE SELECTIVE ABSORDTION
C6
    SE(1,384)=4.2328*TNSQ{1,384
    X1(1;384)=6.9225*TN(11364)
    SE(1;384)xSE(1)384)-x1(1:384)
    SE(1;384)=SE(1;384)* T.4197
    BEI* (0.1697-0.1734*TN*0.2410*TN*TN *PN
    BE({!384)=,2410NTNSO(1;384)
    BE(11384)#,2410*TNSQ(11338
    X{(1$384)=,1734#TN(II384)
    GE(1:384) =日E(11:384)-x{11:13
    日E{1:384}=BE{{$384)*P(NPTR3384)
    IF (LAM,NE.6) GO TO }66
    SE(1;384)=,0815*TNSQ{1:384)
    X1(1;384)E,1113*TN{1;384)
    SE(1{384)=SE{(1;384)-X!(11384)
```

0008135 I INKHO 31．7
$0008136 L I N K H O ~ 318$
0008137 LINKHO 319
$0008138 L$ INKHO 320
000813961 NKHO 321
0008140 LINKHO 322
00081411 INKHO 323
00314ILINKHO 323
000714ZLINKHO 324
$000 \mathrm{B143LINKHO} 325$
0008144 LINKHO 326
0008145 LINXHO 327
000 1146LINKHO 328
0008147 LINKHO 329
0008148 LINKHO 330
0008149 L TNKHO 331
$000149 L$ INKHO 331
0008150 LINKHD 332
0008151 LN INHO 333
0008152 L INKHO 334
0008153 L INXHO 335
00กA1546INKRO 336
DOO8155LINKHO 337
0008156LINKHO 338
000 IS 15 LINKHO 339
000B1S8IINKHO 340
0008159 L INXHO 341
$\begin{array}{ll}0008159 \mathrm{~L} & 1 \mathrm{NXHO} \\ 041 \\ 0008160 \mathrm{INXHO} & 342\end{array}$
$0008161 L^{2}$ NXHO 343
000816 LLINKHO 344
OOOR1\＆3LINKHO 345
OOOSㄹG4LINKHO 346
000 B 165 LINKHO 347
000 Bl GG1 INKHO 348
000 167．INXHO 349
OOOR168LINKHO 350
0008169 INKHO 351
0008170 LINKHO 352
OOOS171LINKHO 353
000817 LL INKHO 354
000 A 7 ZLINKHO 355
0008174 LINKHO 356
0008175 INKHO 357
OOOR176LINKHO 358
0001177 LINKHO 359
OOOB178LINXHO 360
0008179 LINKHO 351
000 S1801 INKMO 362
O00818ILINKHO 363
$0008182 L$ INKHO 364
000 183LINKHO 365
0008184 INKHO 366
000 B185LINKHO 367
00081861 INXHO 368
0008187 LINKHO 369
OOOR18BLINKHO 370
0009189 LINKHO 371
$0008190 L I N K H O \quad 372$
$0008191 L I N K H O \quad 373$
0008192 L INKHO 374
$0008193 L$ INKHO 375
$000 \mathrm{B194LINKHO} 376$
0008195 L INKHO 377
000 B 96 E INKHO 378
000 BI 96 L INKHO 378
0008197 INKHO 379

```
    SE(1;384)=SE(1;384)*.0392
    EE(1;384)=0297*TNSQ(1;384)
    x1{1;384}=.2077*TN(11388)
    BE (1{384) =日E (1:384)*x|{!{384)
    gE (1;384) = EE (1 $384) - 0651
    日E{1{384) = QE{j:384) & (NPTR{384)
    CONTINUE
    UN1 (11384) =UNCO2(NPTR!384)
C7 XX=SElOUN1/BEI
    X1(1:384) #SE (11384) ©UN{ (13384)
        x{(1:384) #x|{1:384)/8E{1:384)
        X!(1f384)=x1(1:384)/8E{1:384)
C IF(.NOT.LI)
            TAU1(I;386) 22.*日E(1;384)
            x1(1|384)#xI(1;384)*1.
        X1(1:384)mvSQRT(X1(1:384)|X{(1:384))
        X1(1:384)mVSQRT(X1{1:384
        X|(1:384)=x|(1:384)=1% (1)
C {F{1,1)
```



```
            TANI(1;384)3C8VCTRG(XI(1:384):LI(1:384):TAUl(1:384))
            TAUT(1:384)=TAU(14384)*TAU1(1:384)
        00 14 K=1:3
C
    KPTR=(K-1)-384*1
    NKPTR= (K-1)*4608*NPTR
    AA(1;384)=A(I:384) कA3(LAN:K)
    BE(1:384)=AAA(1:384) EA4 (LAH,K)
C
    FKGAS(K) =AA*PN/{I.&BE*PN)
    X1(1%384)=88(1;384)*P(NPTK:384)
    X1(1;384)=X (1);384)*1.
    FKGAS(KPTR;384}#AA{1:384}*P{NPTR{384}
    FKGAS (KPTR#j84) =FKGAS (KPTR;384)/X1(I|384)
    IF (LAM,EQ,91 GO TO 15
```



```
    AA(11384) 日日! (K,3&UJ) #TNSQ(1:384)
    X1(1;384)=8!(K, 己;JJ)*TN(1:384)
    AA(1138*)=AA(11384) x \ (11384)
    AA(1;384)#A& (1:384) क & ( (K.1.JJ)
    FKGAS2(KPTH;384) =AA (1 $384)*P(NPPR:384)
    B8=FTEMP(A2{K,1,JN), 甘Z(K,Z,JJ),32{K,3,JJ),TN)
    B8(1;384)=82(K:3;,\)) TNSQ(1:384)
    X1{1;384)=82(K;Z,JJ)*TN{1;384)
    日日(1;384)*8g(1:384)*X1(1:384)
    B8(1:384) =88(1:384) *82(K,I*JJ)
C CC=FTEMP{B3{K,I,NJ},83{K,Z,JJ),日3(K,3.JJ},TN)
    CC(1;384)=日3(K,3;J\) ETNSO(1;384)
```



```
    CC(11:384) =CC(1:384)* X{(11384)
    CC(1;384)=CC(1;384)*日3(K,1,\J)
    CC(11384)=CC(1:384)*日3(K+1*JJ)
    FKGAS2(K)=AA*PN/(1**日B*PN**CC)
    XI{(1;384)=x|(11384) sea(11384)
    X1(1:384)=x (1:3384)+1*
    FKGAS2 (KPTR|384) =AA (1:384) &P (NPTR{384)
    FKGASZ(XPTR!384) #FKGAS2(KPYR;384)/X1(1:384)
    GO TO 161
    GO T0 (23,23,17),K
15 GO TO {23,23,17),K
C23 AA=FTEMP(C1(K,1),Cl(K,2),Cl(K,3),TN)
    AAFFTEHP(C1(K,1),Cl(K,2);C1(K,3),TN)
    AA {1:384}=CI(K,3) TNSQ(1;384)
    X1(11384)=Cl(K,2)*TN(11384)
    AA(11384)=AA(1:384)* X1(11384)
    AA (1 +384) =AA (1) 3g4) &CI (K,l)
```

0008198 L INKHO 380 $0008199 L I N K H O 381$ $0008200 L$ INKHO 382 000820 LLINKHO 383 $0008202 L I N K H O 384$ $0008203 L$ INKHO 385 000 A204LINKHO 386 000月204IINKMO 386 0008205 INKHO 387
0008206 L INKHO 388 0008206 L INKHO 388
$0008207 L$ INKHO 389 0008207 LINKHO 389
$0008208 L$ INKHO 390 0008208 L INKHO 390
0008209 L INKHO 391 0008210 LINKHO 392 00082116 INKHO 393 0008212 L INKHO 394 $0008213 L I N K K O 395$ 0008214 LNKHO 396 0008214 LNKNO 396
$0008215 L I N K H O$
097 0008216 L INKHO 398 0008217 L INKHO 399 0008218LINKHO 400
0008219 L INKHO 401
0008220 LINKHO 402 0008 EZILINKHO 403 $0008222 L$ INKHO $\$ 04$ 0008223 LINKHO －05 0008224 L INKHO 406 0008225 INKHO 407 $000 \mathrm{B226LINKHO} 408$ 0008227 LINKHO 409
$0008228 L$ INKHO 410 00082291 INKHO 411 0009230 L INKHO 412 $0009231 L I N K H O 413$
$0008232 L$ NKH 414 0008233 LINKHO 415 000R己34LINKHO 416 0008235 LINKHO 417
000 g236LINKHO 418 000 P237LINKHO 419 $0008238 L I N K H O 420$ 0008239 I INKно 421
$0008239 L I N K H O ~ 421$
$0008240 L$ INKHO 422 0008241 INKHO 4 23 000824 LLINKHO 424 000824 FLINKHO 425 0008244 LINKHO 426 0009245 L INKHO 427 0008246 L INXHO 428 0008247 INKHO 4 दू 0008248 E INKHO 430 0008249 L INXHO 431 $0008250 L$ INKHO 632 0008251 INKHO 433 $0008252 L$ INKHO 434 000825 ILINKHO 435 0008254 LNKHO 436 000 R255LINKHO 437 0008256 INKHO 438 0008257 I INKHO 439 $0008258 L$ INKHO 440 0008259 INKHO 441 $0008260 L$ INKHO 442




```
C SET PIO TO ZERO FOR OARK CLOUOS
TO ZERO FOR OARK CLOUOS (OLO TO
    X1(1;384) TAUN(NKPTR;384)*1,E=40 0008390LINKHO
    PTO{11384)7x2(1:384)/X1(1:384)
    IF{N.LE.3) TN(1:384}=TSTR{NPTR:384)/273.
    IF{N,GE,4} TN(1;384)=TL({N=4)*384*1;384)/273.
    IF{N,GE,4} TN(11&384)=TL({N=4)*3g4*17384)/27
    IF(TN,GE, %5348,AND,NCLOUD(N),
    L2(1;384) =NCLOUO (NPTR!384).GT.0
        L1(1;384)*L1(1;384)*AND.LZ(1;384)
        PTO(1:384)sQ8VCTRL(2ERO(1:384) 4L(1:384);P10(1;384))
C**** EMISSION CALCULATIONS FOR HAZE LAYER,
G**** EXACT IN THE SENSE OF ISOTROPIC SCATTERING
C*** EXACT SOLUTION&TWO-STREAN SOLUTIONSFORGE FACTOR(PIO &TAUO)
    EXACT SOLUTION=TWO-STREAN SOL(
    LOUT=48SCNT (L1;1:384))
    IF (LOUT.EQ.O) GO TO 165
    PIOC(1,LOUT) =Q8VCMPRS(PIO(1,384):L1(1:384):PIOC(1|LOUT))
    TAUNC (1;LOUT) #Q8VCMPRS(TAUN(NKPTR:384) &1!(1:384):TAUNC{!{LOUT))
```



```
    gTOFNP (1:LOUT) =G8VCMPRS(ETOP(NPTR*384:384),L\(1:384):
    gTOPNP(1:LOUT)=G
    gTOPNP{1 &LOUT))
    XI(IILOUT) &PIOC(1&LOUT)*C8(LAM,N)
    AER2 (1;LOUT)=ONE(1;LOUT)-XI (1!LOUT)
    X1(1:LOUT)FAER1 (1:LOLT)/AER2(1!LOUT)
    AERA{1;LOUT} =VSORT{X1(1:LOUT);AERA(1:LOUT)
    AERU(1 ILOUT) =ONE (1ILQUT) -AERA (1 ILOUT)
    AERU (1 {LOUT) =AERU(1/LOUT)/Z
    AERU(1 ILOUT) *AERU($ILOUT)/Z.
    AERV{I|LOUT) =AERV(11LOUT)/2,0
    AERC(IILOUT)=3.0*AERI(IILOUT)
    AERC(1;LOUT)=AERC(1ILOUT)*AERE(1:LOUT)
    AERC(1;LOUT) =VSQRT (AERC(IILOUT) {AERC(1:LOUT))
    XI(1%LOUT) =AERC (1;LOUT) WTAUNC(1 ILOUT)
    X1(11LOUT)=AERC (1;LOUT
    XI(1ILOUT)x=X1 (1ILOUT)
C.*.*. TEHPORARY TRAP FOR UNDERFLOWS (AS IN SCALAR CODE)
        TSTEXP(IILOUT) = Xi(1ILOUT) LTT. -I8O.218
        EXI(1ILOUT) = Q8VGTRL(ZERO(1;LOUT),TSTEXP(1{LOUT):EXI(1|LOUT))
        TSTEXP(IILOUT) = EXI(IILOUT) .LT. 1.E=30
        EX1 (1;LOUT) * Q8VCTRL(ZERO(1:LOUT),TSTEXP(1:LOUT):EXI(1;LOUT)}
```



```
C*** FORGE FACTOR FOR SOTROPIC SCATTERIN
    FORGE FACTOR FOR ISOTROPIC SCATTERING
    X1 (IILOUTI=AERV (1;LOUT) AERV (1FLOUT)
    X2(1:LOUT) =AERU(IILOUT) *AERU(1;LOUT)
    X2(1ILOUT) =X2 \11LOUT) ExZ(1:LOUT)
    DENO (1;LOUT) =X1 (11LOUT) =x2(1:LOUT)
    ONMO (1;LOUT) &BTOPN {1ILOUT)=gTOPNP {1;LOUT)
    ONMO (1 ILOUT) =ONMO (1;LOUT)/TAUNC (1 &LOUT)
    ONMO(1 LLOUT) =ONMO(1;LOUT)/TAUNC (I;LOUT)
    ONMO(I ILOUT)=ONMO (1 ILOUT)/AERC(1:LOU
    X!(1;LOUT)=AERV (1;LOUT)=xC(1;LOUT)
    X2(1;LOUT) FAERA(1)LOUT) EEX1(1;LOUT)
    X1(11LOUT) =X1(1:LOUT)=x2(1:LOUT)
    ONMO(1;LOUT)=ONMO(1\LOUT)*XI (1:LOUT)
    X1(1:GOUT) =AERU(1;LOUT) ©EXZ{11LOUT)
    ONM!(1;LOUT)=AERV(1;LOUT) & XI(1&LOUT)
C
    ONHI (1;LOUT)#AERV (1;LOUT) & I (1%LOUT)
EUP (N) = (BTOP (N) #ONH1-ONMO&BTOP(N+1) EEXI)/OENO*FTWOWAERA
C*&O* USE TAUNG FOR TEMPORARY STORAGE
```

0008388 LINKHO 570 0008389 L INKHO 571 00083901 INKHO 572 $0008391 L I N K H O 573$ 0008392 LINXHO 574 $0008393 L I N K H O 575$ 0008394 LINKHO 576 0008395 LINKHO 577 0008395 LINKHO 577 0008396 LINKMO 578 0008397 I INKHO 579 $0008398 L I N X H O 580$ 0008399 L INKHO 591 OOO840OLINKHO 582 OOOR4OLLINKHO 583 0008402 LINKHO 584 $0008403 L I N K H O 585$ 0008404 LINKHO 586 $\begin{array}{ll}0008404 L I N K H O & 586 \\ 0008405 L I N K H O & 587\end{array}$ 0008406 L INKHO 588 0008407 LINKHO 589 0008408 L INKHO 590 0008409 LINKMO 591 $0008410 L$ INKHO 592 0008410 LINXHO 592 $\begin{array}{ll}0008411 L I N K H O & 593 \\ 0008412 L I N X H O & 594\end{array}$ $0008413 L$ INKHO 595 0008414 LINKHO 596 0008415 LINKHO 597 0008416LINKHO 598 0008417 I INXHO 599 000841 GLINKHO 600 0008419 LINKHO 601 000日420LINKHO 602 0008421LINKHO 603 0008422 L INKHO 604 $0008423 L$ INKMO 605 0008424 IINKHO 606 0008425 F INXHO 607 0008425 LINKHO 607 $0008426 L I N K H O 608$ 0008427 LINKHO 809 0008428 LINKHO 610 $0008429 L I N K H O 611$ 0008430 LINKHO 612 0008431 LINKHO 613 0008432 L INKHO 614 00031 KNKHO 0008433 LINKHO 615 0008434 INKKO 616 $0008435 L$ INKHO 617 $0008436 L$ INKHO 618 0008437 LINKHO 619 $0008438 L$ INKHO 620 0008439 INKHO 621 00043 ONK $0008440 L$ INKHO 622 $0008441 L I N X H O 623$ $0008442 L$ INKHO 624 0008443 LINKHO 625 0008444 INKHO 626 0008455 LINKHO 627 0008446 IINKHO 628 00084 7 INKMO 629 0008447 INKHO 629 $\begin{array}{ll}0008448 L \text { INKHO } 630 \\ 0008449 L I N K H O & 631\end{array}$

```
    X1(1)LOUT) =6TOPN(1ILOUT) ODNMI (1:ILOUT)
    X1(1;LOUT)=X1(1;LOUT)=DNMO(11LOUT)
    X2(1:LOUT)=ETOPNP(1HLOUT)*EXI(1ILOUT)
    TAUNC(1;LOUT):*1(1HOUT)-x2(1&LOUT)
    TAUNC(:ILOUT)=TAUNC(11LOUT)/OENO(1;LOUT)
    TAUNC(1;LOUT) &TAUNC (1;LOUT)*AERA(11LOUT)
    EUP (NPTR;384) =GEVAPND (TAUNC (1;LOUT) LLI(11384):EUP (NPTR:386))
C EDN(N)*(GTOP (N+1)*ONMI &ONMO.EBTOP (N) EXI)/OENO@FTWO*AERA
    TAUNC(1;LOUT) = =TOPNP (1ILOUT) ONMI (1:LOUT)
    TAUNC{1;LOUT)=TAUNC(1;LOUT)&ONMO(1:LOUT)
    X1(1|LOUT)= ETOPN(1:LOUT) EXI(1;LOUT)
    TAUNG(1:LOUT)=TAUNC(1;LDUT)-xi(116OUT)
    TAUNC(1;LOUT)=TAUNC(1;LOUT) DEENO(1ILOUT)
    TAUNC{1;LOUT) =TAUNC (1;LOUT) AAERA(1:LOUT)
    EDN(NPTR:384) =Q8VXPND(TAUNC {1;384),L1(1;384);EDN(NPTR:384))
C**** REF(N):TDF(N) BASED ON TWO STREAM SOLUTION
C REF (N) A AERU*AERV* (1, -EX2)/DENO
    TAUNC{1:LOUT)=AERU(1;LOUT)*AERV(1;LOUP)
    X1(1;LOUT)=ONE(1:LOUT)-EX2{1HLOUT)
    TAUNC(1ILOUT)=TAUNG(1:LOUT)-XI(1:LOUT)
    TAUNC(1;LOUT)=TAUNC(1;LOUTI/DENO (1&LOUT)
    REF(NPTR;386)=Q8VVPNND(TAUNC(1;LOUT);LI(1:384):REF(NPTR;384))
c
    TDF (N) & (AERV AEERU)/OENO*EXI
    TAUNC{1;LOUT)=AERV(1ILOUT)=AERU(1ILOUT)
    TAUNC(1ILOUT) mTAUNC(1;LOUT)/DENO(1:LOUT)
    TAUNC(IILOUT)ETAUNC(1ILOUT)&EXI(1;LOUT)
    TDF(NPTR:384) =08VXPND(TALNC (1:LUUT) &L1(1:384):TOF(NPTR:384))
165
    CONTINUE
Cow.m
CS*es DARK CLOUDS
G****
C NEXT TEST ON CLOUDS
C BUT FIRST TEST EXCLUDES ALL OTHERS
        L1(11384)=.NOT.L\(113384)
        L3(1:384) =NCLOUO(NPTR;384).GT.0
C L2=SECONO TESTE.NOT.FIRST.AND.SECONO
        L2(11384)B(1(11384),AND.L3(1:384)
C Ll=.NOT.FIRST.AND..NOT,SECONO
    L3(11384)*.NOT,L2(11384)
    L3{13384)=.NOT,42(11384)
    Ll(1;3A4) mL1(1;384),AND.L3(11384)
    TOF{NPTR:384)=QBYCTRL(ZERO(1:384),L2(1:384):TDF(NPTR:384))
    EDN(NPTR:384)=Q8VCTRL (BTOP(NPTR* 384:384):LZ(1;384):EON(NPTR:384))
    EUP(NPTR:384)=Q8VCTRL(BTOP(NPTR:384),L2(1:384):EUP(NPTR:384))
C****
C*OE* PHICK LAYER
C**OQ
    L.2{1;384)=TAUN(NKPTR;384).GT.15.
    L2{1;384)=L2(1:384).AND.L\(1;384)
    TDF(NPTR;384) #Q8VCTRL(ZERO(1;384),L2(1;384):TDF(NPTR;384))
    EXTAU(1:384)=G8VCTRL{2ERO(1:384),L2(1:384):EXTAU(1:384))
C****
C**** TRANSPARENT LAYER
C=#%*
    L2(13386)=TAUN(NKPTR!384).LT.1.E=4
    L2(1;384)=L2(1;384).ANO.L1(!:384)
```



```
    REF(NPTR;384)=08VCTRL(ZERO(1:384);L2(1:384)|REF(NPTR:384))
    EXTAU(NPTR;384) בQ&VCTRL(ZERO(11384),H2(11384);EXTAU(NPTR{384))
    EUP(NPTR:384) =G8VCTRL(ZERO(1:384);L2(1:384):EUP(NPTR:384))
    EDN(NPTR1384) =Q8VCTRL(ZERO(1;384),L2(1!384)3EON(NPTR:384))
coses
C*** INTERMEDIATE RANGE
```

$0008450 L$ INKHO 632 0008451 LINKHO 633 0008452 LINKHO 634 0008453 LINKHO 635 0008454 LINKHO 636 0008455 L NKHOO 637 00084561 INKHO 638 0008457 INKHO 639 000月458LINKMO 640 0008\&59LINKHO 6KL $0008460 L$ INKHO 642 009461 INKHO 643 $0008461 L$ INKHO. 643 0008462 LINKHO 644 0008463 LINKHO 645 00084646 INKHO 646 0008465 INKHO 647 $0008466 L$ INKHO 648 0008467 L INKHO $6 \$ 9$ 0008468 L INKHO 650 00084691 INKHO 651 0008469 INKHO 651
0008470 L INKHO 652 0008470 INKHO 652
0008471 IN INKHO 653 0008472 L INKHO 654 $0008473 L$ INKHO 655 0008474 LINKHO 656 0008475 L INKHO 657 0008476 L INKHO 658 $0008 \$ 76 \mathrm{~L}$ INKHO 658
000 A 477 L INKHO 659 000847 RLINKHO 660 0008479LINKHO 661 0078480 LINXHO 662 0008481LINKHO 663 0008482 L INKHO 664 $000 \mathrm{B4} 83 \mathrm{~L}$ INXHO 665 0008484 E JNKHO 666 00084851 INKHO 667 00084861 INKHO 668 0009487LINKHO 669 0008488LINKHO 670 0008489 INXHO 671 0008490 LINKHO 672 0008490 LINKHO 672
$0008491 L$ INKHO 673 0009492LINKMO 674 0008493 LINKHO 675 0008494 LINKHO 676 0008495 LINKHO 677 0008496 LINKHO 678 0008497 LINKHO 679 000849 ALINKHO 680 0008499 LINKHO 681 $0008500 L$ INKHO 682 0008501 LINKHO 683 0008502 L INKHO 6 A4 00085036 FNKHO 685 0008504 L INKHO 686 00085051 INKHO 687 0008506 LINKHO 688 00085071 INKHO 689 $0008508 L$ INKHO 690 000R5091.INKHO 691 0008510 LINKHO 692 0008511 IINKHO 693 000851 दl INKHO 694

```
C****
    L2{1:384)=TAUN(NKPTR:384).LE.15.
    L3(1:384) =TAUN(NKPTR:384),GE.1.EE-4
    L2(1:384)=12(1:384).AND.L3(1;384)
    L2(11384)=L2(1:384)&AND.L.1(1:384)
    X1(1:384)=-TAUN(NKPTR!384)
    EXTAU(1:384)=VEXP(X1(1:384):EXTAU(1:384))
    TY(1:384)=20.*TAUN(NKPTR:384)
CO** PREVENT TABLE OVERFLOW gY STORING I'S IN LOOK-UP VECTOR
    [TY(1;384)=1
    II(1;384)=TY(11384)*1.0
    RITY(1:384)=08VCTKL(RII(1;384),L2{13384):RITY(1;384)).
    X1(1:384)=Q8VGATHR(TE3(1:301),ITY(11:384):X1(1:384))
    ITY(13384)x\TY(1;384)+1
    X2(1;384) =(18VGATHR{TE3(1:301),!TY(1:384):X2(1:384))
TOF(N)=X2**1
    X3(1;384) = X2{1:384)-x1{1:384}
    X2(1:384)=TY(1:384)-IYY(1:384)
    x2(1;384) =X2(11384)&2.
    TOF(N)=TDF (N) =x2
    X3(1;384)=X3(1;384)* X2(1;384)
    TDF(N)=TDF (N) & X1
    X3(1;384)=X3(1;384)+X1(1:384)
    CONTROLLEO STORE INTO TOF(N)
```



```
****
***
G**** CALCULATIONS COMmON TO INTERMEDIATE ANO HIGH gange
C****
    62(1;384) =L3(1;384),ANO.LI\1!384)
    REF(NPTR#384)=Q8VCTRL{ZERO(1;384),L2(I|384);REF(NPTR13841)
    OFBE(BTOP (N) -BTOP(N+1))96.6667E=01
    x ( (1;384) =8TOP (NPTR;384) -8TOP(NPTR* 384;384)
    X1(1;384)=x (1):384)*6.6667E 0-01
    FGRAD=DFS*({1.0-EXTAU)/X-TDF(N))
    x2(1;384) =ONE (1;384) -EXTAU(1;384)
    x2(11384)=X2(11384)/TAUN(NKPTR1384)
    X2(1:384) =x2(1:384)-TDF(NPTRI384)
    x2(1;384)=x2(1:384)* <1(1;384)
    ANS=100-TDF(N)
    X1(1;394)=ONE (1:1344)-TOF{NPTR:384)
    EON(N)=8TOP (N+1)*ANS*FGRAD
    x3(1;38a)=8TOP(NPTR * 384;384)* X | (1;384)
    X3(1:384)= =3(1:384)+x2(1:384)
    EDN(NPTR:384)=08VCTRL(X3(1:384),L2(1:384):EON(NPTR|3841)
    EUP(N)=8TOP(N)*ANS-FGRAD
    x3(1:384)=8TOP(NPTR;384)*X1(1;384)
    x3(1;384) =x3(1;384)-X2(1;384)
    EUP(NPTR;384)=0BVGTRL(X3(13384),L2\1:384);EUP(NPTR:384))
c****
C**** FORM TOP COMPOSITE LAYER {AODITION{
C****
ClO9
    ENOE1.0-RONCNGREF(N)
    109 X1(11384)=RDNCN(1;384)&REF(NPTR:384)
    OENO(1:384)=ONE(1;384)-XI(11384)
C EUPCN*EUPCN+(EUP (N) *EDNCN*REF (N))*FOFC(N)/OENO
C EDNCN=EDN(N)* (EDNGN*EUP (N) &RONCN) FOF (N) /OENO
    X1(1:384) EEUP(NPTR:384) &RONCN (1:384)
    EDNCN(1;384) mEDNCN(1:384)*x1(1:384)
    EDNCN(1;384) EEONCN(1;384)*TDF(NPTR;384)
    EDNCN(1:384)=EDNCN(1:384)/DENO(1:384)
    EDNCN(1;384)=EONCN{1;384)*EDN(NPTR:3B4
C IF(NCLOUD(N).GT.O) CLOFLG=.TRUE.
    IF(NCLOUO(N),GT.O) CLDFLGE.TRUE.
```

000851 3L INKHO 695 0008514 INKHO 696 0008515 LINKHO 697 0008516 L 1 NKHO 698 D008517LINKHO 699 0008518 L INKHO. 700 0008519 L INKHO 701 0008519 L INKHO 701
0008520 INKHO 702 $0008520 L$ INKHO 702
$000 R 521 L$ INKHO 703 $000 R 5216$ INKHO 703
$000 R 522 L$ INKHO 704 $0008523 L$ INKHO 705 000 SEZ4LINKHO 706 00085251.1 NKHO 707 000 OS26LINKHO 708 0008527 LINKHO 709 000852aLINKHO 710 0008529LINKHO 711 0009530LINKHO 712 0008531 LINKHO 713 0008532 LINKHO 714 0008533 L INKHO 715 0008534 LINKHO 716 0008535 L INKMO 717 ODORS36LINKHO 718 OOORS37L INKHO 719 000月538LINKNO 720 00085391 . INKHO 721 $0008540 L$ INKHO 722 $0008541 L I N K H O 723$ $0008542 L$ INKHO 724 0008543 LINKHO 725 0008544 LINKHO 726 0008545 L INXHO 727 $0008546 L I N K H O 728$ 0008547 LINKHO 729 000854 BL INKHO 730 OOORS 491 INKHO 731 OOORS5OLINKHO 732 OOORS516INKHO 733 $0008552 L I N K H O \quad 734$ 0008553 LINKHO 735 0008554 LINKHO 736 0008555 L INKHO 737. 0008556 LINKHO 738 0008557 L INKHO 739 0008558 LINKHO 740 0008559 LINKHO 741 0008560 LINKHO 742 $0008561 L$ INKHO 743 0008562 LINKHO 744 $0008563 L$ INKHO 745 0008564 LINXHO 746 $0008565 L I N X H O ~ 747$ 0008566 L INKHO 748 0008567 L INKHO 749 $0003568 L$ INKHO 750 000856 LINKHO 751 000857 OLINKHO 752 0008571 L INKHO 753 0008572 IN INKHO 754 $0008573 L$ INKHO 755 0008574 L INKHO 756 0008575 L INKHO 757

CLOFLG(1:384) mCLDFLG(1:384), OR,L1(1:384)
SET AEROSOL FLAG IF CIRRUS CLOUDS (HIGH ALBEOO)
IF (CLDFLG. AND.PIO.GE.1.EEA) AERFLGE.TRUE.

Li(1;384)=L! (1:384), ANO.CLOFLO (1)384)
AERFLO(11384) EAERFLG(11384).OR.LI(11384)
TRANSMISSION COMPUTBD DIFFERENTLY FOR 3 CASE
IF (CLDFLG.OR.AERFLGI 60 TO 125
 C*H* CASE I. ATMOSPHERE HAS NO AEROSOLS OR CLOUOS THRU HERE COF** USE EXPONENTIAL INTEGRAL APPROXIMATION

X3(1:384) $=$ TAU(1:384) *TAUN (NKPTR:384)
TAU(1)384) E QBVCIRL(X3(1:384):LI(1:384):TAU(1:384))
C IF (TAU.GT.15.) GO 70 124 / TOFCN=0.
L2(1:384) $=\times 3(1: 384)$,GT. 15 .

TOFCN(1:384) EQSVCTRL(ZERO (11384), L2(1:384): TOFCN(1:384))
LI(1:384)=61(11384), XOR.L2(1:384)
LOUT=OBSCNT (L1) 11:384) )

TY(1;LOUT) $=20, * T Y(13 L O U T)$
$T Y(1 ; L O U T)=T Y(1 ; L Q U T) * 1$.
ITY(ilLOUTI=TY(IILOUT)
$c$
TDFCNITE3(ITY)\&(TY-ITY\&1)*(TE3(ITY*1)-TE3(ITY))

ITY(1;LOUT) EITY(1:LOUT) $\rightarrow 1$

X1(1HOUT) $=X 1(11$ LOUT) $=X 2(11$ LOUT)
X3(11LOUT)=TY(1ILUUT)-1TY(1:LOUT)
$\times 3(1$ GOUT $)=x 3(1$ ILOUT $)+2$.

$\times 3(1 ;$ LOUT $)=\times 3(1 ; 60 \mathrm{OT})+\times 2(11 \mathrm{~L}$ LOUT $)$


C*O** CASE 2. SIGNIFICANT ABSORPTION, (IF AERFLG)
C RONCN $2 R E F(N) * T O F(N) * R O N C N * T O F(N) / D E N O$
X3(1:384) $=$ RDNCN(1;384) ATOF (NPTR:384)
$\times 3(11384)=\times 3$ (1:384)*TOF (NPTR:384)
$\times 3(1 ; 384)=x 3(1 ; 384) /$ OENO (1; 384)
X $3(1: 3 \mathrm{BA})=X 3(1: 384)$ *REF (NPTR1384)
RONCN (1;384) =Q8VCTRL (x3(1:384), AERFLG(1:384): RDNCN (1:384))
.c
TOFCNETDFCNOTDF (N)/DENO
X3(11384) $=$ TDFCN(1:384) ©TOF (NPTR:384]
X3(1:384) =X3(1;384)/0ENO (11384)
TDFCN(1;384) בQBVCTRL(X3(1:384), AERFLG(1:384) 1 TOFCN(1:384))
6130 IF (NCLOUD(N).EG.O.OR.PIO.GE.1.E-A) GO TO 140
130 L1(1:384) ENCLOUD(NPTR:384) *GT.0
L2(11384)=910(11384) LT.1.E=4
L3(11384) $=6$ (11;384). AND. 2 2(1:384)
coee* Case 3. heavy cloud cover
TDFCN(1:384) xQ8VCTRL(2ERO(1:384), L1(1:386) 1TOFCN(1:384)) RONCN(1:384) *Q8VCTRL(ZERO(11384), LI(11384)18RONCN(1:384) TAU(1;384)=QgVCTRL(2ERO (1;384), Li(1:386);TAU(1;384))
140
$\stackrel{c}{6}$ continue

EUPC (N) AEUPCN
EDNE (NPTR:384) =EONCN(1:384)
TDFC(NPTR1384)=TDFCN(11384)
RONC \{NPTR1384) =RDNCN(11384)
101 CONTINUE
C*** ADDING GROUNO LAYER
RUPCN (1:3B4) =AGRND
EUPCN= (1.0-RUPCN) ©BTOP (NG1)

0008576 LINXHO 758
$0008577 L$ INKHO 759
$000857 B L$ INKHO 760
00085796 INKHO 761
00085BOLINKHO 762
000 BSELLINKMO 763
$0008582 L$ INKHO 764
$00085 g 2 L I N K H O 764$
$0008583 L I N K H O 765$
000858ALINKHO 786
$0008585 L$ INKHO 767
OOOA586LINKHO 768
0008587LINKHO 769
$000858 \mathrm{tr}_{+}$INRHO 770
0008589 INKHO 771
0008590 L INKHO 772
0008591 I INKHO 773
OOOR592LINKHO 774
$0008593 L$ INKHO 775
000859 LLINKHO 776
0008595 LINKHO 777
0008596 INKHO 778
0008597LINKHO 779
000859 BLINKHO 780
0008599 LINKHO 781
0008600 L INKHO 782
0008601 INKHO 783
0008602 LINKHO 784
$0008603 L I N K H O$ 78S
0008604LINKHO 786
$0008605 L$ INKHO 787
0008606 LINKHO 788
0008507 LINKHO 789
0008608 L INKHO 790
0008609 LINKHO 791
0008610 LINKHO 792
000861ILINXHO 793
0008612 LINKHO 794
0008613LINKHO 795 0008614 LINKHO 796 0008615 LINKHO 797 $0008616 L$ INKHO 798 $0008517 L$ INKHO 799 $0008618 L$ INKHO 670 0008619 INKHO 801 $0008620 L I N K H O 802$ 000882ILINKHO 803 $0008822 L I N K H O 804$ $0008623 L$ INKHO 805 $000862+L$ INKHO 806 $0008625 L$ INKHO 807 $000862 G L$ INKHO 808 0008627LINKHO 809 0008629 LINKHO 910 $0008629 L I N K H O$ g11 $0008630 L$ INKHO 812 0008631 INNKO 813 0008632 INKHO 814 OOOR633LINKHO 815 $0008634 L I N K M O 816$
$0008635 L I N K H O B 17$ 0008636 LINKHO 918 0008637 LINKHO 819
000863 BLINKHO 820

```
    EUPCN(1:384)EONE (1:384) -RUPCN(1:384)
    EUPCN{1;384}=EUPCN(1;384)*BTOP(NG*384*1;384)
    OENO=1,OGRUPCN#RDNCN
    XI(1:384) %RUPCN(1%384) RONCN(13384)
    OENO(1:384) =ONE(14384)-X1(1)1384)
    PEFUP= (EUPCN+EDNCN*RUPCN)/DENO
    X1(1:384) =EDNCN(1;384) RUPEN(1;384)
    PEFUP (1;384) =EUPCN (I;384)*X1 (1:384)
    PEFUP(11;384)=PEFUP (1 $384)/OENO(11384)
    FEFON= (EONCN-EUPCN*RONCN) /OENO
    X1(1;384)=EUPCN(1$384)*RONCN(1$384)
    PEFDN(11384) =EONCN(1:384)*X1(1:384)
    PEFON(11384)=PEFON(11384)/DENO(1:384)
    FLXONG=FLXDNG* CKLAM*PEFDN
    X1{1;384) =CKLAM*FEFON(1$384)
    FLXONG (1:384) =FLXONG (1;384) = X1:11384)
    FE (NG) FFE (NG) & CKLAN* (PEFUP-PEFON)
    XI(1:384) =PEFUP (1:384)-PEFDN(1:384)
    X1 (1;384)=X1(1;384)*CXLAM
    FE({NG*1)*384*1 \384)=FE((NG=1)*384*1{384)*X1(1;384)
C**** FORM GOTTOM COMPOSITE GAYER (ADOITION)
C*O**
    00 118 N#Z,NG
    MmNG=N+1
    MgTR=(M-1)*384*1
    x!(1;384) aRUPCN(1;384) EREF(MPTR:384)
    DENO{1':384)=ONE (11384)-x1(1:384)
    EUPCN=EUPP {M) & (EUPCN&EDN (M) *RUPCN) *TOF.{M)/DENO
    XI(1:384)=EDN(MPTK{384) FRUPCN(1:384)
    X1(1:384) #X1{11384} EEUPCN(1:384)
    X1(1;384)=x1{11384)*EUPCN(11384)
    XI(1;384}=x{{1:384}=YDF{MPTR:38
    EUPCN(1#384) #EUP(MPTR;384) +XI(117384)
    IF (M*E゙Q&I) GO TO IIS
    LEM=1
    LPTR=MPTR=384
6
    RUPCN=REF( (M) &TOF (M) & TOF {M)&RUPCN/OENO
```



```
    X1(11384)=XI(17384)*RUPCN{1 1384)
    X1(1;384) =x (1;384)/0ENO (1;384)
    RUPCN(11384) ¥REF(MPTR#384) +X (1 11384)
    OENO=1.OWRONC(L) &HUPCN
    X1 (1 $384) =RONC (LPTR #384) &RUPCN (1 %384)
    DENO(1;384)=ONE(11384)=X!(1:384)
    PEFUP= {EURCN*FDNC (L) ФRUPCN) /DENO
    X1:11384) =EDNC (LPTR{384) FUPCN{I1384)
    PEFUP{1;384}=EUPCN{1:384)*X1(1;384)
    CEFUP{{1384}=EUPCN{1:1384}*X1(1:384)
    PEFUP (11384)*PEFUP{11384)/DENO (1:38
    X1{1:384) #EUPCN(1;384) &DNC (LPTR1384)
    PEFON(13384) EEDNC(1.PTR{384)*X1(11384)
    PEFDN{1;384)=PEFON{1%384}/DENO(1;384)
    GO TO 120
    PGFUP(11384) =EUPCN(1;384)
    PEFON{11384)=0.
c****
C120 FE (M) aFE {M)*CKLAM* (PEFUP-FEFDN}
120 XI(1:384) EPEFUP{1:384) =PEFDN(11384)
    X\(1;384) =CKLAM-X (1:384)
    FE(MPTR:384)*FE(MPTR;384)*X1(1;384)
C**** CONTINUE
100 CONTINUE
200
    CONTINUE
C****
SAVE STRATOSPHERIC FLUXES
    RESTR(1;1152) #FE(111152)
    RE゙(1;3840)=FE(1:53%3840)
    RETURN
    END
```

0008639 LINKHO 821 0008640 LINKHO 日22 $0008641 L I N K H O 823$ 0008642 LINKHO 824 00086431 INXHO 825 0008644 INKHO 826 0008645 INKHO 827 0008645 INKHO 827
0008646 I INKMO 828 $0008646 L I N K M O$
828
0008647 INXHO 829 $\begin{array}{ll}0009647 L \text { INKHO } & 829 \\ 0008648 L \text { INKHO } & 830\end{array}$ 0008649 INKHO 831 0008650 LINKHO 832 0008651 L INKHO 833 0008652 LI INKHO 834 0008653 LINKHO 835 $0008653 L$ INKHO 835
0008654 INKHO 836 0008654 I INKHO 836
$0008655 L$ INKHO 837 $0008655 L$ INKHO 837
0008656 INKHO 838 $\begin{array}{lll}0008656 L \text { INKHO } & 838 \\ 0008657 L \text { INKHO } & 839\end{array}$ $0008658 L$ INKHO 840 0008659 INKHO 841 0008660 L INKHO 842 $0008661 L$ INKNO 843 0008662 LINKHO 844 $0008662 L I N K M O$
$0008663 L$ INKHO 845 $0008664 L$ INKHO 846 0008665 LINKHO 847 0008666 INKHO 848 0008667LINKHO 849 0008668 L INKHO 850 0008669 L INKHO 851． 0008669 INKHO 851
$0008670 L$ INKHO 852 0008670 LINKHO 852
0008671 L INKHO 853 0008671 INKHO 853
0008672 INKHO 854 $0008672 L$ INRHO 854
$0008673 L$ INKHO 855 0008674 L INKHO 856 0008675 L INKHO 857 0008676 INKHO 858 00086776 INKHO 859 $0008678 L I N K H O 860$ 0008679 LINKNO 861 $0008680 L$ INKHO 862 $00086 B 1 \mathrm{~L}$ INKHO 863 0008682 L INXHO 864 0 OOA 683 L INKHO 965 0008684 IINKHO 866 0008685 LINKHO 867 $0008686 L$ INKMO 868 $0008687 L$ INKMO 869 $0008688 L$ INKHO 870 $0008689 L$ INKHO 871 0008690 L INKHO 872 00086916 INKHO 873 0008692 LINKHO 874 0008693 L INKHO 875 0008694 LINKHO 876 000869 SLINKHO 977 OOOA696LINKHO 878 0008697 L INKHO 879 $0008698 \mathrm{LINKHO}-880$ 0008699 LINKHO 881 0008700 L INKHO 882 $0008701 L I N K H O 883$ 0008702 L INKHO 884 0008703 L INKHO 885 $0008704 L$ INKHO 886 $0008705 L$ INKHO 887 $0008706 L$ INKHO 388 $0008707 L I N K H O 889$ $0008708 L$ INKHO 890

```
                                    APPENDIX C
                                    SINGLE, SIMPLE LOOPS OF SPEGTRAL MODEL
    SUBROUTINE XIGENR
    DOUBLE PRECISION VDERIV,XDERIV,W,DXOZOT
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,
    1 NRTP,LKTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT:YRLAG,TIME
    COMMON /QJELK/ NZU.LIO3
    COMMON /DERIV/ VOERIV(2366),XDERIV(2366),W(2366)
    COMMON /GENER/ DXO3UT(2366)
    IL=(L103-1) ANVREAL+1
    IHaNZJ&NVREAL
    DO 200 I=IL,IH
200 XDERIV(I) aXDERIV(I)*OXO30T(I)
    RETURN
    END
    SUBROUTINE OXT003(11,I2)
    DOUBLE PRECISION P,L,ZI,T,Z2,X3
    COMMON P(2366);Z(2366),Z1(2366),T(2366),Z2(2366), X3(2366)
    COMMON/O30X/ O3XFAC(>400),O3XCON(2400)
    COMMON/SPECIE/X3GRO(6240)
    COMMON/FTCST/NLON,NLAT,NGRID
    COMMON/CONSTS/L(13),NVREAL
    DIMENSION DATAIM(2400)
    NLEV=12-11+1
    ILSPC=(11-1)*NVREAL*I
    ILGRD=(II-I)*NGRID*I
    CALL SPCGDI(X3(ILSPC), X3GRD(ILGRD), DATATM,NLEY)
    N=I2*NGRID
    DO 100 J=1LGGRD,N
100 X3GRD(J)=(X3GRD(J) =03XCON(J))/O3XFAC(J)
    CALL GDSP 1(X3(ILSPC),X3GRD(ILGRD),DATAIM,NLEV)
    RETURN
    ENO
```

PARTIALLY RECODED SUBROUTINES OF SPECTRAL MODEL

```
    SUBROUTINE GORFOR
    DOUBLE PRECISION P,Z,Z1,T,Z2,CF,XL
    COMMON P(2366),Z(23,66),Z1{2366),T(2366), 22(2366)
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,
    l NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
    COMMON /CGBLK/ KD(43),CG(43),NCOMP(12),LWAVE(12),NV(43),bV(43)
    COMMON /DERIV/ CF(2366)
    DJMENSION DUM(43)
    DATA N* N 10/
    NZ1:NVZON$1
    DO 300 JJRNZI,NVECT
    J=2*JJ*NVREAL*(ILEV\-2)-NZ1
    XL\mp@code{LV(JJ)}
    00 200 I=ILEVI;ILEV2
    J=J+NVREAL
    \18J*1
    CF(J):XL#P(Jl)
    CF(J1)=@\lomp(J)
    200 CONTINUE
    300 CONTINUE
    IF (NWOEQ.O) RETURN
    !LEV=(ILEVでILEV!)/て*1
    JMPzNVREAL* (ILEV-1)
    WRITE (691000) ILEV
1000 FORMAT (1HO,10X:HTEGT CORFOR ENERGY CONSERVATION FOR LEVEL ',I3/J
    0O 510 J=1.NVZON
    J=\&JMP
    510 DUM(J)==.5*P(JL)*CF(JL)
    0O 530 J=NZI,NVECT
    JR=2*J-NZI*JMP
    JI=JR+1
    530 DUM(J) =~P(JR)*CF(JR) *P(JI)*CF(JI)
    SUM=0.
    00 540 Jxi,NVECT
    540 SUM=SUM*OUM(J)
    WRITE (6.1010)
1010 FORMAT (1HO.10X,IPSI(I),CF(I),DKE(I) *'/)
    DO 550 I=1,NVZON
    J=%&JMP
    WRITE (6,1015) I,P(J),CF(J),DUM(I)
1015 FORMAT (5X,I5,E15,6,15X,E1506,15X,E15.6)
    550 CONTINUE
        K_NVZON
        00600 I=NZI,NVREAL,2
        II=I*JMP
        J=II+1
        KミK+1
        WRITE (6.1020) I,P(II),P(J),CF(II),CF(J),DUM(K)
1020 FORMAT (5X.IS.5E15.6)
    600 CONTINUE
    WRITE (6,1030) SUM
1030 FORMAT (IHO.10X.'TOTAL OKE = '.E15.6)
    RETURN
    END
```

4-D-1

|  | $\qquad$ |
| :---: | :---: |
| $\begin{array}{r} 200 \\ -300 \end{array}$ | DO 300 JJwNZI,NVECT $\mathrm{J}=\mathrm{J}+2$ |
|  | XLEしV(JJ) |
|  | $\begin{aligned} & D O 200 \text { IxILEVI, ILEV2 } \\ & C F(J, I)=X L * P(J+I, I) \end{aligned}$ |
|  | CF $(J+I, I)=-X_{L} * P(J, I)$ |
|  | CONTINUE <br> CONT INUUE |
|  | CONTINUE RETURN |
|  | ENTRY CORFORO |
|  | NZIENVZON+1 |
|  | RETURN |
|  | END |

```
    SUBROUTINE FRICTN
    DOUBLE PRECISION P,Z,FJ,CF
    COMMON P(2366),Z(23n6)
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT.
        NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
        COMMON /CGBLK/ KD(43), CG(43),NCOMP(12),LWAVE'(12),NV(43),6V(43)
        COMMON /VRTBLK/ ZVAL(26);PVAL(26),VWT(26),DZ:RV
        COMMON /DERIV/ CF(2366)
        COMMON /BARBLK/ TBAR(26),SIGMA(26):XIBAR(26).,DIFFM(26),OIFFX(26)
        DIMENSION FJ(26)
    R2sRV-1.
    R2=10/R2
    R1*RV*R2
    IblP1=1LEV1+1
    ILTP1=$LEVZ中1
    FJ(1)=0.DO
    DO 300 J=1,NVREAL
    JJ*(ILEVI-1)*NVREAL*J
    J!8JJ
    00 100 I=ILIPI:ILEVZ
    JC#J!
    Jlad!&NVREAL
100 FJ(I)&O&FFM(I)*(Z(N1)-Z(J2))
```



```
    JJ&JJ@NVREAL
    00200 I=ILEVI,ILEVZ
    JJ&JJ&NVREAL
    CF(JJ)=CF(JJ)*Rl*FJ(! & |) कR2*FJ(I)
    CONTINUE
    RETURN
    END
```

```
        SUBROUTINE VFRICTN
        .
        FJ(1)=0.
        DO 300 J=1.NVREAL
        DO 100 I=ILIPI,ILEV2
        FJ(I)=DIFFM(I)&(Z(U,I)-Z(J&I-l))
    100
    CONTINUE
    FJ(ILZP1)=-DIFFM(IL2P1)&Z(J,ILEVZ)
    DO 200 I=ILEVL,ILEV2
    CF(J,I)=CF(J,I)+RI*FJ(I+1)=R2&FJ(I)
    200 CONTINUE
    300 CONTINUE
    RETURN
    ENTRY FRICTNO
    R2=RV*1.
    R2x1./R2
    RI=RV&R2
    ILIPI=ILEVI+1
    IL2PI=ILEV2*1
    RETURN
    END
```

```
        SURROUTINE MJAB (P,T,DER)
        DOUGLEE PRECISION P,T,A,COFIR,FII,CIND,FR,FI,FIA,FIB,FIG,FID
        DOUBLE PRECISION DER
        COMMON /COFBLK/ C(3800),IS(1500)
        COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZQNVERT.
    &' NRPP,LRTP,NTYPE,NVECT,NVREAL,NVZONONEYCODT
        COMMON/PKBLK/NS,N1,N2.N3.N4,LSODIOL゙ZOG3OL$
        COMMON /WORKBK/ A(2366)
        DIMENSION P(1),T'(1):DER(1)
        DIMENSION FR(26),FI(26)
    GALL DZERO. (A)
    INDEX=0
    KlON#(LLEVI-1)*KINT
    INSZ1=INSZ*!
    00400 K=1,INSZ
    LS*IS(K)
    CALL UPACK
    NAI=N1+N1-NR=1
    NAR=NAI-1
    NBI=N2+N2-NR-1
    NER:N8!mb
    KIBKLOW
    DO 5 JxgLEVI,ILEVZ
    NAR`NAR&K!
    NAI=NAI&KL
    NBRzN&R&K!
    NBI=NBIकK!
    FR(J) & P(NAR)*T(NBI)-P(NAI) #P(NBR) &P(NBR)&T(NAI) -P(NBI) #T(N゙AR)
    Kl=K!NT
    S CONPINUE
    OO 100 ImN3,N4,Z
    INDEX=INDEX*1
    G%NDaC(INDEX)
    Kl=KbOW
    00 75 J&ILEVI,ILEVZ
    A(I+KI)=A(I+KI)+FR(J)*CIND
    KI BKI+KINT
    75 CONIINUE
100 CONTINUE
400 CONTINUE
    DO 500 KaINSZI.INS
    LS=1S(K)
    GALL UPACK
    KI=KLOW
    IF (NI-LEONR+1) GO TO 40
    IF (N2.LE.NR+1) GO TO 50
    NAI&NI*NIONR*I
    NAR=NAI-1
    NBI=N2+N2-NR-1
    NBRmNBI-1
    00 35 J=ILEV1.0ILEV2
    NARENAR+KI
    NAI=NAI.*KI
    NBRxNBR+K!
    NBI=NBI*KI
    F.fA=P(NAR)*T(NBI)=P(NBI), FT(NAR)
    F18=P(NAL)*T(NBR)=P(NBR)*T(NAI)
    F1C=P(NAR)*T'(NBR) -P(NBR) aT (NAR).
    F1D=P(NAI)#T(NBI) P(NBI)*T(NAI)
```

    \(4-D-5\)
    ```
    IF (NS-1) 10,20,30
    10 F1R=-FIA-F1B
    FII=FICmFlD
    GO TO 33
    20 FIR=-F1A*F1B
    FII=FIC+F1D
    GO TO 33
    30 FlR=FjAOFIB
    FlI=F1C*F1D
    GO TO 33
    33 CONTINUE
    FR(J)xFIR
    FI(J)=FII
    Kl=KINT
    35 CONTINUE
    gO TO 160
    40 NBI=N2+N2-NR-1
    NBR=NBI-1
    NAI=NI
    NAR=N1
    GO TO 60
    50 NAR=N1+N1-NR-1
    NAI=NAR-1
    NBR=N2
    NBI*N2
    GO TO 60
.60 CONTINUE
    DO 150 J=ILEVI.gILEV2
    NAR=NAR+Kl
    NAI=NAI +KI
    NBR=NBR+KI
    NBI=NBI*KI
    F\R=mP(NAR)&T(NBI)&P(NBI)&T(NAR)
    FII*P(NAI)*T(NBR)mP(NBR)*T(NAI)
    GO TO 80
    80 CONTINUE
    FR(J)=FIR
    FI(J) mFII
    KI=KINT
250 CONTINUE
160 CONTINUE
    NGI#N3*N3-NR-1
    DO 200 I=N3.N4,2
    NGR=NGI-1
    INDEX=INDEX+1
    CIND=C(INDEX)
    KI=KLOW
    DO 175 JEILEVIOILEVZ
    A(NGR+K1)=A(NGR*K1)*FR(j)*CINO
    A(NGI*KI)=A(NGI*KI)*FI(J)*CIND
    KI=Kl'&KINT
175 CONTINUE
    NGI FNGI*4
        CONTINUE
500 CONTINUE
    IL=(IL.EVI=1)*NVREAL+1
    IH=ILEVZ#NVREAL
    DO 600 FEIL.IH
600 DER(I)=DER(I) & A(I)
    RETURN
    END
```

                    4-D-6
    ```
    SUBROUTINE VMJAB(P,T,DER)
        O
        \circ
        CAblo OZERO(A)
        INDEX=0
        `DO400 Kml.INSZ
        LSEIS(K)
        CALL UPACK
        0O 5 j=ILEVI&ILEVZ
        FR(J) mP(NAR,J)*T(NBI,J)=P(NAI!J)#T(NBR,J)
        1 *P(NBR;J)*T(NAI;J) =P (NBI!J)*T(NAR,J)
        5 CONTINUE
        DO 100 I#N3.N4,Z
        INDEX=INDEX+1
        DO }95\mathrm{ JmELEVI,ILEVZ
        A(!&J) &A(I:J) &FR(J)*C(INDEX)
        CONTINUE
        CONTINUE
        CONPINUE
        DO 500 KwINSZ1.INS
        LS=IS(K)
        CALL UPACK
        DO 35 Jm&bEVIgILEVZ
        FlA(J)=P(NAR,J)*T(NBI:J)=P(NRg%J)#P(NAR.J)
        F18(J)=P(NAI,J)*T(NBR,J) OP(NRR,J)&T}(NAI,J
        FlC(J)=P(NAR,J)#T(NBR,J)*P(NBR,J)*T(NAR,J)
        FlO(J) =P(NAI;J)*P(NBI:J)mP(NEI;J)#T(NAI;J)
        dO CONTINUE
        FR(j)=-FIA(J)-F1B(J)
        FI(J)=F1C(U)-F10(J)
        CONTINUE
        NGI=NGIO
        OO 200 I=N3,N4:2
        NGR&NGI=1
        INOEX=INOEX+1
        DO 175 J=ILEVI., ILEVZ
        A(NGR,J) #A(NGR,J)&FD(J)*C(INDEX)
        A(NGI,J) =A (NGI,J)&FI:(J)*C(INDEX)
        175
        CONTINUE
        NGIRNGI%4
        CONTINUE
        CONTINUE
        00 600 IxloNVREAL
        DO 600 J※{LEVI,ILEV2
        DER(I,J)=DER(I,J)+A(I,J)
        CONTINUE
        RETURN
    ENTRY MJABO
    INSZ1=INSZ*I
    NAI=N1*NI-NR-1
    NAR=NAI-1
    NBI=N2*N2-NR-1
    NBRzNBI-1
    NGIO=N3+N3-NR-1
    RETURN
    END
        4-D-7
```

```
    SUBROUTINE RGAMMA (N)
    DOUSLE PRECISION P,LETA,ZI,T,ZZ,DG,EG,DGCG,EGCG
    DOUBLE PRECISION EVECT,XI.IVET.EVAL,VDERIV,TDERIV,R,AG
    DOUBLE PRECISION A
    COMMON P(2366),ZETA(2366),Z1(2366),T(2366).,Z2(2366)
    COMMON /CONSTS/. INDEX,NR,LRR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT.
    I NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC.DT
    COMMON /CGBLK/ KD(43),CG(43).NCOMP(12),LWAVE(12),NV(43),LV(43)
    COMMON /DERLK/ DG(43):EG(43).DGCG(43).EEGCG(43).
    COMMON NVRTBLKY ZVAL(26),PVAL(26),VWT(26).DZ,RV
    COMMON /LERIV/ VOERIV{2366),TOERIV(2366);W(?366)
    COMMON /WORKBK/ R(2366),AG(2366).
    . DATA NW /O/
        A==(N-1.DO)/(NCYC*DT)
        RVM1mRV-1.
        IH=ILEV2*NVREAL
    IL=(ILEVI-1)*NVREAL+1
    IF (N.GT.ll) GO TO 75
    DO 50 I=IL,IH
    JEI=NVREAL
    AG(I) mVDERTV(J)-VDERIV (I)
    R(I) =-TDERIV(I)*DZ
    5 0
    7 5
    CONTINUE
    CONTINUE
    DO 100 IEIL.IH
    J=I=NVREAL
    AG(I)=VDERIV(J)-VDERIV(I)*A&(ZI(J)=ZI(I))
    R(I)=-(TDERIV(I)*A*ZZ(I))*DZ
    CONTINUE
    CONTINUE
    NVZ1=NVZON+1
    DO 200 JJ=2,NVZON
    JSVaJJ+NVREAL*(ILEVI-2)
    IF (JJ.EQ.2) GO TO }15
    F=-DGCG(JJ)/CG(JJ)
    JmJSV
    DO 125 I=%LEVIनILEVZ
    J=J*NVREAL
    Jl=j=1
    R(J)#R(J) +F*AG(JI)
    CONTINUE
125 CONTINUE
    IF (JJ-EQ DNVZON) GO TO 200
    J=JSV
    F=EGCG(JJ)/CG(JJ)
    DO 175 I*ILEVIOILEV2
    J=J&NVREAL
    Jl=J+1
    R(J) =R(J) &F*AG(JI)
175 CONTINUE
200 CONTINUE
```

4-D-8

```
    JJFNVZON
    00 500 LmI:LR
    NCFNCOMP (b*1)
    IF (NC.LE.O) GO TO 500
    DO 450 NN=1:NC
    J\&\J*1
    USV=2*UJ*NVREAL*(HLEV1-2) -NVZON*!
    IF (NN.EQ.1) GO TO 300
    FEODGCG(JJ)/CG(JJ)
    JREJSV
    00 250 I*ILEVI,ILEVZ
    JR:JR*NVREAL
    J! =JR&1
    JIRxJR=?
    \!!x\lR!1
    R(JR) #R(JR) *F*AG(J\R)
    R(JI) mR(JI)*F*AG(JlI)
550 CONTINUE
300 CONTINUE
    IF (NN-EQ.NC) GO TO 500
    JREJSV
    F=EGCG(JJ)/CG(JJ)
    00 400 I=ILEEVIGILEV2
    JRxJRONVREAL
    JI=JR+l
    JIREJR*2
    NlIENIR+l
    R(JR)=R(JR) +F*AG(J\R)
    R(JI)=R(UI) +F*AG(JII)
400 CONTINUE
450 CONTINUE
500 CONTINUE
    0O600 I=Ib,IH
    R(I)&RVMI*R(I)
600 CONTINUE
    IF (NW.EG.O) RETURN
    WRITE (60.1000) ILEVI,ILEV2
```



```
    JI=NVREAL*(ILEVI-1)
    J3=NVREAL*(ILEV2-1)
    DO 700 I=1,NVZON
    J1=\1 +1
    J3&J3+1
    WRITE (6.1.010) I. K(JI):R(J3)
1010 FORMAT (1X.I10.020.10.20X.020.10)
    700 CONTINUE
    NZ1mNVZON+I
    DO 800 I =NZ1,NVREAL,2
    \:=\1*1
    J2=\1*1
    J3=\3&1
    J4=\33+1
    WRITE (6,1020) I,R(J1):R(J2),R(J3),R(J4)
1020 FORMAT (1X,I10,4020.10)
    J1=|1+1
    J3=\.3+1
    800 CONTINUE
    RETURN
    END
    4-D-9
```

```
    SUBROUTINE VRGAMMA(N)
        \bullet
        *
        *
        IF (N.GT.1) GO TO 75
    DO 50 JalgNVREAL
    DO 50 I=ILEVI.ILEV2
    -AG(J,F) aVDERIV(N,I*I)-EVDERIV (J,I)
    R(J,I):=TDERIV(J,I)*DZ
    CONTINUE
    GO TO llO
    75 CONTINUE
    DO 100 J=1.NVREAL
    DO 100 IEILEVI,ILEVZ
    AG(J,I) =VDERIV (J,I=1)=VDERIV (J,I)*A(N)*(ZI(J,I-I)-ZI(J,I))
    R(J,I)==(TDERIV(J.IN*A(N)*ZZ(J,I))#DZ
    100 CONTINUE
    110 CONTINUE
    DO 200 JJ#Z&NVZON
    DO }165\mathrm{ I=ILEVI,ILEV2
    R(JJ.I)=R(JJ,I)&F(JJ)*AG(JJ=I:I)
    165 CONTINUE
    CONTINUE
    DO 500 JJFNZIQNVECT
    J=2&JJ-NZ1
    DO ,350 I=ILEVI,ILEV2
    R(J,I)=R(J,I)*F(JJ)*AG(J-2,I)
    R(J+1,I)=R(J+I,I)+F(JJ)*AG(J-I,I)
    350 CONTINUE
    CNMSIMTE
    DO 600 I=IL,IH
    R(I)={R(I)#RVMI
    600 CONTINUE
    RETURN
    ENTRY RgAMMAO
    NZ18NVZON+1
    F(2)=EGCG(2)/CG(2)
    DO 1 JJz3,NVZON-1
    F(JJ)=(EGCG(JJ)=DGCG(JJ))/CG(JJ)
    1 CONTINUE
    F(NVZON)=-DGCG(NVZON)/CG(NVZON)
    JJ=NVZON
    DO 3 L=1,LR
    JJ=\J&l
    F(JJ)=EGCG(JJ)/CG(JJ)
    DO 2 NN=2,NC-1
    \\=\\$1
    F(JJ)=(EGCG(JJ)-DGCG(JJ))/CG(JJ)
    2 CONTINUE
    JJ=\J+1
    F(\J)==DGCG(JJ)/CG(JJ)
    3 CONTINUE
    DO.4 I=1,NCYC
    A(I)=(1.-I)/(NCYCDDT)
    4 CONTINUE
    RVM1:RVmI.
    IH=IULEVZ*NVREAL
    IL=(ILEVI-1)&NVREAL*1
    NC=NCOMP(2)
    RETURN
    END
                4-D-10
```

```
        SUBROUTINE WFIELD
        DOUBLE PRECISION DG,EG,DGCG,EGCG,W,WTERM,EI,E2,F2,F2,F3,F4.
        DOUBLE PRECISION VDERIV,TDERIV
        COMMON /CONSTS/ INDEX,NR,LR,INS,INSZOKINT&ILEVI,ILEVZ,NVERT,
        1 NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYCODT
        COMMON /CGELK/ KD(43),CG(43),NCOMP(12) LHAVE(12), NV (43).0LV(43)
        COMMON /DEBLK/ DG(43).,EG(43),DGCG(43);EGCG(43)
        COMMON /VRTBLK/ ZVAL(26),PVAL(26),VWT(26),OZ,RV
        COMMON /OERIV/ VOERIV(2366),TDERIV(2366),W(2366)
        COMMON /WORKBK/ WTERM(2366).
        CALL DZERO (WTERM)
        E2=1.00/(RV-1.00)
        El%RV*ER
        NYZ1=NVZON+1
        DO 200 JJ=2,NVZON
        JSV=JJ+NVREAL*(ILEVI-Z)
        IF (JJ.EQ.2) GO T0 100
        F1mEZ*OGCG(JJ)
    F2#E1*DGCG(JJ)
    J=JSV
    00 50 I=ILEVI,ILEVZ
    J:J&NVREAL
    J1\approx\=1
    J2*J1&NVREAL
    HTERM(J)=FI*W(Jl)-F2*W(Jこ)
    CONTINUE
    CONTINUE
    IF (JJ.EQ.NVZON) GO TO 200
    J=JSV
    F1=E2*EGCG(JJ)
    F2=E1*EGCG(JJ)
    DO 150 IFILEVI|ILEV2
    J=J&NVREAL
    Jl=J&l
    J2*Jl&NVREAL
    WTERM (J) =WTERM(J) = F-l*W(Jl)*F?*W(JZ)
    CONTINUE
    CONTINUE
    JJmNVZON
    DO 500 L=1.LR
    N=ŃCOMP(L+1)
    IF (N.LE.O) GO TO 500
    DO 450 NNEION
    JJ=\J+1
    JSV=2*JJ*NVREAL*(ILEV1-2) -NVZONO1
    IF (NN.EQ.1) GO TO 300
    F1=E2*DGCG(Jv)
    F2=El*DGCG(JJ)
    JR#JSV
    DO. 250 I=ILEV1,ILEVZ
    JR=JR&NVREAL
    JI=JR+b
    J1REJR\infty}
    JII=JJR+I
    J2R*JIR*NVREAL
    J2ImJZR&1
    WTERM(JR) =F 1*W(J\R) =FZWW(JRR)
    WTERM(UI) #FI*W(U1I)*F2*W(J2I.)
    CONTINUE
250 CONTINUE
```

$4-D-11$

```
    IF (NN.EQ.N) GO TO 500
    JR=JSV
    Fl=Eट\EGCG(JJ)
    F2=ElAEGCG(JJ)
    DO 400 I*ILEVI.ILEV2
    JR=JR कNVREAL
    JI=JR+1
    JlR=JR+Z
    JIImJlR+1
    J2R=JIR+NVREAL
J2I=J2R+1
WTERM(JR)xWTERM(JR) - FI*W(JlR) +FR*W(JCR)
WTERM(JI)=WTERM(JI)-FI*W(JiI) +F2*W(JII)
400 CONTINUE
450 CONTINUE
500 CONTINUE
    ILF(ILEVI-1)*NVREAL*1
    IH=ILEV2*NVREAL
    DO 600 I=IL+IH
600 VDERIV(I) =VDERIV(I)कWTERM(I)
    RETURN
    END
```

$-=$
.4-D-12

```
    SUBROUTINE VWFIELD
            \bullet
        *
        0
        00'200 JJ=2,NVZON
        DO }165\mathrm{ ImILEVI,ILEV2
```



```
        CONTINUE
        CONTINUE
        OO 500 JJ&NVZ1,NVECT
        Jæ2#JJ=NVZ1
        DO 360 I=ILEVI,ILEV2
```



```
        WTERM(J+T,I)=FI(JJ)*W(J-igI)=FC(JJ)*W(J\inftyI->I+I)
    360 CONTINNUE
    500 CONTINUE
    00 600 I=IL,IH
    VOERIV(I) zVDERIV(I) &WTERM(I)
    600 CONTINUE
        RETURN
        ENTRY WFIELDO.
        NVZl=NVZON+1
        E2=1./(RV=1.)
        El#RV%E2
        F1(2) =-E2*EGCG(2)
        F2(2)s=E1*EGCG(2)
        DO 1 J=3;NVZON=1
        F1(J) =E2@(DGCG(J)-EGCG(J))
        F2(J)=E!* (DGCG(J)=EGCG(J))
    1 CONTINUE
        F1{NVZON} =E2*OGCG(NVZON)
        F2{NVZON}=E1*OGCG(NVZON}
        \&NVZON
        DO 3 L=1,LR
        \\ミ\J+1
        Fi(JJ)=-EC*EGCG(JJ)
        F2(JJ):=#El*EGCG(JJ)
        DO Z NN=Z,N=1
        J\&JJ+!
        Fi(JJ) FE2* (OGCG(JJ) =EGCG (JJ))
        F2(JJ)=EI*(OGCG(JJ)-EGCG(JJ))
    2 CONTINUE
        Jd<U\*i
        F゙1(JJ) #E2*DGGG(JJ)
        P2(JJ) &E!*DGCG(JJ)
    3 CONTINUE
    IL:{ILEVI-I) #NVREAL
        IHEILEVZWNVREAL
        REPURN
    END
```

$$
4-D-13
$$

```
        SUBROUTINE STABLE
        DOUBLE PRECISION W, TTABW,F,VDERIV,TDERIV
        COMMON /CONSTS/. INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ&NVERT,
        1 NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
        COMMON'/VRTBLK/ ZVAL(26),PVAL(26),VWT(26),DZ,RV
        COMMON /BARBLK/ TBAR(26),SIGMA(26).XIBAR(26)
        COMMON /DERIV/ VDERIV(2366). TDERIV(2366).W(2366)
        COMMON /WORKBK/ STABW(2366)
        DO 200.I=ILEVI,ILEV2
        JJ=(I-1)*NVREAL
        F=-SIGMA(I)
        DO 100' J=IINVREAL
        Kn\J*J
        STABW(K) =F*W(K)
        100 CONTINUE
200 CONTINUE
    IL=(ILEVI-1)*NVREAL*1
    IH=ILEVZ*NVREAL
    DO 300 I=IL,IH
300 TDERIV(I)=TDERIV(I)*STABW(I)
    RETURN
    END
```

        SUBROUTTNE VSTABLE
        -
        \(\cdot\)
        DO 300 I.ILEVI.ILEV2
        \(F=-S I G M A(I)\)
        DO 300 J=l.NVREAL
        STABW (J,I) \(=F\) FW (J,I)
        TDERIV(J,I)mTDERIV(J.I)+STABW(J,I)
        CONTINUE
        RETURN
        END
    $$
4-D-14
$$

```
    SUBROUTINE DIFFXI
    DOU8LE PRECISION P,Z,Z1,T,Z2,X3,Z3
    DOUELE PRECISION VUERIV,XDERIV,W,GJ
    DOUBLE PRECISION DUM
    COMMON P(2366),Z(2366),Z1(2366),9(2366),Z2(2366), X3(2366),Z3(2366).
    COMMON /CONSTS/I INDEX,NR,LROINSOINSZOKINT&ILEVIOILEVZONVERT.
    & NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
    COMMON /BARBLK/ TBAR(26),SIGMA (26):XIBAR(26) ODIFFM(26),DIFFX(26)
    COMMON /DERIV/ VDERIV(2366).gXDERIV(2366),W(2366)
    COMMON /VRTBLK// ZVAL(26),PVAL(26),VWT(26),OZ,RY
    COMMON /QJBLK/ NZJOLIO3
    DIMENSION GJ(26)
    R2mRV-1.
    R2=1:/R2
    R1ERV*R2
    L103M1=L103-1
    If (L.03M1.LE:0) L103Mi=1
    00 100 I=90L103
    100 GJ(!)=0.00
    DO 300 J=1gNVREAL
    JJ&(L103M1-1) कNVREAL+J
    JEm\」
    DO 200 I=LIO3M1.LLEVV
    dim\2+NVREAL
    G\(b)8DIFFX(I)*(X3(J1)-X3(J2))
    J2%J!
    800 CONTINUE
    JJ=(LIO3-2)*NVREAL*J
    00 250 I=L103,ILEVZ
    JJE\J+NVREAL
    DUM*R1*GJ(I) -R2*GJ(I-1)
    XDERIV (JJ)=XOERIV(JJ) &DUM
    250 CONTINUE
    300 CONTINUE
    RETURN
    END
    SUBROUTINE VOIFFXI
        -
            *
        DO 100 1:1.L103
        GJ(I)=0.
        CONTINUE
        DO 300 J=1,NVREAL
        00 200 I=L103M1,ILEVZ
        GJ(I)=DIFFX(I)*(X3(J,I*1)*X3(J,I|.)
    200 CONTINUE
    DO 250 I=L103.ILEVZ
    XDERIV(J,I)=XDERIV(J,I)*RI*GJ(I)=R2*GJ(I-1)
    250 CONTINUE
    CONTINUE
    RETURN
    ENPRY DIFFXIO
    R2=RV-I:%
    R2=1./R2
    R1mRVaR2
    L103M1FL1.03m1
    IF (LIO3M1.LE.0) L103M1=1
    RETURN
    END
                    4-D-15
```

```
        SUBROUTINE WAOVXI
    DOUBLE PRECISION P,Z,Z1,T,Z2,X3,Z3
    DOUBLE PRECISION VDERIV,XDERIV,W,WDX3DZ,WXJBAR
    COMMON P(2366),Z(2366),Z1(2366),T(2366),Z2(2366),X3(2366),Z3(2366)
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,
    1 NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,OT
    COMMON /DERIV/ VDERIV(2366),XDERIV(2366),W(2366).
    COMMON /VRTBLK/ ZVAL(26),PVAL(26),VWT(26):DZ,RV
    COMMON /QJBLK/\cdotNZJPL103
    COMMON /WORKBK/WOX3DZ(2366)
    DIMENSION WXJBAR(26)
    DZ2=2.*OZ
    R2aRV-1.
    R2=1./R2
    R1=RV*R2
    DO 200 I=L103,ILEV2
    JJ=(1-1)*NVREAL
    IL=JJ=NVREAL+1
    IH=\J*NVREAL*I
    F=(X3(IH)-X3(IL))/0Z2
    DO 100 J=2,NVREAL
    KmJ\+J
    WDX3DZ(K)mF*W(K)
100 CONTINUE
200 CONTINUE
    NZ1=NVZON+1
    IL2PI=ILEV2*1
    DO 250 {=1,IL2P1
250 WXJBAR(I)=0.DO
    ILZMI=ILEV2*1
    DO 400 IFLIO3.ILEV2
    JJ={1=1) *NVREAL
    DO 300 J=2,NVZON
    K#JJ*J
300 WXJBAR(I) mWXJBAR(I)+W(K)*X3(K)
    DO 350 JmNZI;NVREAL,Z
    K=JJ*J
    K1*K+1
    *WXJBAR(I)=WXJBAR(I)&2.*(W(K)*X3(K)*W(K1)*X3(K1))
    CONTINUE
400 CONTINUE
    ILZP1mILEV2*1
    CALL XDERSP (WXJBAR,LIO3,IL2PI)
    DO 500 I*L103,ILEV?
    JJ=(I-1)*NVREAL*1
    WDX3DZ(JJ) =WXJBAR(I)
500 CONTINUE
    IL=(LIO3-1)*NVREAL*1
    IH*ILEV2*NVREAL
    DO.600 I=IL.IH
600 XDERIV(I)=XDERIV(I)*WDX3DZZ(I)
    RETURN
    END
```

    4-D-16
    ```
    SUEROUTINE VHADVXE
        -
        -
    00 200 ImL103,ILEVZ
    F*(X(1,1+1)-X(1,I-1)) &EZ2
    00 100 J#Z,NVREAL
    WOX3DZ(J,I)&FOW(J,I)
    100
    200
    CONTINUE
        DO 250 & 1.ILZPI
        WXJBAR(I)=0.
        CONTINUE
        DO $00 1:L803,ILEVZ
        OQ 300 JMZNVZON
        WXJBAR(I)$WXJBAR(I) कW(N.I)$X3(NOI)
    300. CONTINUE
    00 350 J&NZI,NVREAboZ
    WXJBAR(I) =WXJ&AR(I)&20*(W(J.I)#x3(J.I)+W(J&I|I)$\times3(J*I:I))
    350 CONTINUE
$00 CONTINUE
    CALL XOERSP{WXJBAROL103.ILZPI)
    00 500 I=6103,ILEVE
    WDX30Z(1,I) =WXJBAR(T)
    CONTINUE
    DO 600 ImILFIH
    XOERIV(I)*XOERIV(I) &WDX3OZ (I)
    GOO CONTINUE
    RETURN
    ENTRY HADVXIO
    DZZ=E&*O2
    EZ251./DZ2
    RZERV-1.
    RE=1./R2
    R1mRV*R2
    NZ1:NVZON&1
    IL2PI=1LEV2+1
    IL{MI2ILEVZ-1
    Il*(LIO3-1)*NVREAL*1
    IHmILEV2@NVREAL
    RETURN
    END
```

$4-D-17$

```
    SUBROUTINE OJSURF
    DOUBLE PRECISION P,Z,Z1,T,Z2,X3,Z3
    1 NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
    DSURF=DIFFX(NVERT)
    JJ=(NVERT=1)#NVREAL
    II=JJONVREAL
    00 100 ImI=NVREAL
    Jゴ=JJ+1
    II=II+I
100 X3(JJ)=DSURF*X3(II)
    RETURN
    END
    I
    SUBROUTINE VO3SURF
        *
        \bullet
    DO 100 I=1,NVREAL
    X3(I.NVERT)=DSURF*X(I)NVERT-I)
    CONTINUE
    RETURN
    ENTRY 03SURFO
    DSURF=DIFFX(NVERT)
    RETURN
    END
```

    COMMON P(2366),Z(2366),Z1(2366), T(2366), Z2(2366), X3(2366),Z3(2366)
    COMMON / ONSTS/ INOEX,NR,LR,INS:INSZ,KINT,ILEVI;ILEVZ,NVERT,
    COMMON /BARBLK/ TBAR(26),SIGMA (26):XIBAR(26):DIFFM(26),DIFFX(26)
    ```
    SUBROUTINE STREAM (N)
    DOUBLE PRECISION Z.P.X
    COMMON P(2366),Z(2366)
    COMMON / GBLK/ KD(43),CG(43),NCOMP(12).LWAVE(12),NV(43),LV(43)
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,
    1 NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
    DATA NW /O/
    NVZ1=NVZON+1
    DO 200 JJ=2,NVZON
    X=-1.DO/CG(JJ)
    JEJJ~NVREAL
    DO 100 I=ILEVI,ILEVZ
    J=J&NVREAL
    P(J)=X*Z(J)
CONTINUE
```

100

$$
4-D-18
$$

```
200 CONTINUE
    DO 400 JJ=NVZ1,NVECT
    X=-1.00/CG (JJ)
    Jx2$JJ&NVREAL"(ILEVI-Z)=NVZON=I
    DO 300 I=ILEVI:ILEVZ
    JaJ*NVREAL
    P(J) xX&Z(J)
    \l=J+1
    P(N1) =x&Z(J.)
    300 CONTINUE
    400 CONTINUE
    NW=O
    IF (NoEQ.1)'NWEI
    IF (NW-EQ.0) RETURN.
    00 600 J=1.92
    GO TO (450.460):J
    450 WRITE (6,1000) ILEVI
```



```
    JJ&NVREAL*(ILEVI-1)
    GO TO 470
    460. WRIPE (6.1000) ILEVZ
    JJ*NVREAL*(ILEVZ-1)
    GO TO 470
670 CONTINUE
    DO 500 I2I,NVZON
    IR=I\bulletJJ
    WRITE (6,1010) IOZ(IR),P(IR)
1010 FORMAT (1X,110,020%-0,20X,D20.10)
    500 CONTINUE
        DO 550 I=NVZL,NVREAL.Z
    1RxI+JJ
    If*!R&1
    WRITE (6,1020) I,Z(IR),Z(II),P(IR),P(II)
1020 FORMAT (IX,I\0,4020.10)
    550 CONTINUE
    600 CONTINUE
        RETURN
        END
        SUBROUTINE VSTREAM
        -
        00 401 J=2eNVREAL
        00 $01 I=ILEVI,ILEV2
        P(J,I)=x(J)*Z(J,I)
        CONTINUE
        RETURN
    enTRY StREAMO
    DO 200 JJ=2,NVZON
    X(JJ)=-1./CG(JJ)
200. CONTINUE
    NVZ1=NVZON+1
    00 400 JJ=NVZ1.NVEGT
    J=2*JJ=NVZ1
    X(J) =m1./CG(JJ)
    X(J+1)=-1./CG(UJ)
    CONTINUE
    RETURN
    END
                        4-D-19
```

```
    SUBROUTINE THWIND
    DOUBLE PRECISION P,ZETA,ZI;T,ZZ;DG,EG
    COMMON P(2366),ZETA(2366),Z1(2366),T(2366),22(2366)
    COMMON / ONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT,
    l NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
    COMMON /CGBLK/ KD(43).CG(43),NCOMP(12),LWAVE(12),NV(43),LV(43)
    COMMON /DEBLK/ DG(43),EG(43)
    COMMON /VRTBLK/ ZVAL(26),PVAL(26),VWT(26),DZ,RV
    COMMON /BARBLK/ TBAR(26):SIGMA(26):XIBAR(26)
    NW=0
    IF (ILEVI.EQ.0) NW:I
    ILEV1*2
    ILEVZ\approxNVERT-1.
C
    MEAN T COMPUTATION.
    J=(ILEV1-2)बNVREAL$!
    DO 25 I■ILEVI*ILEVZ
    J=J+NVREAL
    25T.(J)=TBAR(I)
C T FROM THERMAL WIND COMPONENTS.
    N=NCOMP (1)
    JB=1
    IF (N.LT.2) GO TO 250
    00 200 JJ=2,N
    JB=JB+1
    JェJB*(ILEV1-2)*NVREAL
    DUM=DZ*CG(JJ)
    Xl:=DG(JJ)/DUM
    X2=EG(JJ)/DUM
    DO 100 I=ILEVI,ILEV2
    JxJ+NVREAL
    T ( J ) = 0 . 0 0
    IF (JJ.GE.N) GO TO 50
    J2xJ+1
    Jl=\2-NVREAL
    T(J)=X2*(P(JI)-P(J2))
    5 0
    NNTINUE
    J2=J=1
    \l=J2-NVREAL
    T(J)=T(J) - X % (P(J1) - P(J2))
    CONTINUE
    CONTINUE
    CONTINUE
    JJ#N
    JB=J8-1
    DO 500 L*1,LR
    N=NCOMP(L+1)
    IF {N.LT.1) GO TO 500
    00.400 JJJ=1,N
    JJm\J+1
    JB=JB+2
    JR=JB+(ILEVI=2)*NVREAL
    DUM=DZ*CG(JJ)
    Xl=DG(\J)/DUM
    X2=EG(JJ)/DUM
    DO 300 I=ILEV1,ILEV2
    JR=JR*NVREAL
    JI=JR+I
    T(JR)=0.00
    T(JI)=0.DO
    IF (JJJ.GE.N) GO TO 260.
    J2=JR+2
    J1=J2-NVREAL
        4-D-20
```

```
        T(JR) =x2*(P(JL) -p (J2))
        J2=\2*1
        Nl=\I+1.
        P(JI) =X 2*(P(JI) =P(J2))
    260
    CONTINUE
    IF (JJJ.EG.1) GO TO 280
    J2#JR-2
    Ul=J2-NVREAL
    T(JR)&T(JR)-Xl*(P(J1)=P(J2))
    \2m, 22+1
    \l=dl+l
    T(JI)=T(JI) -XI*(P(J!)*P(J2))
    280 CONTINUE
    300. CONTINUE
    4OO. CONPINUE
    500 CONTINUE
    IF (NH-EG00) RETURN
    WRITE (601000) ILEV1, PLEV2
```



```
    JI &NVREAL*(ILEVI*I)
    J3&NVREAL*(ILEVZ-1)
    DO 600 IEIgNVZON
    \1m\!1
    \3%J3+1
    WRITE (6,1010) I,T(Jb),T(J3)
1010 FORMAT (1X.I10,D20.10,20X,020.10)
    600 CONTINUE
        NZ1ENVZON+1
        DO 700 I =NZI,NVREAL,2
        \l=\l|l
        J2m.jlol
        」3=\3*1
        」4mJ3+1
        WRITE (6,1020) I,T(J1),T(J2),T(J3),T(J4)
1020. FORMAT (1X,I10.4D20.10)
        J{x|l+1
        J3xJ3+1
    700 CONTINUE
        RETURN
        END
```

```
        SUBROUTINE VTHWIND
            *
            *
        DO. 25 I=ILEVI,ILEVZ
        T(1.1) =TBAR(I)
    25
        CONTINUE
        DO 501 Jx2.NVREAL
        00 501 ImILEVI,ILEV2
        T(J,I)=\times2(J)*(P(J+I,I=1) = P(J+I,I))-X1(J)*(P(J-I,I-1)-P(J-I,I))
        CONTINUE.
        ENTRY THWINDO
        ILEVI:Z
        ILEVZxNVERT-1
        N=NCOMP (1)
        00 200 JJm2.N
        DUM=DZ*CG(JJ)
        DUM=1./DUM
        Xl(JJ)=DG(JJ)*DUM
        X2(JJ) =EG(JJ)*DUM
200 CONTINUE
        JJ=N
        X2(N)=0
        JR=N-1
        DO 500 L=l.LR
        DO.400 JJJ=1,NCOMP(L+1)
        JJE\J+1
        JR=JR+2
        JI=JR+I
        DUMED2*CG(JJ)
        DUME1./DUM
        X1(JR)=Xl(JI)=DG(JJ)*DUM
        X2(JR) =X2(JI)=EG(JJ)*DUM
400 CONTINUE
    X2(JR)=X2(JI)=0.
500 CONTINUE
    RETURN
    END
```

        4-D-22
    

$$
4-D-23
$$

```
        PHASE=ANGVEL*(TIME*YRLAG*TROPLG)
    PHASE=SIN(PHASE)
    TIME=TIME*DT
C
C 2. DETERMINE HEATING IN ZONAL COMPONENTS.
    ILFMERGE2+1
    IF (MERGEZ.EQ.0) ILEIITSZ
    IH=I2TSZ
    00.230 I=1,7:2
    230 X(I)*1.
    00`240 I=2.7.2
    240 X(I)=PHASE
        N=NCOMP (1)
        NH:5
        IF (NHOGT.N) NH:N
        DO 300 ILEVEIL,IH
        YEH(ILEV)
        I=(ILEV*I)*NVREAL
        DO 260 Jx2,NH
        JJ=(J-2)*26*ILEV
    260 Q(I&J)#Y*(X(J)*TSZON(JJ)=T(I*J).)
        IF (NH.G .N) GO TO 280
        NHI=NH+1
        00.270 J=NH1,N
    270 Q(I*J)z=Y&T(I*J)
    280 CONTINUE
    300 CONTINUE
C
    3. DETERMINE NONZONAL HEATING.
    ILFMERGE2*1
    IHEIITSW-I
    IF (IITSWOLT.MERGEZ) IHaNVERT-I
    DO 330 ILEVEIL,IH
    NZ1=NVZON+1
    YEH(ILEV
    KK=(ILEV-1)SNVREAL*NVZON
    DO 330 JENZI,NVREAL
    KK=KK$1
330 Q(KK) #-Y*T(KK)
    ILEIITSW
    IHEI2TSW
    IF (IITSW.EQ.0) GO TO-500
    DO 450 ILEVaIL.IH
    Y=H(ILEV)
    KK#(ILEV-1)*NVREAL$NCOMP(1)
    IJ=0
    DO 400 LLE1,LR
    N=NCOMP (LL*.1)
    IF (N.EQ.O) GO TO 400
    NHI=O
    IF (LL.GT.2) GO TO 360
    NH=4
    IF (NH.GT.N) NHEN
```

    \(4-D-24\)
    ```
        DO 350 J=19NH
        KKmKK$1
        IJ=IJ&1
        Q(KK)=Y*(X(J)*TSEDDY(IJ) Of(KK):)
        KK#KK&l
        iJ&IJol
    350 Q(KK)=Y*(X(J)*TSEDUY(Id)mT(KK))
        IF (NH.GE.N) GO TO &OO
    3 6 0
    NH{=NH*I
        DO 370 JWNHI.ON
        KK\equivKKK+1
        Q(KK) =>Y%T(KK)
        KK#KK+!
```



```
    $00 CONTINUE
    $50 CONTINUE
    500 CONTINUE
C
```



```
C
C DETERMINE HEATING IN OVERLAP AREA BY GINEARLY WEIGHTED
C
    COMBINATION OF METHOD I AND METHOD II.
    IF (MERGE1.EQ.O) GO. TO 610
    NENCOMP (1)
    NHES
    IF (NHEGT.N) NHEN
    IJ=0
    DO 600 ILEVFMERGEI,MERGEZ
    {ve!J+!
    K*(ILEV-1)#NVREAL
    YEH(ILEV)
    0O 530 J=2,NH
    JJ*(J-2)*26*ILEV
    Q(K+J)=ZWTI(IU)*(Q(K&J)-Y*T(K&J))*ZWTZ(IJ)*Y*(X(J)*TSZON(JJ)
    1- - ( (k+ل))
530 CONPINUE
    IF (NH.GE.N) GO TO 550
    NHI|NH+1
    DO 540 J=NHI %N
    Q(K*J)=ZWTI (IJ)*(Q(K+J)-Y*T(K+J))=ZWTZ(IJ)*Y*T(K*J)
5$0 CONTINUE
5 5 0 ~ C O N T I N U E ~
    IF (N%LT.3) GO TO 555
    Q(K+3)EQ(K+3)&ZWTI(IJ)*Q3WR(IJ)
555 CONTINUE
    K=K+N
    DO 590 LLElgLR
    NN*NCOMP bloll)
    IF (NN.EQ.0) GO TO 590
    00 560 J=1,NN
    KaK+1
    Q(K)=ZWTI(IJ)*(Q(K)-Y*P(K))-ZWTZ(IJ)*Y*T(K)
    KinK+1
    Q(K):ZWTI(IJ)*(Q(K)-Y#T(K))-ZWTZ(IJ)*Y*T(K)
    560 CONTINUE
5 9 0 ~ C O N T I N U E ~
6 0 0 ~ C O N T I N U E ~
610 CONTINUE
```

    4-D-25
    ```
        IL=(ILEVI-1)*NVREAL*I
        IH=ILEV2*NVREAL
        DO 630 I=ILIIH
    630 TDERIV(I)=TDERIV(I)*Q(I)
C
    SAVE HEATING COEFFICIENTS IN HTSVE FOR OUTPUT.
    00640 IEIL.IH
    HTSVE(I)EQ(I)
    IF (NW.EQ.O) RETURN
    ILI=ILEVI
    IL2=MERGEI
    IL3mMERGE2
    IL4EILEV2
    IF (ILZ.GT.ILI) GO TO 650
    IL2*(ILI*IL4)/2
    IL3=1L2+1
650 CONTINUE
    WRITE (6,1000) ILI,IL2,IL3,IL4
2000 FORMAT (IHO,10X,'DIABATIC HEATING AT LEVELS ':4I5)
    Kl=(ILI-1)*NVREAL
    K2=(ILZ-1)&NVREAL
    K3=(IL_3-1)$NVREAL
    K4E(IL4-1)*NVREAL
    N=NCOMP (1)
    DO 700 I=1.N
    -K1mK1+1
    K2mK2+1
    K3=K3+1
    K4:K4+1
    WRITE (6,1010) I,Q(K1),O(K2),0(K3),Q(K4)
1010 FORMAT (2X,I5,E15,7,15X,E15.7,15X,E15,7,15X,E15.7)
    700 CONTINUE
    K=N
    OO 800 LLEl|LR
    N=NCOMP(LL+1)
    IF (N.EQ.O) GO TO 800
    DO 750 IEI,N
    K=K+1
    Kl#K1+2
    K2=K2*2
    K3=K3+2
    K4mK4*2
    K1RaK1-1
    K2RmK2-1
    K3R=K3-1
    K4R=K4-1
    WRITE (6,1020) K,Q(K1R),Q(K1),Q(K2R),Q(K2),Q(K3R),Q(K3),Q(K4R),
    1O(K4)
1020 FORMAT (2X,I5,8E15.7)
    750 CONTINUE
800 CONTINUE
    RETURN
    END
```

        \(\frac{-\cdots}{4-D-26}\)
    ```
    SUBROUTINE VDBHEAT
    -
    \bullet
    CALL OJHEAY
    00 100 ILEV:ILEVI,MGMI
    00 75 I=2,NVREAL
    Q(IOILEV) =Q(I.ILEV)=H(ILEV)*P(IPILEV)
    CONTINUE
100 CONPINUE
    PHASEmSIN(ANGVEL.(TIME+YRLAG&TROPLGG))
    TIME&TIME&OT
    DO 230 [x].7.2
    X(I)=1.
230 CONTINUE
    DO 2&0 I:2.7.2
    X(!) =PHASE
240 CONPINUE
    DO 300 ILEVmMERGE2*IDI2TSZ
    00 260 J=2,5
    Q(J.ILEV) =H(ILEV)*(X(J)*TSZON(ILEV,J-2)-T(J.ILEV))
    CONIINUE
    DO द70 J=6.NCOMP(1)
    Q(JOILEV):mH(ILEEV草(J,ILEV)
270 CONTINUE
300 CONTINUE
    DO 330 ILEV\MERGE2*1.IITSW*I
    DO 330 J#N21, NVREAL
    O(JOILEV) E=H(ILEV)*T(J.ILEV)
330 CONTINUE
    DO &50 ILEV&IITSW&IZTSW
    $J%1
    DO 400 Lbal.2
    DO 350 Jalg%.
    JJ=2%JONZ1
    Q(JJOILEV) =H(ILEV)*(X(J)*TSEDDY(IJ )=T(JJ.ILEV))
    Q(JJ&!&ILEV)=H(ILEV)*(X(J)*TSEDOY(IJ*I)-T(JJ*IOILEV))
    IJ#IJ&2
350 CONTINUE
    00 370 Ja5.NCOMP(LL+1)
    J\=2*J*NZ1
    Q(JJ.ILEV)=-H(ILEV)-T(JJ,ILEV)
    Q(JJ*I*ILEV) = =H(ILEV)*T(JJ*IOILEV)
370. CONTINUE
400 CONTINUE
    DO $0& LL=3,LR
    00 401 JeloNCOMP (LL+1)
    JJ=2*J=NZ1
    O(JJ.ILEV)=-H(ILEV)*T(JJ.ILEV)
    Q(JJ+1|ILEV)=~H(ILEV)*T(JJ*1,ILEV)
40! CONTINUE
450 CONTINUE
```

    4-D-27
    ```
        IJ=0
        DO 600 ILEVEMERGEI,MERGE2
        IJ=1J+1
        DO 530 J=2.5
        Q(J.ILEV)*ZWTI(IJ)*(O(J.ILEV)SH(ILEV)*T(J,ILEV))
    1
    5 3 0
        CONTTNUE
        DO 540 J=G:NCOMP(1)
        Q(J,ILEV) aZWTI(IJ)*(Q(J,ILEV)mH(ILEV)&T(N,ILEV))
    l -ZWTZ(IJ)*H(ILEV)*T(J,ILEV)
        Q(3:ILEV) =Q(3.ILEV) & ZWTI (IJ)*Q3WR(IJ)
        DO 590 LLE1,LR
        DO 560 Jxl,NCOMP (LL+1)
        JJ=2*J-NZ1
        Q(JJ.ILEV) =ZWTI(IJ)*(Q(JJ.ILEV))#H(ILEV)*T(JJ;ILEV))
        1-ZWTZ(IJ)*H(ILEV)*T(JJ,ILEV)
        Q(JJ+I*ILEV)=ZWTI(IJ)*(Q(JJ*I,ILEV)OH(ILEV)*T(JJ&I,ILEV))
    1-ZWTZ(IJ)*H(ILEV)*T(JJ+I;ILEV)
    560 CONTINUE
    590' CONTINUE
    600 CONTINUE
        DO 630 I=ILIIH
        TDERIV(I)=TDERIV(I) &Q(I)
        HTSVE(I)zO(I)
        CONTINUE
        RETURN
    ENTRY DRHEATO
    NZlmNVZON+1
    MGM1 =MERGE1-1
    ANGVEL*!./720.
    PI=4.*ATAN(1.)
    TROPLGa-30.04.sPI
    IL=(ILEVI-1)*NVREAL*1
    ILEVZONVREAL
    RETURN
    END
```

    4-D-28
    ```
    SUBROUTINE OJHEAT (II.I2)
    DOUBLE PRECISION P,Z,Z1,T,Z2,X3,Z3
    DOUBLE PRECISION XOX
    DOUGLE PRECISION TOPO,OSV,QT,SPACE,DXOSDT
    COMMON P(2366),2(2366),Z1(2366),F(2366),Z2(2366),X3(2366),23(2366)
    COMMON / ONSTS/ INOEX,NR,LR&INS,INSZ,KINT.ILEVIOILEVZgNVERT.
    & NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYGODPGYRLAG,TYME
    COMMON /OROGRA/ TOPO(91).QSV(105)
    COMMON /SPECIE/ X3GRD(6240),CNO3(6240),XNO2(330)
    COMMON/QJBLK/NZJ,L103,COLO3(26) LEVPCM&LEVOYN
    COMMON /WORKBK/ QT(2366).5PACE(5554)
    COMMON /CHEM/ XNEUT(26),TEMP(6240)
    COHMON /FTEST/ NLON,NLAT.NGRID
    COMMON /BARBLK/TBAR(26):SIGMA(26):DNOTER(26).DIFFM(26):DIFFX(26)
    GOMMON /GENER/ OXOSOT(2366)
    DIMENSION DATAIM(6240),DX3GRD(1),QTGRD(1).
    DIMENSION XOX(1)
    EQUIVALENCE (TEMP (1): XOX(1))
    EQUIVALENGE(QT(1),OX3GRD(1)),(SPACE(29:5),QTGRD(1))
    ILISVBILEVI
    Ib2SV#ILEVZ
C I . X3 SPECTRAL FIELDS FOR LEVELS LIO3 THRU NZJ TO GRID FIELDS
6
C
    TEMP X3GRD.
ILEV!=LI03
ILEVZヵNZ」
NLEVYNZJm6103*1
ILSPCa(LIO3-1) \#NVREAL +1
2LGRO= (L103-1) *NGRID+1
CALL SPCGDI (T(ILSFC), TEMP (ILGRD). OATAIM,NLEV)
CALL SPCGD. 1 (X3(ILSPC), X3GRD (ILGRO), DATAIM,NLEV)
IHGRD=ILGRO-I +NLEVANGRID
0050 IaILGRD.IHGRO
IF (X3GRD(I).LT.O.) X3GRD(I) \(=0\).
CONTINUE
```

4-D-29

```
    CALL DX3CHM
C DX3GRD QTGRD GRID FIELDS FOR LEVELS LIO3 THRU NZJ (DX3GRD) AND
C
C
    IF (IE.GT.NZJ) IZ&NZJ
    ILEVZENZJ
    NLEV=NZJ-L103*1
    CALL GDSPCI(DXO3DT(ILSPC),DX3GRD(ILGRD),DATAIM,NLEV)
    NLEV=1*LEVPCM-L103
    CALL GDSPCI (XOX(ILSPC), X3GRD (ILGRD), DATAIM,NLEV).
    ILEVZ=I2
    NLEV=I2-LI03*1
    CALL GOSPC1 (QT(ILSPC),QTGRD(ILGRD),DATATM,NLEV)
C
C AOO CONSTANT ZONAL HEATING FOR TOP LEVELS.
C
    ILEV1=ILISV
    ILEV2mILZSV
    NZl=NVZON+1
    K*(ILEVI*1)*NVREAL
    KK=(-ILEV1-1)*NVZON
    LEV2=L103-1
    IF (LEVZ.LT.ILEVI) GO TO 300
    ANGVEL=1./720.
    PHASE#ANGVEL*(TIME+YRLAG)
    PHASEmSIN(PHASE)
    DO 200 LEV=ILEVI!LEVZ
    K=(LEV-1)*NVREAL
    KK=(LEV-1)*NVZON
    DO 100 J=1,NVZON,?
    K2=KK+J
    KI=K+J
100 QT(K1)=0SV(K2)
    DO 125 J=2,NVZON,2
    K2=KK+J
    Kl=K&J
125 QT(K1)=PHASE*QSV(KZ)
    DO }150\quadJ=NZ1,NVREA
    Kl=K+J
150 QT(KI)=0.00
200 CONTINUE
300 CONTINUE.
C
c PLACE MEAN (I/XN)*(ON/DT) VALUES aT EACH LEVEL IN DNDTBR(LEV)
C FOR OUTPUT.
C
    DO 400 JxL103.NZJ
    K=(J=1) SNVREAL*1
400 DNDTBR(J)=DX03DT(K)
    RETURN
    END
```

$$
4-D-30
$$

```
    SUBROUTINE VOBHEAT
        *
        O
        COMMON /HEATBK/ . . .MERGE2& . . .
        CALL SPCGDI(T(ILSPC),TEMP.(ILGRO)-DOATAIMONLEV)
        GALL SPCGO1(X3(ILSPC),X3GRD(ILGRO),DATAIM,NLEV.)
        DO 50 ImILGRO,IHGRD
        IF (X3GRD(I).LT.O.) X3GRD(I)=0.
50 CONTINUE
    ILISV=ILEVI
    IL2SV=\LEVZ
    ILEVI=14103
    ILEVZ=NZJ
    CALLL DX3CHM
    CALG GOSPCI(DXO3DT(ILSPC),DX3GRD(ILGRD),DATAIM,NLEV)
    CALL GDSPCI(XOX(ILSPC), X3GRD(ILGRD),DATAIM,NLEVFCM)
    CALL GDSPCI(OT (ILSPC),QTGRD(ILGRD):DATAIM,NLEYMGMR)
    ILEVI=ILISV
    ILEV2=IL2SV
    PHASE=SIN(ANGVEL*(TIME*YRLAG))
    DO 200 LEV=ILEVI/LEV2
    DO 100 Jmd,NVZONoZ
    QT(J.LEV) BQSV(JOLEV)
    CONTINUE
    DO 125. Jm2,NVZON,2
    QT(J,bEV) #QSV(J.LEEV)*PHASE
CONTINUE
    OO 150 JmNZI,NVREAL
    QT(J.LEV)=0.
    CONTINUE
    CONTINUE
    DO 400 J=W103gNZJ
    DNDTBR(J) =0X03DT(1;J)
    CONTINUE
    RETURN
    ENTRY O3HEATO
    NLEV=NZJ*L103*1
    ILSPCE . - 
    ILGRD. . . *
    IHGRD= * -
    NLEVPCM=LEVPCM-L103*1
    NLEVMGMZ#MERGEZ-6103*1
    NZ1=NVZON+1
LEVZ=L103-1
ANGVEL=1.1720.
IF (LEVZ.LT.ILEVIH CALL SOS
RETURN
END
```

```
    SUBROUTINE OBCOL
C
C COMPUTATION OF OZONE COLUMN DENSITY IN
    CM*2(1 CM COL=2.682*10**29CM-2)
    COMMON /CONSTS/ INDEX,NR,LR,INS,INSZ,KINT,ILEVI,ILEVZ,NVERT.
    1 NRTP,LRTP,NTYPE,NVECT,NVREAL,NVZON,NCYC,DT
        COMMON/ VRTBLK/ ZVAL(26),PVAL(26),PP(26),DZ.RV
        COMMON/ SPECIE// X03(6240), CNO3(6240)
        COMMON/ FTCST/ NLON,NLAT,NGRID,AR(30),BR(30)
        DIMENSION CP(3)
C
    10 IF(ILEVI-2) 10:20,30
    CP(1)=12.*PP(1)/DZ
        I=0
        -4X031=0.
        AXO32=0.
        DO 300 LAT=1,NLAT
        DO 100 LON=1,NLON
        I=I+1
        AX031=X03(I)*AX031
        AX032=X03(NGRID+I)*AXO32
200 CONTINUE
        CHECK=RV*AXO32*0.95
        IF(CHECK.LE.AXO31) GO TO 200
        RATIO=AXO'32/AX031
        DM=ALOG(RATIO)/DZ
        CONST=CP(1)/(DM+1.0)
        I=I=NLON
        DO 150 LON=1.NLON
        I=I*I
150 CNO3(I)=CONST*XO3(I)
        GO TO 299
200 IxI-NLON
        DO 250 LON=1,NLON
        I=I"$1
        CNOB(I)=0.0
250 PRINT 5.1,XO3(I)
    5 FORMAT. (IOX,'OZONE SCALE HEIGHT IS TOO SMALLL AT LEVEL'.I5,5X:
        1 (XO3 = ',E1O.3)
299 CONTINUE
300 CONTINUE
```

$$
4-D-32
$$

```
G
G COMPUTATION OF OZONE COLUMN DENSITY ABOVE OTHER LEVELS USING
G QUAORATIC CURVE FIT TO THE VERTICAL PROFILE OF OZONE
    GEIWEEN LEVELSS.
G
    20 I#NGRID
    CP(1)=5.&PP(2)
    CP(2)=8.0*PP(1)
    DO $00 L=IBNGRID
    I*IDI
    Ilx6
400 CNO3(I)=CP(1)*X03(I)*CP(2)*X03(ID)*CNO3(Ib)
    &bl=3
    GOTO 35
    30 I#NGRID*(ILEVI*1)
    Mbl=ILEEVI
    35 00 600 JxILI.ILEVZ
        Jlmjol
        J2xdlol
        CP(1)\approx5.0*PP(J)
        CP(2)=8.0&pp(J!)
        CP(3) =PP(J2)
        DO 500 L=1,NGRID
        IE!&
        IL:ImNGRID
        12mI{-NGRID
500 CNO3(I)= CP(I)*XO3(b)&CP(2)*XO3(II)-CP(3)*XO3(I2)+CNO3(II)
600 CONTINUE
        RETURN
        END
```

```
    SUBROUTINE VOBCOL
        •
        -
        DO 400 K=1.NLAT
        DO 400 I=l.NLON
        CNO3(I,K, 2)=CP1*XO3(I%K,2)*CP2*XO3(I,K,1)*CNO3(I,K,1)
    400 CONTINUE
    DO 600 J=3.ILEV2
        DO 500 K=1,NLAT
        DO 500 I=1,NLON
        CNO3(I,K,J)=CP(N,1)*X03(I,K,J)*CP(J,2)*XO3(I,K,J=I)
    1 -CP(J,3)*XO3(I,K,J-2)&CNO3(I&K.J-1)
    500 CONTINUE
    600 CONTINUE
        RETURN
        ENTRY O3COLO
        DO 601 JxILEVI,ILEVZ
        CP(J,1)=5.*PP(J)
        CP(J,2)=8;*PP(J-1)
        CP(J,3)=PP(J-2)
    601 CONTINUE
        CP1=5.*PP(2)
        CP2x8.*PP(1)
        RETURN
        END
```

DIVISION 5
TECHNOLOGY SURVEY UPDATE

## DIVISION 5

## TECHNOLOGY SURVEY UPDATE

## INTRODUCTION

This segment of the report provides current information as an update of the estimates and prognostications provided in reference 1 , on the general subject of circuit technologies, primarily semiconductor. A number of other bodies of technology, which might be called "System Technologies", are also critical to the FMP design, and their significance should not be lost. Their rate of change, however, tends to be less dramatic and less public; for this reason the following material dwells on circuitry, emphasizing the changes perceived since the previous report.

These can be summarized as follows:

- In the mainline circuitry (ECL) the scales of integration and possibilities have become more defined. There is, of course, still considerable margin in the time dimension, but the picture is sharper than a year ago.
- Of the long-shot logic technologies, Gallium Arsenide (GaAs) is benefitting from increased investment. As such it is worthy of somewhat closer attention. Cryogenic options remain too esoteric to warrant serious concern.
- Of the potential auxiliary memory technolgies, CCDs are still the most plausible, although their availability for the target time frame has become more questionable. The availability of 64 K dynamic RAMs (DRAMs) on the other hand is more certain, albeit at costs which may be higher than is acceptable.
- An "intermediate" memory has been postulated for the proposed AMP. It .wilil utilize only established low-cost technologies.

Some elaboration on these points follows.

## CRITICAL CIRCUIT TECHNOLOGIES

The previous studies (refs. 1, 2) developed the decision that the FMP schedule is best served by use of high-speed ECL logic such as is now (1979) coming into production. The Fairchild version is called $F 200 \mathrm{~K}$, and the internal CDC designation is LSI-168.

As this technology matures natural improvements in cost, reliability and speed will evolve. The next most likely major change would be to a considerably larger scale of integration with some modest initial speed improvement. A change such as this, however, requires a major overhaul of the CAD/CAM (computer aided design/computer aided manufacture) support
system plus a step up in semiconductor technology. To the extent such techniques and circuitry are available for use, they may be invoked in critical areas. However, serious
consideration for their use cannot be prudently planned in the time scales proposed for the FMP development. Table 1 collects. the LSI products in the high speed (ECL) technology, and projects some estimates of availability of the next most significant steps. The final entry (availability 1985) is quite conjectural at this stage. Semiconductor technologists tend to think in terms of three-year increments, so data gathered from them often shows this sort of expected cycles. In point of fact the technology does not move in sudden jumps, but inches along on a broad front. Enough of these breakthroughs are known to be pending, however, that the progress has reasonable predictability. The time scales have a way of stretching out, however, particularly as reasonably large-scale manufacturability is required.

TABLE 1
ECL LSI RELATED PRODUCTS

| PRODUCT | $\begin{gathered} \text { YEAR* } \\ \text { INTRODUCED } \end{gathered}$ | EQUIVALENT GATES/CHIP | STAGE DELAY $\qquad$ | STATUS |
| :---: | :---: | :---: | :---: | :---: |
| Amdahl Gate Array | 73 | 100 | 750 | Custom |
| CDC MOT Gate Array | 75 | 190 | 900 | Discontinued |
| CDC/F200K Gate Array | 77 | 250 | 650 | In proto production |
| Siemens F100k Gate Array | 77 | 500 | 750 | Final development |
| Motorola 10K Macro Cell Array | (79) | 750 | 1200 | In development |
| FSC 8-Bit Slice Set | (79) | $2 \mathrm{~K}-8 \mathrm{~K}$ | 650 | Custom set of four types (not an array) |
| Next Generation Gate Array | (82) | ( $1.5 \mathrm{~K}-2 \mathrm{~K}$ ) | (500) | In exploration stages at FSC, MOT, NATL. |
| Follow-On Gate Array | (85) | $(5 \mathrm{~K}-7 \mathrm{~K})$ | (250) | Prediction by suppliers |

*Numbers in parenthesis are anticipated.

Suppliers such as Motorola, Fairchild, and National presently see 1000-2000 gate equivalents being practical in preproduction quantities by 1982. Although processing differences exist, all project use of some form of oxide isolated, walled emitter ECL process on a die size of less than 8 mm , the current practical limit for projection step and repeat photolithography. All agree that additional advances in photolithography to 1 micron or less and improved metalization processes are required to achieve the 1985 objectives of some 6000 equivalent gates per die.

Other forms of logic have demonstrated subnanosecond performance and must be given consideration. One form, Josephson switches, has demonstrated sub-100 ps delays but require a super-cooled (4-5 degrees Kelvin) ambient environment. CDC, to date, has monitored the progress of this technology only via periodical reviews. The feasibility of conducting actual Josephson switch experiments is currently under investigation. A second form of subnanosecond delay technology that has been demonstrated in $R \& D$ facilities at Rockwell, Hewlett-Packard, RCA, TRW, Motorola, and others utilizes MOS technology and Gallium Arsenide (GaAs) material. Because of the superior mobility of GaAs, 5-6 times that of silicon, delays in ring counter form of $75-150 \mathrm{ps}$ have been demonstrated. The device operates with low power (microamperes current) and in normal ambient temperatures. GaAs devices, in fact, have superior quality to silicon at high temperatures. Material uniformity difficulties, as well as the difficulty in growing necessary oxide coatings over the wafer surface, have hampered progress in this technology's growth. Recent advances in ion implantation have helped, and densities up to 100 gates/die are considered achievable in CY79. Because of the similarities to silicon, the projected superior performance while operating at lower power, the MOS-like circuit packaging densities, and the recent interest shown by major suppliers, efforts to watch this technology more seriously are warranted.

## 2

Recent projections for NMOS, CMOS and I L technologies might suggest that these logic families are to be considered legitimate high performance candidates in the near future. However, they tend to "bottom out" speed-wise in the 1-3 ns range. For applications which demand the ultimate speed, they cannot be seriously considered unless some major breakthroughs occur in architecture which can overcome this handicap. Because the low speed-power product of these technologies enables a remarkable scale of integration, and because of their value in memory products, it is felt proper to spend some time on them here.

The present candidates include: one bipolar prospect (I L), improved NMOS technology in two forms titled HMOS and VMOS, and 2
the CMOS-SOS, or CMOS-SOIS, technologies. I L is in a form of inverted transistor technology utilized by several suppliers
including Signetics in a general gate array of some 400 gates, and FSC in the 9440 16-bit microprocessor family (over 7-8K gates/chip in production). CDC also manufactures some arrays of this type internally. HMOS is a form of scaled geometry and scaled processing NMOS technology aimed at high performance and high density applications. Intel utilizes this process in their sub-50 ns static RAM (SRAM) products and their 8086 16-bit microprocessor. VMOS is an alternative to HMOS utili-zing the at.tributes of etching pits into silicon to form fine gate geometries by implantations/diffusions rather than surface geometries as done in HMOS. AMI utilizes this process in SRAM products for high density as well as performance. CMOS refers to complementary MOS. This implies that the circuitry always has one "off" device in the ground-to-power-bus chain. The switches (ignoring minor leakage currents) have power dissipated only during logic switching periods. The device gate-to-gate on-chip capacitance is reduced further by utilizing either sapphire as the insulating substrate (silicon on sapphire) or oxide isolation between switches (silicon on insulated substrate).

## 2

I L and HMOS/VMOS also utilize oxide insulated device separation for reduced capacitance.

Conservative projections show $10 \mathrm{~K}-50 \mathrm{~K}$ gate equivalents becoming reasonable by the early 80 's with moderate improvement in photolithography ( 1 micron spacings and widths). In addition, stage delays are projected to approach, if not improve below, 1 ns. Presently, these technologies in R\&D design applications (not perfectly designed ring counters for optimum performance) offer impressive 2-4 ns stage delays.

Figures 1 and 2 illustrate the expected trends in two "best-bet" 2
technologies (ECL and CMOS/I L) plus the possible evolution of what CDC considers to be the best wild-card technology, GaAs.

## AUXIEIARY MEMORY TECHNOLOGIES

To. review, the term auxiliary memory technologies has been used to denote the various storage technologies which promise significant cost improvements relative to RAMs, yet with considerably improved access time vis-a-vis rotating magnetic storage. The candidates have been considered to be charge coupled devices (CCD), magnetic bubbles (MBM), and electron beam storage (EBAM).


Figure 1. LSI Stage Delay Performance


Figure 2. LSI Chip Density

None of these has kept pace with its protagonists' hopes or projections over the past five years. Commercial products have appeared for both CCDs and MBMs. In both cases the products, while quite capable, have proved to be considerably more limited in application than hoped for. Nevertheless they have found some valuable use, and as such must be judged to have been "born". Table 2 summarizes the known extant examples, as well as some current predictions.

It is quite clear that auxiliary memory technologies must achieve enough volume to drive their cost below the DRAM competition, a classic chicken and egg situation. If this is accomplished for CCDs they clearly are the best bet for the FMP Backing Store, a memory which is a significant requirement in order for the FMP to achieve its expected system performance. The probability of suitable CCDs being available to build the Backing Store is quite acceptable, although not completely assured; at this point the cost may prove to be a larger hurdle.

Magnetic bubble technology is driving increasingly toward lower cost, at the expense of speed. This comes about because of the need to take advantage of MBMs special properties, e.g., non-volatility, as well as the particular markets these open up. Several semiconductor houses have made significant investments in MBM recently, indicating a growing appreciation for the potential markets. This type of investment is needed to drive the costs down. So in the case of bubbles, the availability and part costs look reasonably promising for the FMP Backing Store; but the speed expectations pretty well rule them out.

EBAM's continue to attract some research investment as a long shot possibility. The DARPA-sponsored interest in this area has, however, been abandoned as of this fiscal year.

AUXILIARY MEMORY TECHNOLOGY
Charge Coupled Devices Magnetic Bubble Devices


* Volume cost advantage occurs after the steep portion of the learning has been completed.
**Cost/price estimates are highly speculative until technology market are established.

DIVISION 6
NASF RELIABILITY-AVAILABILITY EVALUATION

## NASF RELIABILITY-AVAILABILITY EVALUATION

This evaluation incorporates observed MTBF for standard Control Data computing equipment based upon most recently filed data, and estimates of inherent MTBF for the FMP based upon the most recent reliability evaluation of the CYBER 203 (STAR-100A). Several assumptions have been made in the FMP evaluation and the total system evaluation to provide meaningful. structure to the reliability models and to simplify the computations. Assumptions always introduce error; however, where assumptions and/or estimates were made, the effort was to bias the decisions toward the worst-case condition which should yield conservative MTBF and availability estimates.

## FMP Reliability Evaluation

The failure rate and MTBF estimates of the FMP and its units are shown in table 1 . Two types of failure rates, elemental and functional, are derived since the memory units and a large part of the data transfer paths utilize single-error correction/ double-error detection (SECDED) logic. The definitions of these two types of failure rates are given in CDC-STD 1.12.999 Glossary of Reliability, Availablity and Maintainability Terms (included as appendix A).

The FMP logic units are assumed to be similar in structure and logic design to the CYBER 203 and therefore the reliability model uses the same methods as that of the: CYBER 203. The CYBER 203 reliability prediction is based upon the detailed final equipment configuration. The reliability model is composed of reliability modules of the basic configuration building blocks. These modules are derived from the CYBER 203 structure in a way which combines the physical entities into a functional module. For example, the LSI device reliability module is composed of the LSI device and its associated capacitors, solder joints, connector and proportionate number of terminators. In a like manner, reliability modules are defined for the LSI boards, F100K circuits, and so on. The proportionate number of parts, such as terminators, vias, coax connectors, and the like, are allocated on the basis of the average distribution in the CYBER 203. Appendix.B defines the reliability modules. The component part failure rates for some of the major parts (e.g. the LSI device, the storage device, etc.) have changed since the last report and their new failure rates are given in appendix $C$. Failure rates for other parts are taken from CDC-STD 1.12.020 Component/Piece Part Failure Rates (included as appendix D).

Appendix $E$ is the derivation of the elemental failure rates of the FMP units utilizing engineering design estimates and the reliablity modules of appendix $B$. The functional failure rates of the Iogic units are derisved by the method described in appendix $F$ using the SECDED model described in appendix $G$.

Table 1
FMP Reliability Summary

| Unit |  | Failure Rate* |  |
| :---: | :---: | :---: | :---: |
|  |  | Elemental | Functional |
| Scalar |  | 665.5 | 665.5 |
| Swap |  | 70.8 | 16.7 |
| Intermediate Map |  | 83.6 | 56.6 |
| Main Map | 118.4 |  |  |
| Memory Interchange | 424.6 | $>697.3$ | 425.3 |
| Vector Streaming 154.3/ |  |  |  |
| Vector |  | 1,848.0 | 1,848.0 |
| Stream Control |  | 61.0 | 61.0 |
| Input/Output |  | 167.1 | 167.1 |
| Main Memory |  | 14,298.5 | 1,924.3 |
| Intermediate Memory |  | 41,666.7 | 1,518.8 |
| Backing Store (262K device) |  | 36,624.6 | 1,845.6 |
| Refrigeration |  | 409.0 | 409.0 |
| Power |  | 438.2 | 438.2 |
| Total FMP failure rate* |  | 97,030.3 | 9,376.1 |
| FMP MTBF |  | 10.3 hrs | 106.7 hrs. |
| Total with $\begin{aligned} \text { FMP } \\ \text { 262 } \\ \text { Reliability }\end{aligned}$ |  | $\begin{aligned} & \text { with 65K CCD } \\ & 97,030.3 \end{aligned}$ | 9,373.8 |
| Less Backing Store ( 262 K ) |  | 36,624.6 | 1,845.6 |
| 60,405.7 7,528.2 |  |  |  |
| Plus Backing Store (65K) |  | 159,185.4 | 8,576.0 |
| Total FMP failure rate* |  | 219,591.1 | 16,106.5 |
| FMP MTBF |  | 4.6 hrs . | 62.1 hrs . |
| $\text { *Per } 10^{6} \text { device hou }$ | $\operatorname{s}\left(10^{-6}\right.$ | failures per | device hour) |

The Main Memory is based on the design of a memory now under development utilizing a $1 \times 4096$ ECL memory device. Its elemental and functional failure rates and MTBFs are derived in appendix $F$ as are those of the CCD. Backing Store. The failure rates for Intermediate Memory (shown in table 1) were provided by the Control. Data Division which is currently responsible for memories of that type, on which its design is based.

## NASF System Reliability Evaluation

The NASF system, composed of the FMP and standard system components configured in a redundant manner, is summarized in table 2. Because of the system complexity and redundancy, and because utilization of the system does not always require every computational and data handling function to be available simultaneously, a meaningful overall reliability figure of merit (from a user viewpoint) cannot be stated. However, the availability and reliability, as may be seen by a user, can be derived if the required resources and period of use are known. An example is used to provide an estimate of what might be expected. The detail assumptions and derivations are presented in appendix $H$; a summary of the assumptions and results are presented here.

## Assumptions

1) A remote user uses the following resources for a two-hour period:

2551 Communications Controller
CYBER 175 Computer
PDCs for system interconnection
FMD Disk Subsystem
ECS Subsystem
819 Disk Subsystem
FMP
2) The FMD Disk Subsystem uses three of the four disk units.
3) The four 819 disk units assigned to the user, for practical purposes, have no immediate back-up.

Results

1) User availability (the probability that the resources are available on demand) is $98.89 \%$
2) User reliability (the probability of completing the task in two hours) is $98.24 \%$

Other performance characteristics of the system are:

1) The operating system critical components (the FMP, the ECS, and the CYBER 175s) may cause a system interruption on the average of once every three to four weeks. In these cases the operating system must be reloaded and all users will have to reinitiate their jobs or tasks. (See operating system critical MTBF derivation in appendix H.)
2) Something in the system fails approximately every six hours but because more than $50 \%$ are correctable, an operator or user may be inconvenienced every thirteen hours.
3) Assuming the example (appendix H) represents a typical use of the system, the applicable failure rate apparent to a user or class of users is 8804 failures per million hours or 113.6 hours MTBI. Stated another way, once every four or five days a given user may be required to restart or reinitiate the job or task being performed. Eighty-two percent of these failures are local with respect to the system, meaning only specific users are affected. The other eighteen percent are a result of a system failure affecting all current users.

Table 2
NASF Reliabily Summary
Elemental

| System Component | Qty. | MTBF | $\begin{aligned} & \text { Failure } \\ & \text { Rate } \\ & \hline \end{aligned}$ | Functional MTTR* |
| :---: | :---: | :---: | :---: | :---: |
| CDC FMP | 1 | 10.3 | 97030.3 | 1.5 |
| CYBER 175 | 2 | 367 | 2725.0 | 1.7 |
| CYBER 18 | 1 | 362 | 2762.0 | 2.4 |
| (2551) Network Processor | 2 | 1846 | 541.7 | 1.6 |
| Programmable Device Controller | 16 | 10000 | 100.0 | 1.5 |
| 7030 ECS | 1 | 630 | 1587.3 | 1.8 |
| 677 Tape Unit | 2 | 1022 | 978.5 | 1.6 |
| 679 Tape Unit | 2 | 1022 | 978.5 | 2.2 |
| 7021 Tape Controller | 1 | 3200 | 312.5 | 1.8 |
| 885 FMD Disk | 4 | 5000 | 200.0 | 2.0 |
| 7155 FMD Controller | 2 | 8000 | 125.0 | 3.0 |
| 7881 Cartridge Storage Unit | 8 | 1730 | 578.0 | 1.5 |
| 7882 Mass Storage Transport | 16 | 960 | 1041.7 | 1.0 |
| 7880 Mass Storage Controller | 3 | 1970 | 507.6 | 4.0 |
| 819 Disk Drive | 16 | 2800 | 357.1 | 2.2 |
| 7639 Disk Controller | 4 | 5000 | 200.0 | 1.2 |
| 580 Train Printer | 4 | 442 | 2262.0 | 2.2 |
| 405 Card Reader | 2 | $1091{ }^{\circ}$ | 981.4 | 1.5 |
| 3447 Card Reader Controller | 2 | 24000 | 41.7 | 4.0 |
| 415 Card Punch | 1 | 1091 | 981.4 | 3.4 |
| 3446 Card Punch Controller | 1 | 14400 | 69.4 | 2.9 |
| 8271 Transfer Switch | 2 | 72000 | 13.9 | 0.5 |
| 3270 Switch Controller | 1 | 12000 | 83.3 | 2.0 |
| NASF Totals |  | 6.39 | 156393.2 |  |
| * Because of system redundancy, the MTTR should not, except in the case of the FMP, be taken to be the time that the system is down when the associated equipment fails. Even in event of an FMP failure, if the failure is in a vector pipeline or in I/O (about $20 \%$ of the time) the system can be back up in minutes. 6-5 |  |  |  |  |

GLOSSARY OF RELIABILITY,
AVAILABILITY, AND MAINTAINABILITY TERMS.


## GLOSSABY OF RELIABILITY, AVAILABILITY AND <br> MAINTAINABLLITY TERMS

### 1.0 SCOPE

1.1 Purpose - This Standard delineates a list of terms and their definition as used in CDC on the subjects of Reliability, Availability and Maintainability (RAM). These definitions are intended to reduce inconsistancies and confusion in nomenclature.
1.2 Applicability - This standard applies to other standards in the CDC-sTn 1.12.000 , Series.
1.3 Effectivity - This standard is effectlve immediately upon release.
1.4 Authoryty - The enforcement of this standard is in accordance wath CDC-Policy 10:04:00. . Waivers from this standard are only granted via the controlling document. See CDC-Policy 10:04:30. The Interpreting authority for this standard is the General Manager of CDC Technical standards.
2.0 APPLICABLE DOCUMENTS
2.1 Referenced Documents

CDC-Policy 10:04:00 CDC Technical Standards
CDC-policy 10:04:30 Deviations or Waivers from CDC Technical Standaris CDC-STD I.12.000-Reliability, Maintainabılıty and Availability Standards
2.2 Related Documents ${ }^{\text {. }}$

None
3.0 GLOSSARY

Not Applicable
1.0 REQUIREMENTS

Many of the terms are composed of two or more words, These terms appear whth the noun first followed by a coma and all modifiers. Thus, "software installation aids" would appear as "aids, software installation" and would be found with the term starting with the letter "a".
action, repair -
A single maintenance procedure or step designed to completely correct a fajiure. Examples of repair actions are "replace module at Location Bl3" or "perform procedure 54 to re-align the read head".
availability -
The probability that an item will perform its specified operation under stated conditions at any given time.
availability, basic -
The fraction of time that a system or product is not being reparred. It is also called intrinsıc availabllıty and ınherent availability. Basic Availability in fractional notation is:

Basıc zvailability $(\mathrm{Ab})=\xrightarrow[\text { Measurement Interval - Active repaur tame }]{ }$
Measurement Interval

$$
6-A-1
$$

| (5) | SYSTE | $\left.\right\|_{\text {REV }} ^{\text {STD }}$ |  |
| :---: | :---: | :---: | :---: |
| CONTROL DATA corporation | STANゆAFD | $\mid$ PATE | August I97a |

## availability, net -

The fraction of a system's designed and expected (potentıall throughput achievable on demand during a given calendar time period. Potential throughput is that which can be achieved in the absence of interruptions and scheduled maintenance. Net availability is reduced by all lost time, both system down and degraded, during scheduled operating time and scheduled maintenance time.

Net Availability (An) $=$ Calendar time - lost time - scheduled maintenance time
Calendar time
avaılability, user -
The fraction of a system's designed and expected (potential) throughput achievable on demand during scheduled operating time. Potential throughput is that which can be achieved in the absence of interruptions. User avaıjability is similar to net availability except that only events occurring durinq scheduled operating time are counted. User availability is reduced by all lost time, both systen down and degraded, during scheduled operating time caused by an interruption.

User Availability (Au) =
Scheduled Operating Time - Lost Time
Scheduled Operating Time
average, running -
Current month running average $=$
Current month average $\ddagger$ twice the previous month's running average
where the running average of the first 3 months 3 defined as the average for the months being measured. This is done so as not to overemphasize the first month data. In cases where the current month's value is very large compareri to the previous month's running average (e.g., no failures for the month) the following equation should be used:

Current month running average"=

$$
\text { l/current month average }+2 / \text { previous monti's running average }
$$

```
call, service -
```

The response to a request for remedial maintenance that is attencied to at the user - location by one or more persons from the maintenance organization. A service call is normally occasioned by one or more incidents such as fallures, misuse, or media caused failures. Service call-backs are included. Activitıes such as oreventive maintenance and installation of modifications are excluded.
configuration, target -
The configuration which approaches the preaicted typlcal field application of the product.
controlware -
A processor program for a particular processing unit integral to a produrt that provides the product a set of functional operatang characteristics. Controlware is supplied as part of a product in accordance with applicable control Data nolicies and procedures and is necessary for proper product operation. The programs are considered as programmed functions which may be analogous to hardware logic and are documented, maintained and supported in a similar manner.
तeadstart -
The inltial action taken to start a computer system when no software is resident or こ=ive in that system. Deadstart normally occurs after a total system shutdown or an unrecoverable system error and causes system initialuaation and operation ınitiatıon.

## SYSTEM <br> STANDAFD

```
STD 1.12.999
REV A
DATE August 1978
FAGE 3
```


## degradation, graceful -

The automatic and/or orderly removal of some of a system's capabilities due to a failure. This is accomplished in a manner which allows continued system operation, but with reduced capability.
detection, data error -
The process (whether by software and/or by hardware means) of recognizing that one or more bits are incorrectly transferred, stored, read or manipulated.

## diagram, reliability block -

A pictorial arrangement of parts (functional blocks) which describes the separably identifiable functions of a product, equipment or system and their reliabllity relationships.

DPSR - Diagnostic Programming System Report -
A form, AA 4329, used by CDC maintenance software users to report maintenance software fallures. The DPSR applies to released products. DPSRs are classified the same as PSR (for classification definition see PSR).
effectiveness, manntenance software -
Used with maintenance software as a specification of performance which indicates the degree with which maintenance software produces its desired result of detecting and isolating failures.
error -
The difference between an observed or calculated value and a true value, as in data error; something produced by mistake, as in design error.
error, system recovered -
The encountering of a failure or error which is detected and recovered from without 1) manual intervention, 2) loss of any system resources, 3) producing incorrect results, and 4) termination of any application abnormaliy. The encountering of a failure which does not result in a system down or-degraded interruption.
factor, degradation -
The average percentage throughput capability lost with the specific class of failures or interruptions.

Eactor, duty -
The duty factor is the percentage of time an item is used. That is:
Duty Factor $=\frac{\text { Actual Usage Time }}{\text { Scheduled Operating Tıme }}$

## Eailure -

A state of inabilaty of an item to perform its intended function. The cause may be breakage or deterjoration beyond specifyed limits or design errors. A falure may be a data error rate which has deteriorated beyond specified limits. Multiple encounters of the same fault/failure are a single failure. (See Interruption.)
failure, elemental -
A component fallure requiring seplacement or adjustment within a unit whether or not the unit ceases performing. For example a single bit failure in a single error correcting double error detecting (SECDED) memory.
failure, functional -
Any fallure or combination of failures which causes a system, product or equipment to cease performing a specified function.
failure, maintenance software -
Each of the following are maintenance software failures:

- The inability to complete testing due to an error in the maintenance software program itself.

$$
6-A-3
$$

STD 1.12.979 REV A<br>DATE August 1976<br>PAGE $4 .$.

- The report of a hardware fault when no fault exists
- The incorrect identification of a hardware fault even though another fault is present in the hardware
- The inability to perform an auxiliary function due to a design error in the maintenance software
fault -
The cause of failure.
FCO -
Field Change Order. The directuve to install changes to equipment after the normal manufacturing process in order that the equapment wall periorm to its written or mplied specification.


## firmware -

A physical electronic component in which a program resides that is incorporated in a product to provide a programmable mode of operation defining the product's functional characteristics. Firmware 15 not self-modifiable, and is subject to change or modification only by physical modification or replacement.
incident, farlure An occurrence requirang remedial maintenance to correct a single Êallure.
installabillty -
A characteristic of design and environment which expresses an item's abılıty to be configured into a system in the manner specified by a customer on his location.

Interruption -
The cessation of productive processing due to the encountering of a farlure. An interruption is not ended untal at is followed by 15 minutes or longer of productive processing.
interruption, systen degraded -
The encountering of a failure which does not result in a system down interruption or a system recovered error. A system degraded interruption 1) results in an application program termınating abnormally or producing an incorrect result, or
2) requires some manual intervention or the downing of some system capability to
allow recovery.
interruption, system down -
The encountering of a failure which results in none of the user applications being correctly processed. This can be recognized by a deadstart or manual intervention being required to return the system from a down state to a productive state.
isolation, error/failure -
The process by either software and/or hardware means of localizing an error or
fallure to a specified level.
K-factor -
A translation modifier which relates varıous reliabılıty parameters. For example: (failures) (K-factor) $=$ interruptions and (Inherent MTBF) (K-factor) $=$ observed MTBF. Specıfic K-factors require that the parameters being related be identified.
life -
The actual use time before a product will be scrapped or require a major refurbish-
ment.
mainframe -
An organized collection of directly connected hardware products, equipments, parts and accessories consisting of a single central memory, one or more central processors, perıpheral processors, channels and control consoles.

$$
6-A-4
$$

CP
CONTROLDATA
CORPORATION

## maintainabılity -

A characteristic of design and environment which is expressed as the probability that an item will be retained in or restored to a specified condition within a given period of time, when the maintenance is performed in accordance with prescribed procedures and resources.
maintenance -
Any activity to repair a product or correct supporting documentation in order to eliminate or prevent errors or failures in a post-release product.
maintenance, preventive -
A procedure of periodically checking and/or re-conditioning a system or unit to prevent or reduce the probability of failure or deterioration while in service.
maintenance, remedial -
Those activities where a technician is working on a unit or system on a customer installation to make the unit or system operational except those activities that are consıdered preventive maintenance; associated repair or check-out.
management, reçonfiguration -
A procedure that manipulates the organization of system resources such that the system can continue to perform useful work when one or more system elements are unavailable to the user.
margin selection, maintenance software -
The capability withın maintenance software to run with hardware margins selected either manually or under control of the maintenance software.

MLT - Mean Lost rime -
The average lost time per interruption or class of interruption over the time period being measured.
modularity, spares -
Packaging replaceable hardware sub-assemblies in a manner that minimizes the sum of per unit manufacturing cost plus, field replacement costs.
modularity, system -
The organızation of system elements such that logıcal functions with specrfaed interfaces can be easily distinguished and, when necessary, logically and physically isolated from one another.

MTBF - Mean Time Between Failures -
The average time from the start of one fault or failure to the start of the next. The specific time base used is to be indicated in the context of the MTBF usage.

MTBF 15 Mean Time Before Failure in some reliability prediction equations. In this usage MTBF is the average time from the end of a fallure to the beginning of the next fallure. This meaning of MTBF is not used in the RAM standards. When non-operable time is only a few percent or less of the total time, then the two uses of MTBF are approximately equal.

MTBF, Inherent -
An MTBF number derived from component failure rates (anticipated stress levels should be considered). No considerations are made for possible poor design or poor manufacturing or inadequate service.

MTBI - Mean Time Between Interruptions -
The average time from the start of one interruption to the start of the next. The specific time base used is to be indicated in the context of the MrBI usage.

| ED | SYSTEM STANロAFD |  |
| :---: | :---: | :---: |

```
MTBSC - Mean Tame Between Service Call -
    The average time between service calls initiated by customer request (1.e., system
    fallures, service call backs, misuse, media caused fallures, etc., excluding P.M.
    and installation of modifications); ootained by dividing total time by the number
    of service calls.
MTTR - Mean Time to Repalr -
    The overall average time It takes to diagnose a machine fault, repair it, and
    adequately verify the operation after repair; obtained by dividing the total
    unscheduled repair time by the number of repair incidents. This calculation
    average does not include associated repair, travel tmme or wait time.
product -
    A hardware, software, or supply item that is saleable to a customer.
product set -
    The complement of software (excluding the Operating System) supplied by the vendor
    for use by the customer in writing application programs, storing and manipulating
    data, e.g., language processors, SORT/MERGE, Data Management.
program, application -
    Software written by a user or supplied by a vendor to solve a particular problem
    related to the users business, e.g., payroll, linear programming package.
program, maintenance software -
    A software program that detects and/or lsolates, that facilitates repair, that
    aids in adjustment, or that confirms repair of a hardware failure.
PSR - Program System Report -
    A report used by CDC software users to report software failures and errors. PSRs
    are classified into categories of criticality. They are critıcal, urgent, serious,
    mmor, and informational. (Similar to TAR for Hardware, firmware and controlware).
PSR, critical -
    A PSR category where the reported fazlure results in frequent (1 or more per day)
    system downs and/or a major project stalled through software problems.
PSR, informational -
    A PSR category in which errors in comments, coding techniques, or documentation are
    reported. Code change is not required.
PSR, minor -
    A PSR category where reported failures result in inconsistencies or irregularities
    that require a code correction. (The category refers only to the urgency of the
    need for software maintenance). Items of inconvenience or of minor or local
    consequence should preferably be in this category.
PSR, serious -
    Problems that definitely need to be fixed at once, but for some reason are below
    urgent or critical. For example, a PSR belongs in this category if the problem can
    be circumvented, if a local or temporary fix is available, or if it ls an urgent
    problem that only occurs rarely or under unusual curcumstances.
PSR, urgent -
    Regular system crashes (more than l per week); substantial user difficulties. High
    'probabılıty of serious problems (such as bugs in error recoverıes, etc.).
RAM -
    Rellabllity, Avallability and Maintainabilıty.
```



## recovery, data error -

The process of amending or sepeating a data transfer, store, read or manipulation, which resulted in a data error, to produce the correct result.

```
release, field -
```

A. term used in the RAM standards to indicate the point in an item's Infe when it has been certified.

## reliability -

The characteristics of an item expressed as the probability that it will perform a required function under stated conditions for a stated period of time.
repair, associated -
The repair of a replaceable module, subassembly, or product after it has been replaced by a like module in the user's system. Example, the action of repairing spare parts after the product has been returned to service.
software, fail-soft mazntenance -
Maintenance software which is an integral part of the operating system software and other system software and which provides failure management capabılitıes; $1 . e$. , dynamic hardware failure detection, error logging or recovery activitles.
software, hardware checkout -
Software designed for and used only during the engineering or manufacturing checkout of various hardware devices. It may be required when the checkout requirements cannot be satisfied by Hardware Design Verification Software.
software, hardware design verification -
Software designed for and used in the process of hardware or microcode design verification testing. It may be required where design verification requarements cannot be met by off-1ine Maintenance Software.

## software, in-line maintenance -

Maintenance software designed for use in field maintenance of hardware which operates within a subsystem independent of the operating system and which may be used concurrently with customer use of the subsystem.
software, maintenance -
Any computer program code and assocsated documentation, used for maintenance of released products, that detects and/or isolates failures, facilitates reparr, alds in adjustment, and confirms normal operation of hardware.

```
software, off-line maintenance -
```

Maintenance software Gesigned for use in the field maintenance of hardware and which does not operate concurrent with customer operations.
software, on-line maintenance -
Maintenance software designed for use in field maintenance of hardware and which operates under control of the operating system concurrent with customer operations.
subsystem -
An organized collection of hardware together with any necessary software, controlware, and/or sirmware components operating within a system and performing functions assigned to it by the system. For example, a collection of tape devices, controller, controlware, and software devices $1 s$ a magnetic tape subsystem. A processor and memory without the coded instructions necessary to process data would not be a subsystem.
system, computer -
An organized collection of interrelated software and hardware products, accessories
and parts that are directly interconnected and contain only one mainframe under control of a single copy of an operating system and is designed to perform data processing functions.
systems, network of -
An organized connection of computer systems, software and hardware products, accessories and parts interconnected or interrelated in such a manner as to perform data processing functions.

| C5 | SYSTEM | $\left.\right\|_{\text {REV }} ^{\text {STD }} \begin{aligned} & \text { A. } 12.999 \end{aligned}$ |
| :---: | :---: | :---: |
| control data CORPORATION | STANDAFD | $\left.\right\|_{\text {DAGE }} ^{\text {DATE }}$ angust 1976 |

```
system, operating -
    Software which guides a processing system in the performance of its tasks by controlling
    the execution of computer programs and by providing support services to programs
    and programmers, e.g., scheduling, debugging, input-output control, etc.
TAR - Technical Action Request -
    A report used by CDC hardware, fixmware, and controlware users to report product
    design failures and errors. (Smmilar to PSR for software).
test, margin -
    Test performed to provide information relative to a system's (unit's) ability to
    operate under the full range of design parameters. Normally accomplished by varying
    voltages and/or frequency.
test, proauct verıfication -
    A test of a product or equipment in its operating (system) environment to determmne
    that its operation, maintainability, and reliabllity meet the design criteria. The
    test includes the use of operating system and application programs, the use of
    maintenance procedures and diagnostic programs and operation over a prolonged
    period of time. The product verification test is generally performed on a prepro-
    duction or production unit.
testers, maintenance -
    Equapment external to the system that initiates and performs fault detecinon and
    isolation and facilitates repair and adjustment.
time, actıve repair -
    The interval during whych activities occur at the user's location that are associated
    with implementing corrective or avoidance actions. Only.those actlvitues required
    to return the system or products to an operational state following the failure are
    included. Sometimes referred to as unscheduled repair time. For software this
    includes such thangs as dumping files, analysis, PSR documentation, installing
    corrective code, etc. The deferred installation of corrective code or PSR is
    considered scheduled maintenance.
tmme, actual usage -
    The interval or accumulation of Intervals during which an item is performing one
    or more of its intended functions.
time, ađministratıve and logistic wait -
    The interval during which support personnel or materials are not available.
time, calendar -
    Calendar time is the elapsed interval of time during the measurement period, expressed
    in hours, day or months, as appropriate.
time, down -
    The sum of actuve repair time, analysis time and administrative and logistrc wait
    time which takes the system down during scheduled operating tyme. The interval
    during scheduled operating time when the item is mnoperative.
time, analysis -
    The interval the user spends determining that mamntenance servace is required.
time, lost -
    The effective tume that a system is in a totally unacceptable state for productive
    work as a result of interruptions. Lost time is the actual time lost to the user
    due to total or partial loss of system processing capability plus compensatory tume
    for any reprocessing necessitated by interruptions. The following times, if
    present shall be included in the lost time calculation.
        - analysis tine
        - administratrve and logistics wait tame
        - active repair time
        - reprocessing time
```



| STD | 1.12 .999 |
| :--- | :--- |
| REV | A |
| DATE | August 1778 |
| PAGE | 9 |

Under degraded conditions, elapsed time does not represent lost time since some useful work was processed during this interval. Lost time due to degradation is the product of the degradation factor and the time the system was in that degraded conditzon -- i.e., the time between the detection of an interruption and the point in time the users work has been restored to the state it would have been, had the interruption not occurred, NOTR: time in a degrader condition need not be contiguous -- e.g., rérun may be delayed resulting in a fully acceptable productive state between the interruption and the commencing of re-run.

The interval that service personnel are not allowed access to the item needing service is not included in the lost time calculation. The effect of system recovered errors on system thruput is not included in lost time.
time, reprocessing -
The sum of restoration time and rerun time.
time, rerun -
The amount of time required to return work in process to the state that it should have been in when the failure which caused the interruption or erroneous result was detected.
time, restoration -
The amount of time required to restore the operating system, and auxiliary subsystems, because of an interruption or erroneous result. The following activities, if present, shall be included in restoration time.

- restoration of remote devices
- syster reinitialization
time, scheduled maintenance -
The dedicated system time to perform preventive and remedial maintenance and to install corrective PSRs and FCOs. Where this schecluled maintenance activity is performed concurrent, it is handled like degraded lost time with the use of a degradation factor. The installation of new features or products is not considered scheduled maintenance time.
time, scheduled operating -
The interval allocated in advance for the system to be operational for the user.
unit, accounting -
A single item in which the resources used by a job or required for a terminal session are combined including memory field length, CPU time, mass storage usage, magnetic tape usage, permanent file usage and unit record usage. The accounting unit of the NOS operating system is the System Resource unit (SRU) and its specific definition is contained in CDC document 60435700 "NOS Installation Fandbook".


## Appendix B <br> Reliability Module Failure Rates

|  | Qty. | Failure Rate <br> -6 | Total <br> (x10 $)$ |
| :--- | :---: | :---: | :---: |

II. LSI Half Pack Module

| Half Pack | 1 | 0.0412 | 0.0412 |
| :--- | :--- | :--- | :--- |
| Terminators | 3 | 0.001 | 0.003 |
| Ceramic Capacitor | 1 | 0.002 | 0.002 |
| Half Pack Connector (26 pin) | 1 | 0.048 | 0.048 |
| Solder Joints | 2 | 0.0003 | $\underline{0.0006}$ |
|  |  |  | TOTAL |
|  | $->0.0948$ |  |  |

III. LSI Board Module

| Coax Connections | 1,260 | 0.0028 | 3.528 |
| :--- | ---: | :--- | ---: |
| Vias | 18,500 | 0.00005 | 0.925 |
| Ceramic Capacitors | 330 | 0.002 | 0.660 |
| Solder Joints | 660 | 0.0003 | $\underline{0.198}$ |
|  |  |  |  |
|  |  |  |  |
|  |  |  |  |

IV. Frook Module

| F100K Device | 1 | 0.0240 | 0.024 |
| :--- | ---: | :--- | ---: |
| Terminators | 3 | 0.001 | 0.003 |
| Vias | 30 | 0.00005 | 0.0015 |
| Solder Joints | 24 | 0.0003 | 0.0072 |
|  |  |  |  |
|  |  | TOTAL | $->0.0357$. |

V. Average Auxiliary Board Module

Edge Connectors 338
Ceramic Capacitors 150
Solder Joints 300
F100K/Board (weighted avg.) 45

| 0.002 | 0.676 |
| :--- | :--- |
| 0.002 | 0.30 |
| 0.0003 | 0.09 |
| 0.0357 | 1.607 |

TOTAL -->2. 673

| Failure Rate | Total |
| :---: | :---: |
| Qty. $\quad\left(\times 10^{-6}\right)$ |  |

VI. Bus Board Assembly

|  | 150 | $0.0-14$ | 2.10 |
| :--- | :--- | :--- | ---: |
| Tantalum Capacitors | 300 | 0.0003 | $\underline{0.09}$ |
| Solder Joints |  |  |  |
|  |  | TOTAL |  |
|  |  |  |  |

VII. RAM Module (4096-bit ECL)

| RAM Device | 1 | 0.07 | 0.07 |
| :--- | ---: | :--- | ---: |
| Solder Joints | 18 | 0.0003 | 0.0054 |
| Vias | 22 | 0.00005 | 0.0011 |
| Connector Pins | 0.94 | 0.002 | $\underline{0.0019}$ |
|  |  |  | TOTAL |
|  |  |  |  |

Appendix C
Component Elemental Failure Rates
Used in FMP Evaluation


## COMPONENT PIECE PART FAILURE RATES

## 

| STD | -1.12 .020 |
| :--- | :--- |
| REV | A |
| DATE | August 1978 |
| PAGE | 1 of 12 |

PREDICTING
RELIABILITY, AUAILABILITY AND MAINTAINABILITY PaRAMETER VALUES IN HARDWARE AND SOFTWARE
2.0 SCOPE
1.1 purpose - This standard defines methods of establishing Reliability, Avarlability and Maintainability (RAM) parameter values required by CDC Std I.12.006 - Specifying and Measuring RAM, by using reliability prediction technmques. The use of the types of predıctions descrıbed will provide a consistency and commonality for evaluating the predicted RAM performance of a product during its evolution. Use of the rates and factors contained in Appendıces $A$ and $B$ will provide a common base of design information for performing $M T B F$ predictions.
1.2 ipplicability - Thas standard applies to all products intended to be offered for sale or lease by CDC unless specifically excluded by customer contractual conditions. The use of "products" in this document refers to modules, equipments", software, products, suivsystems and systems.
1.3 Effectivnty - This standard is effective immediately upon its release. This standard supersedes Standard Bulletin DO03.
1.4 Authority - The enforcement of this standard is in accordance with CDC Polıcy 10:04:00. The interpreting authority Eor this standard is the General Manager, CDC Technical Standards.
2.0 APPLICABLE DOCUMENTS
2.1 Referenced Documents
CDC-POLICY 10:04:00-CDC Technical Standards
CDC-STD 1.12.006 - Specifying and Measuring RAM (Not yet released)
CDC-STD 1.12.999 - Glossary oE RAM Terms (Not yet released)
MIL-HDBK-217B - Reliability predzction of Electronic Equipment
4IL-STD 756A - Reliability Prediction
CDC-Tech Memo 19 - Reliability Growth Prediction Procedure
Proceedings of 1968 Symposium on Reliability
2.2 Related Documents
CDC-Tech Memo 6 - Relıability Goals
MIL-HDBK-472 - Maintainability Predaction
CDC Pub-60435200 - Investment Deciszon Model
3.0 GLOSSARY
Refer to CDC-STD 1.12.999 - Glossary of RAM Terms.

### 4.0 REQUIMEMENTS

Expected RAM parameter values, when specrfied in the following controling documents, shall be determaned (predicted) using the types of predictions and fallure rates established by this standard (See Figure 1). Also, where predictions are required as a part of a design evaluation or system evaluation, the types established by this standard shall be used so consistency is maintanned in comparative situations. Note: Design and Release predictions are only applicable to hardware MTBF predictions. (See Expository Remarles no. 2 anc no. 3)

SYSTEM
STANDA마
REV A
DATE August 1978
PAGE
2


Figure 1 - TYPES OF RAM PREDICTION ASSOCIATED WITH VARIOUS CONTROLLING DOCUMENTS

4.1 Strategy and Marketing Documents - Reliability parameters to be specified in Strategy
and Marketing Requirement Documents shall be based on a Market Analysis. (See 6.1 )
4.2 Design objectives Documents - Reliability parameters to be specified in Design objectives
Documents shall be based on a tradeoff between capabilityes of the design and the
market needs as exprcssed in the strategies and marketing requirements. The RAM
values specifiedin Design objective documents will give priority to market needs
versus capabilities of design. (See $6.1,6.2$, and 6.3 )
4.3 Desıgn Requirements Documents - RAM parameters to be specisied in Design Requirements Documents shall be based on the design and/or extrapolated predictions (see 6.2 and 6.4).
4.4 Engineering Specifications Documents - RAM parameters to be specified in Engineering Specification documents shall be design or release predictions. Should the design or release RAM parameter values not meet the Design Requirement values (see 6.2 and 6.4), a decision must be made to either hold the manufacturing release and contanue the design activity or change the Design Requirement values to agree with the release values by formal $D R$ revısion and approval.
4.5 Certification - MTBF predictions used for validating hardware for release shall be Release Predictions. (See 6.5)

### 5.0 RESPONSIBILITIES

As with other standards, responsibility for implementation and enforcement rests with division management. Responsibility for updating failure rates and application factors is defined in the last paragraph of the Preface to Appendix A.

### 6.0 PROCEDURE

6.1 Market Analysis - The market analysis approach of establishing a RAM requirement is based on an analysis of market need and competition. Analysis of CDC and competitive RAM performance trends and expected technological advancements are also to be used in this prediction.
6.2 Extrapolated Predictions - Extrapolated predictions are projections based on historical data on smmilar Control Data and competitive products. Known RAM data on similar or predecessor products and the growth characteristic of such data are used to extrapolate or "predict" the RAM values for the proposed product. The extrapolated prediction is not based on a compilation of component/part failure or repair rates.
6.3 Allocated Predictions - Allocated predictions are RAM requirements assigned to indivadual products to attain a desired overall system RAM. These types of predictions are suitable where the overall system RAM and some product RAM requirements are specified and the remaining product $R A M$ requirements are to be determined.
6.4 Design Predictions - (Applicable only to hardware MTBF predictions) Design predictions are based upon a design strategy as represented by a reliability block diagram. Design predictions produce inherent MTBF values which must be translated into expected observed predictions by use of $X$ factors. The specific $K$ factor used and the rationale for its selection should be documented as part of the prediction. (See Expository Remark 1) The procedure, using MIL-STD 756A - Reliability Prediction, as a guldeline for preparing a design prediction is as follows:
6.4.1 Product Definition - The product for which the prediction is being made is defined in temms of:

- functional and physical boundaries
- conditions which constitute failures
- conditions under which the product is to operate
- required maintènance conditions
6.4.2 Rellability Block Diagram and Reliability Model - From the descriptions of the product definition, above, a reliabılity block diagram is constructed. Each block of the diagram is identified and any assumptions and simplifications are clearly stated. A mathematical equation (model) is derived based upon the relationships described by the block diagram. (see 6.4.4 for applicable assumptions)



# APPENDIX A <br> COMPONENT/PIEEE PART FAILURE RATES 

## PREFACE

The fallure rates listed in the followang table reflect current capabilitires of the individual components/piece parts under nominal stress levels. These failure rates are to be used in predictions as required and discussed in the prediction standard to which this appendix is attached. Prior to the use of the following tables, it is strongly recommended that the preface be read in full in order to obtain a clear understanding of the basis and underlying assumptions to the failure rate data.

## Fallure Definition/Units

When using the enclosed failure rates, note that the term "failure" is defined as an open, short, or parameter change greater than specified tolerance. These rates are based primarily on solid Eailures and do not necessarily include the effect of intermittents and transients. The failure rates are inherent fallure rates for each generic part type. The term "inherent" is defined to mean the relability that will be observed on a mature component in a mature application. Both the component and the application have had sufficient power-on time to have passed "Infant mortality". When calculating observed failure rates to compare to these $u n h e r e n t$ numbers, the calculation should be done to a $60 \%$ confidence level.

The units of measurement for fallure rate is "failures per million hrs."

## Stress Levels

The inherent fallure rates are defined for nominal stress conditions. Assumptions include a junction temperature of $45^{\circ} \mathrm{C}$ and unless otherwise specified all semiconductor packages are hermetically sealed. All components are assumed to be in a ground benign environment which is defined by nearly zero environmental stress with optimum engineering operation and maintenance.

## Source of Data

The fallure rate source codes are $A-C D C$ data; $B$-Other Manufacturer's data; C-Component Industry data; D-Defense (MIL-HDBK-217B)/NASA; and E-Engineering judgment. They are listed in their order of precedence. The most accurate data applicable to the types of components and equipment that $C D C$ uses and produces 15 , naturally, data from existing CDC equipment. Military data, generally belng compiled from environments and equipment different from those of CDC and manufacturers of equipment similar to CDC's carries somewhat less weight. When no data exists to support an anherent failure rate for a component, an engineering judgment must be made. It is based on a reliability comparison with a component which has a known failure rate. Factors which influence this comparison include electrical complexity, power dissipation, technology employed with its associated strengths and weaknesses, and materiais.

## Updating Responsibll2ties

As a result of manufacturers continuously improving their products (components/piece parts) and users becoming more sophisticated in the application of those products, a continual change in the failure rates of those components/piece parts can be expected. In order to stay abreast of these changes, 1 t's necessary to 1 mplement a mechanism for providing periodic updates to this appendix. The primary input for this mechanism will be the users of the data contained herein. All such users are strongly urged to submit recommended changes which they believe would improve the validyty of the tables' contents. Such change should be sent to CDC Technical Standards in care of J. E. Mıkkonen, HQWllH, with a discussion of the recommended changes and supportive data on the change. No more frequent than quarterly, all such changes will be reviewed by Relıability Engineers from varlous CDC divisions. At the completion of a successful review process, an updated fallure rate table will be published and distributed.


## STETENM $\Leftrightarrow T$ ANNWAPPロ

STD 1.12 .020
REV A
DATE August 1978
PAGE 6

## COMPONENT/PART FAILURE RATES

| Component/Part Description | $\begin{aligned} & \text { Fallures } \\ & \text { Per Million Hours } \end{aligned}$ | Source Code | Change Date |
| :---: | :---: | :---: | :---: |
| Section 1 Microcircuits |  |  |  |
| ECL LOK SSI | . 01 | A/C | 1/3/77 |
| ECL 10 K MSI | . 01 | A/C | 1/3/77 |
| ECL 10K Transmitters/ Receivers/ Interface Circuits | . 02 | A/C | 1/3/77 |
| ECL 10 K |  |  |  |
| 10101 | . 0094 |  |  |
| 10102 | . 0094 | D | $3 / 7 / 77$ $3 / 7 / 77$ |
| 10105 | . 00080 | D | $3 / 7 / 77$ $3 / 7 / 77$ |
| 10109 |  |  |  |
| 10110 | . 0065 | D | 3/7/77 |
| 10114 | .0065 | D | 3/7/77 |
| 10117 | .0094 | D | $\begin{aligned} & 3 / 7 / 77 \\ & 3 / 7 / 77 \end{aligned}$ |
| 10121 | . 0094 | D |  |
| 10124 | . 0094 | D | 3/7/77 |
| 10125 | . 0094 | D | 3/7/77 |
| 10330 | . 0312 | D | 3/7/77 |
|  | . 012 | D | 3/7/77 |
| 10133 | .013 | D | 3/7/77 |
| 10135 | . 019 | D | 3/7/77 |
| 10136 | . 035 | D | 3/7/77 |
| 10141 | . 032 | D | $3 / 7 / 77$ $3 / 7 / 77$ |
| 10145 | . 01 | D | $\begin{aligned} & 3 / 7 / 77 \\ & 5 / 15 / 78 \end{aligned}$ |
| 10160 | . 015 |  |  |
| 10161 | . 017 | D | 3/7/77 |
| 10165 | . 017 | D | 3/7/77 |
| 10166 | . 023 | D | $3 / 7 / 77$ $5 / 15 / 78$ |
| 10173 | . 021 |  |  |
| 10176 | . 018 | D | 5/15/78 |
| 101.79 | . 017 | D | 3/7/77 |
| 10181 | . 045 | D | 3/7/77 |
| 10192 | . 03 | D | 3/7/77 |
| 10800 Mrcro Processor | . 15 | D/E |  |
| 10803 Interface Memory | . 83 | D/E | $\begin{aligned} & 11 / 10 / 77 \\ & 11 / 10 / 77 \end{aligned}$ |
| ECL 10 K Rams |  |  |  |
| 256 bits | . 035 | C/D |  |
| 1024 bits | . 07 | C/D | 1/3/77 |




STD 1.12.020
REV . A
DATE August 1978
PAGE 9
$\left.\begin{array}{cccc}\begin{array}{c}\text { Component/Part } \\ \text { Description }\end{array} & \begin{array}{c}\text { Faillures } \\ \text { Per Million Hours }\end{array} & \begin{array}{c}\text { Source } \\ \text { Code }\end{array} \\ \text { Sectaon } 3 \text { Resistors }\end{array}\right)$

## Section 4 Capacitors

Ceramac
Electrolytic, Aluminum
Electrolytic, paper
Mica dipped
Mica molded
Mica button
Mylar
Paper/plastic
Paper/Plastic
Tantalum
Variable Air

Section 5 Inductave Devices



## Section 6 Connectors/Connections

Edgeboard Connector

Mainframe Environment
Peripheral Environment
PC board conn. (3500 style)
PC board conn. ( 6000 style)
PC board conn. (7000 style)
Conn. Pins, Cable Connector
coax Connector (includes inner and outer contact)

Power Connector
DIP Sockets, gold
single contact
dual contact
Multuple contact
Solder Joints
Plated thru hole $\quad .00015$
Surface/lap
Surface/lap
Non-plated thru hole
Other hand solder
Taper Pins
Wire wraps
Crimp Joints
.002 per pin
.006 per pin
.0036 per pin
.002 per pin
.0017 per pin
.00013 per pin
. 0014
.002 per pin
.003 per pin
.002 per pin
.001 per pin
.00044
.0044
.00017
.0000037
.000132

A 1/3/77
1/3/77
$1 / 3 / 77$
1/3/77
1/3/77

1/3/77
1/3/77
1/3/77
1/3/77
1/3/77

1/3/77
1/3/77
1/3/77
1/3/77
1/3/77
1/3/77

Section 7 Refrigeration and, Cooling

| Regulator, Hot Gas Bypass | 2.650 |
| :--- | ---: |
| Valve, Water Regulating | 2.650 |
| Valve, Expansion | .589 |
| Valve, Angle, Refrigeration |  |
| Valve, Solenoid (MB1452) | 1.990 |
| Valve, Solenold (MB952) | 1.990 |
| Condenser | 2.650 |
| Compressor, 2-Ton | 1.330 |
| Filter, Drier | .300 |
| Fitting, Fusible Half-Union | 2.650 |
| Gauge, Pressure | 1.300 |
| Control, Dual Pressure | .320 |
| Eliminator, Vibration | .039 |
| Joints, Flare | .040 |
| Jolnts, Threaded | .040 |
| Quick Disconnect | .800 |


| $A / E$ | $5 / 15 / 78$ |
| :--- | ---: |
| $A / E$ | $5 / 15 / 78$ |
| $A / E$ | $5 / 15 / 78$ |
| $A / E$ | $5 / 15 / 78$ |
| $A / E$ | $5 / 15 / 78$ |
| $A / E$ | $5 / 15 / 78$ |
| A/E | $5 / 15 / 78$ |
| $C$ | $5 / 15 / 78$ |
| $A / E$ | $5 / 15 / 78$ |
| $A / E$ | $5 / 15 / 78$ |
| $C$ |  |
| $C$ |  |
| $C$ |  |
| $C$ |  |
| $C$ |  |




## APPENOIX B

COMPONENT APPLICATION FACTORS


#### Abstract

The standard component failure rates in Appendix A are established in consideration of standard application environments. Application in either a more relaxed or more severe operating environment normally will affect the failure rate. phe presentation of this data is that tie majority of components in any given equapment will be operated at or below these conditions. New input to keep this data current with the state of the art is solicited for consideration $1 n$ future revisions.

General - Cooling Air Temperature: $25^{\circ} \mathrm{C}$ Sericonductors


Integrated Circuits

| Operating Junction Temperature | $45^{\circ} \mathrm{C}$ |  |
| ---: | :--- | ---: |
| Operating voltage | $=$ Digital Circuits | $\div 5 \%$ of mfg. nominal |
|  | - Linear Circuits | $75 \%$ of maximum rating |

Transistors, Silicon
Operating Junction Temperature $\quad 25^{\circ} \mathrm{C}$ below maximum ratang Voltage and Current Ratings 75 of maximum rating

Diodes \& Rectifiers, Silicon
Operating Junction Temperature $\quad 25^{\circ} \mathrm{C}$ below maximum rating
Voltage and Current Ratings $75 \%$ of maximura rating
Resistors
Carbon Compositıon; Carbon Film; Metal Film; and Wire Wound, Power

| Power Dıssipation | $50 \%$ of maximum rating |
| :--- | :--- |
| Operatang Voltage | $75 \%$ of maximum rating |

## Capacitors

Ceramic, Glass, Paper, Mylar, Mica
Opexating Voltage
$75 \%$ of max2mum ratang
Electrolytic

```
Operating Voltage
Ripple current Effect
```

```
90% of working voltage
```

90% of working voltage
maxamum temperature rise of }15\mp@subsup{5}{}{\circ}\textrm{C
maxamum temperature rise of }15\mp@subsup{5}{}{\circ}\textrm{C
above ambient

```
    above ambient
```

Tantalum
Operating Voltage $\quad 75 \%$ of working voltage

Transformers \& Inductors

| Operating Temperature | $\quad 15^{\circ} \mathrm{C}$ below maximum insulation rating |
| :--- | :--- |
| Voltage Rating | $75 \%$ of working dielectric rating | voltage Rating $\quad 75 \%$ of working dielectric rating

Fuses
Operating Current $75 \%$ of nominal rating
Switches \& Relay Contacts (other than dry circuit conditions)
$75 \%$ of nominal rating

FMP Logic Unit Elemental Failure Rates
I. Scalar

| Modules | Quantity | Failure Rate |
| :---: | :---: | :---: |
| Auxiliary Boards | 45 | 121.19 |
| LSI Boards. | 16 | 84.98 |
| LSI Arrays | 1967 | 403.24 |
| Half Packs | 17.0 | 16.12 |
| Clock Oscillator | 1 | 4.97 |
| Bus Board Assemblies | 16 | 35.04 |
|  |  | 665.54 |

II. Swap

Modules
ISI Boards
LSI Arrays
Bus Board Assemblies

Quantity
272
2
III. Intermediate Map

Modules
Auxiliary Boards
LSI Boards
LSI Arrays
Bus Board Assemblies
Quantity 10

2
204
2

Quantity
10
3
304
3

|  | Failure |
| :---: | :---: |
|  | -26.73 |
|  | 15.93 |
|  | 69.70 |
|  | 6.57 |
| TOTAL --> | 118.93 |

V. Memory Interchange

| Modules | Quantity |  | Failure Rate |
| :--- | :---: | :---: | :---: |
| LSI Boards | $\frac{12}{63.73}$ |  |  |
| LSI Arrays | 1632 | 334.56 |  |
| Bus Board Assemblies | 12 |  | $\underline{26.28}$ |
|  |  | TOTAL --> | 424.57 |

VI. Vector Streaming

Modules
Auxiliary Boards
LSI Boards
LSI Arrays
Bus Board Assemblies

## VII. Vector

Modules
Auxiliary Boards
LSI Boards
LSI Arrays
Bus Board Assemblies
VIII. Streaming Control

Modules
Auxiliary Boards
LSI Boards
Bus Board Assembly


| Quantity | Failure Rate |
| :---: | :---: |
| 10 | 26.73 |
| 4 | 21.25 |
| 476 | 97.58 |
| 4 | 8.76 |

TOTAL --> 154.32

| Quantity |  | Failure Rate |
| :---: | :---: | :---: |
| 200 |  | 534.6 |
| 45 |  | 239.00 |
| 4760 |  | 975.80 |
| 45 |  | 98.55 |
|  |  |  |
|  | TOTAL $\rightarrow->$ | 1847.95 |


| Quantity |  | Failure Rate |
| :---: | :---: | :---: |
| 20 |  | 53.46 |
| 1 |  | 5.31 |
| 1 |  | 2.19 |
|  | TOTAL --> | 60.96 |

IX. I/O

Modules
Auxiliary Boards
LSI Boards
LSI Arrays
Bus Board Assemblies

| Quantity |  | Failure Rate |
| :---: | :---: | ---: |
| 20 |  | 53.46 |
| 4 |  | 21.25 |
| 408 |  | 83.64 |
| 4 |  | 8.76 |
|  |  |  |
|  | TOTAL |  |
|  |  | 167.11 |

## Unit Functional Failure Rate

A unit which has fault correcting capability will have a functional failure rate different from its elemental failure rate (See appendix A, CDC-STD 1.12.999
Glossary of Reliability, Availability, and Maintainability Terms).

The functional failure rate for the unit will be the sum of the elemental failure rate of that portion not included within the fault correcting part plus the functional failure rate of the fault correcting part.

To determine a unit's functional failure rate, the elemental failure rate is first computed by summing the products of the part type failure rates times the number of parts of each type (see appendix E). From this is subtracted the elemental failure rate of the fault correcting part of the unit. To the remainder is added the functional failure rate of the fault correcting part as determined by the SECDED model. Three examples of these computations are given here. It should be noted that scheduled maintenance is assumed to be once per week. for all units except Backing Store, which is assumed to be once per day.

Backing Store Unit (262K-bit array)

| Component | Quantity | Failure rate |  |
| :---: | :---: | :---: | :---: |
|  |  | Unit | Total |
| Storage Device | 36,864 | . 926 | 34,136.06 |
| Storage Device Connector | 36,864 | . 048 | 1,769.47 |
| TTL Support Circuits | 14,400 | . 02 | 288.00 |
| Board Connectors | 144 | . 284 | 40.90 |
| Capacitors | 11,520 | . 014 | 161.28 |
| Solder joints | 1,107,856 | . 00015 | 166.18 |
| Vias | 1,254,628 | . 00005 | 62.73 |
| Unit Total El | ntal Failu | ate | 36,624.62 |

The configuration of the Backing Store is four data bits per replaceable module (board) and it is so arranged that each data bit is in a different SECDED sector. Because of this arrangement, the total failure rate of the board is divided into four parts, each essentially totally corrected by SECDED. The elemental data bit failure rate used in the SECDED model is the . total elemental failure rate divided by four times the number of boards (1-44) in the memory. This yields a SECDED sector
elemental data-bit failure rate of 63.58 failures per 10 hours.

$$
6-F-1
$$

The SECDED arrangement of the memory is eight sectors in parallel, each sector having a single rank of 72 data bits. The SECDED model derives a failure rate of 230.7 so the total Backing Store has a functional failure rate of 8 times this or . 1845.6 and a functional MTBF of 541.8 hrs .

Main Memory Unit

Failure rate

| Component | Quantity | Unit | Total |
| :---: | :---: | :---: | :---: |
| Storage Device | 159,744 | . 078 | 12,380.2 |
| F100K Device | 29,120 | . 0357 | 1,039.6 |
| Memory Interface | (1) |  | 302.2 |
| Cabinet | (2) |  | 576.5 |
| Elemental Failure Rate |  |  | 14,298.5 |
| Less Storage Elemental Failure Rate |  |  | 12,380.2 |
| Remainder |  |  | 1,918.3 |
| Storage Functional Failure Rate |  |  | 6.0 |
| Main Memory Functional Failure Rate |  |  | 1924.3 |
| Main Memory Functional MTBF |  |  | 519.67 |

The Main Memory Unit will utilize the CYBER 203 memory interface with new storage units which are now in development. Therefore all but the storage device elements are estimates based on preliminary design configurations.

1) The failure rate for the interface unit is that determined -6
for the CYBER 203 ( $254 \times 10$ ) plus the failure rate of 156 extra LSI devices (48.2 x 10 ).
2) The failure rate for the cabinet (power connections, filter capabilities, etc.) is twice that of the CYBER 203 since the Main Memory will use twice the number of storage devices as for a 1 million word CYBER 203 using 1 K chips. (As mentioned before, a new memory chassis is currently being designed, thus the inability to count the expected number of memory chassis for the FMP.)

## Transfer Units (Main Map, Memory Interchange, Vector Streaming)

Main Map Unit elemental failure rate 118.3
Memory Interchange elemental failure rate . 424.6
Vector Streaming Unit elemental failure rate 153.7
Total elemental failure rate 696.6
The transfer path elemental failure rate is based upon the following assumptions:

1) The transfer path is made up of 14 devices in series.
2) There are 6 parallel data transfer bits per device.
3) 0.8 of a device comprise the transfer paths; the other 0.2 is in control logic (not corrected by SECDED).
4) There are 16 39-bit SECDED units comprising the total transfer path (512 information bits).

The equivalent failure rate of a transfer bit within a device is one sixth of 0.8 of the LSI device module failure rate plus the proportionate failure rate of coax connections and vias. (See appendix B, LSI Board Module. Use the failure rate of coax and vias divided by 150 -- the number of LSI devices per board.)
device bit failure rate $=(0.205+0.030) 0.8 / 6=0.0313$
The total bit transfer path failure rate is 14 times this value $=0.439$. (This value is used in the SECDED model.) There are 39 bits in a SECDED sector, and 16 sectors make up the width of the transfer path.

Transfer path total elemental failure rate $=0.439 \times 39 \times 16=273.7$
The functional failure rate of the transfer path is calculated from the SECDED model with one rank ( $n=1$ ) and an elemental failure rate of 0.439. This calculation yields a failure rate of 2.73 .

The Transfer Units functional failure rate is:
Total elemental failure rate 696.6
Less the Transfer Path elemental failure rate - $\frac{273.7}{422.7}$ Remainder
422.9
$\begin{gathered}\text { Plus the } \operatorname{Transfer} \text { Path functional failure rate } \\ \text { Transfer Path functional failure rate }\end{gathered}+\frac{2.7}{425.3}$

In a similar manner, the functional failure rates of the other transfer paths are computed using the following assumptions:

1) The Swap Unit transfer path is 512 data bits wide and is composed of eight 72-bit (including check bits) wide SECDED units in parallel. The SECDED unit is three LSI devices deep (that is, the transfer path has three devices in series) but for model computation purposes it is treated as a single rank with each device having a failure rate of three times that of a single LSI device.
2) The Intermediate Map Unit transfer path is 256 data bits wide and is composed of four $72-b i t$ wide SECDED units in parallel. The transfer path has three devices in series but for model computation purposes it is treated as a single rank with each device having a failure rate of three times that of a single LSI device.
3) A translation from a 72-bit SECDED sector to two 39-bit SECDED sectors, and vice versa, is considered to take place at the interface of the Intermediate Map Unit and the Main Map Unit. The translation is accomplished by check and generation circuits.

The SECDED Model
The model for computing the functional failure rate of a SECDED unit is developed from the basic reliability formulas:

$$
1=R+Q \text { and } R=e^{-\lambda t}
$$

where $R$ is the probability of success, that is, the probability of no failures; $Q$ is the probability of encountering a failure; $\lambda$ is the failure rate of an element; and $t$ is the time interval in question. The probability of success or failure for a rank of $c$ elements is

$$
(R+Q)^{c}=R^{c}+c R^{(c-1)} Q+\frac{c(c-1) R^{(c-2)} Q^{2}}{2!}+\ldots+Q^{c}
$$

Since the first term is the probability of no failures occurring and the second term is the probability of exactly one failure occurring (which is correctable and therefore not a functional failure), the probability of no functional failures in the rank of elements within a SECDED sector of $c$ bits is

$$
R^{c}+c R^{(c-1)} Q_{Q}
$$

The probability of no functional failures occurring within a SECDED unit of $n$ ranks is

$$
P=\left(R^{c}+c R^{(c-1)} Q\right)^{n}
$$

This equation is solved arithmetically with the values for $\lambda$
$-\lambda t$
and $t$ (in the equations $R=e^{-}$and $Q=1-R$ ) set to the
failure rate of the component and the maintenance interval, respectively.

If $P$ is the probability of no failures in a SECDED unit and $F$ is the functional failure rate of the unit, then

$$
\begin{array}{r}
P=e^{-F t}=1-F t(\text { for } F t<0.05) \text { and } \\
F=\frac{1-P}{t}
\end{array}
$$

(Throughout this study the worst case condition of a whole chip failing has been used for memory failures since the predominant modes of partial chip failure are not conclusively known. If it. is desired to consider partial chip failures, the component failure rate should be multiplied by the value of the average or major mode of partial chip failure and the number of ranks divided by that value.)

The functional availablity-reliability of the NASF system can be determined for a user if the following use and system parameters are known.

1. Run or use time of the user program.
2. The system components required by the user program and the portion of time the program is "in" each component.
3. The amount of time of system overhead (e.g. operating system or controlware) for each of the programmable components.
4. The functional failure rate and mean down time (MDT) for each system component. The MDT for a non-redundant component is its MTTR. The MDT for a redundant component is the switch time (between the redundant components), if the switch time is extremely small compared to the component's MTBF.

An example is developed here to show how the above information is reduced to an availability and a reliablity figure for a user. Figure 1 is the reliability configuration for a given user job or task and table 1 shows the values for the components.


Figure 1. Reliability Model Configuration

Table 1
System Component Parameters

| System Component | \|Func- tional Failure Rate |  <br>  <br> User <br> Time |  | (AFR) <br> Applicable <br> Failure Rate(1) |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 2551 | \| 541.7 | 10.1 | \| 0.1 | | 108 | 11.6 |
| CYBER 175 | \| 2725 | 10.3 | \| 0.251 | 1500 | 10.2(3) |
| PDC | 1100 | $\begin{aligned} & 0.2(2) \\ & -1(2) \end{aligned}$ | \| 0.1 | $\frac{30(2)}{20(2)}$ | $0.1(3)$ |
| FMD | 1200 | 10.15 | 0 | 30 | 10.25(3) |
| FMD Control | 125 | 10.15 | 0 | 19 | 10.1(3) |
| ECS | 11587.3 | 10.15 | 0 | 238 | 11.8 |
| FMP | 19373.8 | 10.65 | 0.051 | 6565 | 11.5 |
| 819 Disk | 1357.1 | 10.1 | 10 | - 36 | 12.2 |
| 819 Control | 100(4) | 10.1 | \| 0.1 | | 10 | $10.1(3)$ |

NOTES:
(1) Applicable failure rate is the system component functional failure rate times the sum of the user time and overhead.
(2) PDCs associated with the 175 have a 0.2 user time factor. Those with the 819 s have 0.1 user time factor.
(3) Mean down time is switch time between redundant components.
(4) The total failure rate of the dual controller is divided evenly between the two halves.

Further assumptions regarding the use of the system are:

1. Three of the four FMD's are required for system operation.
2. The probability of a failure in the tape and system. disk subsystems is negligible (0.0004) during the loading of an alternate system disk required upon a disk failure (a ten minute period of time).
3. The 819 disks have no back-up (for this particular case).
4. The PDC networks associated with the 175 s and the 819 subsystem are simplified (for ease of computation) into the configurations shown in figure 2.
5. Intuitively, it can be shown that the functional failure rate of redundant components with very short switch times is the same as the failure rate of one of the components. (A rigorous proof exists.)
6. The user time required is two hours.


Note: $\lambda$ is the failure rate for a system component.

Figure 2. Simplified Reliability Configuration

$$
6-\mathrm{H}-4
$$

The Simplified Reliability Configuration (figure 2) is derived in the following manner:

175/PDC Component

PDC failure rate $=2$ times the applicable failure rate $(A F R)=2 \times 30=60$
PDC MDT
$=0.1$
(for the network of 4 PDC's in figure 1 - see assumption 5)


175/PDC failure rate $=1500+60=1560$
$175 / \mathrm{PDC} \operatorname{MDT}=\frac{\Sigma(\mathrm{AFR})(\mathrm{MDT})}{\Sigma \mathrm{AFR}}=\frac{0.2 \times 1500+0.1 \times 60}{1560}=0.196$

## System Disk Component



Controller failure rate $=19$
Controller MDT $=0.1$
Disk failure rate $=3$ times $\mathrm{AFR}=3 \times 30=90$
Disk MDT $=0.25$
System Disk Failure rate $=19+90=109$ $\Sigma(\mathrm{AFR})(\mathrm{MDT}) \quad 19 \times 0.1+90 \times 0.25$
 $\Sigma \mathrm{AFR}$

109
819 Disk Component
819 failure rate $=36$
819 MDT $=2.2$
Controller (1/2) failure rate $=10$
Controller MDT
$=0.1$

Controller $/ 819$ failure rate $=36+10=46$
Controller/819 MDT $=\frac{\Sigma(A F R)(M D T)}{\Sigma A F R}=\frac{36 \times 2.2+10 \times 0.1}{46}=1.74$
Four controller/819 units have failure rate of $4 \times 46=184$ and MDT

PDC failure rate $=2 \times A F R=2 \times 20=40$
PDC MDT $\quad=0.1$
819 Disk failure rate $=184+40=224$
819 Disk MDT $=\frac{\Sigma(A F R)(M D T)}{\Sigma A F R}=\frac{184 \times 1.74+40 \times 0.1}{224}=1.41$

NASF user availability-reliability is derived from the overall failure rate and MDT for the system for the user job configuration. The pertinent relationships are:

1. Failure rate ( $\lambda$ ) of the system is the sum of component failure rates (see Reduced Reliability Configuration). $\lambda=\Sigma A F R$ (of each component)
2. System unavailability is the sum of each component unavailability which is the product of each component's failure rate and MDT.

$$
\lambda M D T=\Sigma(A F R)(M D T) \text { (of each component) }
$$

3. System availability is 1 minus the system unavailability.

$$
A=1-\lambda M D T \text { (of system) }
$$

4. The system reliablity (for very small $\lambda t$ ) is

$$
R=1-\lambda t \text {, where } t=\text { user required time. }
$$

For this example:

```
                            -6
System\lambda = 8804 x 10 failures per hour (MTBI = 113.6 hrs)
System \lambdaMDT = 0.0111
System Availability = 0.9889 or 98.89%
System Reliability = 1-0.0088 x 2 = 0.9824 or 98.24%
```

The operating system critical MTBF is derived on the assumption that the operating system works non-concurrently $25 \%$ of the time in the 175 s and $5 \%$ of the time in the FMP, and that ECS must be operable during this time.

| $0.25 \times 2725.0$ | $=681.25$ |
| :--- | :--- |
| $0.05 \times 9373.8$ | $=468.69$ |
| $0.30 \times 1587.3$ | $=\frac{476.19}{1626.13}$ |
| O.S. Critical Failure rate |  |
| O.S. Critical MTBF $=615$ hours $=3.7$ weeks |  |

## DIVISION 7

MAINTENANCE STUDY FOR
THE NUMERICAL AERODYNAMIC SIMULATION FACILITY

## STRATEGY ASSUMPTIONS

The maintenance strategy for the NASF system is based on the assumption that there are two categories of equipment: system critical and system non-critical. System critical devices are those which must be operational before useful work can be accomplished. They consist of the FMP, the Network Processors (2551-1), and the ECS. System non-critical devices are those redundant equipments that need not all be operational before useful work can be accomplished. They consist of all equipments except those processors listed above.

Equipment was designated system critical based on how much the loss of its function would impair the system's usefulness. Without the FMP the system could continue to cue jobs and do support processing activities, however, no jobs could be run and the system's useful output would rapidly diminish. Loss of a network processor in the system would mean half the interactive users could not access their data base. ECS is system critical because standard software for the Support Processing System (SPS) depends on this equipment to coordinate the two SPS processors.

System non-critical equipment like the two CYBER-175 processors are each capable of doing the entire SPS task during a temporary interruption in one processor or the other. Mass storage subsystems (disk and cartridge) each have redundant capabilities which eliminate the possibility of a single interruption disabling a significant portion of the system.

The operating software developed for the NASF system has to support this categorization of the hardware configuration to minimize system interruptions. Failsoft and reconfiguration capabilities need to be an integral part of the software in order to take advantage of the hardware system redundancy.

These conditions will enhance the operation and maintenance of the system.

## FIELD ORGANIZATION

The Engineering Services field organization will be operating out of a local service center by the time this system is installed. This type of organization allows efficient distribution of mobile service personnel among local installations. The service options under this organization vary from totally on-call, where the customer engineer (CE) is called when needed, to on-site coverage, where CEs are assigned to one installation.

Examination of the maintenance activity required for the NASF system indicates that optimum service can be obtained by assigning CEs to provide immediate response to interruptions which cause the system to be down during the normal work week ( 24 hours per day, Monday-Friday). This will cover $70 \%$ of the system down interruptions. The remaining $30 \%$ and all interruptions which do not cause the system to be down can be handled on-call from a local service center.

Response time for on-call service typically averages two hours or less. All of the PM and scheduled maintenance actions would be handled by the service center.

Estimated system maintenance cost for this option is $\$ 80,000$ per month. Other options are available which would increase the immediate response to interruptions from $70 \%$ to $100 \%$. This option would increase the cost by approximately $40 \%$ to $\$ 112,000$ per month. Another option would decrease the immediate response to interruption capability to $50 \%$ and decrease the cost to approximately $\$ 70,000$ per month. Initial spare parts cost for the total NASF are estimated to be \$175,000 including \$115,000 for initial FMP spare parts.

## PREVENTIVE MAINTENANCE

A program of preventive maintenance (PM) will be implemented to minimize system interruptions. There are two categories of PM: dedicated and concurrent. Dedicated PM implies that a significant portion of the system will be used for this purpose and other useful work is not practical during this period. Concurrent implies that PM will be performed while the rest of the system is doing useful work.

Weekly dedicated PM is expected for the FMP. During this period single solid Main Memory failures may be removed and diagnostics will be run with margins on the rest of the CPU. This is expected to take four hours per week.

Daily concurrent PM requiring less than an hour is expected for the FMP. This will be required to remove single solid failures from the Intermediate Memory and Backing Store.

PM on all other equipment will be performed on a unit basis concurrently with system operation. This requires that the system be reconfigured so it doesn't use the equipment on which PM is being performed. For the CYBER-175 equipmen't a 3 hour period of concurrent PM will be required weekly for each unit.

It is expected that Intermediate Memory and Backing Store solid failures will be repaired concurrently with system operation. This will require reconfiguring the memory so the system doesn't use the portion (512K) being repaired.

PM tasks such as periodic replacement of filters and measurement of voltages and waveforms will be performed concurrently without affecting the operation of the system.

## COMPUTER AIDED MAINTENANCE

All PM will be scheduled with the aid of a Computer Aided Maintenance Scheduler (CAMS) program. CAMS uses system configuration and error log informaton to optimize the PM schedule. This program can be run on any CYBER 170 system.

An error log for the FMP, the PDCs and the 819 s will be maintained on the Maintenance Control Unit (MCU) disk. The MCU will analyze this $\log$ and provide reports on errors that occurred during system operation, as well as diagnostic errors.

A similar log for the CYBER-175 systems and the peripherals attached to the SPS will be stored on one of the system FMD disk units. The CYBER-175s will provide error reports similar to the one provided by the MCU.

## MAINTENANCE SOFTWARE

Maintenance software for the FMP, Loosely Coupled Network (LCN), and attached peripherals will reside on a disk in the Maintenance Control Unit. On-line maintenance software will be available for confidence testing and to support concurrent PM, as well as for emergency maintenance activities. Off-line maintenance software will be available to run margins and to support dedicated PM.

Maintenance software for the SPS will reside on magnetic tape and its organization will be similar to the EMP maintenance software.

## LOGISTICS

Spare parts for the FMP and other system critical equipment. will be stocked on site and in Minneapolis. Other high failure rate parts may also be stocked on site. Storage space for these parts and their cabinets is estimated to require 40 square feet.

System non-critical equipment parts will be stocked at one or more of the following locations: local service center, regional warehouse, Minneapolis distribution center. Distribution and quantities. will be determined by part density and usage.

Emergency parts are generally available from the local service center in less than two hours and in less than 4 hours from the regional warehouse. Emergency orders for parts at the Minneapolis distribution center are filled within four hours, however, actual response is determined by airline schedules.

To minimize downtime and achieve the availability goals, the replaceable part will be the pluggable subassembly. To reduce costs, some of these assemblies may be repaired on site or at a local service center. The rest will be returned to a central repair facility in Minneapolis.

## TECHNICAL SUPPORT

Technical support for the FMP will be supplied by the design and manufacturing division. The primary method of support will be through remote technical assistance (RTA). The RTA capability will be implemented through the use of data communication technologies for both on-line and off-line maintenance. Through RTA the supporting engineer will be able to manipulate diagnostics from a remote console. The remote console will be connected via phone lines to the NASF Maintenance Control Unit. All of the maintenance capabilities of the MCU will then be under control of the remote console. In this way problems requiring technical assistance can be analyzed directly by the supporting engineer.

Technical support for the SPS will be supplied in a similar manner. A remote console will be connected via phone lines to a multiplexer in the CYBER-175. Through this remote link assistance can be provided to the local customer engineer. Backup support will also be supplied by regional and headquarters support groups.

MAINTENANCE, NON-CDC EQUIPMENT
Maintenance of non-CDC equipment can also be supplied in most cases. This maintenance is under control of each individual region within Engineering Services. Maintenance for each equipment is subject to local availability.

# DIVISION 8 <br> MAINTENANCE SOFTWARE ALTERNATIVES <br> FOR THE 1980s 

FOR THE 1980s

### 1.0 INTRODUCTION

Reliability, Availability, Maintainability (RAM) needs for computer systems in the 1980 s will focus on a reduction in the number of system interruptions as compared with today's systems and an overall lessening of impact on users when interruptions do occur. A requirement of one interruption per month is frequently stated for systems of the 80 s as compared to several interruptions per day.for current systems.

It can certainly be debated that it is economically feasible to reach a maximum interruption rate of one per month for medium size computer systems; however, super computer systems, designed for performance, present many more challenging problems. The key element is what the system does to modify/ease the effect on the user when an interruption arises. Suggestions are made to increase hardware/software self-monitoring, expand automatic (no manual intervention) system re-initialization and/or reconfiguration, enhance checkpoint/restart, and decrease dedicated system time needs for hardware and software maintenance.

With a decided trend toward more system time being available to the user and less system time being available for maintenance, it will be necessary to improve reliability through the techniques of fault-tolerant design; such as redundancy and self-diagnosis. These fault-tolerant techniques appear to be the key to the achievement of total on-line hardware and software maintainability so that a system would not need to be down for maintenance of any kind.

In short, computer system maintenance in the 1980 s will need to be performed 'without significant'ly dissrupting the sys'tem activity. It may be acceptable for the system and its users if response times degrade, but total unavailability of the system will not be acceptable.

Before discussing future alternatives, existing features of CDC super computer maintenance software will be addressed. The basic architecture includes a super computer, a maintenance station (control unit) as a focal point for all system maintenance, and a loosely coupled network (LCN) for I/O containing Programmable Device Controllers (PDCs).

### 2.1 General Features

The current maintenance software system supports off-line and on-line hardware maintenance. The major components of the maintenance software system are:

- Maintenance Control Unit
- CPU (Central Processing Unit) off-line diagnostics
- CPU on-line diagnostics
- PDC diagnostics
- Fault isolation
- Error logging and recovery


### 2.1.1 Maintenance Control Unit

The MCU (Maintenance Control Unit) which is built around a CYBER 18-20 Computer, provides a common user control point for all system maintenance activities. The MCU supports the following on-line and off-line maintenance activities through a local and/or remote terminal.

| On-line | Off-line | Activity |
| :---: | :---: | :---: |
| X | X | Remote Maintenance |
| X | X | Display of MCU, PDC, and CPU Memory |
| X | X | Entry of MCU, PDC, and CPU Memory |
| X | X | Monitoring and Control of MCU and CPU Maintenance Lines |
| X | X | Logging of Memory SECDED Errors |
| X |  | Logging of Operating System Detected Errors |
| X | X | PDC Autoload |
| X |  | System Initialization |
| X |  | System Recovery |
|  | X | Loading, Displaying, and Modifying CPU Microcodes |
|  | X | Control of Off-1ine CPU Diagnostics |
|  | X | Down-loading of Off-line PDC Diagnostics |

### 2.1.2 CPU Off-line Diagnostics

The CPU off-line diagnostic system supports manufacturing checkout and field maintenance of the CPU. A structured set of diagnostics are available to provide error detection and analysis. Most CPU diagnostics have a built-in test mode to assist with remedial maintenance.

The object code of each diagnostic resides on MCU mass storage. The MCU loads each diagnostic into the CPU, MCU, or PDC memory as needed. Because the CPU diagnostics require a dedicated CPU, only one diagnostic can run at one time.

CPU off-line diagnostics consist of two types: fault detection diagnostics and utility diagnostics. The former test instructions and.CPU resources, and form a testing hierarchy of the CPU. The latter support maintenance activities other than fault detection, such as test point simulators and error file analyzers.

### 2.1.3 CPU On-line Diagnostics

The CPU on-line diagostic system supports confidence testing of the CPU without shutting down the system. These diagnostics run as normal jobs. On-line diagnostics consist of all CPU based off-line diagnostics which do not execute monitor mode instructions. Also available for on-line use are test point simulators and error file analyzers.

### 2.1.4 PDC Diagnostics

The PDC off-line diagnostics test PDC components while the PDC controlware is inactive. The user can load these diagnostics from a portable maintenance console or can down-load them from the MCU to a PDC. Control of the PDC off-line diagnostics can come from the MCU or a portable maintenance console. PDC on-line diagnostics are controlled through the operating system. PDC functions are disabled while on-line testing occurs.

### 2.1.5 Fault Isolation

Isolation provides a definite means to reduce checkout time and MTTR. Fault detection is required while fault isolation is optional and depends on cost effectiveness. There are three major categories of isolation applied to diagnostics: physical isolation to the failing component, function isolation to the failing group of components, and unit isolation generally to the failing board.

Fault isolation to the failing memory chip for single solid faults is provided for all memory resources of the CPU. This includes central memory, register file, instruction stack and microcode memories. No isolation is provided for LSI arrays. Unit isolation is provided for critical units while functional isolation has rarely been used. Some unit isolation is provided in the LCN. The structure of the FMP is such as to provide very good unit isolation.

The MCU serves as a local focal point for logging of all system errors. Separate files exist on the MCU for on-line and off-line errors. Time/date data is included in the on-line error file. Central memory SECDED errors are sent by hardware to the MCU for logging. Microcode memory parity errors are detected at the MCU. All errors other than memory errors are passed by the operating system to the MCU for logging.

Central memory can be degraded and reconfigured, through manual intervention from the MCU, when a solid fatal error occurs. A failing Vector Unit can be idled and replaced by a spare through MCU action. Any other fatal system error must be analyzed and corrected before system activity resumes. There are no provisions for reconfiguration of FMP hardware other than memory and the Vector Units.

### 2.2 Summary

In the 1970 s the primary emphasis on maintenance software has been toward development of fault detection tests. Increased complexity of computer systems has led to development of the maintenance station, which is a local focal point for all system maintenance activities. With the development of complex LSI circuitry in mainframes and memory, fault isolation has begun to appear.

Maintenance software in general, and fault detection, fault isolation, error retry, and system recovery in particular, have been given low priority in design of system hardware and software.

In the 1980 s hardware, operating systems, and maintenance software must be treated equally if the number of system interruptions is to be reduced. RAM requirements must enter into system design at the earliest stage, sharing the spotlight with cost and performance.

### 3.0 ALTERNATIVES FOR THE 1980s

To meet the challenge and requirements of improved RAM for super computers of the 1980s new alternatives must be considered.
These alternatives must involve the entire system architecture, hardware, operating system, and maintenance software.

Ways must be found to minimize the mean time to detect and diagnose a system malfunction, the mean time to repair a detected malfunction, and the mean time to restart a system. Of high importance will be the ability to perform maintenance on, and repair of, a portion of the system without shutting down the entire system.

### 3.1 System Recommendations

The various levels of software and hardware must be designed with the following system goals in mind to minimize rerun time when system failures occur.

- Minimize the components required to continue processing -- i.e., minimize system critical hardware and software.
- Maximize system flexibility. Provide as much redundancy and alternate routes to accomplish the same functions as possible.
- Minimize the restart/recovery operations. Where possible, utilize the flexibility of a fall-back position by disconnecting system non-critical items and continue processing.

Actions taken for system error recovery are shown in Figure 1. Physical recovery is completely hardware dependent. All other actions involve use of on-line or offline maintenance software, and interaction from a maintenance station to recover from an error. In cases where the system is reconfigured or degraded, concurrent maintenance software will support system repair.


Figure 1. System Error Recovery

### 3.2 Hardware Recommendations

In the 1980 s computer systems will have incorporated into their designs many of the hardware techniques mentioned below. At a minimum, $10 \%$ of the system hardware will be devoted to improved RAM.

- Semiconductor memories will continue to make use of single error correction, double error detection circuitry. Check bits will be carried along with data on all major trunks. Errors will be collected at a maintenance station such that preventive maintenance and degradation or reconfiguration can occur.
- Develop hardware maintenance features for high-level self-checking of semiconductor memories under control of the MCU. Memory testing would be independent of CPU instruction testing. Memory errors would be found faster under hardware control.
- Microcode control logic will be designed such that it is a useful maintenance tool. Microcode diagnostics will use added hardware features to assist maintenance personnel in isolating failures to the replaceable circuit level as quickly as possible.
- Major control paths may be encoded in error detecting codes to provide continuous fault diagnosis while executing programs. Error codes would aid development of fault isolation techniques.
- Transient faults may be identified by error detecting codes and their effects corrected by rollback. Permanent faults may be corrected by replacement of faulty units. Replacement may be automatic or under operator control from the maintenance console.
- Fault tolerant hardware involving the use of redundancy, with replication of individual circuits or subsystems, will appear. Fault toler:ant hardware will mask the occurrence of random errors as they occur and provide error-free operation for large periods of time.
- Registers and latches in the CPU could be scanned by the maintenance station such that fault isolation to the chip is possible.
- Hardware maintenance features will appear in logic circuits, as more gates are placed inside circuits. The wafer probe and vendor testing problem will accelerate internal circuit testing needs.


### 3.3 Future Maintenance Software Development

The demands of the computer industry are for improved mean time between interruptions and a reduction in lost time. Maintenance software must be treated as a system working in conjunction with hardware and the operating system to meet these demands.

To achieve these goals maintenance software development emphasis will be placed in the following areas:

- Fault isolation
- Operational summation
- Loosely coupled network I/O
- Application of gate simulation data base
- On-line
- Error logging
- Recovery
- Concurrent maintenance


### 3.3.1 Fault Isolation

Isolation must be specified as part of the hardware design requirements. Error detection codes, microcode hardware features, multiplexing of registers to the MCU must be considered if fault isolation diagnostics to the function and circuit are to be developed.

### 3.3.2 Operational Summation

Without changes in hardware design philosophy, operational summation can go far to improve MTTR. With a status of failing and operational CPU instructions, CPU resources, and I/O resources, maintenance personnel can draw upon their knowledge of the system to localize the fault. Maintenance action would be based on operational summation or decision logic tables.

### 3.3.3 Loosely Coupled Network I/O

The MCU must be capable of testing selected PDCs and associated I/O devices concurrently with CPU testing. Communications within the I/O network should also be testable from the MCU. In view of the sophisticated protocol-between PDCs, special maintenance software considerations must be made.

### 3.3.4 Application of Gate Simulation Data Base

Hardware modeling and simulation is a requirement for super computer design; therefore, a gate model of the CPU i.s available for maintenance software applications. The following lists possible applications:

- Routing/placement information. . . the user could query the model from the MCU display.
- Simple simulations could be performed by the MCU to generate test point states based on input operands.
- LSI fault isolation. . . compare the hardware to the model.


### 3.3.5 On-1ine

The following improvements can be made to existing fault detection diagnostics:

- Reduce the risk of diagnostics failing in test condition setup, or in result verification by developing standard coding methods.
- Expand on diagnostic data base concepts where a test condition data base can be controlled from either the CPU or MCU. This method of testing should ultimately replace the existing computer command tests.
- Develop a high-level test of the basic housekeeping instructions used to control CPU based diagnostics, including mixed instruction testing. This level of testing should be controlled from the MCU.
- For the LCN, provide common maintenance software and procedures for the entire network, including super computer, front-end systems and peripherals.
- Develop common maintenance procedures for all field sites, particularly the procedure to follow when a fatal system error occurs.


### 3.3.6 Error Logging

Error messages should be organized as 4 basic types: (1) MCU detected errors (CPU hardware fault detection), (2) CPU errors detected by the operating system, (3) LCN and I/O device errors, and (4) logging of software system errors. Common error files should support error logging for multi-CPU systems.

### 3.3.7 Recovery

More emphasis should be placed on automatic recovery and error logging for deferred maintenance. Since most system errors are transient or intermittent, recovery with degradation is a viable. alternative to increase system availability. System software must be enhanced to promote automatic restart. Only fatal errors in the operating system should force emergency maintenance.

Automatic degradation and reconfiguration should be provided for central memory, I/O, and hardware pipelines. The CPU could be temporarily stopped by the MCU until degradation/reconfiguration takes place. The following degradation alternatives are offered:

- Page flawing for central memory
- Central memory degradation by physical sections
- Dynamic testing/flawing of 819 tracks, particularly on initial file creation
- Provide a spare pipeline which can be switched to an active pipe by the MCU
- Spare PDCs between the trunk and central memory


### 3.3.8 Concurrent Maintenance

Before concurrent maintenance of CPU units (memory, pipelines) can become a reality, hardware must be designed such that maintenance actions can occur on degraded units without affecting operating units.

Concurrent maintenance of degraded PDCs interfaced with the CPU from the MCU seems realistic. Concurrent maintenance of critical I/O devices would have the greatest impact but unless the system can tolerate downed devices, this alternative is. unlikely.
4.0 IMPLEMENTATION GOALS AND STRATEGY

The following goals and strategies will be pursued to improve super computer RAM in the 1980 s.

### 4.1 Goals

- For system availability, improvements to MTBI are deemed more effective than improvements to MTTR; therefore, super computers will stress improvements to MTBI.
- MTTR of less than 0.5 hours for all levels of memory.
- MTTR of less than 1.0 hour for CPU. (excluding memory) and $I / O$.
- Achieve a system MTBI of greater than 100 hours.
- Fault isolation.

| Hardware | \% of Single Solid Faults Isolatable | Isolation Level |
| :---: | :---: | :---: |
| Memories | 100\% | Board \& Circuit |
| PDC | 90\% | Board |
| LSI Logic | 50\% | 16 or less Circuits |
| LSI Logic | 90\% | Functional Unit |

- Fault detection, on-line.
-- $99 \%$ of solid hardware faults are detectable by the operating system, maintenance software, and hardware self-checking features.
-- $50 \%$ of transient faults are detectable by the operating system and hardware self-checking features.
- Fault detection, off-line.
-- 99.5\% of solid faults are detectable by maintenance software and hardware self-checki-ng features.
-- $85 \%$ of intermittent faults are detectable by maintenance software and hardware self-checking features.


### 4.2 Strategy

The following strategy will be followed to satisfy the previous goals. New development projects will emphasize, but not be limited to these areas, and efforts must be made to improve existing products in these areas.

### 4.2.1 Hardware Strategy

- Develop hardware maintenance features for fast, highlevel self-testing of large semiconductor memories.
- Design microcode control such that it is a useful maintenance tool placing emphasis on fault isolation.
- Develop hardware to multiplex CPU registers to the MCU.
- Develop fault detection techniques in CPU and LCN hardware modules and all data paths.
- Configure CPU and LCN hardware modules such that it is possible to remove faulty modules from operation while minimizing impact to the operational system.


### 4.2.2 Operating System Strategy

- Improve error detection and error logging techniques.
- Improve system error recovery techniques, stressing fail-soft recovery with system reconfiguration.
4.2.3 Maintenance Software Strategy
- Develop fault isolation diagnostics, using added hardware features.
- Improve on-line maintenance software.
- Develop concurrent maintenance diagnostics.
- Improve remote maintenance capabilities.
- Reduce time required to verify system operation.


### 4.3 Attainability

A high priority effort in the design of new hardware systems (including the FMP) is the provision for integral maintenance "hooks" throughout such machines. Cost and design resource factors have taken this feature into account. The degree to . which these facilities will be exploited will, however, depend on the resources committed to sophisticated maintenance strategies by the various software developers. The inclusion of features in the various levels of software (from device driver all the way up to applications program instrumentation) is feasible and within the range of known software techniques. The probability that the NASF will possess the maximum of these capabilities rests solely on management commitment by NASA and its contractors to assigning these facilities a high priority in the implementation program.

## DIVISION 9

INSTALLATION ORGANIZATION/OPERATION

## DIVISION 9

## INSTALLATION ORGANIZATION/OPERATION

The information in this division of the report contains a recommended data center organization for NASA planners to aid in the formulation of an accurate life-cycle model for the NASF installation. Data presented in this division was gathered from the personnel of Control Data's STAR-100 Data Center in Arden Hills, Minnesota, a data center possessing similar characteristics to the proposed NASF facility.

In setting up an effective new data center a key point is the pursuit of extensive research on the users, types and frequency of jobs, and projected increases in work. Once the user base has been established, as well as whether remote stations will be employed, data center personnel can more accurately plan the installation.

The following information, which is provided to assist in the planning of NASF Operations, is broken down into manpower, supplies, services, and an overall scenario that will attempt to show typical considerations in data center management. It is hoped that this information will enable NASA planners to set up a NASF Data Center with an eye for efficiency, reliability, and stability.

## MANPOWER REQUIREMENTS

Based on information obtained from Control Data large system data center activities, it is recommended that NASA adopt an organization similar to the one shown in figure 1. This suggested organization has four managers controlling the operations, system support, techniques support, and administrative and technical support -- all reporting to the overall NASF center manager.


Figure 1. Suggested NASF Operations Organization

The operations manager is responsible for ensuring the day-today flow of customer jobs through the data center in a timely manner. The main duties of the operations manager are:

- to set up operating procedures for efficient data center operation;
- to schedule addition of local modifications or Field Change Orders with customer engineering staff;
- customer or user support scheduling;
- customer service dealing with customer problems related to operations;
- to monitor daily operations attempting to foresee both hardware and software problems;
- to deal with vendor customer engineers.

Within the operations organization, the manager must provide leadership to the following subordinates.

Shift Leader -- The shift leaders (3 in the organization chart) are the most experienced computer operators on the shift. They are usually higher pay-grade levels than the other operators on the shift and could act as assistant operations manager because of the amount of experience they have.

A good shift leader's experience would allow him/her to be familiar with all aspects of the various jobs run in the center and he/she should have extensive training on the center's equipment.

Computer Operator -- The operators (eight in the organization chart) should be trained on the NASF and be able to complete the tasks related to efficient customer job throughput.

Process Control Clerk -- The process control clerk is responsible for keeping track of the input and output of customers' jobs within the data center. Often times the process control clerk, in addition to scheduling duties, will function as the requisitioner of supplies needed within the data center for its day-to-day operations, i.e., tab cards, line printer paper, office supplies, etc.

Librarian -- The librarian for the data center sets up and maintains a filing system for magnetic tapes, disk files, and card decks that are often used', thus freeing user time normally spent in this function.

Customer Service Analyst -- Two to four analysts should be available to troubleshoot problems that customers may have in getting their jobs successfully completed. The number of analysts employed at the data center will be determined by whether or not-several remote sites are used and whether the major part of the customer input will be at the data center itself.

The customer service analyst is the first person in the data center the customer should contact if problems arise. These analysts should be of several grade levels so that more complex problems can be addressed by more senior, experienced analysts.

The Systems Support Manager
The systems support manager oversees high-level analyst support of the system and, along with his systems analysts, gets the system operational and solves problems the customer may have that go beyond those encountered in operation.

Responsibilities of the systems support manager include:

- building and maintaining the operating system, or systems, used in both the FMP and SPS sections of the system;
- ensuring that Program Trouble Reports (PTRs) have been properly resolved;
- isolating problems in the system so that Customer Engineering or an analyst can correct it;
- providing vendor analyst support if needed.

In order to effectively deal with the analysts reporting to him/her, the systems support manager should be software knowledgeable.

The systems support manager must supervise the activities of:
FMP Analyst -- These analysts should be familiar with the total system but should be most familiar with the FMP. They would deal with customer job problems of an intermediate nature that have been localized in the FMP. They would also build and maintain the FMP operating system and be familiar with the SPS operating system as backup.

SPS Analyst -- These analysts should be familiar with the total system but should be most familiar with the SPS. They would deal with customer job problems of an intermediate nature that have been localized in the SPS. They would also'build and maintain the SPS operating system and be familiar with the FMP operating system as backup.

Programming Aide -- The programming aide assists the analysts in the successful completion of their jobs.

## Techniques Support Manager

The techniques support manager should be a high-level analyst fully capable of solving the most complex problems the customer could have. Likewise, his crew of consultants should be top-notch people of the highest level.

The techniques group is responsible for:

- working with customers on setting up their basic software;
- acting in a quality assurance function for customer's software to make it more efficient if necessary;
- providing education so that customers may make efficient use of the system;
- setting up benchmarks for vendors;
- generating an algorithm base.

Consultants -- Reporting to the techniques manager are four consultants who should be experts in at least one area of responsibility of the techniques group, e.g., physics, aerodynamics, structures, meteorology. Generally, the techniques consultants provide direct support to the user groups.

Administrative, Technical Support Manager
The administrative and technical support manager is basically the business manager for the data center. Functions that fall into this area of responsibility include:

- initial configuration and revision of existing configurations;
- reports and statistics needed for various management reports;
- billing and other transactions with accounting personnel;
- technical support -- primarily for evaluation and purchase of new equipment.

Reporting, to this manager are the following personnel:
Technical Consultant (Hardware) -- This consultant must be familiar with all facets of hardware so as to be the hardware resource person for data center management. The ability to configure and reconfigure a wide variety of hardware is important, especially at the time of the initial data center setup. If it is expected that the data center will be static as far as new equipment is concerned, this function may be eliminated. However, if several remote locations are to be tied into the data center, this technical expert perhaps should at least be familiar with telecommunications, which will become increasingly more important as the amount and sophistication of the remote equipment grows. In addition, this consultant should be able to develop both short- and long-range budgets, and deal with the technical aspects of moves in the data center.

Administrator -- The administrator would probably have direct responsibility for ensuring that reports., billing, and other tasks get completed. This person is often the interface between the data center and other supporting departments such as plant maintenance or accounting.

Statistics Clerk -- This person would generate reports based on data furnished by others within the data center.

The job functions previously discussed are not meant to be only as described. The exact functions should be flexible enough so that they overlap. All data center personnel should complement each other to assure a smoothly functioning organization.

The cost of personnel, to adequately staff a data center of the size proposed to NASA will be dependent on NASA's pay structures. A recent study of the data processing industry has shown that personnel costs are as high as 50 percent of a data center operating budget.

Increases to personnel wages are constrained by U.S. Government wage guidelines, hardware costs are decreasing; and increasing shortages of good, high-quality people will make accurate forecasting difficult or impossible for the near future.

## DATA CENTER SUPPLIES

Any discussion of supply usage will, of course, depend on the way the data center is used and placed among its users. If it is to be in or near the greatest number of users, more supplies such as printer paper, punched cards, and magnetic tapes will be used. If the center will be separated from the bulk of users, more remote stations will be utilized thus reducing supply expenditures at the center itself.

In general, Control Data recommends stocking an initial amount of magnetic tapes. After the data center is functioning efficiently, CDC's data center staff estimates that a monthly tape replacement of about $10-15$ tapes would be likely.

Control Data's Mass Storage Sytem (MSS) is recommended for the NASF installation. Use of the MSS results in:

- decreased use of disk space,
- decreased operator time or possibly requirements,
- decreased. tape expenditure.

The MSS concept is relatively new and CDC has not yet compiled enough information on its data center utilization to accurately predict cartridge replacement frequency; as with magnetic tape, an initial supply should be stocked.

The other big supplies expenditures would be in line printer paper, printer ribbons, and punch cards. The following chart lists NASF installation projected monthly usage of supplies for both a center without many remote stations and a typical remote station that orders its own stock of supplies.

| Item | Data Center Use/Month | Remote Station Use/Month |
| :---: | :---: | :---: |
| Printer Ribbons | 25 | 7 |
| Printer Paper. | 225 boxes | 50 boxes |
| Punch Cards | $\begin{aligned} & 120 \text { boxes } \\ & \text { ( } 5 \text { boxes per case) } \end{aligned}$ | 30 boxes <br> ( 5 boxes per case) |
| Magnetic Tapes | 10 to 15 | N/A |
| Mass Storage Cartridge | Data not available | Data not available |
| other major expen capitiol equipm ude the purchases more efficient equipment. | iture for a function replacement and pu of new updated equip as well as replacem | ing data center would chase. This will nt to make the data nt of the often-used |

Other projected needs of the users that may be provided by the data center include:

- user communications newsletter
- keypunching, interpreting, etc.
- monitōring of remote sites
- special delivery services to include:
- personal delivery to the airport
- packaging for shipment
- interfacing with shippers
- listing, binding, collating, and bursting
- microfilm/replication services


## DATA CENTER SCENARIO

The planning of the data center staff must include a mix of position levels, as indicated previously, to allow growth from within the organization. Also, a good mix of low-level to highlevel technical content positions give room for personal growth.

One can expect impact on the daily functioning of the data center due to vacation, sickness, tardiness, etc. A backup strategy can be accomplished in one of two ways.

1) Staff "lean-and-thin" which will require overtime by others during times of personnel shortages.
2) Over-staff so that absence of a couple people will not noticeably impact schedules.

In place of distinct eight-hour shifts, it is possible to define a variety of shifts, allowing the data center to attract computer operators who prefer to work unusual hours. This method also provides better transition of data center operators throughout the day. Refer to figure 2 for an example.


Figure 2. Data Center Operator's Schedules

By use of overlapping shifts, operators are more flexible and the data center can be more efficiently run with less difficulty adjustins for absences. Also, the overlapping of computer operator hours allows for more communications between the operators than does a static eight-hour shift.

Computer operators at such a center should be considered as professional or semiprofessional -- not as blue collar automation. Thus the level of system responsibility and understanding is higher.

Reliability and stability data must be provided via the operating system software. Since this will be an ongoing concern, this should be addressed immediately upon startup of the installation.

The design of the NASF center with regard to storage and office space will have to be considered in terms of NASA policies. CDC recommends an operator's desk be in the center and that one desk be supplied for every two operators on a shift. Care must be taken in designing adequate, well-lighted space for visitors to work on their jobs as well.

Controlled storage space, preferably within the range of the computer room air-conditioning, should also be provided. Supplies, especially paper products, need to be stored in a constant temperature, low humidity environment.

Other considerations for space allocations that CDC believes are important are:

- Limited- or controlled-access storage for cards, tapes, etc.
- Room for several users to lay out their work and use terminals.
- Adequate counter space for several users to discuss jobs with center personnel.
- Keypunches for quick "fixes" of user jobs.
- Offices for the support or customer services analysts with offices that open into the center itself.
- Local terminal capabilities so that users may be able to make minor modifications to their work.

More definitive information on floor layouts and environmental considerations will be found in Division 10.

In addition to the data already presented here, the following points should also be considered by NASA.

1) A hardware calendar clock is recommended by CDC. This is an aid in reducing purged files caused by an operator mistakenly typing in the wrong date. Several large Government Data Centers have this clock installed in their systems.
2) Because of the complexity of the NASF system, short-term, part-time operators are discouraged. The return on the training effort is most often poor. Part-time operators that are employed should be long term ones for this reason. Part-time operators do, however, lend a good deal of flexibility to manpower scheduling in the data center.

These points, plus the other information in this division, should enable NASA planners to begin to visualize the structure of the NASF facility. As the NASF plan further materializes, more specific recommendations and estimates can be made by Control Data.

## DIVISION 10

NASF PHYSICAL REQUIREMENTS UPDATE

The report for the first study of this project (ref. 1) . presented size, power, and cooling requirements for the NASF. This report provides an update of those requirements; it consists primarily of two tables. Table 1 s,hows a detailed listing of all the elements that make up the SPS, the disk station, and the MCU, including all the anticipated peripheral equipment. All the equipment shown are products today. Table 2 includes all the separate parts that make up the FMP computer system. The motor-generator sets and the condensing units exist today but the numbers for the parts of the FMP -- computer bay, Main Memory, Intermediate Memory, and Backing Store -- are today's best estimates since these items have yet to be built.

All the numbers in the tables are subject to change as design proceeds and details of the FMP, as well as the system, become more clearly defined.

Table 1. Standard Product Physical Requirements

| CONTROL DATA CORPORA DOCUMENT NO. RUN DAYF 03/2z/79 cUSTOMER: NASA AME | QATION | tota | 1 Srste | Ems | MACHINE | F UNIT | spfalf | ICATION |  | COMPUTER |  | IIITY P | Landine a |  | NNSTOHCTI | $10 \mathrm{~N}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  |  |  |  |  |  |  |  |  |  | Unit Cit | rcult nre | EAKER | REDUTREM | HENTS |
|  |  |  |  | -PHYCI | Cal prop | ERTIES | ---- | UNIT |  |  | $T$ | (n2) |  | $r$ |  | c |
| CABINET NAME |  |  |  |  | UNIT | INNIT | UNIT | HEAT |  |  | P | 4 OOHz | 50/60Mz | $\bigcirc$ | 50,form | \% |
| MODULE/MODULES | PRODuct | Sty |  | DEPTH | AREA | HFISHT | WEIGHT | NTSSTP | $\cdots-$-INIT | r KVn--- | W | 2nav.3p | 3nBV.3P | N | $115 \mathrm{~V} \cdot 10$ |  |
|  | Proouch |  |  |  | (SO FT) | (YN.) | (LRS) | (STU/HR) | 40 BHZ | 50/60H7 | - | AMPERES | amperes | $N$ | ambfags |  |
| .CENT Computer bay |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| cent computer bay | ${ }_{2}^{175-116}$ | 2 | 93.70 | 35.00 | 45.55 | 79.80 | 4500 | 3660 | 10.8 | - | A | 50 |  |  |  | , |
| CONSOLE | 175-116 | 2 | 101.90 | 35.00 | 49.53 | 79.80 | 36no | 5860 | 17.3 | - |  | 70 |  |  |  |  |
| CONDENSING UNIT | 175-116 | 2 | 32.50 | 47.00 | 21.22 | 48.50 | 390 | $303 n$ | n.R | 0.3 |  | 15 |  |  | 15 (178) |  |
| mag tapf Controa | 175-116 | 2 | 72.00 | 26.00 | 26.00 | 48.00 | 14 AO | 17400n(13) | - | 14.4 |  |  | 70 (0') | (02) | : |  |
| REMOTE PROCESSOR C | $7021-32$ | 1 | 29.30 | 30.00 | 6.10 | 60.00 | ? 50 | 2750 | 0.6 | $0 \cdot ?$ |  | 15 |  |  | 15(0) | (n) |
|  | 7021-32 | 1 | 29.30 | 30.00 | G. 10 | 60.00 | 750 | 2750 |  |  |  |  |  |  |  |  |
| mag tapf transport |  |  |  |  |  | no.no | 78 | 2750 | 0.6 | 0.2 |  | 15 |  |  | 15(118) | (19) |
| mag tapf thansport | 677-4 | 2 | 30.50 | 30.00 | 12.71 | 63.50 | 900 | 7980 | - | 2.9 |  |  | 150.) | (03) | : |  |
| FXTENDED CORE GYORA | $679-7$ | 2 | 30.50 | 30.00 | 12.71 | 43.50 | 900 | 7920 | - | 7.9 |  |  | 15(0\%) | (03) | : |  |
| PERIPHFRAL CONTPOL | LFR P CAB |  | - |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | 7030-104 | 1 | 42.00 | 20.50 | 5.9A | 56.90 | 675 | 3 AO | n.9 | 0.4 |  | 15 |  |  |  |  |
| Storage cabiney : |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 15 | (n) |
| Dop Controller | 7030-1046 | 1 | 70.00 | 40.80 | 19.83 | 72.50 | 180n | 344003121 | 7.? | 3.6 |  | 30 | 20 | (02) |  |  |
| POECC CONFIG. | 7030-1048 | 1 | 29.00 | 25.00 | 5.03 | 6fi. 00 | 500 | 1400 |  |  |  |  |  |  |  |  |
| CHANNE CONVERTFR (30nB) |  |  | , | ?.¢. | 5.03 | AR.00 | ano | 1400 | 0.3 | n. 1 |  | 15 |  |  | 15 | (n) |
| ChANNEL. CONVFRTFR |  | 4 | - | - | - | - | - | - | - | - | A |  |  |  |  |  |
| (30n) ${ }^{\text {(3) }}$ | 10315:2 | 4 | - | - | - | - | - | - |  |  | ค |  |  |  |  |  |
| transffe Sw cont | - |  |  |  |  |  | - |  | - | - | Q |  |  |  |  |  |
| TRANSFFP SWITCH: | 32704 | 1 | 22.90 | 20.50 | 3.26 | 56.90 | 450 | 2000 | - | $n .6$ | A |  |  |  | 15 | (n) |
| In 327na/b | AP71-2 | 2 | - | - |  |  |  |  |  |  |  |  |  |  |  |  |
| NETWORK PRECESSER |  | 2 | - | - | - | - | - | - | - | 0.6 | 9 |  | Pup Fom | 2270 |  |  |
| card pinen | 2551-1 | 2 | 24.00 | 34.50 | 11.50 | 75.00 | 700 | 6490 | - | 1.7 |  |  |  |  | ? | (72) |
| 250 CDM | 415 | 1 | 21.50 | 39.50 |  | 45.00 | 55.8 | 3000 |  |  |  |  |  |  |  |  |
| PERIPHFRAL CONTROL. |  |  |  |  |  | 45.00 |  |  | - | 1.1 |  |  |  |  | 15 | (07) |
| CARD READER/CONYRI | 3446-? <br> LIER | 1 | 42.00 | $20.5 n$ | 5.98 | 56.90 | 6F0 | 2700 | 0.7 | 0.1 | 9 | 15 |  |  | 15 | (n) |
| 1200 rPM | 405 | 2 | 57.00 | 33.00 | 26.17 | 46.10 |  |  |  |  |  |  |  |  |  |  |
| CARD READER COnt |  |  | s7.00 | 33.0 | 26.17 | 46.00 | 1200 | 9000 | - | 3.4 | n |  | 15 | 103) |  |  |
| 1 N 405 | 3447-2 | $?$ | - | - | - | - | - | - | - | - | R |  |  |  |  |  |
| line Printer/cont |  |  |  |  |  |  |  |  |  |  |  |  | P., 19 fo m | 405 |  |  |
| 2000 IPM | 580-200 | 4 | 62.00 | 31.50 | 54.25 | 51.50 | 1500 | 1500 n | - | $5 . ?$ | ค |  | 70 (0.) | (02) |  |  |
| MOTOR GENERATOR an kVa |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| an kVa (0l) | 65045-1kC | 1 | 32.00 | 37.00 | 7.11 | 60.30 | $3 \times 75$ | $47 \mathrm{AOn}(11)$ | - | - |  |  | ivonutur | CONN | in wh mod | jut |

Table 1. Standard Product Physical Requirements (Continued)
DOClJment no.


| CABINET NAMF MOBILEAMODULES PRODUCT | nTy | InsIT WIDIH (IN.) | $\begin{aligned} & \text {-PHYG } \\ & \text { INIIT } \\ & \text { DFPKH } \\ & \text { (IN.) } \end{aligned}$ | CAL. PRO UNIT ARFA (50 FT) | $\begin{aligned} & \text { PERTIFS } \\ & \text { UNIT } \\ & \text { HFIGHT } \\ & \text { ITN.) } \end{aligned}$ | $\begin{aligned} & \text { UNTY } \\ & \text { WETGH } \\ & \text { (LAG) } \end{aligned}$ | $\begin{gathered} \text { UNIT } \\ \text { HFAT } \\ \text { ПISSIP } \\ \text { QTIS/HR } \end{gathered}$ |  |  | JNIT C <br> $T$ (02) <br> P. 4 nOH 7 <br> w enobv,3p <br> R AMPERE |  | $\begin{aligned} & \text { IRCUIT MR } \\ & \text { 5n/6niz } \\ & \text { PARV, PP } \\ & \text { SMPEDFS } \end{aligned}$ | $\begin{array}{cc} \text { REAKER } \\ & C \\ Z & 0 \\ 0 & 1 \\ \text { S } & N \end{array}$ | REOUTRENEVTE |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  | 1151.10 ampFRE | $\begin{array}{ll} \mathrm{O} & \mathrm{~N} \\ \mathrm{~S} \end{array}$ |  |  |
| mg contanl cabinet |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| 65045-1k\% | 1 | 56.00 | 23.00 | R. $0_{4}$ | 74.00 | 1050 | - (11) | - | - |  |  |  |  | SITF FnR | 175 H | 0/03k 4 M |  |
| TERMINATOR DOWFR (WALL MOUNT) |  |  |  |  |  |  |  |  |  |  |  |  |  | Stres | 17 | ,o3k mot |  |
| DEW POINT RECORT (WALL MOUNT) | 1 | 24.00 | 6.an | 1.13 | 12.00 | =n | - | - | - |  |  |  | 15 |  |  |  |  |
| 53370000 | 1 | 15.50 | 7.50 | 0.81 | 36.00 | an | - | - | - |  |  |  |  |  |  |
| TEMP MAN/PWR COHT (WaLL MOIJNT) |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | 1 | 75.00 | $7.0 n$ | . 1.22 | 16.00 | 40 | - | $\cdots$ | - |  |  | PRYMADV | Pup $G$ | D no EM. | OfF |
| 53369800 | 1 | >150f | 7.90 | 1.15 | 12.0n | 35 | - | - | - |  |  | 151 |  |  |  |
| Centrar processin ia-zo | 1 | 61.00 |  |  |  |  |  |  |  |  |  |  |  |  |  |
| MOS MAIN MEYORY | 1 | 6,.00 | 31.00 | 13.37 | 79.00 | 475 | 782n | - | 1.1 |  |  |  |  | 15 | (n4) |
| storagf orive jmprf | ? | - | - | - | - | - | - | - | - |  |  | 14CP4 |  |  |  |
| CARD RALINF PR CONT ${ }^{1833-3}$ | 1 | - | - | - | - | - | - | - | - |  |  |  |  |  |  |
| COMM t.ine adapifr ${ }^{\text {1828-1 }}$ | 1 | $\square$ | - | - | - | - | - | - | - |  |  | Iv repil |  |  | - |
|  |  |  |  |  |  |  |  |  |  |  |  | , |  |  |  |
| MAG TADF Sursya 1R43-1 | 1 | - | - | - | - | - | - | - | - |  |  | IV CP, |  |  |  |
| mag tapf transport | 1 | 22.50 | 32.00 | 5.00 | 68.00 | 425 | 1640 | - | $n .6$ |  |  |  |  | 15 | (n4) |
| NStaliation Kit | 1 | 19.00 | $18.0 n$ | 2.39 | 24.00 | 325 | 1640 | - | 0.6 |  |  |  |  |  | (1)3) |
| MAG TAPF TRANSPORT ${ }^{\text {la60-201 }}$ | 1 | 22.50 | 79.50 | 4.61 | 68.00 | 3 no | - | - | $\cdots$ |  |  |  |  | 151051 | $(04)$ |
|  |  |  |  |  |  |  |  |  |  |  | - |  |  | 15(0) |  |
| mag tape transport ${ }^{\text {377-4 }}$ | 1 | 30.50 | 30.80 | 6.35 | 67.50 | 900 | 7920 | $\geq$ | P.9. |  |  | 1510.1 | 1031 | : |  |
| CARD RFADER | 1 | 30.50 | 30.00 | 6.35 | 63.50 | 900 | 7920 | - | 2.9 |  |  | 15(0) | (07) | ; |  |
| INE PRINTER IR29-60 | 1 | 14,20 | 19.00 | 1.87 | 16.50 | 55 | 1319 | - | 0.5 |  |  |  |  | 15(08) | (04) |
| DISPLAY TERMINA, 1A27-60 | 1 | 34:00 | 26.50 | 6.26 | 44.50 | 300 | 3275 | - | 1.2 |  | - |  |  | 35(n6) |  |
|  | 1 | 21.60 | 20.40 | 3.06 |  |  |  |  |  |  |  |  |  |  |  |
| disk stopage unit lall-z |  |  | 20.40 | 3.inn | 15.20 | a | 430 | - | 0.1 |  |  |  |  | 15 | ( $\mathrm{n}_{4}$ ) |
| MASS STORAGE CONT | 16 | 77.00 | 45.00 | 135.00 | 45.00 | ano | 5120 | - | 1.5 |  |  | 70 | 1031 |  |  |
| STORAGF MODULE ORTVE ${ }^{\text {769-PC }}$ | 4 | 29.00 | 25.00 | 20.14 | 66.00 | 450 | 2840 | 0.5 | 0.7 |  | 15 |  |  | 15(08) | (n>) |
| storagr drive cont | 2 | 22.00 | 36.00 | 11.00 | 36.20 | 218 | ? 355 | - | 0.9 |  |  |  |  | 15 | (94) |
| IN FIRST DRIVF 1833-3 | 1 | 17.70 | 24.00 | 2.95 | 12.00 | 91 | 1030(11) |  | 0.3 |  |  |  |  |  |  |
| NON-NUMERIC DATA |  |  |  |  |  |  | 103017 | - | 1.3 |  |  |  |  | 15 | (04) |
|  | 4 | 33.30 | 42.00 | 38.85 | 44.40 | 10 AO | - | 1.6 | 2.2 |  | 15 | 15(0) | (10) | ( 18 ) |  |

Table 1. Standard Product Physical Requirements (Continued)
document no.


Table 1. Standard Product Physical Requirements (Continued)
nocument mo. NOTES:

- ALL specificationg arf on a pfr unitt aasis
- WITH FXCEPTION OF (INIT AREA(RFFLETTS TOTAI. UNIT AREA)
(A) INTFPNAL TERMINATOR POWER EUDDLY
(A) EXTERNAL TERMINATOP POWFR SUNPLY
(01) MAYIMUM CAPACITY SHOWN. ACTHAL (1)AI) MAY RE IESS.
(02) TEOMJNAL STRIP.
(03) LORKING CONNECTOR.
(04) STANDAND $60 H 2$ PI UFs.
(05) $38 \cap V-3 P_{H}$. FAR $5 \cap H Z$ VFRCION.
(06) 22nV - 1PH. FOR 50 H 7 VERCIn*1.
(07) 380/415V - 3 PHASE + NFOP 5AH7 nPFRATION
(OA) $220 / 240 \mathrm{~V}$ - 1 PHASE NOMT SPFQATION
(09) RACK MOUNTARLF EDUIPMENT
(II) INNICATED EOUIPMENT IS NOT INCI.IMED IN MIJS GIJMMARY TOTAIG 11) FOR AREA,HEAT AHD POWER

```
(12) WATER MOLEE-ON PERCENT OF HFAT REJFCTFN TH WATFR.
10 PEDCENT OF HFAT RFIFCTFN TO 200 M .
```



- rNLFT YEMt * FLDA PATE HFA: LOSTA
- REGF DEG CA GPM L/MTNA PSI KPA 4
* An 27 \# $4 . ? 16.7 \$ 5.7$ 30
* 70 21 $30.0111 .5 * ? .5$ 12

HEAN LDSS DROP FIR ODRN COMDFASERD ONI Y. MIVIMUM NPERATING PRFCSURF RIFFFFFVIIH aT UMIT wATED COVMFCT IS In DGT/AMKD. MAXIMUM NATFD PRFGGIJOE 100 PGT/AOt KDN.
(13) WATER CNOLFD-16100n GTH/HD (471 OWW) RE IFCTED TO WATER,

* inlat temp a Flon RATF \# hFAT LOQ
* NFGF DEA CA TPM L/MFNA PC! KPA

10 TOM

- 80 27 \#12.0 46.0 0.565
* $70 . \begin{array}{lllll}27 & 21 & 12.0 & 45.0 & 0.3 \\ 35.3 & 0.3 & 65 \\ 47\end{array}$


| $\#$ | $7 n$ | 14 | $\# 7.6$ | 28.4 | 4.3 | $7 n$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| \# | 0 | 10 | 4 | 4.2 | 73.5 | 4 |

OREN CONDFMED ONH
M OPERATIN. PRFGGURE DIFFEDFNTI:
 MAXIMIJM WATER PRESSIIRF IAO PCI/A4. K2A.

Table 2. FMP Physical Requirements

|  |  | Unit Dimensions inches ( cm ) |  |  | Unit | Unit <br> Dissipated <br> Heat <br> BTU/hr <br> (kcal/hr) | Notes |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Qty | Cabinet Name | Width | Depth | Height | $\underline{60 \mathrm{~Hz}}$ |  |  |
| 1 | FMP Computer Bay | $\begin{gathered} 146 \\ (370) \end{gathered}$ | $\begin{gathered} 102 \\ (259) \end{gathered}$ | $\begin{gathered} 76 \\ (193) \end{gathered}$ |  | $\begin{aligned} & 331 K \\ & (83 K) \end{aligned}$ | (1) |
| 1 | FMP Main Memory | $\begin{gathered} 111 \\ (282) \end{gathered}$ | $\begin{gathered} 63 \\ (160) \end{gathered}$ | $\begin{gathered} 76 \\ (193) \end{gathered}$ |  | 576K (145K) | (1) |
| 1 | FMP Intermediate Memory -- 32M words (65k DRAM) | $\begin{gathered} 200 \\ (508) \end{gathered}$ | $\begin{gathered} 130 \\ (330) \end{gathered}$ | $\begin{gathered} 76 \\ (193) \end{gathered}$ |  | $\begin{aligned} & 140 \mathrm{~K} \\ & (35 \mathrm{~K}) \end{aligned}$ | (1) |
| 2 | FMP Backing Store 65M words 256k CCD | $\begin{gathered} 65 \\ (165) \end{gathered}$ | $\begin{gathered} 29 \\ (74) \end{gathered}$ | $\begin{gathered} 76 \\ (193) \end{gathered}$ |  | $\begin{aligned} & 26.8 \mathrm{~K} \\ & (6.8 \mathrm{~K}) \end{aligned}$ | (1) |
| 22 | Prog. Device Controller | $\begin{gathered} 18 \\ (45) \end{gathered}$ | $\begin{gathered} 13 \\ (-33) \end{gathered}$ | $\begin{gathered} 18 \\ (46) \end{gathered}$ |  | $\begin{gathered} 1.2 \mathrm{~K} \\ (0.3 \mathrm{~K}) \end{gathered}$ |  |
| 5 | Condensing Unit 30 ton | $\begin{gathered} 90 \\ (229) \end{gathered}$ | $\begin{gathered} 34 \\ (86) \end{gathered}$ | $\begin{gathered} 48 \\ (122) \end{gathered}$ | 40 | $\begin{aligned} & 290 \mathrm{~K} \\ & (73 \mathrm{~K}) \end{aligned}$ | $(2,3)$ |
| 4 | Condensing Unit 5 ton |  |  |  | 7 | $\begin{gathered} 55 \mathrm{~K} \\ (13 \mathrm{~K}) \end{gathered}$ | $(2,3)$ |
| 3 | Motor-Gen 250 KVA | $\begin{gathered} 93 \\ (236) \end{gathered}$ | $\begin{gathered} 41 \\ (104) \end{gathered}$ | $\begin{gathered} 43 \\ (109) \end{gathered}$ | 160 | $\begin{aligned} & 133 K \\ & (34 K) \end{aligned}$ |  |
| 3 | MG Control | $\begin{aligned} & 56 \\ & (142) \end{aligned}$ | $\begin{gathered} 20 \\ (51) \end{gathered}$ | $\begin{gathered} 78 \\ (198) \end{gathered}$ |  |  |  |
| Notes for Table 2 |  |  |  |  |  |  |  |
| 1) The Freon-cooled units all dissipate about $10-15 \%$ of their total power into the room ambient. The balance is dissipated into the Freon refrigerant. The portion dissipated into the Freon is included in the total for the condensing units. |  |  |  |  |  |  |  |
| 2) The condensing units dissipate about $5 \%$ to the ambient and the balance to water. The total he dissipated includes internal losses as well as taken from the Freon system. |  |  |  |  |  |  |  |
|  | The condensing units are normally in a separate room. The maximum length refrigerant lines are 100 ft ( 30 m). |  |  |  |  |  |  |

- Total input power of approximately 1000 kVA .
- Total floor space requirements of approximately:
- 6250 square feet for Computer Room.
- 450 square feet for Compressor Room.
-     - 400 square feet for Motor-Generator Room.
- Total heat dissipation of approximately:
- 715,000 BTU/hr ( $180,000 \mathrm{xcal} / \mathrm{hr}$ ) to Computer Room ambient air.
- $115,000 \mathrm{BTU} / \mathrm{hr}(29,000 \mathrm{kcal} / \mathrm{hr})$ to Compressor Room ambient air.
- 445,000 BTU/hr ( $112,000 \mathrm{kcal} / \mathrm{hr}$ ) to Motor-Generator Room ambient air.
- 1,775,000 BTU/hr (447,000 kcal/hr) to water.


# DIVISION 11 SYSTEM SIMULATOR <br> SUMMARY AND RESULTS 

## SUMMARY AND RESULTS

### 1.0 DEFINITION OF SYSTEM TO BE SIMULATED

The Numerical Aerodynamic Simulation Facility (NASF) as studied by CDC is a set of five (5) stations tied together by a communications trunk system called a Loosely Coupled Network (LCN). Figure 1 shows the generalized system.


Figure 1. Generalized NASF
One constraint has guided the design of the proposed NASF system: to meet the user's computational need a computing engine (FMP) should be dedicated to executing flow code at a sustained rate of 1 billion floating-point operations per second (gigaflop). Thus it seems obvious that significant portions of the problem need to be carried out elsewhere. Typical job runs have been dissected to find basic functions lending themselves to modularization. With the job types so specialized and similar, such modularization can result directly
in corresponding modular hardware. Jobs may now run "in parallel". When a job occupies a given module, say the FMP, the other modules are busy working on other jobs (i.e., setup and post-processing tasks). With proper balancing of module tasks the FMP execution can "hide" all other station functions. If . this is true, the FMP would be $100 \%$ utilized. Job throughput would equal FMP throughput (neglecting the system's startup to steady state).

Now in addition to improved throughput, the distributed module concept allows designers to make some or all of the stations particularly useful entities on their own. In other words, aside from increasing processing efficiency and bandwidth, modules can provide storage capability, workstation processing, and/or significant CPU power, etc.

Though these principles were formed on the assumption of a specialized job, the hardware modules can match an apparently general job. For example, the following run -
a) job set up

1) input setup
2) code compilation
3) pre-processing manipulation of data to generalized coordinate space
4) transfer load module to computing engine
b) execution of code and storage of memory snapshots
c) post-execution functions
5) store temporary results
6) store long term results
7) process result files for analysis
8) produce graphical files
can be satisfied by a very versatile set of stations.
These hardware modules have been assigned the following rather canonical functions:
a) The SPS is the system controlling processor. It interfaces with the user community as a timesharing device with a powerful processing unit, and sets up and post-processes jobs for the FMP. It is expected that such a device will require the capability of between one and two times that of a CDC 7600.
b) The DSK serves as the data transfer pipe between stations. These disks serve as temporary storage and large buffer for the FMP engine.
c) The FMP is the computation engine for flow code or other lengthy set of calculations that the SPS cannot reasonably handle. Code compilation and data manipulation may also occur here.
d) The GRF is the display station for user interaction. Both. $\begin{gathered}\text { graphical input and output are manipulated here. }\end{gathered}$ Limited processing power is available locally, plus a tie to the system network.

The actual hardware contents of these modules depends upon many things: job sizes, turnaround needs, storage requirements, etc. Simulation then becomes a powerful design tool: a workload is outlined from a job scenario, the parameterized system, model then simulates the job. Simulation results may then be used to determine improvements to the hardware or software. Hopefully such a process converges to a reliable design.

### 1.1 Stations

A station, or device class, is a set of hardware (processors, disks, memory, etc.) which acts as a functionally unique unit within the syistem. The classes are:

1) Support Processor System (SPS)
2) Flow Model Processor (FMP) .
3) Remote Storage Disks.(DSK)
4) High Speed.Graphics (GRF)
5) Maintenance Control Unit (MCU)

Details of the stations as presently proposed follow.
Note: The MCU is not included in the simulation, and will not be referenced again in this simulator discussion.

### 1.1.1 The SPS

The SPS represents the general pur,pose manager of the system. Composed of two CDC CYBER 175/116 computers and supporting an extended set of peripheral devices (local disks, communication controllers, mass storage system, etc.), it controls the flow of jobs among all stations. Basic responsibilities assigned to the SPS are:

1) Controi and execute NASF operating system.
2) Compile FMP code.
3) Massage grid and configuration data for the FMP.
4) Process result files for display output.
5) Supply intermediate and long term storage.
6) Control all LCN file transmissions.
7) Timeshare local users.
8) Interface remote users to NASF.

One CDC CYBER device acts as the leader, holding and managing the mass storage directories. Otherwise each mainframe acts as a stand-alone processor working on its private job load. The CDC CYBER devices must contend for shared local hardware. Specifications for the proposed SPS are summarized below. Note that word sizes (in the SPS) are 60 bits.

1) Two mainframe processors, each with 262 K words of central memory, 20 PPUs, 24 I/O channels, and 3-MFLOP sustained computation rate.
2) Four FMD disks--0.5 billion words total, 6-Mbit/sec transfer rate sustained. Each disk cabinet contains two disk spindles.
3) ECS--0. 5 million words.
4) MSS--8 cartridge storage units, each with two cartridge tape transports, holding a total of 16 billion words active, cartridges removeable, seconds access time, $16-\mathrm{Mbit/sec}$ transfer rate.
5) 12-Mbit/sec effective channel rate to LCN.

The main responsibility of the SPS station is to keep the FMP well fed with new jobs. All computations done in the SPS, e.g.; code compilation, coordinate space transformations (grid and patch data), post-processing, are done with the intent that the FMP need not be weighted down with extra work. In addition each job can spend a large part of its processing or active life "near" the user. This is important for the interactive user both during code and input debugging, and post-processing analysis.

### 1.1.2 The FMP

The huge computational load of the fluid dynamics problem (or other numerical tasks which lend themselves to vectorization) is assigned to the FMP. With an estimated computation rate of 1 gigaflop sustained, keeping the FMP waiting for other stations is expensive. Up to $14 \mathrm{I} / 0$ channels (each $50 \mathrm{Mbit} / \mathrm{sec}$ ) may be connected to the LCN. The FMP acts as a slave to the SPS and has no system responsiblities other than:

1) receiving files from DSK,
2) job computation,
3) sending results back to the DSK station.

The FMP might also have the capability to compile source code and do complicated pre-flow-code processing tasks. For example, some grid generation computations are too lengthy to assign to the SPS. In fact, simulation may show that the FMP has time for considerable non-flow-code processing, if the SPS is too heavily loaded.
1.1.3 The DSK

- The DSK acts as a storage buffer available to all system stations. The sixteen 819-21 disks presently proposed provide 1 billion words of storage, or a total of about one day's worth of input/output for the FMP. Thus as a relatively large, fast access, fast transfer storage station, the DSK's responsibilities are to:

1) act as a backup queue to the FMP;
2) hold snapshots of intermediate solutions of large problems;
3) hold runs for possible same day restart;
4) transfer files to/from all other stations;
5) relieve other stations of current jobs' data bases.

The 819 disks are assigned to a number of dual-headed disk controllers (two channels per controller). Each half of a controller acts independently (with interlocks) allowing for increased bandwidth.

### 1.1.4 The GRF

The GRF station allows visual setup and solution assessment at a sophisticated graphics terminal. These terminals are tied into individual minicomputers with processing capability. Several of these terminals are concentrated together by a higher bandwidth computer. Each concentrator is a GRF device with one channel interfacing the LCN. The GRF responsibilities are to

1) allow user construction of configurations (aeronautical test vehicles or parts thereof);
2) allow user to specify and construct flow fields;
3) analyze post-execution graphics frames;
4) be used interactively in the system;
5) transfer data to/from SPS and DSK.

Because this class of computer is much smaller than that of the FMP or "SPS, data is senit in quarter-blocks using less of its central memory instead of full, 32 K -word blocks (64-bit words). This station is intended mainly for the interactive aspect of the aerodynamics input preparation/result analysis.

### 1.2 The Loosely Coupled Network

The Loosely Coupled Network (LCN) is a communications scheme which allows both large data bandpass and versatile station to station connectability. The LCN centers on a number of bitserial data trunks each with $50 \mathrm{Mbits} / \mathrm{sec}$ transfer capability. Programmable Device Controllers (PDCs) communicate with each other via the trunk system. Each PDC can connect with up to four trunks. (each connection to a trunk by a PDC is called a drop.) Each PDC connects to one device (via the PDC's backend). In this way a geometry configuration can be set up to suit nearly any designer's system philosophy. Figure 2 summarizes the hardware involved.

The responsibility of the LCN is straightforward, i.e., to handle all the communications load within NASF. For the job load involved, this is a major task. The usage scenerio dictates typical runs of very large data base jobs originating away from the computational engine (the FMP). In many cases significant portions of a job are passed between the GRF and SPS for setup or post-execution analysis. Long-term storage also resides far from the FMP in keeping with the FMP philisophy. Thus, to meet turnaround requirements and to keep the FMP "well fed", the LCN must be somewhat robust. For a detailed. description of the LCN hardware please refer to Division 4 of Volume II.


## NOTE:

ONE TO FOUR DŔOPS PER PDC
A MAXIMUM OF 16 DROPS PER TRUNK

Figure 2. NASF Loosely Coupled Network

### 1.3 System Philosophy, Principles, and Groundrules

The NASF is a distributed system with major functional capability in 4 discrete stations. The basic principle of using this design effectively is to perform parallel operations in these stations, i.e., simultaneous use of large amounts of hardware.

Processing may be done simultaneously in the FMP, SPS, and numerous stand-alone workstations. Parallel data transfers can potentially include

| FMP | > DSK |
| :---: | :---: |
| SPS | $\rightarrow$ DSK |
| GRF | ---> SPS |
| GRF | > DSK |

such that from 1 to 12 (if the LCN has 12 trunks) transfers can be moving on the LCN at the same time; six transfers at a time, 8
for an effective bandpass of about $1.2 \times 10$ bits/sec, would not be unusual during peak demand. It should be noted that data is organized into large blocks, matching the block structure of the FMP Backing Store. Presently, such blocks are envisioned to have 32,768 64-bit words (plus SECDED).

A simple groundrule which allows this effectiveness is the lack of a central supervisor or controller on whose word everything awaits. Instead, each station governs its own action via its private "operating system". There is no need for one station to know what is happening in any other station. Data transfers between stations are preceded by short request messages. The FMP relies on messages from the SPS as to file locations and destinations but the operations are asynchronous and independent. Therefore devices, designed to be independent, see only their respective PDC channels and the messages which go through them.

System control, which resides in the leading SPS device, assigns and schedules jobs as they enter the NASF. After this point the job flows from one station to the next when the job is ready to proceed and the receiver permits it. The control is clearly not centralized. A system such as this, with much concurrency and capability, can still be overloaded by huge data base problems all crashing in the FMP, or mean loads whose data transfer characteristics change suddenly. In principle the system can be configured to handle nearly any workload as long as the usage is reasonably understood and generally predictable. In practice it would be enviable to have a workload which has an assortment of data transfer-processing time characteristics spread out uniformly through the day (i.e.,
the prime shift). In any case, the more accurate the usage model, the more appropriate a resulting system configuration is likely to be; and, the more simulation at different workloads, the more reliable a system design.

System reliability against breakdown is the last basic groundrule for the NASF, though not the least important. With significant amounts of hardware in each station and the vitality of each to the whole system, hardware redundancy is required. Except for the FMP, each station necessary for system operation has redundant devices, e.g., 2 CDC CYBER 175s, 4 disk controllers, etc. LCN breakdown must not be fatal to the system. Redundancy of device-to-device connections through the LCN can easily be three deep. The connection geometry must be designed to cover any single hardware failure (device, trunk, PDC, drop), and will, in fact, usually cover several multiple-breakdown conditions.

The loss of one support processor, 2 disks, and a PDC, for example, should not exclude the use of a healthy FMP. A good system design must continue to service the users, though perhaps at a diminished effectiveness, under such conditions.

### 2.0 SIMULATOR CHARACTERISTICS

The system simulators (SYS82 and SYS83) are written in a traffic-oriented simulation language, General Purpose. Simulation System V (GPSS), in which transactions are serviced by the functional blocks through which they pass. Statistics may travel with the individual transactions; they may be compiled within each block; and/or they may apply to the whole system varying only with the flow of time. GPSS is appropriate here because 1) the programs can be developed to nearly any desired degree of detail and 2) the language is a standard simulation tool that can be used by Ames or nearly anyone with a CDC6600 class computer. Simulation is non-continuous at time resolutions near 1 clock period or less. For system simulation, a clock unit of. 10 microseconds is used. This simulation should appear very smooth and continuous to the user, accustomed to experiencing the passage of time in seconds. Details of how to use the simulators are found in Volume IV, Division 1 and 2 (Simulator User Manuals); examples of using them as an analysis tool are shown in sections 3 and 4 below.

### 2.1 Code Structure

### 2.1.1 Central Theme - LCN Mechanics

The GPSS source code is centered on the operations of the LGN. The code is generalized so that the number of trunks, PDCs, and drops per each PDC, and the network connection scheme are all parameterized. These parameters are initialized at the beginning of the run and cannot be changed during simulation.

A message transfer begins by entering the sending PDC if available. The PDC, probably one of many on a given trunk, waits its turn and then seizes the trunk. While the trunk is taken, no other PDC can use it. After the message is sent on the trunk, the trunk is released. Before the trunk is dropped the receiving PDC answers "message received", "busy", or nothing. No reply is sent if the PDC is out of operation or if there was an error in the message transmission. Messages receiving busy responses are retried next time around unless preempted by a higher priority message in the sending PDC. Transmissions that receive no reply are tried several times and if still unsuccessful the system is notified of a perceived hardware breakdown. This should not occur in simulation, though statistically it could.

Heavy loads may create a queue of messages for a given sending PDC. This queue is ordered according to message priority, and is first-in, first-out within a given priority. GPSS "keeps the books" for the queues, and of hardware contention; no accuracy is lost as the run becomes complicated (though CPU time can grow).

PDC clocks are simulated at 10 -microsecond increments. This represents the typical time increment for the internal PDC pointer that counts whose turn it, is to take the trunk. If a particular trunk goes inactive, the appropriate clocks are turned off until needed, saving GPSS a lot of needless simulation effort.

### 2.1.2 Code Modules for Device Classes

Each device class is coded as a module. The modules contain a library of functions and capabilities, which define the internal. workings of a class. This includes reactions of a class to the requests of other classes.

The DSK, $E M P$, and GRF modules are single, or nearly single, function stations.

The DSK stores and transfers data. The simulation module allows for one to eight dual-head disk controllers, each with a stipulated number of attached disks. Disk specifications are initialized at the beginning of the code. Each disk must be reserved for a specific transfer and then will allow only that data transfer. Other requests will receive "function busy" responses until the awaited disk transfer is completed, Contention for the disks can get quite complicated for heavy loads. The "disk access time" graph and table (see appendix A, pages 123 and 131 respectively, for examples) are sensitive indicators of the level of contention.

The FMP receives jobs, processes them, and returns them to the DSK. Limited communications between the FMP and SPS are also carried out. This module sets up the parallel transfers tolfrom the DSK. Maximum parallelism is exercised. This means if there are three I/O channels between the FMP and DSX, file transfers are split up into three groups. The FMP processor facility is monitored automatically by GPSS. A queue buffers jobs awaiting processing, i.e., a backing store. Only one job is processed at a time.

The GRF acts very nearly like the FMP. It transfers files and can be seized for processing. The GRF can transfer files tol from the SPS in addition to the DSK. In processing mode it can timeshare among users. The main difference is that a block transfer is done in pieces instead of all at once. This is simply because a 16 -bit GRF concentrator most likely cannot hold an entire data block in its memory. The size of the transfers are stipulated and appropriate transfer tines calculated. There can be from 1 to 4 GRF concentrators.

The SPS code is not nearly so modular. The SPS functions mingle with the system's operating system, and so characteristically the respective code fills the spaces between the LCN code and the stations' code. The basic functions are modularized: seizing a..percentage of a mainframe processor and setting up data transfers to the GRF or DSK. The details of
the data transfers are carried out in the "driven" devices, the GRF and DSK. Control responsibilities which permeate the entire source code include all look up directories (e.g., message path connections, disk allocation directory, etc.) and assignment of jobs to specific hardware devices.

### 2.2 Macroscopic Assumptions: Special Notes Concerning the SPS

Contention for shared hardware within the SPS is not modeled. Such a model is a separate project that eventually needs to be done. In the meantime the results of such a simulation has been estimated (carefully guessed), and used for input to the general system. These are called macroscopic assumptions. A four parameter family then becomes the performance capability of the SPS station.

1) Flop (computation) rate--This is the effective computation speed of each processor. For CYBER 175s a rate of 3 Mflops is assumed. Compilation speed is also included here. An accepted speed is 100,000 lines per minute.
2) Effective channel rate--The PPUs have well known data rates, but under a scheme of shared local disks, ECS, and MSS, determining the effective channel rate for large data blocks is not an easy problem. The contention scheme is comparable in principle to the total system contention scheme modelled in the LCN and DSK. The PPU I/O rate is assumed to be the effective channel rate; thus it is assumed that most of the overhead will be hidden.
3) Program or task load--Each SPS processor is a time sharing device. The order of task servicing is a complicated question. Jobs get rolled in and out depending on job memory and processing needs and a list of other state-of-the-system (SPS) factors. The macroscopic assumption is vastly simplified. Parameterized separately is the continuous load due to timesharing (workshop) users, load due to a data block transfer, and load due to program compilation or execution. These loads are percentages of each mainframe's capability. For example, the simulations in section 4 assign $20 \%$ for continuous workshop and operating system load, $10 \%$ for data block transfer load when such transfers occur, and from $15 \%$ to $70 \%$ for the various compilation and execution loads. Demand on memory is the main factor used in esimating these loads.
4) SPS response time--this represents the time it takes the SPS as a system to respond to internal device requests, and to communicate the external requests to its internal hardware. Thus, it is a macroscopic representation of SPS overhead time to requests and commands. It has not been determined if. such overhead is of a significant time span and has been assumed to be zero.

Clearly the simulation results rest heavily on these assumptions. If later, more detailed SPS simulation shows estimates to be incorrect it is reasonable to expect that changes could be made in the SPS design to compensate.

### 3.0 THE SIMULATOR AS A TOOL

This section describes the different levels at which the simulator can analyze a system configuration alternative.

### 3.1 Simulator Input

Three sets of data can be manipulated.

1) The system hardware, via a deck of card images, as described in the User Manual (Volume IV)--this deck allows the user to describe communications between stations, network versatility, and hardware duplication.
2) A set of input commands, another card deck, which represents a job workload--each system job is constructed from the following commands:
a) Transfer a file of blocks.
b) Send a very short message ( $=20$ usec).
c) Send a random length message ( 0 to 1000 usec ).
d) Seize FMP for processing.
e) Seize a percentage of SPS or GRF device for processing.
f) Change priority of input which follows.
g) Assign arrival time for the next job of the same class.

Typically from 10 to 20 input commands describe each job. Up to 99 jobs can be input in one simulation run.
3) A set of system parameters found at the beginning of the GPSS source code:
a) Hardware parameters - DSK specifications, channel rates, buffer sizes within the PDCs, and guidelines for hardware duplication.
b) Software and firmware assumptions - block size, interstation block transfer time statistics, workshop load on SPS and GRF devices, load demanded from SPS and GRF for block transfers.

Data sets 1) and 3) represent a system design to be tested. The user should be careful to understand that the simulated results represent the input designs and assumptions made in these data sets. Thus a large responsibility rests with the user: to understand and note the assumptions made, and to acknowledge the inherent shortcomings and analytical shortcuts that result. For example assigning one number, say $10 \%$, to represent the load put upon an SPS processor to transfer a 32 K -word data block is an oversimplification. The sensativity of the results to such a macroscopic parameter should be understood. Extrapolation of results can only be suggested after several similar simulations.

### 3.2 Two Simulators

There are two levels of simulation detail (see Volume IV). Source code SYS83 is a general purpose simulator which studies the whole system. In this code, system functions are modeled macroscopically: block transfers, seizing processors for seconds, contention by several/many jobs of the various system resources. Functional detail in the SPS, FMP, and GRF is no better than tenths of seconds. Detail in the DSK station is at the millesecond level (seek times, rotation rates).

Source code SYS82 models message transfers in much greater detail. Block transfers are done at a higher level of detail, i.e., each sector is modeled. Buffering of sectors in the PDCs is performed when station channel rates do not match, e.g., CYBER 175 and 819 disks. Thus the detail of the transfers is nearly at the 10 -microsecond level within the LCN. This simulator yields accurate macroscopic specifications used in SYS83 for block transfer times. In addition disk assignment algorithms within the DSK can be compared.

Thus though the two simulators are used in tandem, the responsibility of system analysis rests within SYS83.

### 3.3 Simulation Techniques

### 3.3.1 Full Run - Truncated Run

Simulation runs are performed in two modes.
Mode A: Simulation to completion of all jobs. This acts as a full master listing of an entire simulation. The system is potentially seen under several conditions: start up, smooth throughput, backed up, and clearing out at the end. Because of the random nature of job arrivals, the end of such a run can be greatly extended by a few jobs which arrived late. This situation distorts the mean statistics of the simulation -- in some case quite badly ( $-20 \%$ ). Yet such an overview of the run is a good start.

Mode B: Simulating until an initially stipulated number of GPSS transactions have been sent. This allows the user to stop the simulation while the system is in a specific condition. For example, with the help of the first master run (a Mode A simulation), the user can stop the simulation at virtually the minute chosen -- perhaps after $85 \%$ of the jobs have run and another $10 \%$ are at various stages of completion. In this case, the mean statistics are more valid. See User's Manual (Volume IV, Division 1) for implementation.

In some cases, there may be an anomaly in stopping the system in full flight in that some "facility" output may be incorrect, i.e., "average utilization" and "average time per transaction" for the FMP and SPSs may be wrong. In this case a simple hand calculation from the individual job histories should yield both the average sieze time per transaction (A) and the number of transacations (N). Then, with the simulator stop time (S) from the chronological history, the facility utilization is easily determined.

Average utilization $=\left(A^{*} N\right) / S$
A scan of the simulation history should make clear whether this correction is needed.

### 3.3.2 Light Load - Heavy Load

Experience has shown that the first step in an analysis of a given NASF design should be a set of simple problems. A general analysis involves an understanding of a complicated set of coupled resources and is simply not possible without first understanding some of the basic principles that come out of each system design. The light load - heavy load technique isolates the interconnection design. It assesses the potential of the trunk system and the conflicts therein.
A) Light Load

A light load is defined as a set of data transfers which take less time than the mean time between transfer requests. This can occur either because of a generally light workload, or because the workload is dominated by processing time. . In these cases disk conflicts., PDC busy replies, and trunk busy replies are rare. The user sees the network reaction in best case mode.

The relative interstation bandwidths, trunk loads, and PDC loads show clearly. Communications balancing problems should then become evident.

Appropriate data for these runs are parts of the workload to be modeled later, e.g., Model \#1 of the Ames Usage Model, alone. These runs can also be used to debug the workload data.
B) Heavy Load

A heavy load is a workload dominated by data transfers numerous enough to saturate at least part of. the LCN hardware. For example take the above light load input and decrease all the job interarrival times. Communications hardware with average utilizations of $70 \%$ or more are usually considered saturated or partially saturated. If potential bottlenecks in the LCN were not apparent before, they surely show here. More importantly though, the system shows just how heavy a load it can stand. The user should note decreased communication efficiency through a range of loads.

At this point system reconfiguration may be deemed necessary. If so, the light load - heavy load technique is begun again. By the time this process is complete the user should have a good understanding of a proposed system.

### 3.3.3 Key Diagnostics

A completed simulation yields a history of all the jobs executed. File transfers, processing times, start job and end job annotations are listed. A wealth of information is provided in these listings. These provide a feeling for how the system progresses through its various states. This running time history helps the designer isolate particular resource demand problems. System wide software algorithms, e.g., job priority assignments, can be assessed. When all the information is. digested, the simulation can be a powerful tool.

The general simulation results are summarized by a few key statistics. These can be scanned and digested easily.
a) FMP and SPS.utilizations. Hopefully the utilization of the FMP is quite high which, as noted in section 1 , is "the whole widea of the NASF system. On the other hand, the SPS should not be over-saturated. Overhead time due to job conflicts would slow down the feeding and finishing of jobs; system throughput would diminish.
b) FMP and SPS queues. The FMP may build up a significant job queue depth as long as the depth is not monotonically increasing in time. A steady state FMP queue will not slow down throughput. This is not true of the SPS queues. Such a queue represents an inability for the SPSs to keep up with the system demands; thus slower system startup, finish, turnaround, and throughput.
c) Trunk utilization. This should show if the LCN had any trouble keeping up with the message and data traffic workload.
d) Disk access time. The greater the number of cylinder seeks required, the more the data transfer overhead, i.e., the less efficient the DSK station. Such seeks are due to file request conflicts to the disks, a characteristic of a heavy communications load.
e) Mean block transfer time. Again this shows overhead due to heavy data transfer loads. In this case transfer time increases as a block awaits an open line on the network. File transfer times also show backups in extreme cases.
f) Throughput statistics. This is the bottom line of performance to every simulation. Did the system finish jobs as quickly as it gets them?

### 3.4 Trial Runs on Three Different LCNs

Three different NASF configurations, all with the same device class hardware, have been tested. Figures 3, 4, and 5 show the respective connection schemes for LCN1, LCN2, and LCN3 respectively. Trunks 1 through 4 are dedicated to FMP 《--> DSK transfers (if 4 are used). This bandwidth is a response to the principle that a pipe to/from the FMP always be clear. This includes the peak loads of checkpoint dumps, typically millions of words. Trunk 5 is for the high priority SPS <--> FMP messages and SPS <--> GRF data transfers (relatively rare). Trunks 6 and 7 (when used) are for SPS $\langle-\rightarrow$ DSK and GRF .<--> DSK data transfers. Clearly not all the network hardware is used in all three cases. The differences are simple: LCN2 has twice the SPS <--> DSK and GRF <--> DSK bandwidth of LCN1. LCN3 has the same SPS <--> DSK and GRF <--> DSK improvement, but half the FMP $\langle\rightarrow$ DSK bandwidth.


Figure 3. LCN1



Figure 5. LCN3

Block size： 64 sectors（each 512 words）， i．e．， 32 K words（64－bit）
CYBER PP rate： 12 Mbits／sec
Disk（8．19）transfer rate（sustained）： 21 Mbits／sec
Trunk rate：
50 Mbits／sec
GRF channel rate： $2 \mathrm{Mbits} \% \mathrm{sec}$
PDC buffer size for data transfers： 3 sectors
\％load on SPS device for data transfers：10\％
\％load on GRF device for data transfers：80\％

## LCN 1

8
Under relatively light loads（input $=1.4 \times 10$ words／hr）it
was immediately obvious that the SPS 〈－－＞DSK bandwidth is not balanced with that of the FMP＜－－＞DSK which is 5 times faster． Utilization of PDCs 14 and 18 ，the SPS＇s prime channels to DSK， is about twice that of all other PDCs．PDCs 1－7 and 21－29 and trunks 1－4 all have very comparable utilization averages．

Under heavier loads the SPS $\langle-\rightarrow$ DSK imbalance relative to the FMP＜－－＞DSK grows，until trunk 6 begins to saturate at around $70 \%$ ．At this stage the SPSs are utilized $19 \%$ and $14 \%$
respectively．With mean block transfer times doubled，the SPS
station is clealy I／O bound by trunk 6．At $4.8 \times 10$ word／hr all LCN hardware is less than $50 \%$ utilized except trunk 6 （71\％）． This data load exceeds the needs stated in the Ames Usage Model． Though trunk 6 is almost saturated the effective transfer rate is well within data throughput goals．

LCN2
Now trunk 6 and 7 share the SPS 〈－－＞DSK and GRF＜－－＞DSK load originally on trunk 6．Now SPS 〈－－＞DSK is only 2.5 times slower then FMP＜－－＞DSK．This alleviates much of the excessive load seen on trunk 6 of LCN1．Otherwise system
balance is identical to LCN1．At an input load of $4.3 \times 10$
words／hr（output $=1 \%$ of input； FMP processing times $=60$ sec／job Usage Model－Model非2），no part of the network has begun to saturate！Trunk 6 is about $50 \%$ utilized；trunk 7 is about $30-35 \%$ utilized．SPS utilizations have dropped to $7 \%$ and $3 \%$ respectively．Job throughput and turnaround times improve by about $10 \%$ for heavy workloads．

LCN 3
Cutting the FMP 〈－－＞DSK bandwidth in half（otherwise identical to LCN2）does not degrade system throughput or job turnaround at all．The LCN hardware now looks well balanced with the FMP and SPS having comparable bandpasses into the DSK．Even so，extra system reliability，versatility，and the possibility of larger checkpoint dumps prompt a preference for LCN2 over LCN3 though at slight extra cost．

Two to three hour slices of the NASF Usage Model (version 79.001, from NASA-Ames) have been simulated on the three proposed systems described in section 3.4. The multi-job, highlevel simulator, SYS83 has been used.

Below, one example is presented in detail to show how simulation analysis may proceed. Several other examples are referenced in passing to help clarify basic points. It is important, though, to understand that these represent first passes at using simulation tools for the NASF. Much time and effort will be required for a complete analysis.

### 4.1 Translation of Usage Model into Workload Input for Simulator

The Usage Model translates fairly easily into simulation input. Model guidelines are followed as closely as possible with one exception. Model 4, "Complex Design Simulation"., jobs are assigned entirely to the prime shift instead of $1 / 2$ prime, $1 / 2$ night. This choice is made with the intent of loading toward worst-case simulation.
 eight Model 非4 jobs were simulated. This represents the average job load of 2 hours during the prime-time shift. Specifications are summarized below. Within each class of jobs the major commands that represent that class are listed with letter headings. For some letter headings job scenarios branch out into one or more variations. Such branching is represented by a multi-level number sequence added onto that letter (similar to section or paragraph heading numbers).

The letters represent major steps in a job flow. The first digit, if present after a letter, represents a branch or split of jobs in that class. The second digit, if present, indicates a sequence within the leg of the branch represented by the first digit.

The flow of a job through the system is actually very similar for all jobs. The variation is in the number and sizes of the files involved, and the processing time required. Figure 6 summarizes the basic scenario.


Notes:
(1) "Job execute" request enters NASF through SPS or GRF.
(1a) GRF files may go to SPS.
(2) SPS compiles source code and/or preprocesses input data.
(2a) Continue to (3).
(3) SPS and/or GRF files transfer to DSK.
(4) DSK stores input files; load moule and input data.
(5) Input files sent from DSK to FMP.
(6) FMP executes job.
(7) Result files sent from FMP to DSK.
(8) DSK stores result files.
(9) Some/all result files transfer from DSK to the SPS and/or GRF.
(10) SPS post-processes result files for user analysis.
(10a) Post-processing results may go to GRF for display/ analysis.
(10b) Continue to (11).
(11) Job returns to user at workstation; further analysis may proceed on graphics hardware, a local processor, or the SPS.

Figure 6. Typical Job Flow through NASF.

Model \#1 Method Development ( 59 jobs). These jobs have small processor demands (both SPS and FMP) but relatively large data bases. Typically this is a three million word job which runs in the FMP for 10 seconds, followed by the retrieval of a dump file or diagnostics.
a) Job execution request arrives; interarrival mean of 120 sec .
b) SPS compiles. prepared source code. $30 \%$ of one $C Y B E R$ mainframe seized for 1 sec ; then
c.1) No wait - go to step d) (1.4 of 49 jobs).
c.2.1) Wait 10 minutes ("think time") due to compilation error and recompile as in b) (for 45 of the 59 jobs); then
c.2.2) No further wait - go to step d) (30 of 45 jobs); wait 5 minutes ("think time") due to recompilation error and recompile as in b) (for 15 of the 45 jobs); then
d) Send File1 (load module - 1 block) to DSK; then
e) Send File1 from DSK to FMP, and
f) Send File2 (configuration - 1. block) from GRF to DSK; then
g) Send File2 from DSK to FMP, then,
h) Request FMP for execution. The FMP checks for arrival of files 1 and 2 and processes them for 10 sec as soon as allowable.
i.1.1) Send File9 (debug dump file - 90 blocks) to DSK (for 20 of the 59 jobs), then
i.1.2) Send File. 9 from DSK to. SPS (same 20 jobs).
i.2.1) Send File8 (edited results file - 2 blocks) to DSK (remaining 39 of the 59 jobs), then
i.2.2) Send File8 from DSK to SPS (same 39 jobs).
i.2.3) Request SPS processor for output/pictorial postprocessing. Seize $25 \%$ for 40 sec (same 39 jobs).
j) End job.

Model \#2 Code Development (23 jobs). Similar to Model \#1, but the processor demand and data base is larger. Half of the jobs require post-processing for graphics use. Typically an eight million word job which may run in the FMP for 60 seconds.
a) Job execution request arrives; interarrival mean of 300 sec .
b) SPS compiles prepared source code. $50 \%$ of one CYBER mainframe seized for' 1 second; then
c.1) No wait - go to step d) (6 of 23 jobs).
c.2.1) Wait 10 minutes ("think time") due to the compilation error and recompile as in b) (for 17 of the 23 jobs); then
c.2.2) No further wait - go to step d) (11 of 17 jobs); wait 5 minutes ("chink time") due to recompilation error and recompile as in b) (for 6 of the 17 jobs).
d) Send File1 (load file - 2 blocks) to DSK; then
e) Send File1 from DSK to FMP, and
f.1.1) Send File2 (configuration data - 2 blocks) from GRF to DSK (19 of 23 jobs), then
f.1.2) Send File2 from DSK to FMP (same 19 jobs)
f.2.1) Send File2 from GRF to SPS (4 of 23 jobs), then
f.2.2) Request SPS device for configuration manipulation. Seize $30 \%$ of one CYBER for 70 seconds; then
f.2.3) Send File2 (resulting transformed configuration data - 2 blocks to DSK; then
f.2.4) Send File2 from DSK to FMP (same 4 jobs).
g) Send File3 (grid data - 19 blocks) from SPS to DSK, then
h) Send File3 from DSK to FMP.
i) Request FMP for execution; FMP checks for arrival of files 1, 2, and 3 and processes them for 50 sec., as soon as possible.

| j.1.1) | Send File9 (debug dump - 125 blocks) from FMP to DSK (for 8 of 23 jobs), then |
| :---: | :---: |
| j.1.2) | Send File9 from DSK to SPS, and |
| j.1.3) | Send the rest of Fileg ( 125 blocks more) from FMP to DSK (same• 8 jobs), and |
| j.1.4) | Send this part of Fileg from DSK to SPS. |
| j.2.1) | Send File8. (edited result files - 6 blocks) from FMP to DSK (12 jobs of 23), then |
| j.2.2) | Send Filie8 from DSK to SPS (same 12 jobs). |
| j.2.3) | Request SPS for output/pictorial post-processing. Seize $70 \%$ of one CYBER for 120 sec . (same 12 jobs); |
| j.2.4) | ```Send File7 (output file - 4 blocks) from SPS to GRF (same 12 jobs);``` |
| j.3.1) | Send File8 (edited result files - 12 blocks) from FMP to DSK (remaining 3 of 23 jobs), then |
| j.3.2) | Send file from DSK to SPS (same 3 jobs). |
| j.3.3) | Request SPS for output/pictorial post-processing. Seize. $70 \%$ of one CYBER for 240 seconds (same 3 jobs), then |
| j.3.4) | Send File7 (output file - 8 blocks) from SPS to GRF (same 3 jobs). |
| k) | End job. |

Model $⿰ ⿰ 三 丨 ⿰ 丨 三 一$ 3 Simple Design Simulation（6 jobs）．Engineer＇s simulation job． $20 \%$ of the jobs have full blown oonfiguration and grid files．Some pre－processing short FMP run－60 seconds． $20 \%$ restart．Heavy post－processing for graphical use．．
a）Job execution request arrives；interarrival mean of 24 minutes．
b．1）Proceed to step c）（5 of 6 jobs）．
b．2）SPS compiles prepared source code．70\％of one CYBER seized for 2 seconds（ 1 of 6 jobs）．
c）Send File1（Load file－ 4 blocks）to DSK，then
d）Send File1 from DSK to FMP．
e．1．1）Send File2（configuration patch－ 4 blocks）from the GRF to the SPS，（3 of 6 jobs）．
e．1．2）Request SPS for configuration and grid manipulation．Seize $70 \%$ for 240 sec （same 3 jobs）， then
e．1．3）Send File3（transferred patch and grid data－ 92 blocks）from SPS to DSK（same 3 jobs）；then
e．1．4）Send File3 from DSK to FMP．
e．2．1）Send File2（patch and grid setup－ 4 blocks）from SPS to DSK（3 of 6 jobs）；then
e．2．2）Send File2 from DSK to FMP（same 3 jobs）；then
e．2．3）Request FMP processing of patch and grid manipulation．Seize FMP for 10 seconds（same 3 jobs）．
f）Request FMP processing for flow code for 60 seconds．
g．1）No File9，go to step h）（3 of 6 jobs）．
g．2）Send File9（restart file－ 220 blocks）to DSK for temporary storage（1 job of 6）
g．3．1）Send File9（raw results－ 150 blooks）to DSK（2 of 6 jobs），then
g．3．2）File9 from DSK to SPS（1 of 6 jobs）．
h) Send File8 (edited results file - 3 blocks) to DSK, then
i) Send File8 from DSK to SPS.
j) Request SPS processing of file 8 for pictorial results. Seize $60 \%$ of one CYBER for 200 sec .
k) Null - reserved for Model 非.

1) Send File7 (display file - 2 blocks) from SPS to GRF for graphics study.
m) End job.

Model \#4 Complex Simulation Design (7 jobs). Engineer's simulation jobs requiring significant processing time -- 10 minutes in the FMP, several minutes in the SPS for pre-processing or post-processing or both. This model has the heaviest total demand of time on the FMP ( $6.5 \mathrm{hrs} / \mathrm{day}$ ).

Steps are identical to Model \#3 except as noted below.

| a) | Mean job interarrival time: 15 min. (900 |
| :---: | :---: |
| f) | Request FMP processing flow code for 600 seconds. |
| g.2) | Restart file - 310 blocks. |
| h) | Edited results file - 8 blocks. |
| j) | Seize $60 \%$ of one CYBER for 240 sec . |
| k) | Seize $60 \%$ of one CYBER for 240 sec . again. |
| 1) | Display file - 10 blocks. |

The above represents two simulated hours of input during prime shift. Mean job interarrival times are based on the assumption that the prime shift is 10 hours long. Estimates for SPS processing times have been made for code coinpilation, pre- and post-processing tasks. The main difficulty is counting the number of calculations required. An estimate of compilation rates from past experience with CYBER 170 family is 100,000 lines of code/min. For grid and/or patch generation/ 10
transformation, 10 calculations per 10 points is assumed (quoted from Ames' Usage Model). For post-processing of result 7
files $6 \times 10$ calculations per 4000 point contour plot (i.e., one frame) is assumed (again from Ames' Usage Model). Each CYBER 175 6
is estimated to run at $3 \times 10$ calculation/sec when the entire job is in its central memory. A $50 \%$ reduction in speed is assumed if only $1 / 2$ the job is in the central memory at a given time.

At several junctures of the input design, estimates were assigned. Examples are $20 \%$ continuous load on each SPS mainframe for workshop timesharing, high estimates for SPS computation times, and more restart and raw result files than expected on average (by $60 \%$ ). Thus, it appears likely that a heavier than average primeshift workload, is being simulated, though not an unlikely load.

### 4.2 An Example Simulation

A truncated run (Mode B), lasting 7650 seconds, is described here as a typical example of a system evaluation via simulation. LCN2 was used as the baseline configuration. The run follows all specifications listed in section 3.4 and section 4.1 with one exception: SPS compiling and processing times have been lengthened by $50 \%$ as a tradeoff for demanding half as great a processor resource load. For example, a 120 second job requiring $70 \%$ of an SPS processor is converted to a 180 second job requiring $35 \%$ of the processor. This allows much greater processing versatility within the SPS, but at the same time
implies a much greater SPS computation power．Specifically，
 section 1．1．1，now a pair of 4.5 to 5 －megaflop／sec machines are being modeled．This SPS＂horsepower＂paraneter is varied throughout the family of simulations from this high rate down to a minimum of $2 \cdot m e g a f l o p s / s e c$.

This simulation took approximately 700 CPU seconds on a Cyber 175．For a full listing of the results see appendix A．The amount of information contained in a run like this is quite large，though not unmanageable．With time it is reasonably digestible，as hopefully the reader will see in the following presentation．

## 4．2．1 Characteristics of the Run

## 4．2．1．1 Job Arrivals

Though job arrival specifications are quoted in the usage model， actual job arrival times have a random nature．Thus for a given run these arrival times are lnique．Figure 7 shows the job arrival rate distribution throughout the 2 hour simulation．This distribution is plotted by taking mean arrival rates each 15 minutes．The mean of 20.2 jobs per 30 minutes is $16 \%$ less than the 24 jobs expected；but by far the most demanding job class， Model ⿰⿰三丨⿰丨三一4＂Complex Design Sinulation＂，was fully represented． Perhaps a more characteristic number is the actual load demand for the FMP versus the expected demand：93\％．This is not to be confused with the FMP utilization．Arrivals of these demanding and important Model $⿰ ⿰ 三 丨 ⿰ 丨 三 八$ j jobs are marked by arrows in figure 7 ． Note－1）the close arrival of two of these jobs near clock time 1100 seconds，and 2）the arrival of three of these jobs within 10 minutes of each other at 1 hour wall clock time．


Figure 7．Job Arrival Rate

### 4.2.1.2 SPS Utilization

Figures 8 and 9 summarize the utlization of the SPS.resources due to program compilation and execution. Three characteristics of interest are clear:

1) The spacings in time of general loads shows the results of the job to device assignment algorithm. This algorithm is quite simple: When a job enters the NASF it is assigned to the SPS device whose central processor and memory are least loaded. Under equal loads SPS1 wins the honor. So, in general, without deference to looking ahead, the SPSs ping-pong the responsibility back and forth. In this run the SPSs complement each other well.
2) The sharp upward spikes seen, especially in the SPS1 utilization graph, show the quick response it has to demands. The general lack of broad blocks along the time axis for large loads represents very good CPU availability. Arrows point to times when a job enters the processor after having been blocked out. They are rare and short lived.

The continuous block of time at $20 \%$ represents the timesharing load.
3) The required load assumed for block data transfers, when they occur, is $10 \%$. The near absence of SPS utilization greater than $90 \%$ shows that file transfers are rarely held up by the CPUs.


Figure 8. Utilization of SPS1.
Arrows represent jobs which were originally blocked out of the processor.


Figure 9. Utilization of SPS2.
Arrows represent jobs which were originally blocked out of the processor.

### 4.2.1.3 The FMP Execution Queue

The driving philosophy for a well-used NASF is to keep the FMP busy. A view of the FMP execution queue is a good diagnostic for seeing if the workload is going to adhere to this point. Figure 10 shows the queue for this run. In general, the queue is well fed with a mean depth of 5.33 jobs.

The graph shows that, on this job mix, the FMP can keep up with the job load by emptying the job queue several times.

The driving force in building up the FMP queue is undoubtedly the beginning of an FMP execution on a Model 非 4 job. Arrows in figure 10 show these execution start times. The executions are 10 minutes long, and can easily cause the queue to grow by 10 jobs. Notice also how these Model 4 jobs have now spread out from one another in time This is inevitable, though the actual spacings can be considerably widened or partially thinned by using the job priority cards mentioned in section 3.1.2.

### 4.2.1.4 FMP Utilization

As the reader should be able to predict from seeing the FMP queue results, utilization of the FMP processor should be relatively high. Utilization is, in fact, a healthy $86 \%$. Please note figure 11. Startup time accounts for $5 \%$ of the unused resource; this overhead need not occur again during the day. Of the FMP time not used, $7 \%$ is due to the less than expected demand caused by the job arrival statistics (93\%). Thus at "simulation stop" the FMP is about, $7 \%$ behind. Such a number is somewhat sensitive to stop time; at 7000 seconds the number is about $4 \%$. Only duriag the period from 5400 to 5700 seoonds does the FMP load get "seriously" behind.

The reader should note that in appendix A (page 120) the FMP utilization statistic has been corrected by hand. This simulation run stopped in "full flight" (i.e., a truncated run) and thus the $46 \%$ FMP utilization printed seemed suspect. After checking individual job histories as described in section 3.3.1, this utilization was appropriately corrected. Under such circumstances other facility statistics must also be checked for consistency. In the case cited only the FMP utilization statistic was spurious.

### 4.2.1.5 Throughput and Turnaround Statisties

Figure 12, showing the throughput graph, reflects the results of the previous 5 graphs. In addition, it, shows vividly the one discouraging point in this example: job turnaround times are neither consistent nor very fast. With the throughput rates varying as they do here, it must seem at times as though the system has gone to sleep. The fact is that at these lull times in the throughput (clock time from 4000 to 6000 seconds), the


Figure 10. FMP Execution Queue. Arrows Mark the Beginning of FMP Execution on a Model 4 Job.



Figure 12. Job Throughput. Arrows Mark the Completion of Model 4 Jobs.

FMP，as shown in FMP queue，is very effectively running at full tilt．Throughput has slowed because three Model $⿰ ⿰ 三 丨 ⿰ 丨 三 一$ ． $40 b s, 1800$ FMP seconds，are being processed．This blocks out many jobs，as manifested by the growing queue．Near clock time 3000 and 7000 seconds notice how quickly the system finishes jobs after the FMP has stopped queuing them．This shows the rapid response of the communications network，DSK and SPS，and is very encour aging．

Arrows in figure 12 denote job completion times for the Model 4 jobs．Comparison with figure 7 shows the turnaround time for these jobs： $34.3 \pm 8.0$ minutes．This time includes 10 minutes of FMP processing and 13 minutes of SPS pre－and post－ processing．Such an impressive turnaround statistic is dearly paid for by the turnaround rate of the other three job classes． Note table 1 below．

Table 1．Turnaround Statisties

| Job |  |  |
| :--- | :--- | :--- |
| Class | Turnaround time <br> $\pm$ sample standard <br> Modiation | $18.5 \pm 8.7 \mathrm{~min}$ | | CPU time |
| :--- |
| required |
| （FMP and SPS） |

The final throughput mean for the run， 16.5 jobs $/ 30 \mathrm{~min}$（or 33 job／hr；see last page of listing in appendix A）is somewhat misleading for two rather coupled reasons．First，some ten jobs are well into the system and are partially completed．They are not included in this statistic，nor in the last point on the throughput graph．Second，a Model \＃4 job finisined EMP execthi．na ${ }^{3} t$ alook time 7533 seconds， 2 minutes before＂simulation stop＂， and has left a building FMP execute queue behind．Thus，just as an FMP blocking condition was showing its bad side，i．e．， diminished system throughput，the simulator stopped．

## 4．2．2 General System Response

During 2 hours of simulation forty－four Model 1，fifteen Model 2，four Model 3，and seven Model 4 jobs were begun and completed．An additional fourteen jobs had begun of which 10 were well on their way to completion．A total of approximately 8
$1.5 \times 1064$－bit words traveled within the LCN；in most cases 12
they traveled twice－both to and from the DSK．Over $6 \times 10$ floating－point operations were performed in the FMP and another 10
$5 \times 10$ floating－point operations within the SPS．These numbers，according to the Ames：Usage Model，represent a typical load during a two－hour period．

Aside from the FMP, no system hardware is backed up. The SPS mean utilizations are $47 \%$ for SPS 1, and $45 \%$ for SPS2. GRF utilization is entirely due to workshop usage. The bit-serial data trunks are in no case more than $10 \%$, utilized. PDCs ranged from $2 \%$ to $9 \%$ usage. The majority of disk seeks are sector selects or minimum cylinder selects. In short, there were no traffic complications due to data transfers. Files of up to 10 million words were transferred in 14 seconds (e.g., Job 非97; FMP --> DSK). The longest time for a file transfer was 100 seconds for 5 million words from the DSK to SPS (Job \#96).

### 4.2.3 Variations on the Example

### 4.2.3.1 Different Arrival Statistics

Simulations were run with a different set of actual arrival times; all interarrival mean times were left unchanged. Though basic results were the same, one point was greatly clarified. The arrival profile of Model 4 jobs is the driving force in the system utilization. When these jobs are sparse, the FMP utilization drops to a value necessary to meet needs. For example, in one case an arrival rate for Model 4 jobs, 3.0 jobs/hr (compared to $4.0 / \mathrm{hr}$ in the above example), resulted in an FMP utilization of only $70 \%$. In that case the FMP queue averaged only 2.3 jobs, and the system never had trouble keeping up - nor should it have.

### 4.2.3.2 Different LCN Geometry

Comparable, in some cases identical, workloads and arrival tines were run through LCN1 and LCN3. Since the LCN is so lightly utilized, no appreciable difference on system response is expected (see light-load/heavy-load description in sections 3.3 and 3.4). LCN 3 showed no difference in general system utilization. Throughput turnaround, and processor utilization were virtually identical. Utilization of Trunks 1 and 2 doubled to $4 \%$ along with the utilization of PDCs 4 and 5 . LCN1 showed slight degradation in throughput, and file transport times between the SPS and DSK. These transfer tines were increased because Trunk 5 has to service both SPS 1 and SPS? patins to the DSK. Trunk 6 utilization went up to $17 \%$. General system response was still very similar.

## 4．2．3．3 Effect of Change in SPS Performance and Load

The SPS processing load was arbitrarily decreased by $50 \%$ ．This lead to an interesting result：The only differences in system response were：

1）SPS utilization diminshed by the appropriate amount，

2）turnaround times．were slightly decreased．
Throughput，FMP utilization，and the FMP queue profile were essentially unchanged．This shows that the SPS setup and post－processing work is entirely hidden by the FMP workload． Support work done elsewhere in the NASF does not degrade the efficiency of the system，except during systern startup．Thus， in this case，system efficiency is synonymous with FMP utilization．

SPS performance was then degraded by varying amounts： 2 devjees of 3 megaflop／sec computation rates each，then computation rates of 2.5 megaflops／sec each and 2.0 megaflops／sec each．Though the demands on the SPS devices increased，the SPS did not saturate．For 2．0－megaflops／sec devices the utilization bulged to $71 \%$ for the lead SPS device and $66 \%$ for the backup．The average depth of the SPS execution queues were 1.1 and 0.7 jobs， respectively．The job throughput and FMP utilization were essentially unchanged．Average turnaround per job was longer due to the slower pre－and post－processing times，as expected． The interesting point is that no added system overhead was injected．The SPS work is still almost entirely hidden under the FMP work．As the SPS computational horsepower falls below 2 megaflops／sec this will no longer be true，for at that point saturation is about to set in，

## 4．2．3．4 Effect of Changing the Priority of a Job Class

One simulation was done with Model \＃4 jobs given lowest priority relative to the other prime－shift users．Two obvious results were noticed．Total job throughput went up by some $30 \%$ at the expense of Model 非 4 throughput which decreased by about $20 \%$ ． Turnaround times for jobs of models \＃1，\＃2，and 非3 were also improved significantly at the expense of Model \＃4 jobs．Demands on the SPS data trunks，PDCs，and disks were unchanged．But the FMP utilization，and thus the use of the system，decreased to $75 \%$（down from 86\％）．Whether this trend is general or is a statistical fluctuation that would disappear with several runs is simply not clear at this tine．The tradeoffs between the Model 非 4 jobs and the other joos is clear，though more runs should yield more accurate performanea tradeofis．In any case， such hypotheses are easily tested via liberal use of the simulator．
4.2.3.5 Simulation of the Night Shift Workload

A two-hour simulation of the night shift (Usage Model job classes 5, 6, and.7) has also been done. All data transfers were modeled (e.g. 75 million-word restart files). Postprocessing of the Model 非6 and Model 非7 "contour movies" was left out. A simple calculation shows that during a 10 -hour night shift, the movie processing demand on the SPS is simply too great:

7
$\frac{\text { movie flops }}{\text { night }}=\frac{7500 \text { frames }}{\text { night }} \times \frac{10,000 \mathrm{pts}}{\text { frame }} \times \frac{6 \times 10 \mathrm{flops}}{4,000 \mathrm{pts}} \times \frac{\text { aight }}{10 \mathrm{hr}}$. 12
$=1.125 \times 10 \mathrm{flops} / 10 \mathrm{ar}$ or $31.3 \mathrm{megaflops} / \mathrm{sec}$.
Obviously this user demand causes concern for $i t$ ontoalances any other SPS demand stated by the Usage Model by at least an order of magnitude. A system that meets this demand outside of the FMP will have excessive horsepower for the renainder of the day. And it is not clear at this time that such a task does not belong in the FMP.

With this one point aside, the system had no problem with the traffic flow problem.

| FMP Utilization | $87 \%$ |
| :--- | :---: |
| SPS1 Utilization | $12 \%$ |
| SPS Utilization | $2 \% /$ |
| Trunk 6 Utilization | $11 \%$ |
| All other-trunks | $1 \%$ |

In other words the system was wide open and the FMP was well stocked with work.

### 4.3 Conclusions

This family of examples addresses only the first pass of system evaluation. The results pose new questions. For example at what cost can turnaround times for the more numerous "simple" problems be shortened? Can throughput be more consistent in tine? How heavily utilized can the SPSs be before system efficiency is significantly diminished? Clearly, the system designers could simulate for years playing with job priority algorithms, workload arrival scenarios, keeping the number of jobs executed unchanged. At some juncture the tradeoffs demanded by the needs of different job classes will be assessed, invariably yielding some new questions and tradeoffs.

Some truths have come from this first pass simulation which are maxims to the Ames concept of the task to be done and the machine that is to do them.

1) The NASF as proposed in this paper has no system bottlenecks other than the FMP. Job functions peripheral to the FMP processing are entirely hidden by the FMP operations so long as. the workload is sufficiently heavy to keep it busy. Thus, for the present Usage Model, system efficiency is the FMP utilization.
2) Twenty hours of FMP computations cannot, in practice, be done in twenty hours. Aside from system startup and wind-down time, aside from breakdown possibilities, if the FMP ever goes idle during the running day that time is lost, never to be recaptured. According to the present Usage Model a certain random nature in job arrivals
seems evident. This simply implies a high probability that at some time (s) during the day the job arrival profile and the load within the system will not keep the FMP busy.
3) To obtain high throughput the FMP must be highly utilized. In practicality this requires substantial queuing activity for the FMP, i.e., a healthy FMP execute queue. Thus for a system workload which demands high FMP utilization, there is a tradeofe between throughput and job turnaround. Simulation shows that the greater the extent to which the FMP queue is kept nonempty, the greater the system throughput. Such queiding clearly cuts down on job turnaround.

A particular class of jobs may occupy the queue for relatively long periods of time (a good possibility) or the general user community (average) job may occupy it for a somewhat shorter period. Unfortunately, high throughput for one class of jobs may create long turnaround for jobs of another class. Conversely, large volume of short turnaround jobs decrease FMP utilization and throughput.

This simulation has also identified some possible system difficulties. Though the system concept seems very encouraging, playing the devil's advocate is always fun: two thorns are foreseeable. First, results to date are based on a very sketchy understanding of the capabilities of the SPS system and the load it must carry. Assumptions and guesses have been clearly stated in sections 2, 3, and 4, but it is conceivable that they are not better than $50 \%$ accurate and perhaps worse. Second, what if one SPS device should break down? This is certain to occur on occasion. According to present simulation runs, one SPS could not handle the entire load. It could, however, do a reasonable job if the workload were diminished somewhat. Simulation of such a situation remains to be done.


SUMMARY OF SYSTEM CONFIGỤRATION INPUT DATA

| 1. | 01，14，1，23，1，6，1 | FEI／OISKı |
| :---: | :---: | :---: |
| 2. | 01，13，2，23，2，7，2 |  |
| 3. | 01，13，1，22，2，513 |  |
| 4. | 02，14．1，25，1，0，1 | FEIMOISK 2 |
| 5. | 02，13，2，25，2，7，2 |  |
| 6. | 02，13，1，24，2，5，3 |  |
| 7. | 03，14．1，27．1，6．1 | FEI／OrSk 3 |
| 8. | 0．1，13，2，27，2，7．2 |  |
| 9. | 03，13，1，26，2，513 |  |
| 10. | 04，14，1，29，1，601 | FEI／OISK 4 |
| 11. | 04，13．2．29，2，7．2 |  |
| 12. | 04，13．1．28，2，6，3 |  |
| 13. | 01，18，1，23．2，7，1 | FE2／DISK 1 |
| 14. | 01，17，2，23，1，0，2 |  |
| 15. | 01，17，1，22，2，b，3 |  |
| 16. | 02，18，1，25．2，7，1 | FE2／OISK 2 |
| 17. | 02，17，2，25，1，6，2 |  |
| 18. | 02，17，1，24，2，513 |  |
| 19. | 03，18，1，27，2，7，1 | FE2／DISK 3 |
| 20. | 03，17，2，27，1，0，2 |  |
| 21. | 03，17＋1，26．2，5，3 |  |
| 22． | 04，18，1，29，2，7＋1 | FE2／U1SK 4 |
| 23. | 04，17，2．29，1，602 |  |
| 24. | 04，17，1，28，2，5，3 |  |
| 25. | 11，04．1．22．1，101 | FMP／DISK 1 |
| 26. | 11，02，1，22，2，5，2 |  |
| 27. | 11，02，2，23，2，7，3 |  |
| 28. | 12，05，1，24＊1，2，1 | FMP／OISK 2 |
| 29. | 12，02，1，24，2，502 |  |
| 30. | 12，02．2，25，2，7，3 |  |
| 31. | 13，06，1，26，1，3，1 | FMP／DISK 3 |
| 32. | 13，02．1，26．2，5．2 |  |
| 33. | 13，02，2，27，2，7，3 |  |
| 34. | 14，07，1，28，1，401 | FMP／OISK |
| 35. | 14，02，1，28，2，5，2 |  |
| 36. | 14，02，2，29，2，7，3 |  |
| 37. | 10，13．1，0101，5，1 | FE1／FMP |
| 38. | 10，13．1，02，1，502 |  |
| 39. | 10，14，1，01，2，6，3 |  |
| 40. | 10，17，1，01，1，501 | FE2／FMP |
| 41. | 10，17，1，02．1，5，2 |  |
| 42. | 10，10．1．02．2，703 |  |
| 43. | 21．37．2，23，2，1．1 | QRF1／DISK1 |
| 44. | 21．37．1．22，2，5，2 |  |
| 45. | 22，37，2，25，2，7，1． | GRFI／DISK2 |
| 46． | 22，37，1．24，2，b＋2 |  |
| 47. | 23，37，2，27：2，7：1 | GRF1／DISK3 |
| 48. | 23，37，1，26，2，b，2 |  |
| 49. | 24，37，2，29，2，7，1 | GAF／OISK4 |


| 50. | 24．37，1，28，2．b．2 |  |
| :---: | :---: | :---: |
| 51. | 21，38，2，23，1，601 | GRF2／OISK1 |
| 52. | 21．38，1，22，2，6．2 |  |
| 53. | 22，38，2，25，1：6．1 |  |
| 54. | 22，3日，1．24，2，542 |  |
| 55. | 23，38，2，27，1，6＋1 |  |
| 56. | 23，38，1，26，2，b，2 |  |
| 57. | 24，34，2，29，1，6，1 |  |
| 58. | 24，34．1，20．2，5，2 |  |
| 59. | 20．13，1，37，1，5，1 | FEI／GHF： |
| 60. | 20，13，2，37，2，7，2 |  |
| 61. | 20，17，1．37，1，511 | FER／ORF1 |
| 62. | 20，14：1，37．2，7．3 |  |
| 63. | 20，13，1，38．1．5，1 | FE1／GRF2 |
| 64. | 20，14，1，38，2，6，3 |  |
| 65. | 20，17，1．38，1，bol | FE2／GRFa |
| 66. | 20，17，2，38，2，6．2 |  |

Summary of message thaffic input data
$10,0,120,2,0,0$
－10，61，1，15，0，－600
$10,61,1,15,0,-1$
$11,10,0,1,1=0$
12，21，0，0，1，－1
12，10，0，1，1，w1
$10,60,10,0,2,-1$
18，10，0，0，21－1
$18,1 \cdot 0,1 \cdot 2,-1$
$10,61,60,12,110$
$10,0,0,0,010$
$10,0,0,0,010$
$20,0,120,1,0$
20，61，1，15，0，0000
20．61，1，15，0，－300
16．20．61，1，15，0，－1
17．21．1．0．0．1．－1
18．21．100011．1．0
19．22，21：0，0，1，－1
21：26，10．0．1，10－1
22． $24,10,0,0,2,-1$
23．28．1，0．1．2，－1
24． $20.61,60,12,1,=1$
25．20．0，0，0，0．0
26． $30,0,120,1,0,0$
27． $30,61,1,15,0,-1$
29．31，10，0，1，101
29．31，10，001，1，0
31．32，10，0；1：1：－1
32．30，60．10．0，2，－1
33． $34,10,0,0,2,-1$
34．$\quad 38,1,0,1+2,-1$
35． $30,61,60,12,1,-1$
36． $40,0,120,1,0$.
38．40，61，1，15，0，－600

MODEL I－INTERARRIVAL TIME MEAN OF 120 SHCOODIOO ISEC ABORTED SPS COMPIL＇N WAIT 10 MIN．AOMIIN SPS COMPILATION OF SOURCE CODE－1 SFC．DOMI20 FILEI DSK $\rightarrow \rightarrow$ FMP． 000130 IIE OSK＜－F FMP：
FILE2 DSK＜－－＞FMP． FMP FLOW CODE PROCESSING－ 10 SFC．noajin
 SPS PUST PRUCESSINGI40 SEC－25\％IGAM． 000200 nun 190 JUR RUN COMPLETE： ISEC ABOHED SPS COMPIL＇N－WAIT 10 MIY．OOOZ30 ABORTED CUMPILATION AGAIN－WAIT 5 MTN．DOOP40 SPS COMPILATION OF SOURCE CODE－I SEG． 000250 FILEI－LOAD MODULE 15K：SPS 《－－＞DSK． 000260 FILEI DSK＜$\rightarrow$ F FMP
FILEZ－CONFIGATION iOK：GRF＜n－＞DSK．OOD280 FILE？DSK 《－．${ }^{\circ}$ FMP．
FMP FLUW CODE PROCESSING 10 SEC．
FILEB－RESULTS FILE 60K：FMP＜w－＞OSK． 000300
FILER DSK＜－m＞SPS． 000320
SPS POST－PROCESSINGi40 SEC－258 LOAD．DOD330
JOA RUN COMPLETE， 000340 MODEL I－INREHARRIVAL TIME MEAN OF 120 SF．COONB50 SPS COMP ILATION OF SOURCE COOE AF SFC．OOO360 FILEI－EOAD MODULE 15K：SPS＜－＞＞DSK． 000370
 FILE2 DSK＜－＞＞FMP． 000400 FMP FLOW CODE PROCESSING－ 10 SEC． 000410 FILEg－RESULTS FILE GOK：FMP 《－－＞DSK． 000420 FILEA DSK＜－M＞SPS， 000430 SPS PUST－PKOCESSING140 SEC－25\％LOAD． 000440 HODEL I－INTERARRIVAL YIME MEAN OF 120 SECOOO460 1 SEC ABORTEO SPS COMPIL＇N－WAIT 10 MIN． 000470

SPS CUMPILATION OF SOURCE CODE－1 SEC ..... 00048000490 FILEI DSK＜－${ }^{\circ}$ FMP． ILE CONFIGATION JOK：GRF 《nc OSK． FILEL OSK 〈ー－＞FMP． FMP FLON CODE PROCESSING－ 10 SEC， FILE9－DEBUG DUMP 3MI FMP 《－N． $\begin{array}{ll}\text { FILES DSK } \\ \text { SPS POST－PROCESSINGi40 SEC－25\％LOAD．} & 000550 \\ 000560\end{array}$ 000520
000530 000530
000540 000540 JOB RUN COMPLETE．$\quad 000570$ ODEL I－INTERARRIVAL TYME MN OF 120 SEC000SB PS RECOMPILATION OF SOURCE CODE＊I SEC． 000600 FILEI－LOAD MODULE $15 K$ SPS 《－＊＞DSK． 000610 FILE1 DSK 《＊～＞FMP．
ILEZ－CONFIGATION IOKI GRF $\rightarrow>$ DSK． FILET DSK＜－ $\cos$ FMP．
MP FLOW CODE PROCESSING－ 10 SEC． FILEQ $\rightarrow$ RESULTS FILE $60 K 1$ FMP 10 SEC． 0 OSK， 000650 FILEB DSK 《－W）SPS． SPS POST－PROCESSINGI4O SEC－25\％LOAD． JOB RUN COMPLETE．
RRIVAL 000690
1 SEC AHORTED SPS COMP MEAN OF 120 SEC000700 BORTED COMPILIN AGAIN．WAIT 5 MIN．YHOOO710 SPS RECOMPILATION OF SOURCE CODE－1 SEC． 000730 FILEI－LOAD MODULE 15K！SPS 《－ल＞DSK． 000740 FILE1 DSK＜－$\rightarrow$ FMP． FILEL－CONFIGATION 10K：GRF 《～n＞OSK． FMP FLOW CODE PROCESSING－ 10 SEC ． 000740 ILE8 OSK＜＜m＞，SPS．
SPS POST－PROCESSING：40 SEC－25\％LOAD． JOB RUN COMPLETE． hodel l－interarrival time nean of 120 SECOOOB30 SPS CUMPILATION OF SOURCE CODE－1 SEC． 000840 ILEL－LOAD MODULE 15K：SPS 《－m DSK． 000850 ILEI DSK 《—O FMP．
ILE2－CONFIGATION IOK：GRF＜－－＞DSK． MP FLOW CODE PROCE
FILEB－RESUL TS FILE 60KI FMP $\langle-\rightarrow$ DSK FILEB DSK＜－－＞SPS．
SPS POST－PFOCESSING140 SEC－25\％LOAD． 000920 JOR RUN COMPLETE，$\quad 000930$ MODEL I－INTERARRIVAL TIME MEAN OF 120 SEC000940 I SEC ABORTED SPS COMPILIN－WAIT 10 MI 000950 SPS RECOMPILAIION OF SOURCE CODF－I SEC． 000960 FILEI OSK 《CO FMP． 000980
000990 FILE2－CONFIGATION IOK：GAF＜－A DSK． ILEZ DSK 《－C＞FMP． 001000 FMP FLOW CODE PROCESSING－ 10 SEC． 001010 FILE9－DEEUG DUMP FILE $3 M I$ FMP $\Leftrightarrow$ DSK． 001020 FILE9 DSK＜－C）SPS． JOB RUN COMPLETE． 001050 MODEL I－INTERARRIVAL TIME MEAN OF 120 SECOOIO60 1 SEC ABORTED SPS COMPILIN－WAIT 10 MIOO1070 $\begin{array}{lll}\text { SPS RECOMPILATION OF SOURCE CODE－1 SEC．} & 001080 \\ \text { FILEI－LOAD MOOULE } 15 K: ~ S P S ~<~ & \text { OS OSK．} & 001090\end{array}$ ILEI－LOAO MOOULE 1SK；SPS 《－－＞DSK． 001090
F：LEZ－CONFIGATION IOKI ORF＜－－＞DSK．

| 103. | 92，10，001：1：－1 |
| :---: | :---: |
| 104. | 90，60，10，0，2，－1 |
| 105. | 90，10．0．0，20－1 |
| 106. | 98，800，1，2：－1 |
| 107. | 90，61，60，12，1，－1 |
| 108． | 90，0．0．0，0，0 |
| 109． | 100．0．120．1．000 |
| 110. | 100，61．1．15．0．－600 |
| 111. | 100，61，1，15，0，－300 |
| 112. | 100，61，1，15，0，－1 |
| 113. | 101．10000．1．－1 |
| 114. | 101：1000，1：1，0 |
| 115. | 102，21，000，1，－1 |
| 116. | 102，10，0，1，1，01 |
| 117. | 100，60，10，0，2：－1 |
| 118. | 108，10．0：0．2：－1 |
| 119. | 109．1－0，1，2，－1 |
| 120. | 100．61，60．12，10－1 |
| 121. | 1000000000 |
| 122． | 11000，120．1．000 |
| 123. | 110，61，1，15，0．－1 |
| 124. | $11101 / 000110-1$ |
| 125. | 111010．0．1．1，0 |
| 126. | 112，21．0．0．1，－1 |
| 127. | 112，10，0，1：1：－1 |
| 128. | 130，60，10，0，20－1 |
| 129. | 1191000．0，901－1 |
| 130. | 119，1，0，1，90，－1 |
| 131. | 110，61，60，12．1，－1 |
| 132. | 110．0．0．0．0．0 |
| 133. | 120，0．120，1，0，0 |
| 134. | 120，61，1，15，01－600 |
| 135. | 120，61，1，15，0，－1 |
| 136. | 121，10000，1，－1 |
| 137. | 121．100011．1．0 |
| 138． | 122．21：0，0．1，－1 |
| 139. | 122，10，0，1，1，－1 |
| 140. | 120，60，10，0，20w1 |
| 141. | 129010．0，0，90：－1 |
| 142． | 1290100，1，90，－1 |
| 143. | 120，61，60，12，1，－1 |
| 144. | 120，0，0，0，0，0 |
| 145. | 130．0．120．1．0．0 |
| 146. | 130，61，1，15，0，－600 |
| 147. | 130．61．1．15．0．－1 |
| 148. | 13101000010－1 |
| 149. | 131．10，0．1．1．0 |
| 150. | 132，21：0，0．1，－1 |
| 151. | 132，10，0．1．10－1 |
| 152． | 130，60，10，0，20－1 |
| 153. | 130，10，0，0，2，－1 |
| 154. | 138，1－0，1，2，－1 |
| 155. | 130，61，60，12，1，－1 |
| 156. | 130，0，0，0，0，0 |
| 157. | 14000．120．1．0．0 |
| 158. | 140，61．1，15，0，－600 |
| 159. | 140，61，1，15，0：－300 |
| 160. | 140．61，1，15，0．e1 |
| 161. | 141，100，0010－1 |
| 162. | 141，100011，100 |
| 163. | 142，21，0，0，1，m1 |
| 164. | 142010．0．1，1，－1 |
| 165. | 140．60，10，0，20－1 |

FILEZ DSK 《＊$\rightarrow$ FMP． FMP FLOW CODE PROCESSING－ 10 SEC． FILEEBRESULTS FILE GOK：FMP＜n－s DSK． FILEB DSK＜－O＞SPS． JOB RUN COMPLETE．

001120

G40 SEC－25\％LOAD． 001170 MODEL I－INTERARRIVAL TIME MEAN OF 120 SE001180 1 SEC ABORTED SPS COMPIL＇N．WAIT 10 MOO1190 ABURTED COMPILIN AGAIN－WAIT 5 MIN：YOO 1200 SPS RECOMPILATION OF SOURCE CONE－ 1 SEC．OOI210 FILEI－LOAD MODULE 15K！SPS 〈m＂＞OSK
FILEL DSK＜－M FMP
FILE2－CONFIGATION IO
FILEZ OSK＜$\rightarrow$ ㄱ FMP．
001250 FMP FLOW CODE PHOCESSING－ 10 SEC． FILEBARESULTS FILE $60 K 1$ FMP＜$\rightarrow \infty$ DSK． FILEA OSK＜－$->$ SPS．
SPS POST－PROCESSING：40 SEC－25\％LOAD． 008 RUN COMPLETE．
MODEL I－INTERARRIVAL TIME MEAN OF MODEL I－INTERARRIVAL TIME MEAN OF 120 SEOOI310 FILEI－LOAD MODULE SOURCE CODEWI SEC， 001320 FILEI DSK＜－W力 FMP．SK 《O－＞DSK。 FILER－CONFIGAYION IOK：GRF 《－m DSK。 FILEE DSK＜AS FMP．
HP FLOW CODE PROAESSINO 001350 MP FLOW CODE PROCESSING＝ 10 SEC． 001360 FILE9－DEBUG DUMP FILE 3M；FMP 《－－＞OSK． 001380 $\begin{array}{ccc}\text { FILE9 OSK 《－－＞SPS．} & & 01390 \\ \text { SPS POST－PROCESSING } 40 \text { SEC－25\％LUAD．} & 001400\end{array}$ JOB RUN COMPLETE． 001410 MODEL 1－INYERARRIVAL TIME MEAN OF 120 SE001420 1 SEC ABORTED SPS COMPILIN－WAIT 10 MOO1430 SPS RECOMPILATION OF SOURCE CODE－I SEC． 001440 FILEEL－LOAD MODULE 15K：SPS 《－＞＞DSK 001450 FILEL DSK＜－$->$ FHP． 001460 FILER－CUNFIGATION 10K：GRF $\Leftrightarrow->$ DSK． 001470 EMP FLOW CODE PHOCESSING－ 10 SEC． 001490 FILE9－DEBUG DUMP FILE 3MI FMP＜－m＞DSK． 001500 FILE9 DSK 《m－＞SPS．， 001510 SPS POST－PHOCESSING14O SEC－25\％LOAD． 001520 JOB RUN COMPLETE． 001530 MODEL I－INTERARRIVAL TIME MEAN OF 120 SEOOI540 SPS REC ABORTED SPS COMPILTN－WAIT 10 MOOI550 FILEI－LOAD MODULE 15K；SPS \＆w OP DSK． 001570 FILEI DSK＜－W＞FMP．
FILE2－CONFIGATION IOKI GRF 《－m＞USK。 001590 FILEZ DSK＜wa＞FMP． FMP FLOW CODE PROCESSING－ 10 SEC． FICRE CODE 001610 FILEE－RESULTS FILE 6OK：FMP 《－m DSK． 001620 SPS POST－PHOCESSING14O SEC－25\％LOAD． 00163 JOB HUN COMPLETE． 001650 MODEL I－INTERARRIVAL TIME MEAN OF 120 SEQOIG60 1 SEC ABORTED SPS COMPIL＇N－WAIT 10 MOO1670 ARURYED COMPIL＇N AGAIN ．WAIT S MIN，R001680 SPS HECOMPILATION OF SOURCE CUDE－I SEC． 001690 FILEI－LOAD MODULE 1SKI SPS＜m＝＞OSK． 001700 FILEA DSK＜－c＞FMP．
FILEZ DSK＜－w FMP 001720
FMP FLOW CODE PHOCESSING－ 10 sEC ． 001740

ILEE－RESULTS FILE GOK：FMP 《－－＞DSK ..... 001750 FILEE DSK＜$\rightarrow$ SPS． JOB HUN COMPLETE． 40 SEC－25\％LOAD． 001770 MODEL I－INTERARRIVAL．TIME MEAN OF 120 SEOO 1740 SPS COMPILATION OF SOURCE CODE－1 SEC． 001800 FILEI－LOAQ MODULE 15KI SPS 《 $\rightarrow-\rightarrow$ DSK． 001 DIO FILEI DSK＜－w FMP．n01日20 FILEE－CONFIGATION IOK；GRF＜－M＞USK． 001830 ILE2 OSK 《－W＞FMP．
CESSING－ 10 SEC． FILE日－RESULTS FILE SOKI FMP＜－－＞OSK，NO1850 FILEB DSK＜－T＞SPS， SPS POST－PHOCESSING：40 SEC－25\％LGAD．nO1ABO JOB KUN COMPLETE．nOIB90 MODEL L－INTERAARIVAL IIME MEAN OF 120 SE001900 1 SEG．ABOHFED SHS COMPILIN－WAIT 10 MOO1910 SPS HECOMPILAYION OF SUUHCE CUNE－1 SFC．n01920 FILEI DSK＜$\rightarrow$ FMH． FILFZ－CONFIGATION IOK：GRF＜－－＞DSK， 001940 FILEC DSK 《－－＞FMP． 001960 FMP FLOn CUDE PHOCESSING＊ 10 SEC， 001970 FILE9－DEEUG DUMP FILF 3MI FMP＜－ल＞DSK．001940 FILEQ OSK＇＜－～＞SPS． 001990 SPS POST－PROCESS KUN COHPLETE SECOSW LJAD． 002000 MOD RUN COMPLETE INA IMME MEAN OF 120 SORODIO 1 SEC AGUHTED SPS COMPILIN－MAIT 10 MOOZO3 SpS hecumplaation of source conewl ser． 002040 FILEI－LOAO MOOULE 15K：SPS＜－＊＞DSK，n0205 FILFI OSK＜－－＞FMP．NOZ060 002070 CDSK．〈Na＞FMP FILEQUHESULTS FILE GOKI FMP 10 SEC． 002090 FILE日 DSK＜n－＞SPS． 002110 SPS PUST－pHUCESSING\＆40 SEC－258 LUAD． 002120 JOB RUN COMPLETE， MODEL I－INTENARRIVAL TIME MEAN OF 120 SEOO2140
1 SEC ABORTED SPS COMPIL．＇N－WATT 10 MOO2150 SPS KECOMPILATION OF SOURCE CONE－1 SEC．DO2170 FILEI－LUAD，MUDULE 15KI SPS 《－－＞DSK． 002180 FILEI OSK＜nw FMP． FILEZ－CONFIGATION LOKI GHF 《－－＞DSK． 002200 FILFE OSK＜．．．＞FMP．
FMLF FLOW CUDE FMP．
FMP FLOW CUDE PHOCESSING 10 SEC ． 002220 FILEHAHESULTS FILE 6OKI FMP＜n－＞DSK．nO2230 SPS POST－PHOCESSINGi40 SEC－25\％LUAD． 002240 PO HUT－PMPLETE MODEL L－INTEAARKIVAL TIME MEAN OF 120 SE002270 SPS COMPILATION OF SOUACE CODE－1 SEC． $0022 B 0$ FILEI－LOAD MODULE $15 K$ ；SPS＜mo＞OSK． FILEI OSK 〈－～＞FMP．
ATION $10 K$ ：GRF 《w－s DSK FILEE DSK＜A－＞FMP． 002290 $023!$ 002330 FILE9－DEGUG DUMP FILE 3M；FMP $\rightarrow$ DSK． 002340 FILEG DSK＜－＞SPS． SPS POST－PROCESSINGi4O SEC～25x LOAD． 00236 MODEL l－iNTERARRIVAL TIME MEAN OF izo sE002380

| 230. | $\begin{aligned} & 200,61,1,15,0,=60 \\ & 200,61,1,15,0,=1 \end{aligned}$ |
| :---: | :---: |
| 232. | 201．100，0．1，－1 |
| 233. | 201．10i0．1．1，0 |
| 234. | 202．2100011．－1 |
| 235． | 202010；0101．01 |
| 236. | 200，60，10，0．20－1 |
| 237. | 209，10，0，0．90，－1 |
| 238. | 209．1．0．1．90 |
| 239. | 200，61：60，12，10－1 |
| 240. | 200，0000，0：0 |
| 241. | 21000．120，100，0 |
| 242. | 210，61，1，15，0，－1 |
| 243. | 211，1，0，0．1，－1 |
| 244. | 211：10，0810100 |
| 245. | 212，21，0，0，1，－1 |
| 246. | 212，10，0，1，1，－1 |
| 247. | 210，60，10，0，2，－1 |
| 248. | 218010．0．0．2，－1 |
| 249． | 218：1：0，1，2：－1 |
| 250. | 210．61．60．12．1．－1 |
| 251. | 21000000．0．0 |
| 252． | 220，0，120．1．0．0 |
| 253. | 220，61，1．15，0，01 |
| 254. | 221．1．000．1．－1 |
| 255. | 221010，011，1，0 |
| 256. | 222，21，0．0．1，－1 |
| 257. | 222，10，0，1：1－01 |
| 258. | 220．60，10，0，2，m1 |
| 259. | 228，10，0，0．20－1 |
| 260. | 228，100，1，2，－1 |
| 261. | 220，61：60，12，10－1 |
| 262. | 220：000，0，0，0 |
| 263. | 230，0．120．1．0．0 |
| 264 | 230．61．1．15．0，－1 |
| 265. | 231，1＊010．1．－1 |
| 266. | 231，10．0．1．1，0 |
| 267. | 232021000．1：－1 |
| 268. | 232，10．0．101，－1 |
| 269． | 230，60，10，0，2，－1 |
| 270． | 238．10．0．0．2．－1 |
| 271． | 238．1，0，1，2，－1 |
| 272. | 230，61．60，12．1．01 |
| 273. | 230．0．0．0．0．0 |
| 274. | 240，0．120．1．0．0 |
| 275. | 240，61，1，15，0，m1 |
| 276． | 2410100001，－1 |
| 277． | 241．10，0．1．1．0 |
| 278． | 242，21：000．1．－3 |
| 279. | 242：10，0．1．1．－1 |
| 280. | 240，60，10，0，20－1 |
| 281. | 249，10，0．0．90，al |
| 282． | 2490100．1，90，－1 |
| 283. | 240，61，60，12，1，－1 |
| 284. | 240，0，0，0，0．0． |
| 285. | 250，0120．1．0．0 |
| 286． | 250．61，1，15，0，－600 |
| 287. | 250，61，1，15，00w1 |
| 288． | 251010000．1．01 |
| 289. | 251．10．0．1．1．0 |
| 290. | 2520100．0．1．－1 |
| 291. | 252，10．0，1．01，－1 |
| 292. | 250，60，10，0，20－1 |
| 293. | 258，10，0，0，2，－1 |

1 SEC ABORTED SPS COMPIL＇N－WAIT 10 M002390 SPS RECOMPILATION OF SOURCE CODE－1 SEr． 002400 FILEI－LOAD MODULE 15K：SPS＜－w＞DSK．n02410 FILEI DSK 《－B FMP．
FILEZ－CONFIGATION IOKI ORF 002420 FILEZ DSK＜－w FMP． SPS COMPILAYION OF SOURCE CODEWI SEC． 002510 FILEE－LOAD MODULE 15K：SPS 《－a＞DSK． 002520 FILEZ－CONFIGATION IOK：GRF 《－G DSK． 00253 FILER DSK＜＊－＞FMP． 002550 FMP FLOW CODE PROCESSING－ 10 SEC． 002560 FILE日－RESULTS FILE 60KI FMP 《－C＞DSK． 002570 FIUE OSK 《－AC SPS． 00258 SPS POST－PROCESSINGI40 SEC－25® LUAD． 002590 JOB RUN COMPLETE 002600 MODEL 1－INTERARRIVAL TIME MEAN OF 120 SE 002610
SPS COMPILATION OF SOURCE CODE－1 SEC． FILEI－LOAD MODULE $15 K 1$ SPS＜－- －S SEC． 002620 ILEI DSK＜$\rightarrow \infty$ FMP FILEZ－CONFIGATION IOK，GRF 《－E）DSK．
FILEZ OSK＜$-\omega$ F FMP． FMP FLOW CODE PROCESSING－ 10 SEC．
FILEB RESULTS FILE GOK：FMP FILEB－RESULTS FILE GOK：FMP 《w－＞DSK 0026670 FILEB DSK＜－SPS．SSK． 002680 SPS POST－PROCESSING：40 SEC－25\％LOAD． 002690 JOB HUN COMPLETE． SPS COMPILATION OF SOURCE CODE－1 SEC． 002730 FILEI－LOAD MODULE 15 K ；SPS 《一－＞OSK． FILE DSK＜－m＞FMP：
FILEEZCONFIGATION LOKI GAF＜－m＞DSK． 002750
FILER DSK＜n－＞FMP．ORF＜－m＞DSK FMP FLOW CODE PROCESSING－ 10 SEC． FILEBARESULTS FILE GOKI FMP 《m－＞DSK． FILEB DSK＜$-\infty$ SPS．
SPS POST－PROCESSING， 002790 SPS POST－PROCESSING：40 SEC－25\％LOAD． 002810 JOB RUN COMPLETE． MODEL I－INTERARAIVAL TIME MEAN OF 120 SE 002820 SPS COMPILATION OF SOURCE CODE－1 SEC． 002840 FILEI OSK MOD FMP $15 K 1$ SPS 《－n DSK． 002850 FILER－CONFIOATION IOKI GRF＜u－＞DSK． 00286 FILEL DSK＜－a＞FMP． FMP FLOW CODE PROCESSING－ 10 SEC． FILEG－DERUG DUMP 3M：FMP 002890 FILEG－DEBUG DUMP 3M．FMP 《－T＞DSK． 002900
FILE9 OSK SPS POST－PROCESSINGI4O SEC＝25\％LOAD． 002910 JOB RUN COMPLETE． 002930 MODEL I－INTERAARIVAL TIME HEAN OF 120 SE 002940
SEC ABORTEO SPS COMPILDN－WAIT 10 HOO2950 SPS RECOMPILATION OF SOURCE CODENI SEC． 002960 FILEI－L．OAD MODULE 15K，SPS 《＝＂＞DSK． 002970 FILEI OSK 〈－Wン FHP： 02980 FILE2 DSK＜－n＞FMP．GRF＜－－＞DSK． 00299 FMP FLOW CODE PRCCESSING－ 10 SEC． 003010 FILEB－RESULTS FILE GOKI FMP 《一円 DSK． 003020

| $\begin{aligned} & 294 \text { 294: } \\ & 295 . \end{aligned}$ | $\begin{aligned} & 250,1,0,1,2,-1 \\ & 250,61,60,12,1,-1 \end{aligned}$ |
| :---: | :---: |
| 296. | 250,0,0,0,0,0 |
| 297. | 260,0,120,1,0,0 |
| 298. | 260.61.1.15.0.-600 |
| 299. | 200,61.1.15,0,-300 |
| 300. | 260.61.1.15, |
| 301. | 261,100.0.1:-1 |
| 302. | 261:10.0.1.1.0 |
| 303. | 262.21-0,0,1,-2 |
| 304. | 2t2:10,0.1.1 |
| 305. | 260,60,10,0,2,-1 |
| 306. | 268,10,0,0,2,-1 |
| 307. | 260,1,0,1,2,-1 |
| 308. | 260.61,60.12.1.-1 |
| 309 | 260:0,0,0,0,0 |
| 310. | 270,0,120,1,0,0 |
| 311. | 270,61,1,15,01-1 |
| 312. | 271,1,0,0,1,-1 |
| 313. | 271.10,0,1.100 |
| 314. | 272,21,0,0.1.01 |
| 315. | 272010,0,1,1,-2 |
| 316. | 270,60,10,0,21-1 |
| 317. | 278.10,0,0,20-1 |
| 318. | 278,1,0,1,2,-1 |
| 319. | 270.61,60.12.1.-1 |
| 320. | 270.0,0,0.0.0 |
| 321. | 280,0,120.1.0.0 |
| 322. | 240,61,1,15,0,0600 |
| 323. | 240.61,1.15,0.0 |
| 324. | 281:1,0,0,1,-1 |
| 325. | 281,10:0.1.1.0 |
| 326. | 282,21:0,0,1,01 |
| 327. | 282,10,0.1.1.-1 |
| 328. | 280,60,10,0,2,-1 |
| 329. | 289,10,0,0,90,-1 |
| 330. | 289,1,0,1,90,-1 |
| 331. | 280,61,60,12,1,-1 |
| 332. | 280,0,0:0,0,0 |
| 333 | 29000.120,1,0,0 |
| 334. | 290,61.1.15.0.-600 |
| 335. | 290.61.1.15:0,-1 |
| 336. | 291,1,0,0,1,-1 |
| 337. | 291,10,0.1.1,0 |
| 338. | 292.21,0.0,1,-1 |
| 339. | 292,10,0,1,1,-1 |
| 340. | 290,60,10,0,20-1 |
| 341. | 298,10,0,0,2:-1 |
| 342. | 298,1,0,1,2,-1 |
| 343. | 290.61,60,12,1,-1 |
| 344. | 290,0.0,0,0,0 |
| 345. | 300,0.120.1.0.0 |
| 346. | 300.61.1.15.0.-600 |
| 347. | 300,61,1,15,0,-300 |
| 348. | 300,61.1.15,02-1 |
| 349. | 301,1-0,0,1,-1 |
| 350. | 301.10.0.1.1.0 |
| 351. | 302,21:0,0,1--1 |
| 352. | 302.10.0.1.1,-1 |
| 353. | 300,60:10.0.2,-1 |
| 354. | 309,10,0,0,90,-1 |
| 35 | 30901,0,1090,-1 |
| 35 | 300.61:60.12,1,-1 |
|  |  |


modFl 1－INTEHARGIVAL TIME mEAN OF 120 SE003670 SPS COMPILATJON OF SOURCE CODE－1 SEC． 003680 FILEI－LOAD MODULE 15KI SPS＜－－＞DSK， 003690 FILEZ－CONFIGATION IOKI GRF＜－－＞DSK． 003710
FMPER DSK CODES FHOCESSING－ 10 SEC．
FILEB－RESULTS FILE LOKI 003730 FILEA DSK＜－W）SPS． 003740 FILFE OSK＜－W．）SPS． 140 SEC－25\％LOAD． 003750 JOA HUN CUMPLETE． 003760 MODEL I－INTERARAIVAL TIME MEAN OF 120 SE003780 $I$ SEC ABORTED SPS COMPIL＇N－WATT 10 MOO3790
SPS HECOMPILATION UF SOURCE CODE－I SEC． 003800 FILFÍLUAD MODULE $15 K I$ SPS $<\omega \rightarrow$ DSK．003A10 FILFI DSK＜－a＞FMP． 003820 FILER－CONFIGAYION IOK：GHF＜n－＞BSK．nO 3830 FILEC DSK 《～－＞FMP． 003840 FMP FLUW CODE PRUCESSING－ 10 SEC． 003850 FILE9＊DEBUG DUMP FILE $3 M ;$ FMP＜－w DSK． 003860
FILE9 DSK＜w＞SPS．
003870 SPS POSTAPHOCESSING140 SEC－258 LOAD． 003880
JOB HUN COMPLETE． JOB HUN COMPLETE．
MODEL I－INTEKARRIVAL TIME MEAN OF 120 SE003900 I SEC ABURIED SPS COMPIL＇N－WAIT 10 MOO3910 SPS RECOMPILATION OF SOURCE CODE－1 SEC． 003920 FILEI－LOAD MUDULE 15K；SPS＜＝त＞OSK 003930 FILEZ－CONfigation FILER－DSK＜－A FMP OKI GRF＜w－＞DSK． 003950 FMP FLOW CODE PROCESSING－ 10. SEC． 003960 FILEE－RESULTS FILE HOKI FMP＜－＞DSK． 003980 FILEB OSK＜－W〉 SPS． 003990 SPS POST－PROCESSING140 SEC－25\％LUAD． 004000 JOB RUN COMPLETE，
MODEL 1 INTERARRIVAL TIME MEAN OF 120 SE004010 MODEL I－INTERARRIVAL TIME MEAN OF 120 SE004020
1 SEC ABORYEO SPS COMPIL．＇N Y WAIT 10 MOOMO ABORTED COMPIL＇N AGAIN W WAIT 5 MIN．TOO4040 SPS RECOMPILAYION OF SOURCE CONE－1 SEC． 004050 FILEI－LOAD MUOULE 15KI SPS 《ーN＞DSK．D04060 FILEI DSK＜－न FMP．
FILER－CONFIGATION IOKI GRF 004070 FILE2 DSK＜WO＞FMP．
FILEL DSK 《い－＞FMP。 FMP FLUN CODE PHOCESSING－ 10 SEC． FILEB－RESULTS FILE HOKI FMP＜$-\infty$ DSK． 004090
 SPS POST－PROCESSING140 SEC－25\％LOAD． 004130 JOB FUN COMPLETE． SPS COMPILATION OF SOURCE CODE－I SEC SE004150 FILEI－LOAD MODULE $15 K$ ；SPS＜－w $)$ DSK． 004160 FILEI－LOAD MODULE 15K；SPS＜w＊＞DSK． FILE DONFIOATION
FILEZ－CONFIGATION OKKI GRF 《－－＞OSK． 004190 004200 FILEBMESULTS FILE GOKI FMP 《－A OSK． 004210 FILEB DSK 《－n＞SPS． SPS POST－PROCESSINGI 40 SEC－25\％LOAD。 JOB RUN COMPLETE．
MODEL I－INTERARRIVAL IHE MEANOFI 004240 MODEL I－INTERARRIVAL ITHE MEAN OF 120 SE004260 SPS RECOMPILATION OF SOURCE CODE－1 SEC M004270 FILEI LOAD MODULE 15Ki SPS 《w－＞DSK． 004290 FILE：OSK 《－－＞FMP．

| 422. | 362．21，0，0，1．－1 |
| :---: | :---: |
|  | 362，10，0 |
| 42 | 360.60 .10 .0 .21 |
| 425. | 369，10．0，0，90－1 |
| 426. | 369，1，0，1，90，－1 |
| 427. | 360．61．60，12． |
| 428. | 360，0，0：0，0．0 |
| 429. | 370：0，120，1：0．0 |
| 430. | 370，61，1，15，0，0600 |
| 431. | 370，61：1，1500，－1 |
| 432. | 371，10000．1，w1 |
| 433. | 371．10．0．1．1．0 |
| 434. | 372，2100．0．1，－1 |
| 435. | 372．10，011：1，－1 |
| 436. | 370，60：10．0．20－1 |
| 437. | 378，10．0．0．2．－1 |
| 438. | 378，1，0，1，20－1 |
| 439. | 370，61，60，12，1，－1 |
| 440. | ＇370，0．0．0．0．0 |
| 441. | $300.012001,000$ |
| 442. | 300，61，1：15，0，－600 |
| 443. | 390，61，1－15，0，0300 |
| 444. | 380，61，1，15，0：w1 |
| 445. | 381．1．0．0．1．－1 |
| 446. | 381，10，0．1－1，0 |
| 447. | 382021，0，0．1，－1 |
| 448. | 382010，0．1，10．1 |
| 449. | 380，60，10．0．20－1 |
| 450. | 388，10，0，0．2，－1 |
| 451. | 308，1．0．1，2，－1 |
| 452. | 380，61，60，12，1－1 |
| 453. | 380，0．0．0，0，0 |
| 454. | 390．0．120．1．0．0 |
| 455. | 390，61：1，15，0，－1 |
| 456. | 391，100．0．10－1 |
| 457. | 391，10，0，1，1，0 |
| 458. | 39202100001，－1 |
| 459. | 392，10．0．1．1．－1 |
| 460. | 390．60．10．0．29－1 |
| 461. | 399，10，0，0，90－－1 |
| 462. | 399，14001．90，－1 |
| 463. | 390，61，60，12，1，－1 |
| 464. | 390，000，0：0，0． |
| 465. | $40000120,1.000$ |
| 466. | 400，61，1，15，0，－600 |
| 467. | 400，61，1，15，0，－1 |
| 468. | 401，10000．10．1 |
| 469. | 401，10．0．1．170 |
| \＄70． | 4021210001181 |
| 471. | 402，10，0．1：1，－1 |
| 472. | $400,60,10,0,21-1$ |
| 473. | 409，10，0，0，900－1 |
| 474. | 409，1，0，1，90，－1 |
| 475. | 400，61，60，12，1，－1 |
| 476. | 400，000．0，0．0 |
| 477. | $410,0.120,1.000$ |
| 478． | $410,61+1+15,0,01$ |
| 479. | $411+1,0.0 .10=1$ |
| 480. | 411，10，0，1－1，0 |
| 481. | 412，21，0，001，－1 |
| 482. | 412010．0．101，－1 |
| 483. | 410：60，10．0，2，－1 |
| 484. | $418,10,0,0,2,=1$ |
| 485. | $418,1001,20 \mathrm{ml}$ |

FILER－CONFIGATION 1OK1 GRF 《－$->$ DSK． 004310
FILE2 DSK＜w－＞FMP． 004330 FILE9－DEBUG DUMP FILE 3MI FMP 《～－＞DSK．004340 FILE9 DSK＜－n＞SPS． 004350 SPS POST－PROCESSINGI40 SEC－25\％LOAD． 004360 JO日 RUN COMPLETE． 004370 MODEL 1 INTERARRIVAL TIME MEAN OF 120 SE004340 SPS RECOMPILATION OF SOURCE CODE－1 SEC． 004400 FILEI－LOAD MODULE 1SK：SPS 《un＞DSK 004410 FILE！DSK＜－－＞FHP．
FILEZ－CONFIGAYION IOKI GRF 《－N DSK． FILER DSK＜－F－＞FMP．
FMP FLOW CODE PROCESSING－ 10 SEC．
FILEBURESULTS FILE GOKI FMP 10 SEC， 004440 FILEB RESULTS FILE 6OKI FMP $\langle\rightarrow\rangle$ OSK： 004460
FILEB DSK $\langle-\rightarrow$ SSPS． SPS POSY－PROCESSINGI40 SEC－2S\％LOAD． 004470 JOB RUN COMPLETE． 004490 MODEL L－INTERARRIVAL TIME MEAN OF 120 SE004500 1 SEC ABORTED SPS COMPILIN－WAIT 10 MOO4510 ABORTED COMPILIN AGAIN－WAIT 5 MIN，Y004520 SPS RECOMPILATION OF SOURCE CODE－1 SEC． 004530 FILEIOLOAD MDDULE 15K：SPS＜－＊＞DSK． 004540 FILER－CONFIGATION：
FILE2 DSK 《－$\rightarrow$ FMP．GRF 《－0＞DSK． FMP FLOW CODE PROCESSING－ 10 SEC． HIEB－RESULTS 004570 FILEE－RESULTS FILE 60KI FMP 《－$\rightarrow$ DSK． 004590
 SPS POST－PROCESSING：40 SEC－25\％LOAD． 004610 JOB RUN COMPLETE．
MODEL I－INTERARAIVAL TIME MEAN OF 120 SE004630 FILEI－LOAD MODULE 15 K I SPS＜－－＞DSK． 004650 FILEI DSK \＆$\rightarrow$ FHP． FILEZ－CONFIGATION IOK；GRF 《－－＞OSK． FILEZ OSK 《－W）FMP．
MP FLOW CODE PROCESSING－ 10 SEC 004670 FILE9－DEBUGG DUMP FILE 3MI FMP SE．OSK 004690 FILE9－DEBUG DUMP FILE 3MI FMP 《－．＞DSK． 004700
FILE9 DSK $\langle\rightarrow$ SPS． SPS POST－PROCESSING140 SEC－25\％LOAD． 004720 JOB RUN COMPLETE． MODEL I－INTERARRIVAL TIME MEAN OF 120 SE014 1 SEC ABORTED SPS COMPILPN WFAIT 10 M004750 SPS RECOMPILATION OF SOURCE CODE－1 SEC． 004760 FILEI－LOAD MODULE 15K！SPS＜－－＞DSK． 004770 FILEZ－CONFIGATION IOKI GRF＜－m＞OSK． FILEZ DSK＜－$->$ FMP．

004780
004800
FMP FLOW CDDE PROCESSING－ 10 SEC． 004810
FILE9－DEBUG OUMP FILE 3MI FMP＜nー＞DSK． 004820 FILEG DSK 《HSSPS． JOB RUN COMPLETE． 004840 MODEL I－INTERARRIVAL TIME MEAN DF 120 SE004860 SPS COMPILATION OF SOURCE CODEMI SEC． 004870 FILEL－LOAD MODULE 15K；SPS 《～－＞OSK． 004880 FILEI DSK 《OD FMP． FILER－CONFIGATION IOK：GRF＜－－＞DSK． FILER DSK＜－$\rightarrow$ FMP．
FMP FLOW CODE PROCE 004880
004890 FILEQ FRESULTS FILE BOKI FMP ${ }^{10}$ SEC． FILEB OSK \ll M SPS． 004900 004900
004910 004910
004920 004920
004930 004940



SPS POST－PHOCESSING14O SEC－25\％LOAD． JO日 RUN COMPLETE． MODEL L－INTERARRIVAL TIME MEAN OF 120 SE004970 SPS COMPILATION OF SOURCE CODE－1 SEC． 004980 FILEE－LOAD MODULE 15KI SPS＜m－＞OSK． 004990 FILER－CONFIGATION IOK：GRF 《－－＞DSK． 00500 FILEZ OSK＜－C＞FMP． FMP FLOW CODE PROCESSING－ 10 SEC． 005030 FILEBARESULTS FILE 60K：FMP 《一川 DSK． 005040 FILE8 DSK 《wn SPS． 00505 SPS POSTMPROCESSINGI40 SEC－25\％LOAD． JOB RUN COMPLETE 00506 MODEL－INTERARRIVAL TIME MEAN OF 120 SEDO5070 0050 MODEL I－INTERARRIVAL TIME MEAN OF 120 SE005080
SPS COHPILATION OF SOURCE CODE－1 SEC． 005090 FILEI－LOAD MODULE 15KI SPS 《－－＞DSK． 005100 FILEI OSK＜－W FMP．
FILEZ－CONFIGATION IOK；GRF 《－N DSK． 005120 FILEZ DSK 《OCP FMP．
FPS POST－PROCESSINGI4O SEC－25\％LOAD． 005160
JOB HUN COMPLETE． 005180
MODEL I-INTERARRIVAL YIME MEAN OF 120 SE 005190
SPS COMP ILATION OF SOURCE CODE-1 SEC. 005200
SPS COMPILATION OF SOURCE CODE-1 SEC. 005200
FILEL-LOAD MODULE 15K: SPS <W-> DSK. 005210
FILEL-LOAD MODULE 15K: SPS 《w-> DSK。 005210
FILEL OSK < FMP FMP:
FILE
FILEL OSK <ONFIGATIONPIOKI GAF 《-C> OSK.
FILER DSK <- F FMP.
FMP FLOW CODE PROCESSING - 10 SEC
FILE9*DEGUG DUMP 3M: FMP <->> OSK.

SPS POST-PROCESSING140 SEC~25\% LUAO. 005270
JOB RUN COMPLETE.
MODEL I-INTERARRIVAL TIME MEAN OF 120 SE005300
SPS RECUMPILATION OF SOURCE CODEN SEC. 005320
FILEI-LOAD MODULE LSK! SPS 《ー-> OSK. 005330
FILE1 DSK <- $\rightarrow$ FMP.
FILE2-CONFIGATION IOK: GRF <n-> DSK.
FILE2 OSK 《OAS FMP.
FML FLOW CODE PROCESSING 005350
$\begin{array}{lll}\text { FMP FLOW CODE PROCESSING } 10 \text { SEC. } & 005380 \\ \text { FILEQARESULTS FILE GOK F FMP <- } \rightarrow \text { OSK } & 005370\end{array}$
FILEE-RESULTS FILE BOK: FMP <--> OSK. 005380
SPS POST-PHOCESSING140 SEC-25\% LOAD. 005390
JOB RUN COMPLETE. 005410
MODEL I-INTERARRIVAL TIME MEAN OF 120 SEOO542
1 SEC ABORTED SPS COMPILPN-WAIT 10 MOO5430
ARORTEO COMPIL'N AGAIN - WAIT 5 MIN TOO5440
SPS HECOMPILAYION OF SOURCE COOE-1 SEC. 005450
FILEl-LOAD MODULE $15 K$ S SPS 《O-> DSK. 00546 .
FILER-CONFIGAYION LOK: GRF <um DSK
FILEZ OSK <-NP FMP.
FMP FLOW CODE PROCESSING - 10 SEC. 005500
FILEB-RESULIS FILE KOK: FMP 《-

JO日 HUN COMPLETE. 40 SEC-25\% LOAD. DOS530
MODEL I-INTERARHIVAL TIME MEAN OF 120 SEOOS55
SPS COMPILATION OF SOURCE CODEWI SFC. 005560
FILEI-LOAD MODULE 15 K : SPS <me> DSK. 005570
FILEI-LOAD MODLLE ESK SPS <-m> DSK。 005570
$\begin{array}{ll}\text { FILE: DSK <-Z FMP: } \\ \text { FILEZ-CONFIGATION IOK: GRF }\end{array} \rightarrow-\infty$ OSK. DO5580

FILEE DSK 《－FMP． 005600 FMP FLUW CUDE PROCESSING－ 10 SEC． 005610 FILEB－RESULTS FILE AONI FMP 《－－＞DSK． 005620 FYLEB OSK＜－N）SPS． 140 SEC－Z5\％LUAD． 005630 JOB RUN COMPLETE． 005640 JOB RUN COMPLETE．$\quad 005650$ MODEL I－INTEHARRIVAL TME MEAN WFIT 10 MEDOF670 SPS REC A日ORTED SPS COMPILIN WUATT 10 MOOC670 FILEL－LUAD MODULE 15 K ：SPS＜－－＞IOSK nObtion FILEI DSK＜－－2 FMP．
FILE2－CONFIGAIJUN IOK：GHF＜－－＞DSK．005710 FILEC DSK 〈－$\rightarrow$ • FMP．
FMP FLOW CODE PROCESSING－ 10 SEC． 005720 FILE9－DEBUG DUMP F1LE 3 Mi FMP $\Leftrightarrow \rightarrow$ DSK． 005740 FILE9 OSK \ll－－${ }^{2}$ SPS． SPS POSTーPKOCESSING140 SEC－25\％LOAD． SPS POSTHPROCESSINGS40 SEC－25\％LOAD． 005760
JOE HUN COMPLETE．
005750 MODEL L－INTEAARAIVAL TIHE MEAN OF 120 SE005780 I SEC ABORTED SPS COMPIL＇N－WAIT 10 MOO5790 SPS RECOMPILATION OF SOURCE CODE－1 SER． 005800 FILEIWLOAD MODULE 15KI SPS 《－ー＞OSK． 005810
 FILEZ $D S K$＜－$\rightarrow$ FMP． FMP FLOW CODE PROCESSING－ 10 SEC． FILE日－RESULTS FILE KOKI FMP 《－a＞DSK． FILEE DSK（－\＃》 SPS．
SPS POST－PROCESSING； 40 SEC－25\％LOAD． JOG RUN COMPLETE．
005830 005890
005890 1 SEC ABORTED SPS COMPILIN MEAN OF 120 SE005900 1 SEC ABORTED SPS COMPIL＇N－WAIT 10 MOOS910
ABORTED COMPILIN AGAIN－WAIT 5 MIN，TOOS920 SPS RECOMPILATION OF SOURCE COOE－1 SEC． 005930 FILEI－LOAD MODULE ISK！SPS 《－＊＞DSK． FILEI DSK＜$\rightarrow>$ FMP．
 FILEL DSK 《－＞FMP．
FMP FLOM CODE PROCESSING m 10 sEC FILEQ RESULTS FILE GOK：FMP＜$\rightarrow$ OP OSK． FILE日 DSK＜$\rightarrow$ S SPS．
SPS POST－PROCESSING：40 SEC－25\％LOAD． JOB RUN COHPLETE．
MOB RUN COMPLETE． 006010 SPS COMPINARAR OF SOURE MEAN OF 120 SE006030 SPS COMPILATION OF SOURCE CODE－1 SEC． 006040
FILEI－LOAD MODULE $15 K$ SPS SOS FILEI－LOAD MODULE 15K：SPS $\rightarrow-\rightarrow$ OSK． FILE2．CONFIGATION
FILE2 DSK 《－ッ＞FMP．
MP FLON 《CO PMO． FMP FLON CODE PROCESSING－ 10 SEC． FILEB－RESULTS FILE OOK：FMP $<\rightarrow \rightarrow$ DSK． FILEE DSK＜－－＞SPS．
SPS POST－PROCESSING140 SEC－25\％LOAD．
JOB HUN COMPLETE．
MODEL I－INTERARRIVAL TIME MEAN OF 120 SE006130 SPS COMPILATION OF SOURCE CODE－I SEC． 006150 FILEI－LOAD MODULE $15 K$ ；SPS 《N－＞DSK． FILEI DSK 〈 $\rightarrow$ F FMP．
FILER－CONFIGATION IOK：GRF 《W－＞OSK． FILED OSK $\leqslant \rightarrow \rightarrow$ FMP．
FMP FLOW CODE PROCESSING－ 10 SEC． FILEB＊RESULTS FILEE NOK！FKP 《－a＞OSK． FILES POST PPROCESSING：4O SEC－25\％LUAD． JOG RUN COMPLETE．

MODEL I－INTERARRIVAL TIME MEAN OF 120 SE 006250
SPS COMPILATION OF SOURCE CODE－1 SEC
006260 SPS COMPILAYION OF SOURCE CODE－1 SEC． 006260
FILEI－LOAD MODULE $15 K$ SPS $\&-\infty$ DSK． 006270 FILEI－LOAD MODULE 15K；SPS 《－$\rightarrow$ DSK． 006270
FILE！OSK $\langle=\rightarrow$ FMP．
006280 FILEZ－CONFIGATION IOKI GRF 《－－＞OSK． 006290 FILE2 DSK 《－FHP． 006300 FMP FLOW CODE PROCESSING－ 10 SEC． 006310 FILE9－DEGUG DUMP FILE 3MS FMP 《 $\rightarrow$ DSK． 006320 FILE9 OSK＜W－＞SPS． $\begin{array}{ll}\text { SPS POST－PKPLETE．} & 006340 \\ \text { JOS RUN COMPLET }\end{array}$ MODEL I－INTERARRIVAL TIME MEAN OF 120 SE006360 SPS COMPILATION OF SOURCE CODE－1 SEC． 006370 FILEI－LOAD MOOULE 15K；SPS $\& \rightarrow->$ OSK． 006380 ILEL OSK \＆\＃C＞FMP． 006390 FILE2－CONFIGATION IOK：GRF＜－m DSK． 006400 FILE2 OSK 《＊－＞FMP．

006400
006410
006420 FMP FLOW CODE PROCESSING－ 10 SEC． 006420
FILE9－DESUG DUMP 3MI FMP FILES OSK 《－$\rightarrow$ SPS． $\begin{array}{lll}\text { FiLES OSK＜UC）SPS；} & & 006440 \\ \text { SPS POSTMPROCESSINGI40 SEC－25\％LOAD．} & 006450\end{array}$ JOB RUN COMPLETE． 006450
006460 MODEL IWINTERARRIVAL TIME MEAN OF 120 SE006470 SP SEC ABORTED SPS COMPIL＇N W WAIT 10 MOO6460 SPS HECOMPILATION OF SOURCE CODEF－1 SEC． 006490
FILEI－LOAD MODULE $15 K 1$ SPS
$006 S 00$ FILE！DSK \＆－$\rightarrow$ FMP． 006510 FILE2－CONFIGATION IOKI GRF 《OM OSK．
FILEZ OSK 《－$\rightarrow$ FMP．
006530
FMP FLON CODE PROCESSING－ 10 SEC． 006540
 SPS POST－PROCESSING140 SEC－25\％LOAD． 006570 JOB RUN COMPLETE．

TIME MEAN OF 120 SE006590 ABORTED ABOMA SPS COMPIL＇N－WAIT 10 M 006600 SPS RECOMPIMI FILEI－LOAD MODULE ISK：SPS © CODE－1 SEC． 006620 FILEI DSK＜GO FMP
FILEZ－CONFIGATION IOK，DRF 006640 FILEZ－CONFIGATION IOK：GRF＜－mD DSK． FILES DSK＜KNT FMP． FILES－RESULTS FILE 6OK：FMP 《－n OSK．
$\begin{array}{lll}\text { FILES－RESULTS FILE 6OK：FMP 《W－＞OSK，} & 006680 \\ \text { FILEB DSK＜－W SPS．} & 006690 \\ \text { SPS POST－PROCESSING140 SEC－25\％LOAD，} & 006700\end{array}$ SPS POST－PROCESSING140 SEC－25\％LOAD． 00665 JO日 RUN COMPLETE．
MODEL I－INTERARAIVAL ITME MEAN 006700 MODEL I－INTERARAIVAL TIME MEAN OF 120 SE 0067720
SPS COMP ILAAIDN OF SOURCE SPS COMPILATION OF SOURCE CODE－1 SEC． 006730 FILEI LOAD MODULE $15 K 1$ SPS 《m－＞DSK． FILE2－CONFIGATION IOKI GRF 《Wn＞DSK．
 FMP FLOW CODE PROCESSING－ 10 SEEC． 006770 FILEB－RESULTS FILE 60KI FMP 《－＞＞OSK． 006790 FILE日 DSK 《CH SPS． 006800 SPS POST－PROCESSINGI4O SEC－25\％LOAD． 006810 MODEL L－INTERARRIVAL TIME MEAN OF 120 SE006B30 1 SEC ABORTEO SPS COMPILPN WAIT 10 MOO6B40
SPS RECOMPILATION OF SOURCE CODE－1 SEC． 006850 SPS RECOMPILATION OF SOURCE CODE－1 SEC． 006850
FILEILLOAO MODULE 15K：SPS ©－＂＞OSK 006860 FILEE－LOAD MODULE 15K，SPS 《OW＞OSK FILEI DSK＜N－＞FMP：
FILE2－CONFIGATION IOK，GRF 《－－＞DSK．
FILE2 OSK＜－C）FMP． 006870

FMP FLOW CUDE PROCESSING - 10 SEC. 006900 FILE9-UEBUG DUMP FILE 3M; FMP 《->> DSK. 006910 SPS POST-PROCESSING:40 SEC=25\% LOAD. 006930 JOB HUN COMPLETE.
MODEL 1 -INTERARRIVAL TIME MEAN OF 12050006950 1 SEC ABORTED SPS COMPILIN - WAIT 10 M006960 SPS RECOMPILATION OF SOURCE CODE-1 SEC. 006970 FLEJ-LOAD MODULE $15 K$ SRS $\leqslant N \rightarrow$ DSK. FILE2-CONFIGATION 10
FILE2 DSK <- F FMP,
006990
007000
FMP FLOW CODE PHOCESSING - 10 SEC. 007020 FILEB-RESULTS FILE SOKI FMP 《- $\rightarrow$ DSK, 007030 FILEB DSK < W- SPS. 007040 SPS POST-PHOCESSING140 SEC-25\% LOAD. 007050 MODEL 2~INTERARRIVAL TIME MEAN OF 300 SEC007070 1 SEC ABOHTED SPS COMPILN - WAIT 10 MIN. DO7080 SPS RECOMPILATION OF SOURCE CODE - 1 SEC. 007090 FILEI-LOAD MODULE GOK: SPS $\rightarrow>$ DSK. 007100 ILEI DSK --> FMP.
ILER-CONFIGURAYION 5OKI GRF $\rightarrow->$ OSK. ILEE DSK $-{ }^{-2}$ FMP.

SPS $\rightarrow$ DSK ILE3 DSK --> FMP. ILE9-DEBUG DUMP 4M: FMP ILEG DSK DPS SMP DSK.
FILE9 DSK $\rightarrow$ SP SPS.
FILEG ANOTHER $4 M$ WORDS: FMP $\rightarrow->$ DSK. FILEG ANOTHER 4M W.
FILEG DSK - -3 SPS.
JOB RUN COMPLETED
007210
model z-interarrival time mean of 300 secou7220 SPS COMPILATION OF SOURCE CODE - I SEC 007230 FILEI-LOAO MODULE GOKI SPS $\rightarrow$ OSK. 007240 ILE1 DSK M- FMP. 007250 FILET-CONFIGURATION 50K: GRF $\rightarrow$ SPS. 007260 SPS CONFIGURIN MANIPULAYNI70 SEC-30\% LOADOO7270 ILEZ-CONFIGURAYION 5OK: SPG DSK. ILE $3 \rightarrow$ GAID 600 KI SPS $\rightarrow-2$ OSK. 007300 ILE DSK --> FMP. MP FLOW CODE PHODESTH10 ILEAMRESULTS FIIE ISOKG FOSEGIAWAIT 3 FIOO7320 FILEARRESULTS FILE 180KI FMP $\rightarrow->$ DSK. 007330 ILEB DSK $\rightarrow-\rightarrow$ SPS. SPS PUST PROCESSING-120 SEC:70\% LOAD. 007350 ILET-OUTPUF FILE $120 K$ SPS --> GRF. 007370 MODEL 2-INTERARRIVAL. TIME MEAN OF 300 SEC0073H0 1 SEC ABUKTED SPS COMPILN - WAIT 10 MIV. 007390 ABORTEO CUMPILATION AGAIN-WAIT 5 MIN. 007400 SPS REEOMRILAAIION OF SOURCE CODE - I SEC. 007410 FILED-LOAD MUDULE GOKI SPS $\rightarrow->$ OSK. FILEI DSK -COMFMP. ILEE DSK … FMH. 007450 ILE3 DSK GOOKI SNS --> OSk. FILE 3 DSK $\rightarrow$ FHP. 007470 FMP FLOW COUE PROCESSING-GOSECSAWAIT 3 FI 007480 FILEB-RESULTS FILE 3HOKI FMP $\rightarrow->$ DSK. 007490
 FILET-UUTPUT FILE 240 KI SPS --> GHF. no7520 JOR RUN COMPLETED, MODEL Z-INTERARRIVAL TIME MEAN OF 300 SECDO7540

| 746. 747 | $640,61,1,25,0,-6$ |
| :---: | :---: |
| 748. | $640+6101,25,0$, $641,1+0,0,2,-1$ |
| 749. | 641,10.0.1,2.0 |
| 750. | 642,21;0,0,2,-1 |
| 751. | 642,10.0,1.2. |
| 752. | 643,1,0,0.19,-1 |
| 753. | 643,10,0,1,19,-1 |
| 754. | 640,60,60,0,3,-1 |
| 755. | $648,10.0,0,6,-1$ |
| 756. | 648,1,0.1,6,-1 |
| 757. | 640.61.180.35:1,-1 |
| 759. | 647:20,0.0.4.-1 |
| 759. | 640,000,0,0,0 |
| 760. | 650,0,300+1,0,0 |
| 761. | 650,61,1,25:0,-600 |
| 762. | 650,61.1,2510,-1 |
| 763. | 651,1,0+0,2+-1 |
| 764. | $651,10,0+1 \cdot 2,0$ |
| 765. | 652,2100,0,2,-1 |
| 766. | 652,10,0,1,2,0 |
| 767. | 653.1.0.0.19,-1 |
| 768. | 653,10,0,1,19,-1 |
| 769. | 650,60,60,0,3:-1 |
| 770. | 658,10,0,0,6,-1 |
| 711. | 658,1,0,1,6,-1 |
| 772. | 650,61,180,351.1.1 |
| 773. | 657,20,0,0,40-1 |
| 774. | $650,0,0,0,0,0$ |
| 775. | $660,0,300 \cdot 1.0 .0$ |
| 776. | 660.61.1.25.0, 6 600 |
| 777. | 660.61.1.25.0,-1 |
| 778. | 66102:0,0,2:-1 |
| 779. | 661:10,0.1.2.0 |
| 780. | 662,21,0,0.2,-1 |
| 781. | 662.10.0.1.2.0 |
| 782. | 663.1.0.0,191-1 |
| 783. | 663,10,0,1,19,01 |
| 784. | $660,60.60 .0 .3 \mathrm{~cm}$ |
| 785. | 669,10,0,0,125:-1 |
| 786. | 669,1,0,1,125.0 |
| 787. | 669,10,0,0,125,-1 |
| 788. | 669,1,0,1,125:-1 |
| 789. | 660:0*0,0,0.0 |
| 790. | 670,0.300.1.000 |
| 791. | 670.6101.25.0.-1 |
| 792. | 671,1-0,0,2,-1 |
| 793. | 671.10.0.1.2.0 |
| 794. | 672,20,0,1,2,-1 |
| 795. | 670,61,80,20,1,-1 |
| 796. | 672.100,0.2,-1 |
| 797. | 672,10.0.1,2,0 |
| 798. | 673,1,0,0,19,-1 |
| 799. | 673.10.0.1.190-1 |
| 800. | 670,60,60,0,3,-1 |
| B01. | 679,1000,0,125,01 |
| 802. | 679,1.0.1.125:0 |
| 803. | 679,10,0,0,125.-1 |
| 804. | 67911,0,1-125:-1 |
| 805. | 670,0,0:0.0.0 |
| 806. | 680.0.300.1.0:0 |
| 807. | 680.61.1.25,00-600 |
| 808. | 680,61.1.25,0,-300 |
| 809. | 680, $61.1 .25 .0,-1$ |
| 10. | 6810100.0,2,-1 |

681, $1,0,0,2,-1$

1 SEC ABORTEO CUMPILN-WAIT 10 MIN SPS RECOMPILATION OF SOUACE CODE FILE: DSK $\rightarrow$ FMP.
FILEPOCONFIGINATION SOKI OHF … IISK. FILER DSK M-M FMP:

S $-\infty$ osk.
FILE 3 DSK $\rightarrow->$ FMP. 007610 FMP FLOW CODE PROCESSING-GOSECIAWAIT 3 Fin07630 FILEB-RESULTS FILE LADKI FMP $\rightarrow->$ OSK. 007640 FILEB DSK - S SPS.
FPS POST PRUCESSIMG-120 SECI70\% LOAD. 007650 FILET-OUTPUT FILE 120KI SPS -NS GHF. 007670 JOH RUN CUHPLETEO: 1 SEC ABDRTED COMAL TIME MEAN OF 300 SECDO7680 SPS RECOMPILATION OF SOURCE CODE WIN: SEC. 007700 FIEEI-LOAD MODULE GOK: SPS CODE DSK. 1 SEC. 007710 FILEI DSK $\rightarrow$ FMP.
FILE2-CONFIGURATION 5OKI GRF $\rightarrow->$ OSK, F1LE2 DSK $\rightarrow$ FMP.
FILE $3-G R I D$ 600KI SPS $\rightarrow$ O OSK.
007550 007760 FMP FL.OL CODE PROCESSING-60SECIAWAIT 3 FIOO7780 FILEB-RESULTS FILE L80KI FMP $\rightarrow->$ DSK. 007790 FILEB DSK ~~ ${ }^{\text {SPS }}$ SPS
SPS POST PROCESSING-120 SECI70\% LOAD. 007800
 MODEL 2-INTERARRIVAL TIME MEAN OF 300 SECND7840 1 SEC ABORTEO SPS COMPILN - WAIT 10 MIN. 007850 SPS RECOMPILAYION OF SOURCE CODE - 1 SEC.007B60 FILEI-LOAD MODULE GOK: SPS $\rightarrow O$ DSK. SEC. 007870 FILEI DSK $\rightarrow$ FMP,
FILEZ-CONFIGURATION SOK: GKF $\rightarrow->$ OSK. FILE2 DSK $\rightarrow-{ }^{-2}$ FMP.
FILEB-GRID 600KI SPS $\rightarrow$ DSK.
FILE3 DSK =m FMM. 007900 FMP FLOW CODE PROCESSINGGOSECIAWAIT 3 FIOD7920 FILE9-DE日UG DUMP 4MI FMP --> DSK。 007940 FItEG OSK $\rightarrow->$ SPS.

FILES DSK - $-A_{\text {SPS }}$ SP
108 RUN COMPETED 007970 MODEL 2-INTERARRIVAL TIME MEAN OF 300 SEC007980 MODEL 2MINTERARRIVAL TIME MEAN OF 300 SECOOT990
SPS COMPILATION OF SOURCE CODE - SEC
ODBOOO FILEI-LOAD MODULE 6OKI SPS --> DSK. FILE: OSK - 2 FMP. 008000
008010 FILER-CONFIGURATION 50K: GRF $\rightarrow \rightarrow$ SPS. SPS CONFIGURIN MANIPULAINITO SEC-30\% LOAOODBO30 FILEL-CONFIGURATION SOKI SPS -ㅏ OSK. 008050 FILE 2 DSK -"> FMP:
FILEB-GKID b00kI SPS $\rightarrow->$ aSk.
008070
008080 FMP FLOW CODE PROCESSINGOEOSECIAWART 3 FI008090 FILE9 DEEUG OUMP 4MI FMP $\rightarrow \rightarrow$ DSK. FILE9 DSK $\rightarrow$ SPS.
FILE9 - ANOTHER $4 N$ WORDS FMP $\rightarrow->$ OSK. FILE9 OSK M-> SPS. 008100 008120 1 SEC ABORTED SPS COMPILN MEAN OF 300 SECOOB150 ABORTED COMPILATION AGAINWWAIT 5 MIN. 008170 SPS RECOMPILATION OF SOURCE CODE - 1 SEC.ODBIt0

| $\begin{aligned} & 811_{0} \\ & A_{1} \end{aligned}$ | $601,10,0,1,210$ |
| :---: | :---: |
| 813. | $6 \mathrm{B2}, 10,0,1,2,0$ |
| 814. | 683,1,0.0.19 |
| 815. | 683,10.0.1.190: |
| 816. | 680,60,60,0,30-1 |
| 817. | $688 \cdot 10,0,0,6,0$ |
| 818. | $688.10011,6,0$ |
| 819. | 680,61,180.35si] |
| 620. | 687,20.0.0.4. |
| 921. | $680 \cdot 9.0 .0 .0 .0$ |
| 822. | 690.0.300. |
| 823. | 690,6101,2500,6600 |
| 824. | 690,61,1.25,0.-1 |
| 825. | 691:100,0,2,-1 |
| 826. | $691.10+0.1+2,0$ |
| B27. | 692,21,0,0,2,-i |
| 828. | 692,10,0,102,0. |
| 829. | 69310000.191-1 |
| 830. | 693,1000.1,19,-1 |
| 831. | 690,60,60,0,30-1 |
| 832. | 698.10,0,0,6+61 |
| 833. | 690,1.0.1,6,-1 |
| 83 | 690,61:180,35,10w1 |
| 635. | 697,20,0,0,4, $=$ |
| 836. | 690,0,0.0,0,0 |
| 837. | 700,0,300'1,0.0 |
| 838. | 700,61,1,25,0.0600 |
| 839. | 700,61,1,25,0:-1 |
| 840. | 701,1,0,0.2,-1 |
| 841. | 701:10,0.1.2,0 |
| 842. | 702,21,0,0,2,-1 |
| 843. | 702,10,0,1.210 |
| 844. | 703,1.0.0.19,-1 |
| 845. | 703,10,0,1.19,-1 |
| 846. | 700,60,60,0,39-1 |
| 847. | 708.10.0.0.12.-1 |
| 848. | 708.1.0.1.12,-1 |
| 849. | 700.61.360.35.1. |
| 650. | 707,20,0,0,8:-1 |
| 851. | 700,000000,0 |
| 852. | 710,0,300.1.0,0 |
| 833. | 710,61.1,25,0.7600 |
| 854. | 710.61.1.25,0.-1 |
| 855. | 7110100.0.2,-1 |
| 856. | 711.10,0,1,2,0 |
| 857. | 712,21,0,0,2,-1 |
| 858. | 712,10.0.1.2.0 |
| 859. | 713110010,191-1 |
| 860. | 713.10.0.1.19,-1 |
| 861. | 710,60.60,0.31-1 |
| 862. | 719,10,0,0,1251-1 |
| 863. | 719.1.0.1.125.0 |
| 864. | 719,10,0,0,125,-1 |
| 865. | 7190100.1.125:-1 |
| 866. | 720:000,000,0 |
| 867. | 720,0,300.1.0.0 |
| 868. | 720,6101,25,0,-1 |
| 869. | 721,1,0:0,2:-1 |
| 870. | 721.10.0.1.2,0 |
| 871. | 722,20,0,1-2,-1 |
| 872. | 720:61,80,20:1,= |
| 873. | 722.10000.2--1 |
| 874 | 722,10.0.1.2,0 |
| 675. | 723,100.0,19,-1 |



FILE3 DSK $\rightarrow \Rightarrow$ FMP, FILEB FLOWESULTS FROCESSING-GOSECIAWAIT 3 F FILEB-RESULTS FILE L8OKI FMP $\rightarrow-$ DSK. FILEB OSK M-> SPS.
SPS POST PAOCESSING=120 SEC:7OX LOAD. JOE RUN COMPIETED.
GRF 008900 MODEL 2-INTERAARIVAL TIME HEAN OF 300 SEC008910 SEC ABORTEO SPS COMPILN O WAIT 10 MIN. 008930 ABORTED COMPILATION AGAIN-WAIY 5 MIN. $00 B 940$ SPS RECOMPILATION OF SOURCE CODE - I SEC. 008950 FILEI-LOAD MODULE 60KI SPS $-\infty>$ DSK. 00A960 FILEE -CONFIGURAIION 5OK: GRF -m) DSK. FILEZ DSK $\rightarrow->$ FMP. 00899 FILE3-GRID 600K! SPS $\rightarrow-\infty$ DSK. 009000 FILES DSK $\rightarrow$ FMP. 009010 FMP FLOW CODE PROCESSINGm60SECSANAIY 3 FI009020 FILEB-RESULTS FILE 1BOKI FMP $\rightarrow$ DS DSK. 009030 SPS POST PROCESSING-120 SECI70\% LOAD. 009050 FILET-OUTPUT FILE L20K: SPS $\rightarrow \rightarrow$ GRF. 00906 OOB RUN COMPLETED
FILE2-CONFIGURATION SOKI GRF $\rightarrow$ - OSK. 009130 009140

FILE3-GAID 600 KI SPS $\rightarrow$ DSK.
ILE3-GAID 600KI SPS $\rightarrow$ - DSK. ..... 009150
ILE 3 OSK $\rightarrow$ FMP. ..... 009160
ILEB-RESULTS FILE 3KOK FMP $\rightarrow \rightarrow$ OSK. ..... 009170
009140 ..... n09140
ILE8 OSK TOS SPS.
FILE7-OUPPUT FILE 2BOK: SPS -9 GRF. ..... 009210
009220
JUR RUN COMPLETED.
MODEL 2 - SNTERARAIVAL TIME MEAN OF 300 SEC ..... C009230
I SEC ABORTED COMPL IN WAIY 10 MIN . ..... 009240
SPS RECOMPILATION OF SOURCE CODE - 1 SEC. 009260
FLEd-LOAD MODULE 6OKI SPS - $->$ DSK ..... 009270
ILEI 0 OKK $\rightarrow$ FMP. ..... 0928
ILEE-CONFIGURATION SOK GHF -") OSK。 ..... 009290
FILE2 DSK --> FMP: ..... 0093300
009310
FILEBGGRID 600KI SPS $\rightarrow$ DSK.
FILEBGGRID 600KI SPS $\rightarrow$ DSK. ..... 009320
FILE 3 DSK -7 FMP.
FILE 3 DSK -7 FMP. ..... 00933
ILER-RESULTS FILE
S PUST PROCE SPS
ILET-UUTPUT FILE I2OK: SPS $\rightarrow->$ GHF.009350HODEL NOMPLETEO 00938 nMODEL Z-INTERARHIVAL TIME MEAN OF 300 SECNO93901 SEC ARUKTED SPS COMP ILN - WAIT 10 MIN. 009400PS RECOMP MOOUK GOKY SPS $\rightarrow-\rightarrow$ SSK SEC. 0094 !ILEI DSK MOOUEE GOKI SPS $\rightarrow->$ DSKILER-CONFIGUFATION 5OXI GKF $-\rightarrow$ D DSK.
TLEZ-CONFIGURATION 5OXI GKF $-\rightarrow$ DSK.no9440
ILEZ DSK $-{ }^{-3}$ FMP.

FILEB-GRID GOOKI SP
ILEB-GRID GOOKI SPS $-\infty$ DSK. 009450
FP FLOW GOD FMP. ..... 00946FItE

| $941 .$ | $769,1,0,1,125: 0$ <br> 769010.0 .0 .1250 |
| :---: | :---: |
| 943 |  |
|  |  |
|  |  |
| 945. | 770,0,300.1:0,0 |
| 946. | 770:61,1,25,0,.-1 |
| 947. | 771,100.0.2,-1 |
| 948. | 771+10,011+2,0 |
| 949. | 772.20,0.1, |
| 950. | 770,61,80,20,1. |
| 951. | 772,10000,2,-1 |
| 952. | 712,1010.1.2,0 |
| 953. | 77301.0,0,19,-1 |
| 954 | 773.10.0.1.19+1 |
| 955. | 770.60.60.0.3.-1 |
| 956. | 778,10.0.0.6.-il |
| 957. | 77a,10041.6.-1 |
| 958. | 770.61,180.35\% |
| 959. | 777,20,0,0.4.-1 |
| 960. | 770,00000,0,0 |
| 961 | 780.0.300.1.0.0 |
| 962. | 780.61,1.25.0.0600 |
| 963. | 780.6111.25,0.-300 |
| 964. | 780,61,1,25,0,-1 |
| 965. | 781,1-0,0,2,-1 |
| 96 | 701.10.0.1.2.0 |
| 967 | 782,21,0.0.2,-1 |
| 968. | 782,10,0,1,2,0 |
| 969. | 7830100,0,19,71 |
| 970. | 783:10,0,1.19i-1 |
| 971. | 780,60,0040,3i-1 |
| 972. | 788,10,0,0,6.E1 |
| 973. | 788,1-0.1.6.-1 |
| 974. | 780,61,180,35:1,-1 |
| 975. | 787:20,0,0.4.01 |
| 97 | 780.0.0,0,0.0 |
| 977. | 790:0,300,1:000 |
| 978. | 790,61,1,25,0,-600 |
| 979. | 790,61,1,25,0i-1 |
| 980. | 791,1.0.0,2,-1 |
| 981. | 791,1010.1,2.0 |
| 982. | 792,21:000.20:1 |
| 983. | 792,10,0,1,2,0 |
| 984. | 793.1.0.0.19:-1 |
| 985. | 793:10.0.1.190-1 |
| 986. | 790,60,60,0,30-1 |
| 987. | 798.10.0.0.6:-1 |
| 988. | 798.1.0.1.6.-1 |
| 989. | 790,61,180.35:1,-1 |
| 990. | 797.20.0.014,-1 |
| 991. | 790.0.0,0,0,0 |
| 992. | 800,0,300,1,000 |
| 993 | 800,61,1,25:0,-600 |
| 994. | 800,61,1,25,0,-1 |
| 995. | 801.1.0,0,2,-1 |
| 996 | 801.10,0,1.2,0 |
| 99 | 602.21,000,2:-1 |
| 998. | 002,10,0,1,2.0 |
| 999. | 803,1,0,0,19,-1 |
| 1000. | 803:10.0.1.190-1 |
| 1001. | 800,60,60,0,3:-1 |
| 1002. | 80a,10,0,0,6, ${ }^{\text {a }}$ |
| 1003. | 808,lod, i,6,mi |
| 1004 | 800.61.180.35.1.-1 |
| 1005 | 807,20.0,0.4.71 |
|  |  |

807,20,0,0,4:-1
00010,0,0,0.0

FILEQ DSK … SPS.
FILEQ ANOTHEA $4 M$ WORDSI FMP .-> DSK. 009500 FILE9 OSK $\rightarrow$ SPS.

009510 JOR RUN COMPLETED.
MODEL 2-INTEPARRIVAL TIME MEAN OF 200 SE009530 MODEL 2-INTERARRIVAL TIME MEAN DF 700 SECOO9540
SPS COMPILATION OF SOURCE CODE - SEC SPS COMPILATION OF SOURCE CODE - 1 SEC 009550
FILEI-LOAD MODULE GOKI SPS $\rightarrow-\rightarrow$ OSK.
009560 FILEI DSK $\rightarrow->$ FMP. 00957 FILER-CONFIGURATION SOKI GHF $\rightarrow 2$ SPS, 00950 SPS CONFIGURIN MANIPULAINITO SEC-3nS LIIABOU9540 FILEZ-CONFIGURATION 5OKI SYS $\rightarrow->$ NSK. 009600 FILE DSK $\rightarrow \Rightarrow$ EMP. FILE 3-GRID GOOK: SPS $\rightarrow$ DSK. 009610 FItE3 DSK $\rightarrow-{ }^{-1}$ FMP. FMP FLOW CODE PROCESSING-60SECIAWAIT 3 Fl 009640 FILEB-RESULTS FILE 180K: FMP $\rightarrow-\rightarrow$ DSK. 009650
FILEB DSK SPS POST PHOCESSING-120 SECiTON LOAD. 00966 $\begin{array}{lll}\text { SPS POST PROCESSING-120 SEC:70\% LOAD. } & 009670 \\ \text { FILET-OUTPUT FILE I20KI SPS } \rightarrow>\text { GRF. } & 009640\end{array}$ JOB RUN COMPLETED.
MODEL 2-INTEDARRIVAL TIME MEAN OF 300 SECOD9700 1 SEC ABORTED SPS COMPIL.N - WAIT 10 MIN. 009710 SPS RECOMPILATION OF SOURCE CODE MIN. SFC. 009720 FILEI-LOAD MODULE GOK1 SPS $\cdots>$ DSK. 009740
 FILE2 OSK 포 FMP,

## FILE3~GRID GOOKS SPS - DSK. $\begin{array}{ll}009760 \\ 009770\end{array}$

## 009790

 FMP FLOW CODE PROCESSINO-GOSECBAWAIT 3 FI009AO FILEBMRESULTS FILE LBOKI FMP $\rightarrow$ DSK. 009810 FILES DSK $-\rightarrow$ SPS. SPS POST PROCESSING-120 SECITOX LOAD. $\quad 009820$ FILET-OUTPUT FILE 120K: SPS \#. C GRF. 009840 MODE RUN OF 300 SEC00985 1 SEC ABOKTED COMPILN-WAIT 10 MIN. SECOO9870 SPS RECOMPILASION OF SOURCE CODE - 1 SEC. 009880 FILEI-LOAD MODULE GOK: SPS $\rightarrow-\infty$ DSK. 009890 FILEL OSK - $\rightarrow$ FMP.FILEL-CONFIGURATION 50K: GRF $\rightarrow->$ DSK. FILE2 DSK $\rightarrow-\rightarrow$ FMP. 009900 FILE3-GAID GOOK: SPS $\rightarrow$ DSK. 009910 FiLE3 DSK ~n FMP. 009930 FMP FLOW CODE PROCESSINGM6OSECIAWAIT 3 FIDO9950 FILEG-RESULTS FILE 180x; FMP $\rightarrow->$ DSK. 009960 FILEA DSK $\rightarrow$ SPS.
 $J O B$ RUN COMPLETED.
MODEL 2-INTEHARRIVAL TIME MEAN OF 300 SECO 01000 SSEC ABOHTED COMPL'N - WAIT 10 MIN. 010020 SPS RECOMPILATION OF SUURCE CODF-i SEC. 010030
FILEI LOAD MODULE OOK: SPS -9 DSK. FILEL LLOAD MODULE OOK; SPS $\rightarrow-\rightarrow$ DSK. FILE1 OSK -3 FMP.
FILE2-CONFIGURATION SOK; GRF …> DSK. FILE2 DSK "-> FMP.
FILE3-GRID 600KI SPS $\rightarrow$ DSK. 01008
010090 COOE PROCESSING-60SECBAWAIT 3 FI010100 FILEB-RESULTS FILE 180KI FMP =-> DSK. FILEB DSK $\rightarrow$ SSS.
SPS POST PROCESSING-120 SECI70\% LOAD. $\quad 010120$ FILET-OUTPUT FILE $120 K: S P S \rightarrow$ GRF.
JOB RUN COMPLETED.


MODEL 2-INTERARRIVAL TIME MEAN OF 300 SECOI0160 | 1 |
| :--- |
| SPS SEC ABORTED SPS COMPILN - WAIT 10 MIN. 010170 | SPS RECOMPJLATION OF SOURCE CODE - 1 SEC.O101BO FILEI-LOAD MODULE GOKI SPS $\rightarrow->$ DSK. 010190 FILEI OSK -WS FMP. FILEZ-CONFIGURATION 50K: ORF mes DSK, 010210 FILESOGRID 600K: SPS $\rightarrow$ OSK. 010220 FILE3 OSK $\rightarrow>$ FHP. 010230 FMP FLOW CODE PROCESSING-60SECIAWAIT 3 FIOIO250 FILE9-DEQUG DUMP 4MI FMP -2 DSK. 010260


 JOB RUN COMPLETED. $\quad 010300$ MODEL 2-INTERARRIVAL TIME MEAN OF 300 SECOIO310 SPS COMPILATION OF SOURCE CODE - 1 SEC 010320
FILEI-LOAD MODULE GOKI SPS $=0>$ DSK.
010330 $\begin{array}{ll}\text { FILEI-LOAD MODULE 6OKI SPS } & -\infty \text { OSK. } 010330 \\ \text { OILEI DSK } \rightarrow \rightarrow \text { FMP. }\end{array}$ FILEQ-CONFIGURATION 50K: GRF $\rightarrow$ SPS. SPS CONFIGUATN MANIPULAPNITO SEC= 30\% LOADO10350 FILEP-CONFIGURATION 5OK: SPS $-\infty$ OSK. 010370
 010370
010380 FILE3 DSK ——P FMP. $\quad 010390$ FMP FLOW CODE PROCESSING-60SECIAWAIT 3 FIOI0410 FILEA-RESULTS FILE 180KI FMP $\rightarrow$ OSK. $^{3} 010420$ FILEB DSK $\rightarrow$ SPS. SPS POST PROCESSING-120 SECITON LOAD. 010440 FILET-OUTPUT FILE $120 K I$ SPS $\rightarrow->$ GRF. 010450
JOB RUN COMPLETED. MODEL Z-INTERARRIVAL YIME HEAN OF 300 SECOI0470 1 SEC ABORTED SPS COMPILN - WAIT 10 MIN. 010480 ABORTED COMP ILATION AGAINOWAIT 5 MIN. 010490 SPS RECOMPILATION OF SOURCE COOE - 1 SEC. 010500 FILE1-LOAD MODULE 6OKI SPS $\rightarrow-2$ DSK. SEC. 010510 FILEA DSK $\rightarrow \rightarrow$ FMP.
$\begin{array}{lll}\text { FILER-CONFIGURATION SOKI GRF } \rightarrow-> & \text { OSK. } 010530 \\ \text { FILE2 DSK } & 010540\end{array}$ FILE3-GRID GOOK: SP LLE3-GRID book SPS $-\infty$ DSK. 010550 FILE3 OSK $-{ }^{2}$ FMP. 010560
 $\begin{array}{lll}\text { FILEE RESULYS FILE } 360 K I \text { FMP } \rightarrow-\rightarrow \text { OSK. } & 010580 \\ \text { FILE8 DSK } & 010590\end{array}$ $\begin{array}{lll}\text { FILES DSK } \rightarrow>\text { SPS. } & & 010590 \\ \text { SPS POST PROCESSING } 240 \text { SEC:70\% LOAD. } & 010600\end{array}$ $\begin{array}{lll}\text { SPS POST PROCESSING-240 SEC:570X LOAO. } & 010600 \\ \text { FILETOOUTPUT FILE } 240 \mathrm{~K} \text {; SPS -A> GRF. } & 010610\end{array}$ JOB RUN COMPLETED. 010620 MODEL 3-INTERARRIVAL TIME MEAN OF 20 MIN. 010630 FILEI-LOAD HODULE 12OKI SPS $\rightarrow-2$ DSK. 010640 FILE: DSK $\rightarrow-$ FAP. 010640
010650 FILER-PATCH AND GRID SETUP $100 \mathrm{KIGRF} \mathrm{\sim OSSS} 010660$ SPS PATCH AND GRID PROCESSING-240S.70\% 010670 FILESURESULTING TRANSFORMED FILE SPS-->OS010680 REST OF FILE3- 3H TOTAL! SPS=->DSK. 010690 AEST OF FILEE OSK $\rightarrow->$ FMP. 010710 FMP FLOW COOE PROCESSING - 60 SEC. 010720 FILEB-RESULTS FILE 90K: FMP $\rightarrow->$ DSK. 010730 FILEE OSK - - S SPS
SPS POSTMPROCESSING:200 SEC. 60xLOAD. 010750
FYLETMDISPLAY FILE 50KI SPS $\rightarrow$ GORF. 010760
MODEL 3 -INTERARRIVAL


FILĖI-LOAD MODULE L20K; SPS mm DSK. 010790
 FILE2 DSK AND GRID SETUP 100 KISPS -NDSKOI0日I0 FMP PATCH AND GRID PROCESSING - 10 sEC. 010830 $\begin{array}{lll}\text { FMP FLOW CODE PROCESSING }-60 \\ \text { SEC. } & 010830 \\ 010840\end{array}$ FILESORAW RESULTS FILE SMI FHP ت-PDSK, 010840 010860 FILEGーRESULTS FILE وOK; FMP $\rightarrow$ OS OSK. FILEB DSK $-\infty$ SPS.

1086 010880 FILET-DISPLAYESSINGJ200 SEC,60\%LOAD. 010890 JOB RUN COMPY FILE 5OKI SPS $\rightarrow$ - GRF. 010900 JOB RUN COMPLETE. 010910 MODEL $3=I N T E R A R R I V A L$ TIME MEAN OF 20 MIN. OIO920 $\begin{array}{lll}\text { FILEL-LOAD MODULE } 120 K: S P S ~ & \rightarrow> \\ \text { FILE DSK. } & 010930 \\ 010940\end{array}$ FILE2-PATCH AND GRID SETUP $100 \mathrm{~K}:$ GRF $m>S P S O 10950$ SPS PATCH AND GRID PROCESSINO1240S,70\% 010960 FILESMPATCH AND GRID TRANSFORMED: $3 M$. TO DOIO970 FILE3 DSK - F-> FMP. $\begin{array}{lll}\text { FHP FLOW CODE PROCESSING - } 60 \text { SEC. } & 010990 \\ \text { FILER-RESULTS FILE 90K: FMP } & -2 \text { DSK. } & 011000\end{array}$ FILER-RESULTS FILE 90K: FMP $->$ DSK. 011000 FILEE DSK MOS SPS:
SPS POST-PROCESSINGIROO SEC,6OXLOAD. FILETMDISPLAY FILE SOKI SPS $\rightarrow->$ GRF.
 MODEL 3-INTERARRIVAL TIME MEAN OF 20 MIN 011040 FILEL-L.OAD MODULE IZOKB SPS $\rightarrow$ OS DSK MIN. 011050 FILEI DSK $\rightarrow$ - FMP. 011070 FILE2-PATCH AND GRID LOOK: SPS --2 DSK. Ol1080 FILEE DSK $\rightarrow \underset{\text { FMP. }}{ }$ FMP PATCH AND GRID PROCESSING- 10 SEC. O 011100 FMP FLOW CODE PROCESSING - 60 SEC, FILE9-RESTART FILE 7 II FMP $\cdots>$ DSK. 011120 FILEB-RESULTS FILE $90 K 1$ FMP $\rightarrow \rightarrow$ DSK. 011130 $\begin{array}{lll}\text { FILEA OSK - }->\text { SPS. } & 011140 \\ \text { SPS POST-PRUCESSING:200 SEC.bOXLOAD. } & 011150\end{array}$
 JOB RUN COMPLETE. 0111170 MODEL 3-INTERARRIVAL TIME MEAN OF $20 \mathrm{MIN}, 01 I 180$ SPS COMPILATION OF SOURCE COOE-Z SEC. 011190 FILEI-LOAD MODULE L2OK: SPS $-\rightarrow$ DSK. 011200
 FILEZ DSK $\rightarrow$ FMP. 011230 FMP FLOW CODE PROCESSING - 60 SEC. 011240 FILEA-HAW RESULTS SMI FMP $\rightarrow-7$ DSK. 011250 FILEB-RESULTS FILE 90K; FMP - $->$ OSK. 011260 FILEB DSK SPS POST-PHOCESSINGI200 SEC,60XLOAD. FILET-DISPLAY FILE $50 K I ~ S P S ~$ OP GRF. 011290

011260
011270 011270
011280 MODEL 3-INTERARRIVAL 71 ME MEAN OF 20 MIN.OII310 FILEL-LOAD MODULE I2OK; SPS -m DSK. NII O2n FILEI DSK $\rightarrow$ FMP.

011320
011330 FILEZ-PATCH AND GHID SETUP 100 O GGFF-NPSPSDII340 SPS PATCH AND GRIO PHOCESSINGI24OS,70\% 011350 FILEBWPARCH AND GRID TRANSFORMED: $3 M$, TO DOII 360 FMP FLOW COOE PROCESSING - 60 SFC .
FILEARESESULTS FILE 90KI FMP -n> OSK. 011380 $\begin{array}{lll}\text { FILEA-RESULTS FILE 90KI FMP } & -\cdots> \\ \text { FILEA DSK. } & 011390 \\ \text { DSK SPS. } & 011400\end{array}$
SPS POST-PHOCESSINGI200 SEC, 60\%LOAD: $\quad 011410$
$900,0,0,0,0,0$
1135.
1136.
$120200900,2,1,0$
$\begin{array}{ll}1136 . & 921+1,0+0,4,-1 \\ 1137 . & 921,10,0,1+4,0\end{array}$
$\begin{array}{ll}1137 . & 921,10,0,1,4,0 \\ 1130 . & 922,20,0,1,4,-1\end{array}$
1139. $920,61,360,35,2,-1$
1140. 923.1.0.0.92,-1
1141.
1142.
$1230,10,0,1092,-1$
$920,600,0.20-1$
$1143 . \quad 928,10,0,0,8,-1$
$1144^{\circ} \mathrm{F} 928,1,011.8,-1$
$\begin{array}{ll}1450, & 920.61+360 \cdot 3011+1 \\ 1146 . & 920.61 .360 .30 .01-1\end{array}$

| 1147. $927,20,0,0,3,-1$ |
| :--- |
| 114 B. |
| $220,0,0,0,0,0$ |

1149. 930.0.900.1,0.0
1150. 931,1,0,0,4:=
1151, $931+10,0,1,4,0$
$\begin{aligned} & 1152 . \\ & 1153 .\end{aligned} 932 \cdot 1,0+0,4 \cdot=1$
1151. $932,10,0,1: 4,-1$
$\begin{array}{ll}1155 . & 9130600600+0,0,-1 \\ 1156 . & 939,10000+310,-1\end{array}$
1152. $938010,0.0 .8,-1$
1153. 938.1,0,1,8,-1
$\begin{array}{ll}1159, & 930,61,360,3011,-1 \\ 1160 . & 930,61,360,30.0,-1\end{array}$
1154. 957,20,0,0,3,-1
$\begin{array}{ll}1166, & 930,0,0101010 \\ 1163 . & 940,0,900.1,0 .\end{array}$
1164 . $940.61+3,35,0,-1$
1155. 941,100.0.4,-1
$1165{ }^{\circ}$
$1160^{\circ}$
1156. 
1157. 
1158. 
    \(942,10,0,1 ; 4,0\)
    \(942,100,0,4,-1\)
    $942010,0,1,4,-1$
940,60,600:0,2:-1
$949,10+0,0160.0$
948,10,0,0,8,-1
$94 \mathrm{~A}, 1,0 \cdot 1,8,-1$
$940,61,360+3011+-1$
$940.61,360 \cdot 30+0,-1$
947.20.0.0.3.-1
$950.0 \cdot 90010$
$95010190011,0,1$
$951,1+0,0,41-1$
$951+1010,1140$
$952,20,0,1,4,-1$
$950,61,360,35,2,-1$
$953.100,0,46,-1$
$953.10 .0+1,4600$
$953: 1: 0,0,46,-1$
$953,10,0,1,460-1$
$950,60,600,0,2,-1$
95B,10,0.0.B.m1
95B.1,0.1.8.-1
950.61,360,30.1,
$950,61,360,30,0,-1$
957,20,0,0,3,-1
$950,010 \cdot 0,0$
$960,0,900,1,0,0$
$961+10=0,1=4,0$

FILET－DISHLAY FILE 50Ks SPS $\rightarrow->$ GHF． MODEL 4 －INYEHARRIVAL．TIME MEAN OF 15 MIN O O 11430 FILEI－LDAO MODULE 120 K ：SPS $\rightarrow$ DSK．$\quad$ Oll 1450 ILEE DSK $\rightarrow$ FMP． FILEZ－PATCH AND GHID SETUP 100K；GRF－WSSSSOl1470 SPS PATCH AND GRID PHOCESSINGI24OS． 708 O114BD FILE 3 DSK ．．．$\rightarrow$ EMP． FMP FLOW CODE PROCESSING－ 600 SEC．B 11510

 SPS PUST－PROCESSING：240 SEC，60\＆LOAD． 011540
 JOQ HUN COMPLĖYE． 011560 MODEL 4 －INTERARRIVAL TIME MEAN OF 15 MIN OII580 FILEI－LOAD MODULE L20K：SPS $\rightarrow \rightarrow$ DSK． 011590
 FILEZ－PATCH AND GRID lookt SPS $\cdots$ DSK． 011610 FILES DSK $\rightarrow>$ FMP． 011620 FMP PATCH AND GRID PROCESSING 10 SEC． 011630 FMP FLUW CODE PROCESSING $\quad 600$ SEC． 011640 FILEB－RESULTS FILE 225K：FHP $-\boldsymbol{2}$ OSK， 011660
 SPS POST－PRUCESSING：240 SEC．60\％LOAD．
＂＂＂＂＂ 011680 FILE 7－DISPLAY FILE 75KI SPS－－＞GRF． 011700 JOA RUN COMPLETE， MODEL 4－INTERARRIVAL TIME MEAN OF 15 MIN OIII720
SPS COMPILATION OF SOURCE CODE－2 SEC FILEI－LOAD MODULE 120 K ；SPS $\mathrm{mm} \rightarrow$ DSK． 011730 FILE！DSK $\rightarrow$ FMP． 011750 FILEZ－PATCH AND GRIO SETUP $100 \mathrm{KISPSma-DSKO11760}$ FILE2 DSK－－＞FMP． FILEO－RAW RESULTS 5M3 FMP－－＞DSK． 011780
 SPS POST－PROCESSING！240 SEC，60\％LOAD． 011820 FILET－DISPLAY FILE 75KI SPS－w GHF． 011840 JOB RUN COHPLETE．
MODEL $4-I N T E R A R R I V A L$ YIME MEAN OF 15 MIN 011860 FILEL－LOAD MODULE $120 K$ SPS $\rightarrow$ OSK．O11870 FILEZ－PATCH AND GRID SETUP $100 \mathrm{KIORF-OSPSO11890}$ SPS PATCH AND GRID PROCESSING－240S．70\％011900 FILE3－RESULTING TRANSFORMED FILE SPSma＞OSO11910 FILE3 DSK－n＞FMP．
REST OF FILE3－3M TOTAL SOS－ 2 DSK 011920
 REST OF FILES DSK MAD FMP．
FILES＊RESULTS FILE 22SK：FMP 600 SEC． 011940 FILEB OSK $\rightarrow->$ SPS．22SK：FMP $\rightarrow$ OS DK． 011960 SPS POST－PROCESSINGI24O SEC，60\＄LOAD． 011970 ＂ILET＂＂${ }^{\prime \prime}$＂ 011990 FILETODISPLAY FILE 75K：SPS $\rightarrow 0$ GRF． 012000 JOB RUN COMPLETE．
MODEL 4WINTERARRIVAL TIME MEAN OF 15 MIN 012020 $\begin{array}{lll}\text { FILEI MLOAD MODULE } 120 K ; ~ S P S ~\end{array} \rightarrow>$ OSK． 012030

arkival time for individual. joas


|  | 20 |  | lead | 2262 |
| :---: | :---: | :---: | :---: | :---: |
|  | 21 |  | LEAD | 2646 |
|  | 22 |  | lead | 2931 |
|  | 65 |  | lead | 2882 |
|  | 23 |  | Lead | 2958 |
|  | 66 |  | lead | 2965 |
|  | 24 |  | Lead | 3 31 |
|  | 67 |  | lead | 3077 |
|  | 25 |  | Lead | 3208 |
|  | 95 |  | Lead | 3227 |
|  | 26 |  | вackup | 3231 |
|  | 68 |  | васкир | 3280 |
|  | 69 |  | васкир | 3338 |
|  | 70 |  | backup | 3530 |
|  | 86 |  | backup | 3682 |
| $\xrightarrow{\square}$ | 71 |  | backup | 3729 |
| \% | 27 | , | backup | 3750 |
| N | 96 |  | gackup | 3754 |
|  | 72 |  | Lead | 3799 |
|  | 97 |  | LEAD | 3926 |
|  | 28 |  | Lead | 3932 |
|  | 73 |  | Lead | 4152 |
|  | 29 |  | lead | 4532 |
|  | 87 |  | lead | 4576 |
|  | 88 |  | backup | 4889 |
|  | 98 |  | backup | 4900 |
|  | 30 |  | lead | 4992 |
|  | 31 |  | lead | 5035 |
|  | 32 |  | Lead | 5038 |
|  | 33 |  | lead | 5102 |
|  | 34 |  | LEAD | 5257 |
|  | 35 |  | lead | 5366 |


|  | 36 | LEAD |  | 5518 |
| :---: | :---: | :---: | :---: | :---: |
|  | 37 | Lead |  | 5523 |
|  | 38 | Lead |  | 5590 |
|  | 39 | Lead |  | 5656 |
|  | 40 | lead | - | 5690 |
|  | 74 | Lead |  | 5879 |
|  | 41 | LEAD |  | 5A80 |
|  | 99 | backup |  | 5944 |
|  | 42 | backup |  | 6078 |
|  | 43 | backup |  | 6088 |
|  | 75 | backup |  | 614 A |
|  | 76 | backup |  | 6215 |
|  | 44 | backup |  | 6357 |
|  | 77 | backup |  | 6486 |
| - | 78 | backup |  | 8522 |
| $\cdots$ | 45 | backup |  | 6530 |
| 1 | 79 | backup |  | 6767 |
| N | 46 | LEAD |  | 6926 |
|  | 47 | LEAD |  | 7089 |
|  | 48 | LEAD |  | 7148 |
|  | 80 | Lead |  | 7702 |
|  | 49 | lead |  | 7347 |
|  | 50 | Lead |  | 7754 |
|  | 89 | lead |  | 7357 |
|  | 51 | backup |  | 7358 |
|  | 52 | lead |  | 7535 |
|  | 81 | LEAD |  | 7595 |




## SUMmARY OF JOB * 4



 JOB $\% 6$ COMPLETE AT RELATIVE CLOCK * 2641 sEC.

SUMMARY OF JOB \# 7

summahy of job * a
 JOB \# a complete at relative clouck * 2581 sec.

## SUmmary of JOB m 9



sumhary of job* 11

 Jou \# 12 complete at relative clock a 2749 sec.
summary of job * 13



## SUmhary of JO日 15





## summary of job \# 1a

 JOB \# 18 COMPLETE at relative clock \# 3108 sec.

SUMMARY OF JOB * 19

requested the spsi processor at hel. clk. 2572 sec.o and has setied it at hel. clko 2572 for go seconis. - Job " 19 complete at relative clock = 2633 sec.


JOB " 20 COMPlete at rel.ative clock. 2974 sec.

## SUMmARY OF JOB＂ 21

File No．
blocks $\begin{aligned} & \text { NUMbEH OF } \\ & \text { IRANSFERRED }\end{aligned}$ ROM TO

TRANSFER COMPLETED AT
TRANSFER COMPLETED AT
RELATIVE CLOCK TIME ISEC）．．．OF TUAATION（SEC）
requested the spsi processor at rel．Clk． 2646 sec．．and has setzed it at rfi．oik． 2646 fur 1 sfconths．
 Job \＃ 21 complete at aelative clock m 2777 sec．

REQUESTED THE SPSI PROCESSÓ́R AT REL．CLK． 2831 SEC．，AND HAS SEIZED IT AT REL．CLK．2日3i FOR 1 SECONRS，

| 1 | 1 | S＇PेS | OSC | 2033 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | O＇SC | FMP | 2834 | 0 |
| 2 | 1 | OPHI | DSC | 2835 | 1 |
| 2 | 1 |  | 2037 |  |  |

hequested the fmp processor at rel．clk． 2838 SEC．，and has Seized it at rfi．clk． 2838 for 10 seconds．
B FHP OSC $2849 \quad 0$
 JOB＂ 22 complete ay relative clock ． 2912 sec．




## SUMMARY OF J08 \# 26



[^1]SUmMARY OF JOB " 27




## SUMMARY OF JOB " 31



REQUESTED THE SPSI PROCESSOR AT HEL. CLK. 5035 SEC., AND HAS SEIZED IT AT RFL. CLK. 5035 fUR 1 SECONOS.

| 1 | 1 | SPS | DSC | 5037 | 0 |
| :--- | :--- | ---: | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 5038 | 0 |
| 2 | 1 | BPH2 | DSC | 5040 | 1 |
| 2 | 1 | DSC | FMP | 5041 | 0 |

REQUESTED THE FMP PROCESSOR AY HEL. CLK. 5042 SEC.O AND HAS SEJZED it aY RFL. cLK. 5975 fur 10 SECONDS.

| $\mathbf{y}$ | FMP DSC | 5987 | 0 |
| :--- | :--- | :--- | :--- |


| 2 | DSC SPS 6004 | 1 |
| :--- | :--- | :--- |

REQUESTED THE SPSI PROCESSOR AT HEL. CLK. 6005 SEC. O AND HAS SEIZED IT AT RFLO CLK. 6085 FOR 60 SECONTS. JOE " 31 COMPLETE AT relative clock . 6126 sec.



## SUMMAHY OF JOB * 34



Job * 34 complete at relative clock * 6901 SEC.

SUMMAHY OF JOB * 35
 JOR 35 COMPlete at relative clock - 6730 sec.



## SUmmary of Jog \# 38



SUMHARY OF JOB * 39
 SOB \# 39 complete at relative clock - 6811 sec.

## SUhmary of JOB a $\$ 0$




TRANSFER COMPLETEO AT



| 1 | 1 | SPS | DSC | 6080 | 0. |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | DSC | FMP | 6082 | 0 |  |
| 2 | 1 | GPHI | DSC | 6083 | 1 |  |
| 2 | 1 | OSC | FMP | 6084 | 0 |  |
|  | Requested the | FMP PROCESSOR AT | REL. CLK. 6085 | sec., and has seized it at rfl. clik. | 6785 FOR | 10 Seconns. |
| 8 | 2 | FMP | DSC | 6797 | 0 |  |
| 8 | 2 | DSC | SPS | 6798 | 0 |  |
|  | Requested the | SPS2 PROCESSOR AT | HEL. CL.K. 6799 | SEC.. AND HAS SEtzed it at rel. clk. | 6799 FOR | 60 SECONIIS. | JOB " 42 COMPLETE AT RELATIVE CLOCK w 6860 sEC.

SUMMARY OF JOB \# 43


## Summary of jog \# 46



SUMMARY OF JOB m 45


## SUMmARY OF JOB * 47




SUMMARY OF JO8 * 49





## summary of jog * 52



SUMMARY OF JOB * 61


## SUMMARY OF JOB " 62





(



## SUMAARY OF JOB \# 69



## SUMMARY OF JOB * 70



SUMMARY OF JOB * 71


## SUMMARY OF JOB * 72


summary of job M 73



## summary of job * 75



SUMMARY OF JOe * 76

REQUESTED THE SPSZ PROCESSOR AT REL. CLK. 6215 SEC., AND HAS 5EIZEO IT AT RFL. CLK. G215 FOR 1 SFCONNS. requested the spse processor at rel. cle. 6816 sec., and has sejzed it at rfl. cl.k. 70pl fur 1 seconds.

| 1 | 2 | SPS | DSC | 7024 | 0 |
| :--- | :---: | :---: | :---: | :---: | :---: |
| 1 | 2 | DSC | FMP | 7025 | 0 |
| 2 | 2 | GPH | DSC | 7028 | 2 |
| 2 | 2 | OSC | FMP | 7029 | 0 |
| 3 | 19 | SPS | DSC | 7041 | 5 |
| 3 | 14 | DSC | FMP | 7042 | 0 |

REQUESTED THE FHP PROCESSOR AT REL. CLK, 7043 SEC., AND HAS SEIZED IY AY RFL. CLK. 7585 FOR 60 SECONSS.
125 FMP DSC 7651
7658

SUMMARY OF JOB * 77






## SUMmAHY OF JO日 M 86





SUMmARY OF JOB * 89

requesteo the spsi processor at rel. clk. 7357 sec.f and has seized it ay rel. clik. 73by Hor 3 seconds.

| 1 | 4 | SPS | DSC | 7363 | 1 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 4 | OSC | FHP | 7364 | 0 |
| 2 | 4 | SPS | DSC | 7366 | 1 |
| 2 | 4 | DSC | FMP | 7367 | 0 |

summary of Jón 92


| FILE NO. | BLOCKS | NUMEER OF transfermed | FROM | то | transfer completed at RELATIVE CLOCK TIME (SEC)... | OF | DURATION <br> TRANSFER (SEC) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 |  | 4 | SPS | DSC | 1104 |  | 1 |
| 1 |  | 4 | DSC | FMP | 1105 |  | 0 |
| 2 |  | 4 | SPS | DSC | 1106 |  | 1 |
| 2 |  | 4 | DSC | FMP | 1107 |  | 0 |

requested the fmp processor at rel. clk. 110 s sec., and has setzeo it at rfl. gik. 1108 fur 10 seronins. heduested the fmp processor at rel. clk. ll19 sec., and has setzed it at rel. clik. Illa for goo seconts.

| 310 | FMP | DSC |
| ---: | ---: | ---: |
| B | FMP | DSC |
| $B$ | DSC | SPS |

1731
10

8
REQUESTED THE SPS2 processor at rel. CLK. 1737
Requested the spsz processor at rel. clk. 2098 sec.. and has seized it at rfl. clk. 2098 fur 360 seconis 7

3
SPS GPH
2461
1
Job 93 complete at relative clock $=2462$ sec.


SUMmARY OF JOB 95



SUMMARY OF JOB * 96


Summary of joo * 97 :

| FILE NO. | number of blocks transferred | FROH | To | transfer completed at relative clock time (SEC)... | duration <br> of transfer (SEC) |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 4 | SPS | DSC | 3927 | 1 |
| 1 | 4 | DSC | FHP | 3929 | 0. |
| 2 | 4 | SPS | dsc | 3930 | 1 |
| 2 | 4 | osc | FMP | 3931 | 0 |

REQUESTED THE FMP PROCESSOR AT REL. CLK. 3932 SEC., AND HAS SEIZED IT AT REL. CLK. 4375 FOR 10 SECONOS. REQUESTED THE FHP PROCESSOR AT REL. CLK. 4386 SEC., AND HAS SEIZED IT AT RFL. CLK. 5295 FOR 600 SECONOS.

| 9 | 310 | FMP | DSC | 5911 | 14 |
| :--- | ---: | :--- | :--- | :--- | :--- |
| 8 | 8 | FMP | DSC | 5912 | 0 |
| 8 | 8 | DSC | SPS | 5918 | 4 |

REQUESTED THE SPSI PROCESSOR AT REL. CLK. 5919 SEC., AND has SEIZED it at rFl. CLK. 5919 FOR 360 SECONIS. requested the spsi processor at rel. Clk. 6280 sec., and has seized it at rfl. clk. gero fur 380 seconins. 7

SPS GPHZ
6643
1

## SUMMARY OF JOB 98



summaky of file transfer reguests



JOB \# 2 HEQUESTED THE SPS2 PROCESSOR AT REL. CLK。 251 SEC., AND HAS SEIZED IT AT REL. CLK. 25! FOR I GECONDS.
JU日 ${ }^{\prime \prime} 3$ hequested the spsz processor at rel. ClK. 354 sec.o and has seized it at rel. clk. 354 foh 1 secorios.

| 1 | 1 | SPS | DSC | 356 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 357 | 0 |
| 2 | 1 | GPHI | OSC | 359 | 1 |
| 2 | 1 | DSC | FMP | 360 | 0 |


8 2 FMP DSC

372
374
0
 JOB * 4 HEQUESTED THE SPSI PROCESSOR AT REL. CLK. 390 SEC. AND HAS SEIZEO IT AT REL. CLK. 390 fOR 1 secondos.

| 92 | SPS | OSC |
| ---: | :--- | :--- |
| 4 | SPS | DSC |
| 92 | DSC | FMP |


| 393 | 27 |
| :--- | ---: |
| 397 | 1 |
| 398 | 3 |

File transfer summary - paoe 2


JOB * $\quad$ G KEQUESTED THE SPSZ PROCESSOR AT REL. CLK. 775 SEC., AND HAS SEIZED it AY RFL. CLK. TTS FOR 1 GFCONAS.
85
85
85
85

 JOB \# 7 HEquested the spsi processor at rel. CLK. 940 sec., and has seized it at rel. clk. g40for 1 secondos.

| 7 | 1 | 1 | SPS | DSC | 942 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 7 | 1 | 1 | DSC | FMP | 943 | 0 |
| 7 | 2 | 1 | BPH2 | DSC | 945 | 1 |

file transfer summary - page 3

| JOB NO. |  | File no. | $\begin{array}{r} \text { NUMB } \\ \text { BLOCKS } \end{array}$ | EER of rransferred |  | FROM | T0 | transfe aelative | r completen at clock timpisfli.. | OF | oluation <br> thansfer | (SfC. ${ }^{\text {a }}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\boldsymbol{\gamma}$ |  | 2 |  | 1 |  | DSC | FMP |  | 946 |  | 0 | 0 |
|  | Joe * | 4 hequested | THE SPS 1 | PROCESSOR | ar | REL. CLK. | 991 SEC. | AND HAS | SEIZED If AT RFL. C | CLK. | 991 FOR | 1 Seconits. |
| 4 |  | 1 |  | 1 |  | SPs | DSC |  | 993 |  |  | 0 |
| + |  | 1 |  | 1 |  | DSC | FMP |  | 994 |  |  | 0 |
| 4 |  | 2 |  | 1 |  | GPH1 | DSC |  | 996 |  |  | 1 |
| 4 |  | 2 |  | 1 |  | DSC | FMP |  | 997 |  |  |  |
|  | J08 * | 1 hequested | THE FMP | Processor |  | REL. CLK. | 766 sec. ${ }^{\text {c }}$ | and has | SEIZED It ar mfl. | CL.K. | 999 FOR | in seconds. |
| 92 |  | 8 |  | 8 |  | FMP | DSC |  | 1000 |  |  | 0 |
| 92 |  | 8 |  | 8 |  | DSC | SPS |  | 3004 |  |  | $?$ |
|  | J08 | 92 ReOUESTED | THE SPSI | PROCESSOR | AT | REL. CLK. | 1005 SEC. ${ }^{\text {d }}$ | AND HAS | SEIZED It at rfl. | CLK. 1 | 1005 FOR | 360 SECONDS. |
|  | JO日 ${ }^{\text {\% }}$ | 85 hequested | THE FMP | PROCESSOR | AT | REL. CLK. | 796 sEC. | AND HAS | SEIzed it at rel. | CLK. 10 | 1009 FOR | so secomba. |
| 1 |  | 8 |  | 2 |  | FMP | DSC |  | 1010 |  |  | 0 |
| 1 |  | 8 |  | 2 |  | OSC | SPS |  | 1012 |  |  | 0 |
|  | J08" | 1 hequested | The SPS2 | PROCESSOA | AT | REL. CLK. | 1013 SEC. | AND has | seizeo it at rel. | CL.K. | 1013 FOR | 60 9FCOMDS. |
|  | J00 \# | 8 Hequested | THE SPS2 | PROCESSOR | At | REL. CLK. | 1032 sEC. | and has | SEIzeo it at rfl. | CLK. | 1032 FOA | 1 SECONDS. |
|  | J00 \% | 7 hequested | THE FMP | PROCESSOR |  | REL. CLK. | 947 SEC. ${ }^{\text {a }}$ | AND HAS | SEIZED If at rel. | CLK. | 1069 FOA | 10 seconnos. |
| 85 |  | 8 |  | 3 |  | FMP | DSC |  | 1070 |  |  | 0 |
| 85 |  | 8 |  | 3 |  | DSC | SPs |  | 1072 |  |  | 1 |
|  | J0日 * | bs Requesteo | $\begin{aligned} & \text { THE SPSI } \\ & \text { JOB } 1 \end{aligned}$ | PROCESSOR |  | REL. CLK. <br> elative cl | 1073 SEC. <br> аск . 1074 | and has SEC. | SEIZED IT at rel. | CL.K. | $1073 \mathrm{FOR}$ | $300 \text { SECONDS. }$ |
|  | JOB* | - Requested | THE FMP | PROCESSOR |  | REL. CLLK. | 998 SEC.. | and has | seizeo it at rel. | CLK. | 1079 F0R | 10 seconds. |
| 7 |  | 8 |  | 2 |  | FAP | DSC |  | 1080 |  |  | 0 |
| 7 |  | 8 |  | 2 |  | DSC | SPS |  | 1082 |  |  | 0 |
| 7 | J00 * | 7 Requested | THE SPSI | PROCESSOR |  | gel. CLK. | 1083 SEC. | AND HAS | SEized it at mel. | CLK. | 1083 FOR | 60 SECONDS. |

file transfer summary = page :


NUMBER OF
FROM
10
TRANSFER COMPLETED AT RELATIVE CLOCK time isect guration of transfer (sec.)


| 9 | 90 | FHP | DSC |
| :--- | ---: | :--- | :--- |
| 2 | 4 | SPS | DSC |
| 1 | 4 | DSC | FMP |
| 2 | 4 | SPS | DSC |

1093
3
 JOB " 93 requested the fhp processor at rel. CLK. 1119 sec., and has seized it at rFl. Clk, 1119 for $6 n O$ SECONDS. JOB " $\gamma$ complete at relative clock = 1144 sec.

2
2
2
2
4

JOB * \& REQUESTEO THE SPSł PROCESSOR AT REL. CLK. 1171 SEC. AND HAS SEIZED IT AT RFL. CLK. 1171 FOR GO SETONDS. JOB.. of hequested the spsz processor at rel. ClK. 1171 sec., and has seized it at rel. clk. 1171 for 3 seconds.

| 94 | 1 | 4 | SPS | DSC | 1177 | 1 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 94 | 1 | 4 | DSC | FMP | 1178 | 0 |
| 94 | 2 | 4 | SPS | DSC | 1179 | 1 |
| 94 | 2 | 4 | DSC | FMP | 1180 | 0 |

job w complete at relative clock = 1232 sec.

File transfer summary - page 5

J0e NO.
FILE NO.

## NUMBER OF blocks transferreo

то

## TAANSER COMPLETED AT

 gelative ctock timetsect ouration OURATIONTRANSFEA (sEc.) JOB \# II REQUESTED THE SPSZ PROCESSOR AT REL. CLK, 1297 SEC." AND HAS SEIZED IT AT REL. CLK. 1297 fOR 1 seconds.

JOB " 5 hequested the sps2 processor at rel. CLK. 1346 sEc., and has seized it at rel. clo. 1346 for 1 SECONDS. JOB " 62 HEQUESTED THE SPSZ PROCESSOR AT REL. CLK, 1349 SEC., AND HAS SEYZED IT AT REL. CLK. 1349 FOR 1 SECONDS.

| 1 | 1 | SPS | DSC | 1349 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | OSC | FMP | 1350 | 0 |
| 2 | 1 | GPH2 | OSC | 1351 | 1 |
| 1 | 2 | SPS | DSC | 1351 | 0 |
| 2 | 1 | DSC | FMP | 1352 | 0 |
| 1 | 2 | DSC | FMP | 1353 | 0 |
| 2 | GPH | SPS | 1356 | 1 |  |


 JOE * 92 hequested the spsi processor at rel. clk. 1366 seg., and has seizeo it at rel. clk. 1366 for 360 seconos.
jub " 6 hequested the spse processor at rel. Clk, 1376 sec.o and has seized it ar rel. clk. 1376 for i geconds. JOB 05 COMPLETE AT RELATIVE CLOCK - 1376 SEC.

12

SPS DSC
1417
0

教



File transfer summary - page

JOB NO.
file no. $\qquad$
FROM glocks transferred

TAANSFER COMPLETED AT
RELATIVE CLOCK TIME (SEC)....
2013
GPHI DSC

DSC
FMP
2015
duration - TRANSFER (SEC.) JOB \# 64 REQUESTED THE SPSI pROCESSOR AT REL. CLK. 2014 5EC., and has SEIZED it at rel. CLK. 2014 FOR 1 SECONDS.

1

2
1
0
JOB * 16 hequesteo the spsi processon at rel. ClK. 2039 sec., and has seized it ay rfl. clk. 2039 for 1 SECONDS.
JOB W 17 hequested the spsi processor at rel. ClK, 2051 SEC., and has seized it at rel. ClK, 205l for 1 SECONDS. JOB * 12 hequested the sps processor at rel. CLK. 20日e sec., and has seized it at rel. clk. 2088 for 1 SECONDS.

| 1 | 1 | SPS | DSC | 2090 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 2091 | 0 |
| 2 | 1 | GPHZ | DSC | 2093 | 1 |
| 2 | 1 | DSC | FMP | 2094 | 0 |


 JO日 * 19 hequested the spsl processor at rel. Clk. 2223 sec., and has selzed it ay rel. clk. 2223 for, $1 \cdot$ geconids.

| 1 | 1 | SPS | DSC | 2225 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 2226 | 0. |
| 2 | 1 | GPH2 | DSC | 2228 | 1 |
| 2 | 1 | DSC | FMP | 2229 | 0 |



| 2 | SPS | DSC |
| :--- | :--- | :--- | :--- |
| 2 | DSC |  |

2260 -
0
file transfer summary - page 9


## FILE YRANSFEG SUMMARY - PAGE 10

JOB NO.
FILE NO.

то
TRANSFER COMPLETED AT muration TRANSFER (SEC.)
j08" 21 hequested the spse processor at rel. CLK, 2373 sec., and has seized it at rel. clo. 2432 for 60 seconids. jot " 5 complete at relayive clock * 2433 sec.

JOB 93 COMPLETE AT RELATIVE CLOCK 2462 SEC.
job " b requested the fmp processor at rel. clek. 1640 sec.. and has sexzed it at rel. clek. 2469 for 10 seconds.

| 8 | 6 | FMP | DSC | 2471 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 6 | DSC | SPS | 2475 | 2 |




| 9 | 125 | DSC | SFS | 2485 | 69 |
| :--- | ---: | :--- | :--- | :--- | ---: |
| 9 | 90 | FMP | DSC | 2486 | 5 |



| 8 | 2 | FMP | DSC | 2490 | 0 |
| :--- | ---: | :--- | :--- | :--- | :--- |
| 9 | 125 | DSC | SPS | 2490 | 67 |

JOB \# 6) COMPLETE AT RELAYIVE CLOCK $=2491$ SEC.
JO日 " 11 COMPLETE AT RELATIVE CLOCK $=2493$ SEC.


| 8 | 2 | FHP | DSC | 2500 |
| :---: | :---: | :---: | :---: | :---: | 10 seconos.


| $a$ | 2 | FHP | DSC | 2510 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| $B$ | 2 | DSC | SPS | 2512 | 0 |

JOB 15 REqUESTED THE SPSI PROCESSOR at REL. CLK. 2513 SEC., and has SEIZED it ar rel. CLK. 2513 for go seconds.

file transfer summary - Page il

TRANSFER COMPLETEN AY
RELATIVE CLOCK TIME(SEC)... TRANSFER COMPLETEA AY
RELATIVE CLOCK TIME (SEC)... OF THANSFER (SEC.)
duration
job * o hequested the spsa processor at rel. clik. 2519 sec., and has seized it at hel. clk. 2520 for 60 gecomds.
8
FMP DSC
2520
0
job " 19 hequested the fmp processor at rel. cle. 2z30 sec.o and has seizeo it at rel. clk. 2529 for 10 seconos.
job " 63 kequested the fmp processor at rel. CLK. 2273 séc., and has seized it at rfl. cl.k. 2539 for go seconds.

| 9 | 90 | FMP | DSC | 2543 | 3 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 9 | 90 | DSC | SPS | 2571 | 26 |

JOB \# 19 hequested the Spsi processor at rel., CLK. 2572 SEC., and has seized it at rel. clk. 2572 for 60 geconds. JOB ${ }^{2} 15$ COMPLETE AT RELATIVE CLOCK $\quad 2574$ SEC.

JOB * 6 REQUESTED THE SPSZ PROCESSOR AT REL. CLK, 2520 SEC., aND HAS SEIZED IT AY REL. CLK. $25 G 0$ FOR 60 SECONOS. JOB " a complete at relative clock - 258: sec.
JOB " 13 KEQUESTED THE FMP PROCESSOR AT REL. CLK. 2294 SEC., AND has SEIZEO IT AT REL. CLK. 2599 FOR 10 SECONOS.

| 8 | 12 | FMP | DSC | 2601 | 0 |
| :--- | ---: | :--- | :--- | :--- | :--- |
| 8 | 2 | $F M P$ | DSC | 2610 | 0 |



| 64 | 1 | 2 | SPS | DSC | 2618 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 64 | 1 | 2 | DSC | FMP | 2619 | 0 |
| 64 | 2 | 2 | OPH2 | DSC | 2622 | 2 |
| 64 | 2 | 2 | DSC | FMP | 2623 | 0 |
| 64 | 3 | 19 | SPS | DSC | 2628 | 0 |
| 64 | 3 | 19 | DSC | FMP | 2630 | 0 |

 JOB \# 19 COMPLETE AT RELATIVE CLOCK - 2633 SEC.
JOB \# 16 REQUESTEO THE SPSI PROCESSOR AT REL, CLK. 2640 SEC." AND HAS SEIZED IT AT REL. CLK. 2640 FOR 1 SECONDS. JOB \# 9 REQUESTED THE SPSZ PROCESSOR AT REL. CLK. 2521 SEC., AND HAS SEIZED IT AT REL. CLK. 2640 FOR 60 SECONDS. JOB * 6 COMPLETE AT RELATIVE CLOCK $=2641$ SEC.

file transfer summaity - page 13

| JOB NO. | File NO. | NUMGER OF blocks transferred | FROM | T0 | thansfer completen at relative clock time (SEC)... | OF | ouration fransfer (SEC.) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |

JOB * 64 HEQUESTED THE SPSI PROGESSOR AT REL. CLK. 2696 SEC., AND HAS SEIZED IT AT RFL. CLK. ZGYG FOR IAO SECONDS.
JOB \# 94 HEQUESTED THE SPSE PROGESSOR AT REL. CLK. 2697 SEC.. AND HAS SEIZED IT AT RFL. CLK. 2697 FOR 360 GFTONDG.

JOE \# 9 COMPLETE AT RELATIVE CLOCK - 2701 SEC.
Job . 14 Requested the spsi processor at rel. clo. 2703 sec.. and has seized it at rel. clk. 2703 for 1 geronos.

| 1 | 1 | SPS | OSC | 2706 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 2708 | 0 |
| 2 | 1 | GPH | DSC | 2710 | 2 |



| 2 | 1 | DSC | FMP | 2711 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 6 | 2 | FHP | DSC | 2712 | 0 |
| 0 | 2 | DSC | SPS | 2715 | 1 |


 JOB * 10 COMPLETE AT RELATIVE ClOCK a 2722 sEC.


JOB \# 63 REquested the spsz processor at rel. CLK, 2665 sec., and has seized it at rel. clik. 2724 for 360 gfonnds. 17

DSC SPS
2725
JOB * 13 Complete at relative clock $=2725$ sec.
j0b 17 hequested the spsi processor ay mel. ClK. 2726 sec., and has seized it at rel. clk. 2726 for go secondos.

| 8 | 2 | FHP | DSC | 2732 | 0 |
| :--- | ---: | :--- | :--- | :--- | :--- |
| 8 | 2 | OSC | SPS | 2734 | 0 |
| 9 | 90 | DSC | SPS | 2736 | 30 |

JOB \# 14 REQUESTED THE SPSI PROCESSOR AT REL. CLK. 2735 SEG., AND HAS SEIZED IT AT REL. CLK. 2736 FOR 60 GEGONDS.
file transfer summary－page it

JOB NO．

## FILE NO．NUMBER OF

 GLOGKS TRANSFERREDJ0日 ${ }^{\prime} 12$ COMPLETE AT RELATIVE CLOCK $=2749$ sEC．
J0日 16 REQUESTED THE SPSI PROCESSOR AT REL．CLK． 2737 SEC．，AND HAS SEIZED IT AT REL．CLK． 2776 FOR 60 SECOMDS． J0B 21 COMPLETE AT RELATIVE CLOCK－ 2777 SEC．
JOB＊ 10 REQUESTEO THE SPSI PROCESSOR AT REL．CLK， 2749 SEG．，AND HAS SEIZED IT AY REL．CLK． 2786 fOR 1 SECONDS＇， JOB＂ 17 COMPLETE AT RELATIVE CLOCK $\quad 2787$ SEC． Job ${ }^{\prime} 14$ complete at relative clock $: 2797$ SEC．
JOB＂ 22 hequested the spsi processor ay rel．Cl．k．zebi sec．，and has seizeo it at rel．clk．2b3i for 1 seconds．

| 1 | 1 | SPS | DSC | 2833 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 2834 | 0 |
| 2 | 1 | GPHI | DSC | 2835 | 1 | JOB＂ 16 COMPLETE AT RELATIVE CLOCK a 2837 sEC．

2 1 OSC FMP 2837
job 422 heluested the fmp processor at rel．ClK． 2838 sec．，and has seized it at rel．clk． 2 abs for 10 seconds．

| 8 | 2 | FMP | OSC | 2849 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 8 | 2 | OSC | SPS | 2850 | 0 |

JO日＂ 22 hequested the spsi processoh at rel．CLK， $285!$ sec．，and has seized it at rfl．clo． 2851 for go seconos＇． JOB＂ 20 hequested the spsi processoh at rel．CLK， 2863 sec．，and has seizeo it at rel．clk． $2 b 63$ for 1 SECOMDS．

| 1 | 1 | SPS | DSC | 2866 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | DSC | FMP | 2867 | 0 |
| 2 | 1 | OPH1 | DSC | 2868 | 1 |
| 2 | 1 | DSC | FMP | 2870 | 0 |

JUR＂ 20 hequesteo the fmp processor at rel．CLK． 2871 sec．．and has seized it at rfl．clk． $2 g y 1$ for 10 seconds．
file transfer summary - page $15^{\circ}$


JOB " 24 hequested the fmp processor at rel. clik. $303 \theta$ sec.o and has seized it at rel. CLK. 303E for 10 seconds. JOB \# 23 complete at relative clock e 3040 sec.
file transfer summary - page 16

JOB NO.

FILE NO.

FROM
number transfer
FMP
TRANSFER COMPEETED AT RELATIVE CLOCK TIMF\{SEC)... OF THANSFER (SFC.

## 3053

3

NOB 94 COMPLETE AY RELATIVE CLOCK = 3061 SEC.


| 1 | 2 | SPS | OSC | 3080 | 1 |
| :--- | ---: | :--- | :--- | :--- | ---: |
| 9 | 90 | DSC | SPS | 3081 | 26 |
| 1 | 2 | DSC | FMP | 3081 | 0 |

job 24 kequested the spsi processor at rel. clik. 3082 sec., and has seized it at rel. clk; 30 ohz for 60 seconds.
 JO日 67 hequested the spsi processoh at rel. Clk. 3083 SEC., and has seized It at rel. Clk, $30 A 3$ for ro geconds. JOB " 18 hequested the spsi processor at rel. ClK. 3087 SEC., and has seized it at rel. clk. 3087 for 1 SECONDS.

| 1 | 1 | SPS | DSC | 3089 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 7 | 8 | SPS | 6PH2 | 3090 |  |
| 1 | 1 | DSC | FMP | 3090 | 0 |
|  | co | ative | CK |  |  |
| 2 | 1 | GPH2 | DSC | 3091 | 1 |
| 2 | 1 | DSC | FMP | 3092 | 0 |



| 0 | 2 | FMP | DSC | 3105 |
| :--- | :--- | :--- | :--- | :--- |
| 0 | SPS | 3106 | 0 |  |

JOB M 10 HEQUESTED THE SPSI PROCESSOR AT REL. CLK. 3107 SEC.; AND HAS SEIZED IT AT REL. CLK. 3107 FOR 60 SECONDS. JOB " 24 complete at relative clock = 3143 sec.
file transfer summary - page 17

file transfer summaty - page $1 a$

| JOB No. |  | FILE No. | NUMBER OF blocks transferred | FROM | Y0 | transfer completen at RELATIVE CLOCK TIME(SEC)... | ouratio <br> of THanSFe | (sec.) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 65 |  | 3 | 19 | SPS | DSC | 3496 |  | 5 |
| 65 |  | 3 | 19 | DSC | FMP | 3498 |  | 0 |
|  | JOB \% | 65 Requesteo | YHE FMP PROCESSOR AT | REL. CLK. | 3499 SEC. ${ }^{\text {, }}$ | ANo has seizeo it at rel. Clin | 3499 FOR | SO 5Fcomids. |
|  | J0日 " | 70 hequested | ThE SPS2 PROCESSOR AT | REL. CLK. | 3530 SEC. ${ }^{\text {, }}$ | and has seized it at rfl. Cl | 3530 FOR | 1 gecondos. |
| 65 |  | B | 6 | FMP | DSC | 3560 |  | 0 |
| 65 |  | 8 | 6 | OSC | SPS | 3563 |  | 2 |
|  | Jot \% | 65 hequested | THE SPSI PROCESSOR AT | REL. CLK. | 3564 SEC. ${ }^{\text {a }}$ | and has seizeo it at rei. Cl | 3564 FO | 180 seconos. |
|  | JUB * | 66 Requested | THE SPS! PROcESSOR AT | REL. CLK. | 3566 SEC. | and has seizeo it at rel. Cl | $3592 \text { FOR }$ | ! seconds. |
| 66 |  | J | 2 | SPS | DSC | 3595 |  | 0 |
| 66 |  | 1 | 2 | DSC | FMP | 3596 |  | 0 |
| 66 |  | 2 | 2 | BPHl | DSC | 3599 |  | 3 |
| 66 |  | 2 | 2 | DSC | FMP | 3601 |  | 0 |
| 66 |  | 3 | 19 | SP5 | DSC | 3612 |  | 1 |
| 66 |  | 3 | 19 | OSC | FMP | 3614 |  | 1 |
|  | J0a \% | 66 hequesteo | ThE FMP PROCESSOR AT | REL. CLK. | 3615 SEC. ${ }^{\text {, }}$ | and has seized it ar rel. CL | 3615 FOR | 60 Seconds. |
| 95 |  | 3 | 46 | SPS | DSC | 3615 | 2 | 1 |
| 95 |  | 3 | 46 | DSC | FMP | 3619 |  | 2 |
| 95 |  | 3 | 46 | SPS | DSC | 3630 | 1 | 4 |
| 95 |  | 3 | 46 | DSC | FMP | 3633 |  | 1 |
|  | J08 * | 95 Requesteo | ThE FMP PROCESSOR AT | REL. CLK. | 3634 SEC. ${ }^{\text {, }}$ | and has selzed it at rel. CLK | 3675 FOR | 600 SECONDS. |
| 66 |  | 9 | - 225 | FMP | OSC | $3681$ |  | 4 |
| 86 |  | 1 | 4 | SPS | DSC | 3684 |  | 2 |
| 86 |  | 1 | 4 | OSC | FMP | 3686 |  | 0 |
| 86 |  | 2 | 4 | SPS | DSC | 3687 |  | 1 |

file transfer summary - page 19


file transfer summary - page 20


## file transfer summary - page 2

J08 NO.

FILE NO. $\quad$| NUMBER OF |
| :---: |
| BLOCKS TRANSFERRED | FROM TO

Transfer completed at RELATIVE CLOCK TIME (SEC)... OF TRANSFER (SFC.

Job \# 26 requested the spsa processor at rel. CLK. 4133 sec., and has seized it at rel. clk. 4133 for 1 SECondos.




| 68 | 1 | 2 | SPS | DSC | 4185 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 68 | 1 | 2 | DSC | FMP | 4186 | 0 |
| 68 | 2 | 2 | GPHI | DSC | 4189 | 3 |
| 68 | 2 | 2 | DSC | FMP | 4190 | 0 |
| 68 | 3 | 19 | SPS | DSC | 4195 | 4 |
| 68 | 3 | 19 | DSC | FMP | 4197 | 0 |

JOB * 66 hequested the fmp processor at rel. clek. 3690 sec.i and has seizeo it at rel. clk. 4275 for 10 secondos.
95

5


job '\# 96 requested the fmp processor at rel. clk. 3760 sec., and has seized it at rfl. clik. 4zos for 10 secondos.
file transfer summary - page 22


Fil.e transfer summary * page 23


FILE TRANSFER SUMmARY - PAGE 24

JOB NO.

〕 0 )
JOB " 95 REG
9
J0e 15
4 7 SPS GPH2 7 - 7 SPS OPHZ

TRANSFER COMPLETEO AT RELATIVE CLOCK TIME ATEC).... OF TRANSFER (SEC̈.)

JOB 6 68 REQUESTEO THE SPS2 PROCESSOR AY REL. CLK. $45 \Omega 1$ SEC., AND HAS SEIZED IT AT REL. CLK. 463I FOR 180 SECONOS. 4634



160

| FHPL. CL |
| :--- |

DSC
DSC
4642
SSl PROCESSOR AT RE
3
(EC.: AND HAS SEIZED
ED IT AY REL. CLK. 46AZ FOR
5

| FMP | DS |
| :--- | :--- |
| DSC | SPS |



JOB II 86 REQUESTED THE SPS2 PROCESSOR AT REL. CLK. 4691 SEC., AND HAS SEIZED IT AT REI. CLK. $4 B I 1$ gOR 300 SECONDS.
68
7

JOB " 68 COMPLETE AT RELATIVE CLOCK $=4815$ SEC.
SPS GPHI
4888
4
JOB \# 70 COMPLETE AT RELATIVE CLOCK : 4889 SEC.

| SPS | DSC | 4890 | 1 |
| :--- | :--- | :--- | :--- |
| DSC | FMP | 4892 | 0 |
| SPS | DSC | 4892 | 1 |
| OSC | FMP | 4894 | 0 |
| SPS | DSC | 4901 | 1 |
| OSC | FMP | 4902 | 0 |
| GPH2 | SPS | 4905 | 2 |

JOB 98 hequested the spsì processor at rel. CLK. 4906 SEC., and has seized it at rel. clk. 4906 for 360 sECONDS. JOB " 73 KEQUESTED THE SPSI PROCESSOR at REL. CLK. 4753 SEC.. AND HAS SEIZED IT AT REL. CLK. 4942 for 1 SECONOS.

FILE TRANSFER SUMMARY = PAGE 25


File thansfer summary a page 26

| JOB NO. | FILE NO. | NUMBER OF blocks transferred | FROM | T0 | transfer completen at RELATIVE CLOCK TIMF(SEC)... | OF | duration <br> thansfer (sec.) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 73 | 1 | 2 | SPS | DSC | 5246 |  | 0 |
| 73 | 1 | 2 | DSC | FMP | 5247 |  | 0 |
| 96 | 8 | 8 | OSC | SPS | 5249 |  | 4 |

JOB \% 96 HEQUESTED THE SPSR PROCESSOR AT REL. CLK. 5250 SEC.. AND HAS SEIZED IT AY REL. CLK. 5250 fOR 360 SFCONOS.

| 2 | 2 | OPHI | DSG | 5254 | 6 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 2 | 2 | DSC | FMP | 5255 | 0 |

JUB " 34 hequested the spsi processon at rel. Clok. 5257 sec., and has seizen it at rfl. clk. 5257 for 1 sfenalds.

| 3 | 19 | SPS | OSC | 5262 | 6 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 3 | 19 | DSC | FMP | 5264 | 1 |
| 3 | 46 | SPS | DSC | 5295 | $2 B$ |



| 98 | 3 | 46 | DSC | FMP | 5303 | 4 |
| :--- | :--- | ---: | :--- | :--- | :--- | ---: |
| 71 | 9 | 125 | FHP | DSC | 5309 | 12 |
| 71 | 9 | 125 | FMP | DSC | 5320 | 9 |
| 98 | 3 | 46 | SPS | DSC | 5337 | 40 |
| 98 | 3 | 46 | DSC | FMP | 5343 | 4 |
| 96 | 9 | 160 | DSC | SPS | 5343 | 99 |



| 35 | 1 | 1 | SPS | DSC | 5368 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 35 | 1 | 1 | DSC | FMP | 5369 | 0 |
| 35 | 2 | 1 | GPH1 | DSC | 5373 | 4 |
| 35 | 2 | 1 | OSC | FMP | 5375 | 0 |
| 71 | 9 | 125 | DSC | SPS | 5390 | 87. |
| 71 | 9 | 125 | DSC | SPS | 5402 | 80 |

file transfer sumbary - page 27
jog NO.
FILE NO.

## number of

GLOCKS TRANSFERHED
FHOM
ro
O
TRANSFER COMPLETEN AT TELATIVE COMPLETEN AT UURATION RELATIVE CLOCK TIMF\{SFC).... OF THANSFER ISFC.)

Joi " 71 complete at relative clock a 5403 sec.
 JUB * 37 hequested the spsi processur at rel. clk. 5523 sec., and has seizeo it at rfl. cl.k. 5923 foh 1 sfcoids.




file transfer summary - page 20

file transfer summary - page 29

file transfer summaby - page 30

| J0B No. | FILEE NO. | number of blocks transferred | FROM | To | rransfer completen at RELATIVE CLock timf(ser).... | OF | juration transfen | (SFr.) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 42 | 2 | 1 | GPHI | DSC | 6083 |  | 1 | , |
| 42 | 2 | 1 | DSC | FMp | 6084 |  | 0 |  |



| 43 | 1 | 1 | SPS | DSC | 6090 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 43 | 1 | 1 | OSC | FMP | 6092 | 0 |
| 43 | 2 | 1 | GPH2 | DSC | 6093 | 1 |
| 43 | 2 | 1 | osc | FMP | 6094 | 0 |
| J0日 "31 complete at relative clock = 6126 SEc . |  |  |  |  |  |  |
| 73 | 8 | 6 | DSC | SPS | 6127 | 1 |

JOB * 36 hequested the spsi processoh at rel. Clk. 6119 sec., and has seized it at rflo clk. 6127 foh 1 afgitic. JOB \# 37 hequested the spsi processon at rel. clk. 6124 sec., and has seized it at afl. clik. 6128 for 1 seronds.

| 36 | 1 | 1 | SPS | DSC | 6129 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 37 | 1 | 1 | SPS | DSC | 0 |  |
| 36 | 1 | 1 | DSC | FMP | 6130 | 0 |
| 37 | 1 | 1 | DSC | FMP | 0 |  |
| 36 | 2 | 1 | GPH2 | DSC | 6132 | 0 |
| 37 | 2 | 1 | GPHI | DSC | 6132 | 1 |
| 36 | 2 | 1 | DSC | FMP | 6133 | 0 |
| 37 | 2 | DSC | FMP | 6133 | 0 |  |



 JOB " 76 hequested the sps 2 processor at rel. clo. 6215 sec., and his seized it at rel. clk, gris for 1 geconds.




Joi \# 73 complete at relative clock * 6463 sec.
JO日 74 requested the spsi processor at rel. clik. 6480 sec., and has seized it at rel. clk. gabo for 1 secondos
transfer completen at RELATIVE CLOCK TIME(SEC)... uURATION THANSFER (SEC.)

FILE NO.
NUMBER OF
日LOCKS TRANSFERRED FROM

1 1

| 2 | SPS | DSC |
| :--- | :--- | :--- |
| 2 | DSC | FMP |


| 6482 | 0 |
| :--- | :--- |
| 6483 | 0 |
| 6486 | 2 |

file transfer sumhary - page 32


JOB \# 97 complete at helative clock * 6644 sec.


| B | 日 | FMP | DSC | 6857 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 8 | $B$ | DSC | SPS | 6660 | 2 |








60 SECONDS.



1 -






$\therefore \quad i i_{i}^{1}$





 Dic ! spi ,

1 seconos.




file transfer summary - page 34


FILE TRANSFER SUMMARY O PAGE 35

JOB NO


34

## J08

J08
go seconos．

10日 3
 GO SFCOMDS． J0日＂ 34 complete at relative clock $=6901$ sec． Jog＂ 36 complete at retative cluck a 6911 sec．

| 43 | 9 | 90 | D5C | SPS | 6913 | 53 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 44 | 9 | 90 | DSC | SPS | 6915 |  |



a 12
FMP DSC

6917
0


JU日＂ 46 hequested the spsl processok at rel．clk． 6926 sec．，and has seizeo it ay rfl．clk，692b for i qernnis．

FILS transfer summazy - page 36



Job \# 74 complete at relative clock * 1290 sec.

JOB \# 49 hequested the spsi processor at rel. clk. 7347 seg., and has seized it at relo clk, 7347 for iseconos.

FILE TRANSFER SUMMARY - PAGE 38


FILE thansfer summary - page 39



| 1 | 2 | SPS | DSC | 7573 | 0 |
| :--- | :---: | :---: | :---: | :---: | :---: |
| 1 | 2 | DSC | FMP | 7574 | 0 |
| 2 | 2 | GPH | DSC | 7577 | 7578 |
| 2 | 2 | DSC | FMP | 758 |  |
| 3 | 19 | $S P S$ | DSC | 758 | 0 |

JOB 16 hequested the fmp processor at rel. ClK. 7043 SEC., and has setzeo it at rel. clk. 7585 for go secinis.


JUB \# bi hequested the spsi processor at rel. CLK. 7595 sec., and has seyzed it ay hel. Cl.k. 7595 for 1 seconds.

$9 \quad 125$ FMP DSC 9651 *
jog 45 requesteo the fmp processor ay rel. Clk. 713 s seg.d ano has seiteo it at rfl. clik. 7655 for 10 seconds.

JUB M 47 REQUESTED THE SPSI PROCESSOH AT REL. CLK. 7660 SEC.O AND hAS SEIZED IT AT RFL. CLK. 7660 fOR hO SECONOS. JU日 * 75 hequested the fhp processor at rel. CLK. 7284 sec.o and has seized it at rel. clk. 7665 for go sfonnds.
8
8
FMP DSC

7667
0
file transfer summary－page 40

JOB NO．

## FILE No．

## NUMGER OF OKS TRANSFERRED

FROM
ro
ThanSfer completen at ouration relative clock time（Sfel．．．of thansfer（SFc．）
job＊ 45 hequested the spsa processor at rel．cle． 7671 sec．，and has seized it at rfl．clk． 7 fit for go seconds．
the foliouling statistics summarize the utilization of the propused nasf mahiname．


QUEUING STATISTICS FOR THE FMP ANO SPSIS

| ＇Gueue | MAXIMUH CONTENTS | average CONTENTS | total ENTRIES | $\begin{gathered} \text { ZERO } \\ \text { ENTHIES } \end{gathered}$ | $\begin{gathered} \text { PERCENT } \\ \text { ZEROS } \end{gathered}$ | AVERAGE <br> TIME／THAN | SAVEHAGE <br> TIME／TRAN |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| fmpa | 18 | 5.265 | 86 | 13 | 15.1 | 47169699.837 | 56341585．917 |
| SPSIO | 2 | 0.067 | 123 | 113 | 91.9 | 417593.496 | 5136400.000 |
| SPS20 | 3 | 0.154 | 95 | 78 | 82.1 | 1246734.474 | 6967045．588 |

utilization of the fmp

| FACILITY | average． UTILIZATION 0.859 | NUMBER ENTRIES | average <br> time／tran <br> 8140479.2 |
| :---: | :---: | :---: | :---: |
| FMP | ．0045t | 81 | －6i0te55827 |

UTILIZATION OF THE SPS AND GRF DEVICES
CONTENTS ARE PERCENTAGES

| STORAQE | average UTILIZATION | ENTRIES | AVERAGE <br> tImE／tiRan | Current <br> CONTENTS | MAXIMUM CONTENTS |
| :---: | :---: | :---: | :---: | :---: | :---: |
| SPS 1 | 0.469 | 3508 | 10294041．105 | 32 | 100 |
| SPS2 | 0.454 | 3049 | 11472069.764 | 82 | 100 |
| ORFI | 0.054 | 1535 | 2726590.160 | 5 | 35 |
| GRF2 | 0.053 | 1295 | 3173179．564 | 5 | 35 |


utilization of the poc units
histogram of disc access times



NOTE：ABOUT 50 PERCENT OF THIS TABLEIS ENTRIES ARE CONTAINED IN THE FIHST BIN（．LT．．OZ SECI THEY ARE THE SHORT MESSAGE REQUESTS AND RESPONSES AND ARE YHEREFORE LEET OFF THIS GRAPH．
histugram of transfer times fuh small GILES \＆LESS than 10 BLOCKS 1 BETWEEN THE FMP AND DSC


histoghan of transfer times fur all files BETWEEN THE FMP ANO DSC
note, this graph is the appropritate sum of tables "pmpa". "fmpb", ano "fmpe".

HSTOGRAM OF TRANSFER TIMES GOR SMALL
FILES (LESS Than 10 blocks
FILES \& LESS THAN 10 BLO
BETWEEN THE SPS AND OSC

histuorah of transfeh times fur all files GETWEEN THE SPS ANO DSC


NOTE，THIS GRAPH IS THE APPROPRIATE SUM OF TAGLES，＂SPSA＂。＂SPSB＂。 AND＂SPSC＂。


Tuknahound statistics foh messages and files TABLE TOTGP
enthies jn table
1107

| 1107 |  | 130.061 |
| :---: | ---: | ---: |
| UPPER |  |  |
| LIMIT | UHSERVED <br> FHEQUENCY | PEH CENT |
| OF TOTAL |  |  |

$19189 \quad 131.626$ UPPER
LIMIT

vaserved
FHEDUENC
PEK CENT
OF TOTAL
42.93
0.07
0.10
0.09
0.06
0.07
0.05
0.08
0.07
0.62
5.83
7.01
8.57
3.01
2.03
0.83
0.30
0.21
0.16
0.10
0.09
0.09
0.10
0.09
0.03

| Standard deviation |  |
| :---: | :---: |
|  | 122.197 |
| cumulative | cumulativé |
| percentage | REMAINDER |
| 46.97 | 53.03 |
| 70.10 | 29.90 |
| 95.93 | 4.07 |
| 98.28 | 1.72 |
| 99.46 | 1.54 |
| 99.64 | 0.36 |
| 99.73 | 0.27 |
| 99.82 | 0.18 |
| 100.00 | 0.00 |
| Standard deviation |  |
| 258.653 |  |
| cumulative PERCENTAGE | cumulative hemainder |
| 42.93 | 57.07 |
| 42.99 | 57.01 |
| 43.10 | 56.90 |
| 43.19 | 56.81 |
| 43.24 | 56.74 |
| 43.32 | 56.68 |
| 43.36 | 56.64 |
| 43.45 | 56.55 |
| 43.52 | 56.48 |
| 44.14 | 55.86 |
| 49.97 | 50.03 |
| 56.98 | 43.02 |
| 65.55 | 34.45 |
| 68.56 | 31.44 |
| 70.59 | 29.41 |
| 71.42 | 28.58 |
| 71.72 | 28.28 |
| 71.93 | 28.07 |
| 72.09 | 27.91 |
| 72.19 | 27.81 |
| 72.29 | 27.71 |
| 72.37 | 27.63 |
| 72.47 | 27.53 |
| 72.56 | 27.44 |
| 72.59 | 27.41 |


| Sum of argumentsi |  |
| :---: | :---: |
| 143977.000: |  |
| multiple | deviation |
| OF MEAN | fring mean |
| 0.769 | -0.245 |
| 1.538 | 0.572 |
| 2.307 | 1.391 |
| 3.075 | 2.209 |
| 3.844 | 3.027 |
| 4.613 | 3.846 |
| 5.382 | 4.664 |
| 6.151 | 5.482 |
| 6.920 | 6.302 |
| SUM OF ARGU |  |
| 2525771.000 |  |
| mulifple | deviation |
| OF MEAN | FRSM MEAN |
| 0.076 | -0.766 |
| 0.152 | -0.703 |
| 0.228 | -0.640 |
| 0.304 | -0.577 |
| 0.380 | -0.513 |
| 0.456 | -0.450 |
| 0.532 | -0.387 |
| 0.608 | -0.324 |
| 0.684 | -0.261 |
| 0.760 | -0.198 |
| 0.836 | -0.135 |
| 0.912 | -0.072 |
| 0.988 | -0.009 |
| 1.064 | 0.053 |
| 1.140 | 0.116 |
| 1.216 | 0.179 |
| 1.292 | 0.242 |
| 1.368 | 0.305 |
| 1.443 | 0.368 |
| 1.519 | 0.431 |
| 1.595 | 0.494 |
| 1.671 | 0.557 |
| 1.747 | 0.620 |
| 1.823 | 0.683 |






| FULLWORO |  | savevalues |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| number | ．．．．．．．．CONTENTS | numater | ．．．．．．．．．CONTENTS | number | ．．．．．．．． | CONTENTS | Numger | ．．．．．．．．．．CONTENTS | NUMUER | ．．．．． | COntenta |
| 17 | 63 | 18 | 7 | COMNT |  | 9 | 22 | 45 | 23 |  | 7671 |
| 24 | 7671 | 25 | 60 | 26 |  | 2 | CYLMN | 1000 | CYLAV |  | 5500 |
| CYLDV | 4500 | ROTMN | 1 | ROTAV |  | 1251 | Rotive | 1250 | BLOK1 |  | $1150 n$ |
| Rutmx | 2501 | TRKRT | 65 | CLOCK |  | 48943742 | FMPRT | 65 | 日f CMP |  | 3 |
| SpSHT | 273 | BFSPS | 3 | DSCRT |  | 156 | GFDSC | 3 | END |  | 1000n000nn |
| bluko | 27000 | D日liko | 1500 | D日LKI |  | 1500 | count | 1 | card |  | 125！ |
| OSCN | 1 | ORIVE | 1 | PRIOR |  | 62 | SLOP | 200 | 10 PEC |  | in |
| Last | 2 | HLOK2 | 42000 | BL．OK3 |  | 108000 | DALKE | 10000 | D日LK3 |  | 10000 |
| IOPGR | 30 | GPHRT | 1638 | GPHWT |  | 5000 | DWAIT | 500 | 日LK2a |  | 10500 |
| 日lk3a | 27000 | dskza | 2500 | 0日K3a |  | 2500 | ILSPS | 20 | ILGPH |  | 5 |
| GPHS2 | 16 | COPYS | 4 | BSI2E |  | 64 |  |  |  |  |  |
| － |  |  |  |  |  |  |  |  |  |  |  |
| throughruts ay job size． |  |  |  |  |  |  |  |  |  |  |  |
| for jubs with fmp rime less than or elual．to 120 SEC PAROUGHPUT IS 29.44 PER HOUR |  |  |  |  |  |  |  |  |  |  |  |
| Fon jubs with fmp time between lzo sec and 20 min ThKOUGHPUT IS 3.27 PER HOUR |  |  |  |  |  |  |  |  |  |  |  |
| FOH JURS WITH fMP TIME BETWEEN 20 MIN ANO ONE HDUR THKOUGHPUT IS 0.00 ＇PER HOUR |  |  |  |  |  |  |  |  |  |  |  |
|  | FOM JUAS WITH F throughput is | mp time 0. | more than one houk 00 PER HOUR |  |  |  |  |  |  |  |  |

1. NASA CR-152059, Preliminary Study for a Numerical Aerodynamic Simulation Faclity - Final Report. Oct. 1977.
2. NASA CR-152108, Preliminary Study for a Numerical Aerodynamic Simulation Facility - Final Report - Phase 1 Extension. Feb. 1978.
3. Machenhauer, B.; and Rasmussen, E.: On the Integration of the Spectral Hydrodynamical Equations by a Transform Method. Institut for Teoretisk Meteorologi, Kobenhavns Universitet, Copenhagen, 1972.
4. Hung, C. M.; and MacCormack, R. W.: Numerical Solution of Supersonic Laminar Flow Over a Three-Dimensional Compression Corner. AIAA 10th Fluids and Plasmadynamics Conference, June 1977.
5. Lambiotte, J. S., Jr.; and Voigt, R. G.: The Solution of Tridiagonal Linear Systems on the CDC STAR-100 Computer. ACM Transactions on Mathematical Software, Dec. 1978.
6. Soll, D. B.; Habra, N. R.; and Russell, G. L.: Experience with a Vectorized General Circulation Climate Model on STAR-100. High Speed Computer and Algorithm Organization, Academic Press, Inc. Nov. 1977, pp. 311-312.
7. Lorenz, E. N.: Energy and Numerical Weather Prediction. Tellus, vol. 12, no. 4, Nov. 1960, pp. 364-373.
8. Peng, L.: A Simple Numerical Experiment Concerning the General Circulation in the Low Stratosphere. Pure and Applied Geophysics, vol. 61, no.2, May 1965, pp. 191-218.
9. Clark, J. H. E.: A Quasi-Geostrophic Model of the Winter Stratospheric Circulation. Monthly Weather Review, vol. 98, no. 6, June 1970, pp. 443-461.
10. Lindzen, R. S.; and Goody, R.: Radiative and Photo-Chemical Processes in Mesopheric Dynamics: Part 1, Models for Radiative and Photochemical Processes. Journal of the Atmospheric Science, vol. 22, no. 4, July 1965, pp. 341-348.
11. McConnell, J. C.; and McElroy, M. B.: Odd Nitrogen in the Atmosphere. Journal of the Atmospheric Science, vol. 30, no. 8, Nov. 1973, pp. 1465-1480.
12. Lorenz, E. N.: An N-Cycle Time-Differencing Scheme for Stepwise Numerical Integration. Monthly Weather Review, vol. 99, no. 8, Aug. 1971, pp. 644-648. 'trailer

[^0]:    3*JMAX*KMAX* $($ LSL +2$) * 3 / 8-(2 * J M A X * K M A X * L S L) / 8=$ (7*LSL+18)*JMAX*KMAX/8 clock cycles, loop overhead.

[^1]:    job " 26 complete at relative clock - 4520 sec.

