The architecture of a video image processor for the space station by Carpenter, T. et al.
The Architecture of a Video Image Processor
for the Space Station
S. Yalamanchili, D. Ice, K. Fritze, T. Carpenter, and K. Hoyme •
Honeywell Systems and Research Center _!. _
Minneapolis, MN 55418
Abstract
N. Murray
NASA Langley Research Center
Hampton, VA 23665-5225
/
..._ i¸
T_his.papet_ieseribes t_,e architecture of a video image processor for space station appllc_ttions._The architecture was derived from a study
of the requirements of algorithms that are necessary to produce the desired functionality of many of these applications. Architectural options
were selected based on a simulation of the execution of these algorithms on various architectural organisations. A great de_ of emphasis was
placed on the ability of the system to evolve and grow over the lifetime of the space station. The result is a hierarchical parallel architecture
that is characteri=ed by high level language programmability, modularity, extensibillty and can meet the required performance goals.
1 Introduction
A maior goal in the design and deployment of the NASA space station is to enable crew members to effectively and efficiently use the resources
of the space station. The number of anticipated scientific and commercial missions will place a heavy demand on these resources, one of which
is crew time. Thus, facilities that enable crew members to perform their tasks efficiently, effectively, and safely are critical to the success of
the space station. This paper describes the utility and feasibility of providing crew members with one such facility - a video image processor (VIP).
[nltialty, a crew member will directly control and be int_ractlvety involved with meet activities, such as inspection, docking, experiment
monitoring, and control. One of the problems with the man,in-the-loop scenario is that the human operator is frequently performing repetitive
observations and control functions that do not exploit the/unlque capabilities of a human in space, namely, decision making, supervision, and
creative thinking. Repetitive tasks are ideally suited for aufomation by machine. Such automation would free crew members for more demanding
tasks as well make more efficient use of their time. An _mpoctant technology central to automation and robotics is image processing. Video
images may be processed to improve the quality for vi_wlng purposes, provide cues in assisting an operator in some task, or provide some
information to control other devices or alert the crew when necessary, as in automatic experiment monitoring.
Early on in the study [11, a surprisingly large numl_er of applications were found that could benefit from the availability of an on-board V lP.
The required algorithms and architectures necessary/_,o support a V[P were found to be mature enough to make the concept of a VIP' feasible.
An examination of the functional requirements of image processing algorithms and the capabilities of current and future processors resulted in
the conceptual design of a hierarchically structure.d, parallel architecture for a VIP. This paper reports the results of an effort to refine this
cooceptiml view via simulation. The simulation st,Jdles were based on the requirements derived from an analysis of algorithms for space station
applications. The design and validation of these algorithms are discussed in a companion paper {2 I.
t
2 Role of a VIP in the Space Station
The principal goal of a VIP is to increase the efl]clency with which the space station resources are used. This would be achieved by automating
tasks with *.he ViP &s well as making some tasks more efficient. Although detailed requirements for vario*_s systems on the space station
are still being developed, it is evident that a number of existing requirements support the concept of an on-board VIP. One requirement is that
rnissions'._0ifld-be performed in a timely and safe manner. Missions include user experiments, user production activities, satellite servicing, and
housekeeping tasks. A VIP may enable a crew member to perform more activities from a central location, *och as a workstation. This would
mean fewer extra vehicular activities (EVAs), which in turn makes the crew member more efficient and, in many instances, considerably safer.
In _ddhlon, there are tasks that can be performed by a VIP that would result in faster execution (for example, providing automatic camera
control_, make the task easier (filtering of transmitted v_deo containing noi_), or eliminate the active participation of a crew member altogether
(automatic experiment monitoring).
Anot::, _ requirement relates to space station autonomy. Autonomy may be interpreted in two ways. The first interpretation is that an
activity :_tat can proceed autonomously from human interaction. A VIP helps in this case by _erforming functions relating to machine vision,
freeing the user from constant interaction. The second is that the space station should operate as autonomous from ground support as feasible.
In this instance, a VIP may be used in several ways. It can compress image data, allowing the relatively limited on-board storage capability to
be used more efficiently. Thus, requests for data are more likely to be satisfied at the station without accessing ground-based archives. In this
manner, a VIP can increase crew efficiency, making it less likely that ground-based personnel would be required to support the workload.
A related reqmrement is that the crew time necessary for housekeeping tasks should be minimized. As shown in the space station mission
requirements report, one of the constraints on the number of active payloads will be the number of crew hours available to perform payload-
• related tasks. The assumption was made that with a crew of six, the equivalent of one person would be required just to perform housekeeping
https://ntrs.nasa.gov/search.jsp?R=19890017132 2020-03-20T02:09:08+00:00Z
charts.AVIPcouldhelpreducethis time by auton_lnl or 8uppo_in8 housekeeplnS activities, such ... renclesvous and docking, spe_e sud.loe
inspection, and main_mnce. Durinj our ,xamina_)on o( p_mtial apjd_ o( • VIP [11, we gnnernted the foliowinl| Jet of tasks that could
pot_nt|ally utilise • VIP foe increaned safety, autonomy and efficiency.
• Construction
• S_tellite servicing
• Rendezvous and Proximity Operations
• Docking
• Inspection
•Malntensnce and Rel_r
• Payload Delivery and Retrieval
• Experiment Monitoring
• Data Management and Communications
• Training
A more detailed discumlon of what role a VIP might play in each of these tasks can be found elswhere ii]. Finally s VIP is impacted by s
need to evolve with the space station, primarily since it will not be possible to plan and accommodate all future processing needs. It also has
to be compatible with the size, weight and power budgets that are constrained by the copabilltiss of the power generation subsystem and the
payload capacity of the space transportation system.
3 VIP Algorithms
There were two goals for the algorithm selection proce_. First, the image procuring techniques required must I_ mature and reliable, enabling
a hlgh degree of confidence in obtaining the desired functionality. This is partlculacly important due to the unique set of envlmnmenud, lighting
and inutging conetralnte under which space imagery is acquired. Secondly, the algorithm suite should benefit s large number of applications. A
croes reference between six major cisases of image processing algorithms and the eight generic classes of space stattion applications is shown in
figure i.
These six families do not represent the entire breadth of the state of the art in irns4ge processing, but most of the image processing algorithms
required for the automation of space station ¢_tsks belong to one of these families. In addition, algorithms in each of these categories are
sufficiently mature for the design and build of a prototype system. This prototype system could be semlautonomous in that it could perform the
majority of the dab reduction nece_ary for s specific task and an operator would be required for verlfication/confirmation at"the actions of the
VIP. Typical applications in which a VIP may perform a tnsk in semloutonomous mode are inutge enhancement/filtering, intelligent bandwidth
reduction, and object velocity estimation for proximity operations. A more detailed discussion is presented in s companion paper 121.
4 VIP Architecture
As a result of the VIP i study, we recommended an architecture for the VIP, shown in Figure 2. It consist- ofa multlple-SIN/D organisation (level
l) followed by a multiprocessor organization (level 2). The multiprocuanr will be designed to allow for the addition of processors specifically
suited for symbolic processing, e.g., rule_b_sed inference processing. The level-1 system may be viewed as a sequence of array processors. The
first processor receives video datat from the network interface. Each array processor stage may implement s specific image function, such as
detector compensation, gray _.ale stretch, or digital filtering. The processed image may then be transferred _o an image memory, which forms the
input to another array processor implementing another image function, or may be trexad'erred to the inutge memory of processors that perform
the next higher level of processing (level 2 and level 3). They consist of more flexible multiprocessor systems that can be used to comp,,te
descriptions of area, of the image, e.g., regions of interest, boundary codes, statistics, etc. Processing at the higher levels may primarily de-/
with arrays of real or integer data (e.g., tracking and position estirngtion) or symbolic dat_ (e.g., rel_ional descriptions). These two types o(
proce_ing are fundamentally different, but both are required at this level of processing. Our approach is to eiEciently accommodate both modes
of processing, loosely coupled through the use of a partitioned &lob,d address space. Architectures specifically suited for artificial i_lllgeoce
execution are not considered at this time because of the immaturity of the concept. However, as research continues in this area an(I in _rtificial
intelligence algorithms for image understamding, the addition of such processors may be allowed during the growth phase of the VI ").
The specific choice of a building block for each level must result in an overall organintion that satisfies the constraints identified in the
VIP 1 study. One set of constraints is due to the requirements of the Initial Operating Configuration (IOC). Dependinfg upon the technololLy
freese date for IOC, current hardware, _ftwace, system and algorithm technololff may not allow the development of a fully functlonal VIP
within the anticipated size, weight and power constraints. The iNue therefore is in pluming: being able to use that part of VIP that is useful
and currently feasible, while provisions are made to allow it to evolve into s fully function_ VIP. Certain critical portions (such as the level-I
architecture) may be included at IOC to perform thcee functiotm tlutt ace useful for the hum-in-the-loop scenario. Then, during the growth of
the space station, addltionai functionality could be provided using more advanced and stable technology, algorithms, and perhapa acc_,itectures.
Ph_ ,, • _ementation requires feature of programmability, modularity, and field exlmallbility. The latter includes provisions for integrating
86
speeiad-I_rpae devices into the urchitocture u their need becomes evident and their impismentat_n becomes viable. Further, the impisment,_
tlon technoiolff and aur_hit_cture must be sufficiently mature to be considered for deployment. The VIP a_chit_cture prol_ned in tais p_graunn
is amenable to all of these constraints.
4.1 L_vel I Arch|tecture
The VIP I study called for a synchronous parallel architecture that operated in single instruct;on multiple data stream (SIMD) mode and
delivered in excess of 600 million operations per second (MOPS) performance. Furthermore, each proce_or would have access to neighborinli
_rs' armor,s and would be microcoded. The rationale for these constraints can be found in the VIP I final report [I 1.
Our cholc- _. chc '_selc building block rot the levei-I architecture is the Electro-Optical Si|nal Processor (EOSP) developed by Iloueyweil
i3j. Its organlsat.on " ,tiefies all of the above mentioned constraints, in addition, it pomumes a number of other important feature that make
it a good choice for s VIP. The EOSP architecture waa derived in a top-down numnm" from the requirernente of real-time [nmge procmnl[
aigorithnm. The result is a very high speed integrated circuit. (VHSIC) chip set that _m deliver up to 25 MOPS per processing eieownt (PE) for
low.level image procmlng tasks. Thlrty-two PEa constitute a single stage of the EOSP, resulting in a computation rate of 800 MOPS per stalie.
It ls optimised for image processing funct;ons that are characterized by large volumes of data and repetitive arithmetic and Iogkal operations
over small neighborhoods of an image. Unlike many early image prc_essing architectures, lsanes concerning the interlace to differeut semmre and
the implementation of image input/output ([/O) were al_o nddreMed early in the developmeut. Thus, the EOSP is optimized to provide hilh
throughput for raster sr_n imaging devices. Any algorithm that exhibits concurrency at the plxel level can be efficiently implemented on the
t:OSP.
The organimttion of the EOSP is ill,strated in Figure 3. The architecture consists of a linear array of identical PEs, each with its own
nwmory, controlled by a single common controller. This SIMD architecture minimises the control overhead per PE, thus achieving extremely
high computational rates within a very compact processor. In its current form, each PE has a 128-byte input buffer and a 17.8-byte output
buffer. Local memory consists of 512 bytes accessed by a 16-bit arithmetic logic unit (ALU). Each of the 1/O buffers is e_ternally clocked. Thu_,
it is poMible for data transfer into the input buffer, data trsnsier out of the output buffer, and processing of local memory contents *ll to be
ocrurring simultaneously. This allows for apipellned mode of operation in which images my be processed in real.time with storage requirsmente
independent of image size. The EOSP architecture operates on an image on a line-by-line basis. Each image llne is evenly distributed among the
input buffers of the PEa and transferred to local memory. A sufficient number of consecutive image lines is stored to enable one line of inmge
output to be computed. In the configuration of the aho-e example, one llne of s K x K window function can be computed by all the processors
in parallel. Each processor computes M pixels of the output llne. Next a new input line raa be acquired, and one llne of the computed image
can be output. In this manner, one llne worth of results is computed and output for every input image line from the sensor. This has the effect
of "sliding" a K x K window over the input image. Data from the input buffer are trmmferred to the individual PE memories in p_allei as,
the end of the _can line. Procured results in the PE memory, computed during the previous line input, are transferred simuitanmusly to the
corresponding output buffers. Buffered results are read out synchronously with the input da_ entering during the next scan llne. Input _nd
output can be double-buffered for sensors that rmesess no dead time bet, wren lines (e.g., retrar.e time). This provides a great deal ot' fle_billty in
interfacing the F_OSP to different types of senM4_rs and architectures. Such a feature is especi'lly attractive for the VIP since the functionality of
the video network interface (VNI) (input to [eve[-I architecture) and the details of the lsvel-2 archite-ture (output from the ]evei-I architecture)
are subject to change. This feature is even more important if the VIP is to be deployed ms the level-I architecture only and is to subsequently
evolve to inchJde the level-2 architecture later in the life of the space station.
Sizing an EOSP system is done with respect to three features - processing throughput requirements, memory requirements and the I/O
requirements. Whichever feature is the most demanding in terms of the number of processors required, dictates the size of the F_,OSP system.
This es._ntlaily accounts for the fact that some applications may be throughput bound versus 1/O bound or memory bound. Several examples
are illustrated in Table 1. All plxei and neighborhood operations will be implemented in the level 1 architecture. This includes the color image
enhancement algorithms i2]. Functioning brassboard versions of the EOSP PEs are available today. The technology and architecture can be
considered to be mature by any future technology freeze date. Moreover, a great deal of familiarity with the EOSP systems has been obtained,
e_tabllshing a degree of confidenc._ in the ability to meet the projected performance goals. Experience has been gained kmd lessons learned in
the design of the EOSP. For this and the reasons cited above, the EOSP architecturv is an excellent choice as the building block for the VIP
level- I architecture.
4.2 Level 2 Architecture
Our earlier studies indicated that this level would require an g - 16 processor system delivering about 100- 200 _AOPS with distributed task
allocation, scheduling and synchronization. To understand the characteristics of the lsvel-2 arthiterture, one needs to ,,nderstan:l the a]gorlthnm
thaz. will be executed. The granularity of parallelism is relatively large (compared to three executed in the level-I architecture), resulting in a
number of concurrently executing tasks. The processing within a task is highly d_ dependent. As a result, interactions between tas_ should
be asynchronous. The volume of intertask communication is highly variable and can become the principal determinant of perfornmare 13-4 I.
Thus, the first i_ue is the choice of interconn_rtion topology. Once this has been choasn based on the requirement_ of the VIP algorithms,
the architecture my be examined in greater detail to address issues of protocoh_ proremor_pecific features, and operating system features.
The choices are limited only by ,toe's imagination, tlowever, we chose "opologies that, in some sense, occur at extreme points in the spectrum
of performance that interconnectlon networks can provide. At the same time, the choices were filtered by factors such as maturity, available
experience with them, and how well we understood them. Our choice of families of topologies to investigate were multiple buses, hypercubes,
and braided rings. These topologies are illustrated in figore 4
The next issue is one of analysis techniques. The level-I architecture exploited fine grain parallelism in a synchronous mode of operation.
Further, the algorithms*re largely data independent. With such a fine understanding of the implementation of the computations, it is pmaible to
an-lytlcally evaluatte the architectural options. That is not the case with the lsvei-2 m'chitec.ture and algorithms. The high degree of variability
87
in the proaas/n| ,-,d communication requirements isd;¢aAe that simulstiou is -,I spproj_rh_e mesas to determine the proper topolosien. The
Architecture Design and AaseJmnent System (ADAS)tool set developed at Research TriuqiJe Institute wus used for this purpose. The tool
set include, facilities for constructing models of communicating parsJlsi tadm and pe_dlel architocturw. Further, tools are av-;lable to map
communlcating sequential tasks onto specific arehitectores and evsJ,,-ts the performance of m, ch • hardware/software system.
4.2.1 Slmulatlon
The objectives for-performlng the simulation are multlfold. First, we would llke to verify that the propnsed architecture design can meet the
system throughput" requirements, --d that the specified inuqte processing algorlthme can he executed within the given time frame. Second,
we want to coml_tro the performance OFesveral proposed architectures and tol_losies and amdyse how they perform in executing the different
|dgorlthme. This would then provide guidance in selsct;ng the appropriate architecture sppro6_ for the VIP design. Fin-fly, we would like to
use simulation u a tool to refine the architecture design. By varying the system sise _ characteristics, one can perform tradeoffe not only
between interconnect topologies, but idso in the number of procemors and buses, procemor speeds, and bus bandwidthe. The ultimate objective
is to enable us to select and derive a suitable architecture for the VIP design. The parameters we have chosen to study for the VIP simulation
effort ,,re castorised u f_'.)ws.
• Network topolos,y- Six interconnectnetwork topologieswere simulated: multiple buses with one two and three buses,hypercube, unidi-
rectionaland bidirection -j braided rinp.
• Communication bandwidth. Three separate bus speeds were used in the simulatlon: 2, S, and 10 Mbytes/sec.
• Pracemor throughput - Three separate proceumr throushpute were used in the simulation: 2, S, and 10 million instructions per second
(MJPS).
• System else - System sises of 4, 8 and 16 procelmore were considered.
In t;te description of the simulation, buses will be used to refer to both the multiple-socees shared media, such _s time shared buses, as well
as point-to-point links, such as those used in the hypercuhe and ring organlm*tlorm. The level 2 implements the components of the tracking and
bandwid*h reduction algorithm. Each of the computationally intensive components of this algorithm was studied in greater detail and parallel
versions of these algorithms were derived and modelled with ADAS. These were,
• Monochrome Segmentation
• Boundary Tracing
• Lineority Filter
• Connected Cumponente
• Silhouette Matching
For each of the above algorithms, conservative requirements on image resolutlou and other algorlthm-specific parameters (e.g., size and
number of objects) have been assumed in constructing the software graphs. Both of the above software and hardware systems are modelled in
ADAS with directed graphs consisting of nodes interconnected by directed arcs. Nodes represent individual software operations or hardware
functional elements, while arcs represent data flow between software operations or hardware components. The presence or absence of data or
control is represented by tokens on the arcs. When em input condition is satisfied by the presence of specific patterns of tokens on t,he input,
arc a node "fires ". It fires for some period of time after which tokens may be placed on some output arcs probably enabling another node.
Once software and hardware graphs have been develop*el, the software graph is mapped onto the hardware graph to produce a constrained
software graph. Since the software graph represents :he algorithm executed by the hardware, the order in which the soft.ware graph nodes fire
is determined by the structure of the underlying hardware graph. In particular, software nodes mapped onto the same hardware nodes can
only be executed one at a time. Nodes represent the execution of a computation (transfer of data). The firing delays are therefore functions
of the volume of computation (data) and the processor speed (llnk or bus bandwidth). The slmtdation sequence considers the range of values
for proceesor speeds (link or bus bandwidths). Some examples of hardware and software graphs are shown in figure 5. The simulation sequence
pro<earls as follows.
I. Construct software and hardware graphs. The software graphs represent the image processing algorithms to be executed, while the
hardware graphs represent the architectures and constraints of the hardware system.
2. Place appropriate weights on software nodes. These weights include the various assumed characteristics, such as delays, amount of
pro¢_minK requirements, procemor throughputs, and network link bandwidths.
3. Constrain the software graph execution by mapping the software graph to t.he hardware graph. This involves assigning various software
modules (algorithms) to the different hacdwace modules (nodes).
4. Execute the constrained software graph and collect execution statistic.
.5. Modify the weight- in step 2 to effect a change in the parameter= of interest and repeat the sequence.
For the purpose OFevalmtting the results, the following perfornumce measures were generated by the simulation.
• Latency - This is the time for one execution of the complete software graph (algorithm).
• Avorqe procemor utilization - This ;s the average percent of executiou tlrn* -" : p_ are busy.
88
• Maximum processor utilisation - This measure is the maximum percent of execution time that z particular processor is busy. It identifies
the presence of bottlenecks.
• Variance of procemor utilisation - This provides a measure of balance |n procemor ut|lhmtion sad thus, the distribution of the computation
load.
• Averase bus utilisation - This is the average percent of execution time the bu_s are being used.
• Maximum bus utilisation- This identifies the presence of communlcatl0nbottlenecks.
• Variance of bus utilization - This measure indicates the distribution of the communication load.
To control the ADAS simulation sequence and facilitate the generation of the performance measure statistics, a simulation manager was
developed. The slmulstlon nmna4[er is the core of the simulatlon management facility. It e_entially controls rise itecatlve execution of the
simulation. The decadls of the slmulation management facility can be found in [6].
4.2.2 Simulation Analysis
To facilitate the analysis of the simulation data in selecting a suitable VIP architecture, we decided to evaluate the performance of the various
designs based on the following performance metrics.
• Low latency - Latency is the major criteria in evaluating the performance of a design. The system throughput must be above some
minimum threshold in order to satisfy the basic timing and processing requirements. Beyond that threshold, low latency may be traded
off agalust other considerations.
• Balanced processor utilisation - The preference here is to evenly distribute the proom[ng load among the procesoors us much as possible,
thereby avoiding the presence of bottlenecks and reducing the severity of single-point fal|ures. This can also serve as an ind'cation of how
growth and fault tolerance can easily be achieved with the design.
• Balanced bus utilisation - The preference here is to avoid communication bottlenecks and severity of single-point failures. Again, this can
serve us an indication of the ease with which fault tolerance snd future growth amy be eccommodated.
• Latency and utH[zatlon improvement - This is the differential of the latency or utilis_,ion as & function of some architectural parameter.
This measure is used to identlfy points of d_minlshiug returns. For exarnpisl a doubling of processor speed may produce only a 2% decreace
in latency. In that case, the coat of designing a faster processor nmy not be worth the added speedup. A similar argument can be made
for utilisation and, in fact, for moat parameters. Another view is that this measure indlcat_, the sensitivity of the latency and utilization
metrir_ to various architectural parameters.
In addition to the above performance metrics, we also made the following empirical mmumptious concernlng the VIP design rcquirements.
• The design shall provide a pro(.'._seiog throughput margin of I00%.
• The design shall provide a communication bandwidth margin of 100%. These first two assumptions allow for growth in algorithmic
requirements and other unexpected overheads.
• The design shall allow the presence of spare processors and spate buses. This enables the design to provide for fault to[erance as well as
growth capabilities.
• The VIP design shall execute the tracking and bandwidth reduction algo'rlthm at, the rate o£ about one image per second. This assumption
is more of a desire than a requirement, in reality, considering the anticipated applications of VIP in the space station, _ processing rate
of one image per every few seconds may even be acceptable for moat applications.
With the above initial assumptions and performance metrics in mind, the simulation data were analyzed and evaluated. A software graph
for the tracking and bandwidth reduction algorithm was constructed, and its execution was simulated on the various architectural organizations.
This includes patallelized versions of the selected components. The size of the search space of the architectural alternatives is fairly large. There
are six organizations- three for the bus-based systems, one ['or the hypercubes, one for the unidirectional rings, and one for the bidirectional
rings. For each organization, there are three system sizes (4, 8, and 16 PEa), three processor speeds (2, 5, and 10 M[PS), and three bus
bandwidth.q (2, 5, and 10 Mbytes/sec). Thus, there are (6x3x3x3) or 162 distinct possible architectural solutions in this formulation. For each
po_ible architectural solution, the parameters of interest, are measured and tabulated. These results were examined manually to apply the
chosen metrics and select acceptable solutions. The result reveals that a configuration with 16 processors, each with a processing speed of I0
MIPS and a dual*boa network with bus speeds of 5 Mbytes/sec0 comes closest to meeting all of the empirical assumptions and performance
metrlcs*mentioned above. The simulation performance data for this acchltecture ate surnmacized in Table 2.
The simulation data also indicate th_.t the hypercube configuration (N = 4), with 16 processors at 10 MIPS each and bus speeds of 5
Mbytes/sec, is also a viable alternative. The final latency value for configurations with the hypercube design is shown in Table 3. Currently, the
bus-b_sed approach is preferable to the hypercube approach mainly because it is a relatively mere mature and well-understood architecture. In
this respect, the boa-baned approach represents a low-risk approach. While the hypercube technology has now become a commercially viable
product, improvements are rapid and continuous. The network is inherently fault tolerant through the presence of mutiple paths between nodes,
but it is not immediately obvious how that feature may be efficiently exploited. The area that needs the most attention is operating system
support. Efficient internode communication and global resource allocation strategies are lacking and are the focus of several research efforts by
both commercial and academic organizations. By comparison, software in general, and operating systems in particular, are much more mature
in bus-based systems. Further, increases in pertormance by the addition of one, two, or a small number of modules are straightforward in
bns-basod systems. Generally, the number of modules is doubled to maintain the connectivity of the hypercube. The addition of a smaller
89
number of procemom is not straightforward. Thus, while simulation experiments indicate that the hypercube is an acceptable solution, practical
considerations indlc_ts that bus-based simulations are preferable. Hence, the I6-procesoor, two-bus system is the choice for a VIP.
4.3 System Issues
System issues can now be addressed in more detail with respect to this specific organlgatlon. System issues relate to three aspects of the VIP.
The first is the interaction of the VIP with |ts environment. This is defined by the functionality of the VNI. The second concerns the software
requirements, and the third, the hardware requirements.
4.:I.I Interaction with Environment
The VIP is intended to support bidirectional transfer of video data to and from devices on the space station. The VIP processes raw video
data from a variety of video sources - including video cameras, video storage devices, and uplink video - and transfers processed, filtered, and
enhanced images to various sinking devices on the space station. In order to specify the functionafity of the VNI, it is neces...xry to make
some assumptions about the operating environment. For example, what is the nature and frequency of tra[_c to and from the VIP? It is
clearly infeasible to consider all pc_ibilitles. Therefore, we focus on what we feel will be the most prevalent scenario for the use of a VIP:
: crew member controlling and using the VIP from a mu!tipurposo applications console (MPAC). For example, cameras possibly mounted
outside the space station may transmit images to the MPAC. These images may be redirected from the MPAC to the V[P for enhancement
for viewing purposes. Alternatively, the VIP could receive images directly from cameras (under MPAC control) and relay results to the MPAC
on detection Of a specific event, e.g., in automatic experiment monitoring. [n such a scenario, the functionality of the VNI would be deter-
mined by the nature of the interaction with the MPAC and by the operation ar, d type of communlcations media between the MPAC and the VIP.
The MPAC will be one of the primary interactive display devices on the sosce station, lnmges will be displayed in the video/graphic/text
application display _ of the MPAC, and the console will present a mixture of infornmtion types, such as graphic, tabular, textual, video,
discrete, etc. The advanced Work Package 2 [mplementatlon guidelines 17] demonstrate a preference for the display of color image data. However,
the capability must exist for handling both color and monochrome video data formats. From the point of view of the _nteraction with a VIP, [t
is assuaged that the MPAC will provide for the buffering of processed images since meet functions are not proces.,_l at the image data rate of
the MPAC, and graphics and image databa.se functions will not he provided by the VIP.
The Sl_ce station data management system (DMS) can support the requirements for bidlrectional communication between the VII TM and
MPACs. Data transfers between the VIP and the MPAC involve the transfer of commands and images from the MPAC to the VIP and status
from the VIP to the MPAC. Commands take the form of enable/disable for the VIP, dlagnostic commands, as well as a selection of algorithms.
The volume of communications l'or such a transfer is expected to be low, 200 to 300 bytes every I/lgth second. Thus, tolerable network latencies
are determined by the interactive nature of the processing. It is the image transfers that place demands on the bandwidth of the DMS. These are
high in volume and place stringent demands on whatever communication network is available. Two options may be considered in determining
how this traffic may best be handled. The first is to use all digital transmission and the space station I)MS. The second is to use a separate
analog network and retain the images in analog video form. Both options are viable and possess advantages and dlsadvantages. However, it
_hould be noted that the choice of one or the other does not impact the functionality or operation of the VII =, but only affects the VNI.
4.3.2 Hardware Issues for VIP
Several distinct hardware issues arise in the organisation of the VIP. These are related to the four principal components of the architecture: the
VNI, the level-i PEs, the interface between the level-I and level-2 architectures, and the level-2 PEs. The functionality of the VNI and issues
related to it have been discussed in the previmm sl_bsection. The EOSP architecture is an existing system, and most, if not all, hardware issues
relevant to the VIP have been resolved. The operation of, and interface requirements to, an EOSP architecture are defined 13]. Issues related to
the remaining two components are discussed in this subsection.
This interface is physically a bus that can accommodate data transfers at least at the sensor rate. Operation of this bus is embedded in
the functionality of the VNI and the bus interface units of level-2 PEa. This bus, in addition to serving as the physical interface between the
level-I and level-2 architectures, also is the' in_erfare between the EOSP and the VNi for output of image data to the network. This bus is
interfaced to the output buffers of the EOSP PEs. Since these buffers are externally clocked, some degree of freedom is available in designing
the bus to interconnect the EOSP stages, the PEs at the next level, and the VNI. This sensor rate bus provides a parallel, muitldrop data and
message transfer medium and is a custom bus defined to m_t the requirements of the VIP. The data transfer bus is a 16-bit parallel bus and
thus is matched to the word width of the EOSP 1/O data paths. The 16-blt bus also provides sufl_clent bandwidth for the anticipated data
transfers. The control bus portion must provide signal lines for interrupts, bus arbitration, and broadcast. Given the block structured nature of
data transfers, muiticycle arbitration schemes with timeouts are probably preferable since the control overhead will be amortized over the size
of the data transfers. In addition, with proper design of the flow of control, it is unlikely that all three components would be simultaneously
requesting the bus. The interrupt facility would also be used to synchronize the transfers between the EOSP and t.he level-2 architecture. Use
of a command facility for the sensor rate bus could eliminate the need for an address bus at the level-I interface. The majority of data transfers
across the sensor rate bus axe block oriented rather than byte or word oriented. The EOSP output data is transferred across the sensor rate bus
on a horizontal scan basis. Data transferred from the EOSP to the VNI is also based on the scan line as the unit of data transfer. A command
code may be active during the beginning of a block transfer or for the duration of a transfer depending on the command type, e.g., beginning
of a scan, EOSP microcode start address, end of scan, etc.
The two principal components of the level 2 architecture are the PEs and the multibus system interconnecting them. The architecture and its
interface to the EOSP are illustrated in Figure 6. Each PE consists of a processor module with local memory, a global memory module, and bus
interface units. The processor (with local memory) interfaces to the sensor rate bus and accesses the two inter-PE buses thrcugh the associated
global memory element. Such an organization has several advantages. From the point of view of developing a testhed, all of the components
9O
and interfaces can be implemented with available standardized commerciad components. This could actually continue to be the case, with some
modifications, for a deployed version of the VIP. From a performance viewpoint, interspersing the proce_or between the global memory element
and the sensor rate bur, is crucial. This global memory element can provide performance equivalent to a locally accessible private memory for
the local processor. At the same time, this element is available as globally accessible shared memory vla the dual buses, and thus functions as a
true shared memory since the Ic_.al processor is not in the path for global memory accesses from remote PEa. The price paid for this generality
is that the processor interface unit is in the path for data transfers from the EOSP, and the processor and memory share interfaces to the two
inter-PE buses. Considering the synchronous, predictable, block structured nature of communlcation between the level-I architecture and the
PEa, this is not considered a significant disadvantage. Each memory element consists of fast access, static, random access memories (RAMs).
Single-port access to the memory is provided by the local bus interface and the two level-2 global bus interfaces. All three bus interf&ces would
contend for port access on an equal priority basis.
Each PE consists ofa proceesor_ bus interface units, local bus systems, and a global memo,'y element, as illustrated in Figure 6. The processor
consists of a generic, 32.bit, single-chip computing element, such as the Motorola MC68030. Elements such as this can provide the computing
power necessary to satisfy the throughput requirements determined by the VIP ADAS simulations. A complete set of software development tools,
soch as compilers, assemblers, and debuggers, is also typically available for such elements. The avsilahility of such mature hardware/software
environments is particularly advantageous for the testbed development phase.
The processor bus interface unit controls data transfers between the sensor rate bus, the PE and local memory, and the global memory
element. Data may be transferred directly from the sensor rate bus to the global memory element. Data transfer may also occur between the
local program memory and the global memory element. Each hardware node interfaces with a number of boa structures. The first is the sensor
rate bus interface. The second is the intraproce_r bus system between the processor, local memory, and the b,s ' 'efface unit. This would
likely be a generic asynchronous bus interface well suited to interconnectlon between the generic processor and local memory. Finally, there is
the local bus system between the processor unit and global memory element. A s_ndard, 32-bit, asynchronous bus architecture, such as the
VME bus 18], would suffice for this latter bus. An asynchronous bus structure for this local bus simplifies the bus p_'ocol and allows for fast
arbitration and capture of the system bus. This feature lowers PE desd time during a bus arbitrstion phase for single-word and short block
transfers. U_ of block transfer_ after the hue arbitration pha._e supports block-level direct memory access between the _nsor rate bus and the
global memory element.
A two-bus system architecture has been suggested for the VIP level-2 architecture. Global memory interconnection to t._,e level-2 bum 1
and 2 is depicted in figure 6. There are a number of important qnalltles that the level-2 bus should posses,. From the simulation studies, this
bus system should provide a minimum average data transfer bandwidth of $ Mbyta_/sec. This performance figure is not dilficuit to achieve with
many standardised bus architectures. A 32-bit data transfer bus width is preferred. This prevents packing and unpacking of 32-blt data that
will be typically required. Further, the bus architecture should be processor independent and should allow a fairly large number of modules to
interconnect to the buses. The current system calls for 16 PI_. In addition to the processor hardware nodes, there may be communications
controllers that connect the VIP to the space station DN4S via the level-2 bus.
A standard bus architecture that could meet the requrements for a VIP level-2 bus architecture is the Multibus II ig] bus structure. The
Multibos Ii system bus is a high-performance, 32-bit bus capable of supporting up to 20 independent modules. This bus system is synchronous,
supports processor independence, and supports block-level data transfers.
4.3.3 Software Iasu_
Currently under development is a microcode compiler for the EOSF and Distributed ADA {101, a candidate for the level 2 architecture.
Capabilities have been successfully demonstrated on restricted problem sets. When finished they will enable the full VIP ( level I and 2 ) to be
programmed in ADA. With respect to operating system issues, we feel that a modification of an existing operating system such as Hunter and
Ready's VRTX system will provide the functionality required of VIP. Schemes for distributed task allocation and sehedullng are either handled
within distributed ADA or have been developed 161. Finally existing schemes are applicable for handling cache coherence and other problems
that may arise. This is primarily due to the embedded nature of the VIP applications.
5 Concluding Remarks
Overall, a VIP will serve as a valuable utility to crew members on the space station, enabling them to emciently accompllsll their mission
objectives and improve use of the space station resources, especially crew time. The architecture of V|P is based on relatively mature technology,
one that will be stable before any future technology freeze date. Many of the systems issues can be resolved with existing hardware and software
technology. The overall effect is one of comparatively low risk with the prospect of increased efficiency in many space station applications.
6 References
1. Honeywell Systems and Research Center, Spece Station Video Image Processor Concept Development, Final Report, 1985.
2. P. Symosek et.al., Knowledge-Based Vision/or Sp ze Station Object Motion Detectiolt, Recognition, and Tracting, Proceedings of the NASA
Workshop on Space Telerobotlcs, Pasadena CA., January 1987.
3. lloneywell Systems and Research Center, Electro-OpticalSignal Processor User Manual, 1986.
4. B. Lint and T. K. Agerwala, Communcatio_ Issues in the Design and Analysim of Parallel Algorithms, IEEE Transactions on Software
Engineering, vol. SF_7, March lgS1, pp.174-188.
91
5, S. H. Bokharl, On tke Mappinf Prob{em, IEEE Transactions on Computers, vol. C-30, March i981, pp. 207-214.
6. Honeywell Systems and Research Center, Video Image Processor/or the Space Star'z, Final Report, November 1986.
7. Space Station Work Package _ . Definition and Preliminary Design Plan#, Habitabilky/Man Systems Report, vol. 13, NAS9 - 17365 DR
- 02, 1986.
8. Signetics, VME Bus Manufacturers Group, VME llaJ Specification Manmal, rev. B, August 1982.
9. Intel, Meltibue H Bua AreAiteetere SpeciJication Handbook, 1984.
10. Iloneywell Systems and Research Center, IioneyuleU Diatribeted ADA Project, StaLin Report 1985.
Color Image Proximity Bandwidth
Enhan_-'ement Tracking 8urvelilmce iden(lll_l_l Opemtlonl Reduction
Conslmctinn X X X X
$-_!-__le Servicing X X X X
Ri_zvoue and X X X X X
Proximlly Opera!ions *
insp,ctton X X X X
Payload X X X X X X
D:..'-"-..'ylRotdeval
Experiment Monitoring X X X X
Oala Management X X X )_
and Commun-_=-!ionl
Training X X X X ,,,
Figure 1. Cross reference between applications and algGrithms
Control Ous
iii
Control/Sync Bus
Figure 2. VIP Conceptual Architecture
92
Figure 4. Bus oriented, hypercube and braided ring organizations
["'7
hi_c
"7
ndtmh
[--7
.._dLfc
_ Imllt
Figure 5.
updt_
t
_ldMld
_mp
_ma,
Figure 5 cont. Example ADAS software graph
corresponding to a parallelized component
ADAS graph of the tracking and bandwidth reduction
algorithm before parallelizing the components
93
