Method and systems for a radiation tolerant bus interface circuit by Kinstler, Gary A.
(12) United States Patent 
Kinstler 
(io) Patent No.: 
(45) Date of Patent: 
US 7,228,442 B2 
Jun. 5,2007 
(54) METHOD AND SYSTEMS FOR A 
RADIATION TOLERANT BUS INTERFACE 
CIRCUIT 
(75) Inventor: Gary A. Kinstler, Torrance, CA (US) 
(73) Assignee: The Boeing Company, Chicago, IL 
(US) 
( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 464 days. 
(21) Appl. No.: 10/813,152 
(22) Filed: Mar. 30, 2004 
(65) Prior Publication Data 
Oct. 6. 2005 US 200510223260 A1 
(51) Int. C1. 
G06F 1/00 (2006.01) 
G06F 13/14 (2006.01) 
(52) U.S. C1. ....................................... 713/300; 7101305 
(58) Field of Classification Search ................. 7131300 
(56) References Cited 
See application file for complete search history. 
U.S. PATENT DOCUMENTS 
4,013,938 A * 3/1977 McCoy .................... 363/56.01 
5,548,467 A * 8/1996 Heaney et al. ............. 361/93.7 
6,064,554 A * 5/2000 Kim ............................ 361/64 
6,067,628 A * 5/2000 Krithivas et al. ........... 713/340 
6,483,317 B1 * 11/2002 Floro et al. ................. 324/537 
6,516,418 B1 * 2/2003 Lee ............................ 713/320 
6,937,454 B2 * 8/2005 Mikolajczak et al. ....... 361/111 
6,963,985 B2 * 11/2005 Stachura et al. ............ 713/310 
2004/0229478 A1 * 11/2004 Chen ........................... 439/11 
2005/0060587 A1 * 3/2005 Hwang et al. .............. 713/300 
OTHER PUBLICATIONS 
Tai et al., COTS-Based Fault Tolerance in Deep Space: Qualitative 
and Quantitative Analyses of a Bus Network Architecture, 4th IEEE 
Intl. Symposium on High Assurance Systems Engineering, Nov. 
1999, pp. 1-8. 
* cited by examiner 
Primary Examinerxhun Cao 
Assistant Examiner-Ji H. Bae 
(74) Attorney, Agent, or Firm-Lee & Hayes, PLLC 
(57) ABSTRACT 
A bus management tool that allows communication to be 
maintained between a group of nodes operatively connected 
on two busses in the presence of radiation by transmitting 
periodically a first message from one to another of the nodes 
on one of the busses, determining whether the first message 
was received by the other of the nodes on the first bus, and 
when it is determined that the first message was not received 
by the other of the nodes, transmitting a recovery command 
to the other of the nodes on a second of the of busses. 
Methods, systems, and articles of manufacture consistent 
with the present invention also provide for a bus recovery 
tool on the other node that re-initializes a bus interface 
circuit operatively connecting the other node to the first bus 
in response to the recovery command. 
23 Claims, 11 Drawing Sheets 
r 
I 
I 
I 
126 -,I 
I 
146 7+ I 
I 
I 
 
Processing 
Computer 
127, 
https://ntrs.nasa.gov/search.jsp?R=20080009479 2019-08-30T03:38:23+00:00Z
U.S. Patent Jun. 5,2007 Sheet 1 of 11 
0 
'I 
W 
0 
r 
\ 
FIG. I 
.................................. 
.................................. 
US 7,228,442 B2 
0 
cy 
e 
e 
T e ........................................................... - .  
Q) 
2 $.= 1 
Y g g z  i 
1 m i $! 
c -  eo/;;OiST p 
z ....................... ....................... .................................. .................................. 
r 
........................................................... 
\ 
m ........................................................... I 
g F m ; ?  W g g : r  r 
r L \  z 'I L t  C 
1 
.......................................................... 
......................................................... H4 :i 

U.S. Patent Jun. 5,2007 Sheet 3 of 11 
300 
f Bit No. 
1 
2 
3 
4 
5 
6 
7 
8 
FIG. 3 
US 7,228,442 B2 
1 = Power Down Command, Channel A Link 
1 = Power Down Command, Channel A PHY 
1 = Power Down Command, Channel B Link 
1 = Power Down Command, Channel B PHY 
1 = Activate Current Surge Test for the Enabled Outgoing UART Source 
0 = Enable Channel A for Outgoing UART Data, 1 = Enable Channel B 
0 = Enable Monitoring of Link Current for Outgoing UART Data, 
1 = Enable PHY Current Monitoring 
Reserved - Set to 0 in Control Latch 
U.S. Patent Jun. 5,2007 Sheet 4 of 11 
FIG. 4 
US 7,228,442 B2 
PO0 
Time=O 404 408 ,, 
Y 
I I 
.......... S=l s=2 D=62 D=l 
402 { 
4i8  414 424 
I .......... . . . . 
428 
I 
442 
s=3 
D=l ........ . . . . 
41 0 406 mF \ 
I 
s=4 
D=l ....... 
I 
I ....... 
440 
I I I. 
Time = t 
} 420 
422 
U.S. Patent Jun. 5,2007 Sheet 5 of 11 US 7,228,442 B2 
Transmit Recovery Command 
to the non-responsive other 
node on a second bus 
FIG. 5 
51 0 
A 
Transmit “heartbeat” message 
on at least one bus to at least 
one other node 
I N  
Transmit second “heartbeat” 
message to non-responsive 
other node on at least one bus 
U.S. Patent Jun. 5,2007 Sheet 6 of 11 US 7,228,442 B2 
FIG. 6 
612 
........ 
U.S. Patent Jun. 5,2007 Sheet 7 of 11 
FIG. 7 
US 7,228,442 B2 
704 ( 
U.S. Patent Jun. 5,2007 Sheet 8 of 11 US 7,228,442 B2 
FIG. 8 
Reinitialize the bus interface 
circuit (e.g., PHY or Link 
controller) corresponding to 
upset bus for the node 
/804 
Transmit Recovery Command 
to the non-responsive other 
nodeonasecondbus 
U.S. Patent Jun. 5,2007 Sheet 9 of 11 US 7,228,442 B2 
FIG. 9 
900 
/ 
w 
Sense current level of 
corresponding bus interface 
circuit 
1 906 
b 
I / 
I Re-initialize the bus interface 
circuit 
FIG. I O  
12?\ 
~~ 
DC 
Isolation 
I 
I -01002 
I Enable Logic 224 A 4 4  
228% output CPU +I004 
1 7  
I Boundry 
212 1 2 6 N ;  2 3 1  226 3 f , I 21 6 
4 Latch Power 
I 
I 
I 
t 222 y A220 202 +, 
1 o f 2  . I 
I 
/ Selector- AID 1' 
UART * 
-
Groundec 
,'246 248 - - - - - - -  7 242\ 
Processing I 
Computer I Test Enable 
I Logic 
I 
I I 
124/ 
210>--, 
Current I 
-1 
I 
- - - - - - - - - -  
PHYDC I 
Regulator I 
I 
I 
I 
/206 I 
Power I 
Control I 
I 
204 
I 
2084  Sensor I I 
on Test - Opto- Current ~ 
A 
240 
A 06 
- 
= Port0 - 
2 
3 
VI 
h) 
0 
0 
4 
U.S. Patent Jun. 5,2007 Sheet 11 of 11 US 7,228,442 B2 
N FIG. 11 
0 s 
1 
ce, 
0 
r 
\ 
........................................................... - 
x .=  8 
- E -  ? g $ e  i , :  2 
- n  p E O E Y  
r ,YE : i 
a,= : 
m r 
........................................................... 
n 
(v 
0 
r 
\ 
m 
cy 
0 
7 
N 
Q) 
'0 
0 
2 
\ .......................................................... 
r 
........................................................... 
f 
d 
0 
r 
r x 
.......................................................... I 
I 
'T 
0 $J 03 c9r
zi 
........................................................... .I . -  
US 7,228,442 B2 
2 
“hard fail” (e.g., vehicle node ceases to communicate on the 
first bus) occurs on the first bus. Radiation induced latch-up 
errors often cause “hard fails” when COTS parts are used in 
the vehicle nodes to implement the first and redundant 
busses. For example, the U.S. Advanced Tactical Fighter 
(ATF) aircraft has a redundant IEEE-1394 high-speed serial 
bus network. But the ATF and other conventional vehicles 
employing a redundant high-speed serial bus implemented 
using COTS components are still typically susceptible to 
radiation latch-up or upset errors and do allow for recovery 
of the primary bus when a “hard fail” occurs on that bus. 
Therefore, a need exists for systems and methods that 
overcome the problems noted above and others previously 
experienced for error recovery on a high speed serial bus. 
SUMMARY OF THE INVENTION 
1 
METHOD AND SYSTEMS FOR A 
RADIATION TOLERANT BUS INTERFACE 
CIRCUIT 
The invention described herein was made in the perfor- 
mance of work under NASA Contract No. NASX-01099 and 
is subject to the provisions of Section 305 of the National 
Aeronautics and Space Act of 1958 (72 Stat. 435: 42 U.S.C. 
2457). 
This application relies upon and incorporates by reference 
U.S. patent application Ser. No. 10/813,296, entitled “Meth- 
ods and Systems for a Data Processing System Having 
Radiation Tolerant Bus,” and filed on the same date here- 
with; 
BACKGROUND OF THE INVENTION 
The present invention relates to communication networks, 
and, more particularly, to systems and methods for recovery 
of communication to a node on a high speed serial bus. 
High speed serial bus networks are utilized in automotive, 
aircraft, and space vehicles to allow audio, video, and data 
communication between various electronic components or 
nodes within the vehicle. Vehicle nodes may include a 
central computer node, a radar node, a navigation system 
node, a display node, or other electronic components for 
operating the vehicle. 
Automotive, aircraft, and space vehicle manufacturers 
often use commercial off-the-shelf (COTS) parts to imple- 
ment a high speed serial bus to minimize the cost for 
developing and supporting the vehicle nodes and the serial 
bus network. However, COTS for implementing a conven- 
tional high speed serial bus network in a home to connect a 
personal computer to consumer audioivideo appliances (e.g., 
digital video cameras, scanners, and printers) is susceptible 
to errors induced by radiation, which may be present in 
space (e.g., proton and heavy ion radiation) or come from 
another vehicle having a radar device (e.g., RF radiation). 
Conventional methods of shielding high speed serial bus 
networks and COTS parts from radiation do not adequately 
protect against proton and heavy ion radiation radiation. In 
addition, conventional shielding may be damaged (e.g., 
during repair of a vehicle), permitting a radiation induced 
latch-up error or upset error to occur. A COTS part experi- 
encing a radiation induced latch-up error typically does not 
operate properly on the associated high speed bus network. 
A COTS part experiencing a radiation induced upset error 
typically communicates erroneous data to the associated 
node or on the high speed bus network. Thus, vehicles that 
use COTS to implement a conventional high speed serial bus 
network are often susceptible to radiation induced errors that 
may interrupt communication between vehicle nodes, cre- 
ating potential vehicle performance problems. 
For example, a conventional high-speed serial bus fol- 
lowing the standard IEEE-1394 (“IEEE-1394 bus”) allows a 
personal computer to be connected to consumer electronics 
audioivideo appliances, storage peripherals, and portable 
consumer devices for high speed multi-media communica- 
tion. However, when a conventional IEEE-1394 bus is 
implemented in a vehicle using COTS parts, radiation from 
another vehicle’s radar or radiation present in space may 
cause a latch-up or upset error on the conventional IEEE- 
1394 bus that often renders one or more of the vehicle’s 
nodes inoperative. 
Some conventional vehicles employ a second or redun- 
dant high-speed serial bus to allow communication between 
vehicle nodes to be switched to the redundant bus when a 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55  
60 
6 5  
In accordance with methods consistent with the present 
invention, a method in a data processing system is provided. 
The data processing system has a plurality of nodes opera- 
tively connected to a network having a plurality of busses 
and one of the nodes has a bus management tool. The 
method comprises: transmitting periodically a first message 
from one of the plurality of nodes to another of the nodes on 
a first of the plurality of busses of the network, determining 
whether the first message was received by the other of the 
nodes on the first bus, and when it is determined that the first 
message was not received by the other of the nodes, trans- 
mitting a recovery command to the other of the nodes on a 
second of the plurality of busses. 
In accordance with articles of manufacture consistent with 
the present invention, a computer-readable medium contain- 
ing instructions causing a program in a data processing 
system to perform a method is provided. The data processing 
system has a plurality of nodes operatively connected to a 
network having a plurality of busses. The method comprises: 
transmitting periodically a first message from one of the 
plurality of nodes to another of the nodes on a first of the 
plurality of busses of the network, determining whether the 
first message was received by the other of the nodes on the 
first bus, and when it is determined that the first message was 
not received by the other of the nodes, transmitting a 
recovery command associated with the first bus to the other 
of the nodes on a second of the plurality of busses. 
In accordance with systems consistent with the present 
invention, a data processing apparatus is provided. The data 
processing apparatus comprises: a plurality of network inter- 
face cards operatively configured to connect to a network 
having a plurality of busses, each network interface card 
having a bus interface circuit operatively configured to 
connect to a respective one of the plurality of busses; a 
memory having a program that transmits periodically a first 
message to at least one of a plurality of nodes operatively 
connected to a first of the plurality of busses of the network, 
determines whether the first message was received by the 
other of the nodes on the first bus, and transmits a recovery 
command associated with the first bus to the other of the 
nodes on a second of the plurality of busses in response to 
determining that the first message was not received by the 
other of the nodes; and a processing unit for running the 
program. 
In accordance with systems consistent with the present 
invention, a network interface apparatus is provided. The 
network interface apparatus comprises: a bus interface cir- 
cuit for operatively connecting the network interface card to 
a bus; a power controller operatively connected to the bus 
interface circuit; a current sensor operatively connected to 
US 7,228,442 B2 
3 4 
the bus interface circuit to sense a current level in the bus command in a message to a node experiencing a radiation 
interface circuit; and means for determining whether the induced latch-up or upset error on another bus; 
sensed current level exceeds a predetermined level and for FIG. 8 depicts a flow diagram illustrating an exemplary 
causing the power controller to cycle power to the bus process performed by the bus recovery tool in FIG. 1 to clear 
interface circuit in response to determining that the sensed 5 a radiation induced latch-up or upset error detected by the 
current level exceeds the predetermined level. bus management tool in FIG. 1; 
In accordance with methods consistent with the present FIG. 9 depicts a flow diagram illustrating another exem- 
invention, a method in a data processing system is provided. plary process performed by the bus recovery tool of a node 
The data processing system includes a network having a bus. to detect a bus interface circuit of the node that is experi- 
The method comprises: sensing a current level in a bus i o  encing a radiation induced latch-up or upset error on a bus 
interface circuit operatively connecting a node on the net- and to clear the detected latch-up or radiation induced upset 
work to the bus; determining whether the sensed current condition; 
level exceeds a predetermined level; and re-initializing the FIG. 10 depicts an exemplary block diagram of another 
bus interface circuit in response to determining that sensed bus interface recovery circuit suitable for use with methods 
current level exceeds the predetermined level. 15 and systems consistent with the present invention; and 
In accordance with articles of manufacture consistent with FIG. 11 depicts a block diagram of another vehicle data 
the present invention, a computer-readable medium contain- processing system having a bus management tool and a bus 
ing instructions causing a program in a data processing recovery tool suitable for practicing methods and imple- 
system to perform a method is provided. The data processing menting systems consistent with the present invention. 
system includes a network having a bus. The method com- 20 
prises: sensing a current level in a bus interface circuit 
operatively connecting a node on the network to the bus; 
determining whether the sensed current level exceeds a 
predetermined level; and re-initializing the bus interface Reference will now be made in detail to an implements- 
circuit in response to determining that sensed current level 25 tion in accordance with methods, systems, and products 
exceeds the predetermined level. consistent with the present invention as illustrated in the 
Other systems, methods, features, and advantages of the accompanying drawings. The same reference numbers may 
present invention will be or will become apparent to one be used thoughout the drawings and the following descrip- 
with skill in the art upon examination of the following tion to refer to the same or like parts. 
figures and detailed description. It is intended that all such 30 FIG. 1 depicts a block diagram of a data processing 
additional systems, methods, features, and advantages be system 100 implemented in a vehicle, such as an automo- 
included within this description, be within the scope of the tive, aircraft or space vehicle, and suitable for practicing 
invention, and be protected by the accompanying claims. methods and implementing systems consistent with the 
present invention. The data processing system 100 includes 
BRIEF DESCRIPTION OF THE DRAWINGS 35 a plurality of nodes 102a-102n operatively connected to a 
network 104 having a primary bus 106 and a secondary bus 
The accompanying drawings, which are incorporated in 108. In one implementation, each node 102a corresponds to 
and constitute a part of this specification, illustrate an a separate electronic component within the vehicle. As 
implementation of the present invention and, together with explained in detail below, one of the nodes 102a is a data 
the description, serve to explain the advantages and prin- 40 processing apparatus operatively configured to manage 
ciples of the invention. In the drawings: communication between the nodes 102a-102n and to detect 
FIG. 1 depicts a block diagram of a vehicle data process- and recover from a radiation-induced bus error, such as a 
ing system having a bus management tool and a bus recov- node experiencing a latch-up or radiation induced upset 
ery tool suitable for practicing methods and implementing condition, on the network 104. 
systems consistent with the present invention; 45 Each node 102a-102n has at least two bus interface 
FIG. 2 depicts an exemplary block diagram of a bus circuits (e.g., circuits 110 and 112) to operatively connect the 
interface recovery circuit suitable for use with methods and respective node 102a-102n to both the primary bus 106 and 
systems consistent with the present invention; the secondary bus 108. In the implementation shown in FIG. 
1, each node 102a-102n has a physical layer (PHY) con- 
sent from the bus recovery tool of FIG. 1 to a bus interface 50 troller 110 operatively connected to the primary bus 106 and 
recovery circuit of a node to control the operation of the bus a PHY controller 112 operatively connected to the secondary 
interface recovery circuit; bus 108. Furthermore, each node 102a-102n has a link layer 
FIG. 4 depicts an exemplary timing diagram for a frame (LINK) controller 114 or 116 operatively connected to a 
of messages generated by nodes in the data processing respective PHY controller 110 or 112. The PHY controller 
system of FIG. 1; 55 and the LINK controller for each bus (e.g., circuits 110, 114 
FIG. 5 depicts a flow diagram illustrating an exemplary for the primary bus and circuits 112, 116 for the secondary 
process performed by the bus management tool in FIG. 1 to bus) may be incorporated into a single bus interface circuit 
detect a bus interface circuit of a node that is experiencing (not shown in figures). The PHY controllers 110 and 112 and 
a radiation induced latch-up or upset error on a bus and to the LINK controllers 114 and 116 are configured to support 
recover communication on the bus to the node; 60 known protocols for open system architecture or intercon- 
FIG. 6 depicts another exemplary timing diagram for a nection of applications performed on or by the respective 
frame of messages generated by nodes in the data processing nodes 102a-102n. The protocols may follow the established 
system of FIG. 1 in which the bus management tool selec- Open Systems Interconnect (OSI) seven-layer model for a 
tively transmits a “heartbeat” message to nodes of the communication network defined by the International Stan- 
system; and 65 dards Organization (ISO) to allow heterogeneous products 
FIG. 7 depicts an exemplary timing diagram of a frame on (e.g., vehicle nodes) to exchange data over a network (e.g., 
a bus in which the bus management tool transmits a recovery network 104). 
DETAILED DESCRIPTION OF THE 
INVENTION 
FIG. 3 depicts an exemplary control message that may be 
US 7,228,442 B2 
5 
In particular, each PHY controller 110 and 112 may be 
operatively configured to send and receive data packets or 
messages on the respective bus 106 and 108 of the network 
104 in accordance with the bus 106 and 108 communication 
protocol (e.g., IEEE-1394b cable based network protocol) 
and bus 106 and 108 physical characteristics, such as fiber 
optic or copper wire. Each PHY controller 110 and 112 may 
also be configured to monitor the condition of the bus 106 
and 108 as needed for determining connection status and for 
initialization and arbitration of communication on the 
respective bus 106 and 108. Each PHY controller 110 and 
112 may be any COTS PHY controller, such as a Texas 
Instrument 1394b Three-Port Cable TransceiveriArbiter 
(TSBXlBA3) configured to support known IEEE-1394b 
standards. 
Each LINK controller 114 and 116 is operatively config- 
ured to encode and decode into meaningful data packets or 
messages and handle frame synchronization for the respec- 
tive node 102a-102n. Each LINK controller 114 and 116 
may be any COTS LINK controller, such as a Texas Instru- 
ment 1394b OHCI-Lynx Controller (TSBX2AA2) config- 
ured to support known IEEE-1394b standards. 
Each node 102a-102n also has a data processing com- 
puter 118,120, and 122 operatively connected to the two bus 
interface circuits (e.g., circuits 110, 112, or circuits 110,114 
and 112,116) via a second network 124. The second network 
124 may be any known high speed network or backplane 
capable of supporting audio and video communication as 
well as asynchronous data communication within the node 
102a-l02n, such as a compact peripheral component inter- 
connect (cPCI) backplane, local area network (“LAN’)), 
WAN, Peer-to-Peer, or the Internet, using standard commu- 
nications protocols. The secondary network 124 may 
include hardwired as well as wireless branches. 
Each node 102a-102n also has a bus interface recovery 
circuit 126 and 128 operatively connected between the data 
processing computer 118, 120, and 122 and a respective bus 
interface circuit (e.g., circuits 110 and 112, or circuits 
110,114 and 112,116). In one implementation, one bus 
interface recovery circuit (e.g., 126) may be operatively 
connected to both bus interface circuits of the node 
102a-102n. In another implementation, the PHY controller 
110 or 112, the LINK controller 114 or 116, and the bus 
interface recovery circuit 126 or 128 may be incorporated 
into a single network interface card 127 and 129. 
As explained in detail below, each bus interface recovery 
circuit 126 and 128 is configured to sense a radiation 
induced glitch or current surge (e.g., a short circuit condi- 
tion) on a respective interface circuit 110, 112, 114, or 116, 
which may cause the bus interface circuit that is operatively 
connected to the respective bus to latch-up (such that the bus 
interface circuit may no longer properly communicate on the 
bus 106 or 108) or experience a radiation induced upset 
(such as a single event functional interrupt which may 
disrupt a control register) where the bus interface circuit 
may no longer communicate on the bus 106 or 108. Each bus 
interface recovery circuit 126 and 128 may automatically 
re-initialize the bus interface circuit or report the radiation 
induced error to the data processing computer 118, 120, and 
122 for further processing. 
As shown in FIG. 1, each data processing computer 118, 
120, and 122 includes a central processing unit (CPU) 130, 
a memory 132, 134, and 136, and an IiO device 138. Each 
IiO device 138 is operatively configured to connect the 
respective computer 118, 120, and 122 to the second net- 
work 124 and to the respective bus interface circuits 126 and 
128 of the node 102a-102n. Each data processing computer 
6 
118, 120, and 122 may also include a secondary storage 
device 140 to store data packets or applications accessible 
by CPU 130 for processing in accordance with methods and 
systems consistent with the present invention. 
Memory in one of the data processing computers (e.g., 
memory 132 of data processing computer 118) stores a bus 
management program or tool 142. As described in more 
detail below, the bus management tool 142 in accordance 
with systems and methods consistent with the present inven- 
10 tion detects a bus interface circuit 110, 112, 114, or 116 of 
a node 102a-102n that is experiencing a latch-up or radia- 
tion induced upset condition on a bus 106 or 108 and causes 
the corresponding bus interface recovery circuit 126 or 128 
to clear the latch-up or radiation induced upset condition so 
15 that communication on the bus 106 or 108 via interface 
circuit 110, 112, 114, or 116 to the node 102a-102n is 
maintained or re-established. The same memory 132 that 
stores the bus management tool 142 may also store a 
recovery command 143. As described herein, the bus man- 
20 agement tool 142 may transmit the recovery command 143 
in a message on one bus (e.g., either the primary bus 106 or 
the secondary bus 108 not effected by radiation) to another 
node 102b-102n to cause the other node to clear the radia- 
tion induced latch-up or upset condition associated with its 
25 bus interface circuit (e.g., circuits 110,114, or both) so that 
the other node can maintain communication on both busses 
106 and 108. 
Memory 132,134, and 136 in each of the data processing 
computers 118, 120, and 122, respectively, stores a bus 
30 recovery program or tool 144 used in accordance with 
systems and methods consistent with the present invention 
to respond to a recovery command 143 and to allow the bus 
management tool 142 to communicate with the bus interface 
recovery circuit 126 and 128 for each node 102a-102n as 
Bus recovery tool 142 is called up by each CPU 130 from 
memory 132, 134, and 136 as directed by the respective 
CPU 130 of nodes 102a-102n. Similarly, bus management 
4o tool 142 and the recovery command 143 are called up by the 
CPU 130 of node 102a from memory 132 as directed by the 
CPU 130 of node 102a. Each CPU 130 operatively connects 
the tools and other programs to one another using a known 
operating system to perform operations as described below. 
45 In addition, while the tools or programs are described as 
being implemented as software, the present implementation 
may be implemented as a combination of hardware and 
software or hardware alone. 
Although aspects of methods, systems, and articles of 
50 manufacture consistent with the present invention are 
depicted as being stored in memory, one having skill in the 
art will appreciate that these aspects may be stored on or 
read from other computer-readable media, such as second- 
ary storage devices, including hard disks, floppy disks, and 
55 CD-ROM; or other forms of ROM or RAM either currently 
known or later developed. Further, although specific com- 
ponents of data processing system 100 have been described, 
one skilled in the art will appreciate that a data processing 
system suitable for use with methods, systems, and articles 
60 of manufacture consistent with the present invention may 
contain additional or different components. 
FIG. 2 depicts an exemplary block diagram of the bus 
interface recovery circuit 126 for node 102a. The compo- 
nents of bus interface recovery circuits 126 and 128 for each 
65 node 102a-102n suitable for implementing the methods and 
systems consistent with present invention may be the same. 
Thus, for the sake of brevity, only the components of bus 
5 
35 described herein. 
US 7,228,442 B2 
7 
interface recovery circuit 126 depicted in FIG. 2 shall be 
discussed in detail as one having skill in the art will 
appreciate. 
As shown in FIG. 2, the bus interface recovery circuit 126 
includes a terminal 202 for data communication connection 
to the data processing computer 118 of node 102a, a current 
sensor 204, and a power controller 206. Both the current 
sensor 204 and the power controller 206 are operatively 
connected to the terminal 202 and to at least one interface 
circuit (e.g., PHY controller 110). The current sensor 204 
8 
In the implementation shown in FIG. 2, terminal 202 is 
adapted for serial data communication connection, such as 
RS-232, RS-485, or 12C, to data processing computer 118 or 
to the bus management tool 142. In this implementation, the 
5 bus interface recovery circuit 126 further comprises a Uni- 
versal Asynchronous Receiver-Transmitter (UART) 218. 
The UART 218 is operatively connected between the ter- 
minal 202 and the latch 216 such that bits in the control 
message 300 in FIG. 3 are received serially by the UART 
i o  from the data processing computer 118 via an input serial 
may be any known current sensing device including a bus 148 and then separately latched or stored in the latch 
current sensing resistor (e.g., a 0.1 ohm series resistor) or 216. 
any sensor measuring current based on the magnetoresistive In an alternative implementation, a multi-drop bus, such 
effect. as an 12C bus, creates the second bus that is used to connect 
In the implementation shown in FIG. 2, the bus interface 15 the bus management tool 142 in node 102a to a plurality of 
recovery circuit has a second current sensor 208 and a or all other nodes 102k102n. 
second power controller 210 that are both operatively con- As shown in FIGS. 1 and 2, each data processing com- 
nected to the terminal 202. Each current sensor 204 and 208 puter 118,120, and 124 may control respective bus interface 
is operatively configured to sense a current level in or to the recovery circuits 126 and 128 (configured as Channel A and 
respective bus interface circuit, PHY controller 110 and Link 20 B, or vice versa) via the same input serial bus 148. 
controller 114, and to report the current level to the data The bus interface recovery circuit 126 may also include a 
processing computer 118 via the terminal 202. Each power switch or multiplexer 220 having an input 222 and opera- 
controller 206 and 210 is operatively configured to switch tively connected between the UART 218 and the current 
power on or off to the respective bus interface circuit, PHY sensors 204 and 208. The multiplexer 220 is operatively 
controller 110 and Link controller 114, in response to a 25 configured to selectively allow one of the current sensors 
corresponding signal 212 and 214 received from the data 204 or 208 to report the respective sensed current level to the 
processing computer via terminal 202. Each power control- data processing computer 118 via UART 218 based on input 
ler 206 and 210 may source up to 1000 ma. 222. Input 222 may be operatively connected to latch 216 so 
Thus, bus interface recovery circuits 126 and 128 allow that an enable signal transmitted by bus recovery tool 144, 
the bus recovery tool 144 of each data processing computer 30 such as Bit 7 in control message 300 in FIG. 3, causes 
118, 120, and 122 to sense or monitor the current level on multiplexer 220 to select one of the current sensors 204 or 
(e.g., current drawn by or through) PHY controller 110 and 208. 
Link controller 114 of the nodes 102a-102n. In addition, In one implementation, the UART 218 is configured to 
when the sensed current level exceeds a predetermined level read latch 216 and report the current control message 300 
(e.g., 200 milliamps corresponding to a radiation-induced 35 stored in latch 216 as well as report the sensed current level 
glitch or short circuit), the bus interface recovery circuit 126 from the selected current sensor 204 or 208 via an output 
and 128 allows the bus recovery tool 144 to re-initialize or serial bus 146. As shown in FIGS. 1 and 2, each data 
cycle power to the respective bus interface circuit, PHY processing computer 118, 120, and 124 may receive the 
controller 110 and Link controller 114. The bus recovery sensed current level from respective bus interface recovery 
tool mav sense a current level. determine that the current 40 circuits 126 and 128 (Configured as Channel A and B. or vice 
level exceeds a predetermined level, and cycle power to the 
respective bus interface circuit in a period equal to or greater 
than 10 milliseconds in accordance with methods consistent 
with the present invention. The period is based on, among 
other things, power ramp up and down time constraints of 45 
the power controllers 206 and 210. 
FIG. 3 depicts an exemplary assignment of bits in a 
control message 300 that may be sent by the bus recovery 
tool 144 of the data processing computer 118 to the bus 
interface recoverv circuit 126 via terminal 202 for control- 50 
\ u  
versa) via the same output serial bus 146. 
The bus recovery tool 144 of the data processing com- 
puter 118 may provide a second enable signal 224 (e.g., Bit 
6 in FIG. 3 to identify the channel for the network interface 
card 127) to the bus interface recovery circuit 126 to 
selectively cause the bus interface recovery circuit 126 to 
report the sensed current level from the selected current 
sensor 204 or 208 via terminal 202. 
In the implementation shown in FIG. 2, the bus interface 
recoverv circuit 126 also includes a tri-state controller 226 
ling operation of the bus interface recovery circuit. In the operatively connected between the terminal 202 and the 
implementation shown in FIG. 3, Bits 1 and 2 of control UART 218 and operatively configured to selectively allow 
message 300 correspond to respective signals 214 and 212 either bus interface circuit 126 or 128 to apply its output data 
received by Link controller 114 and PHY controller 110 on the shared output serial bus 146. 
when the bus interface recovery circuit 126 is configured to 55 The bus interface recovery circuit 126 may also include 
connect to channel A or the primary bus 106 of the network an output enable logic 228 circuit and a switch 232 having 
104. Bits 3 and 4 of the control message 300 may correspond an output 234 that identifies whether the bus interface 
to respective signals 214 and 212 received by Link control- recovery circuit 126 is to operate on a “Channel A” (e.g., 
ler 114 and PHY controller 110 when the bus interface primary bus 106), or on a “Channel B” (e.g., secondary bus 
recovery circuit 126 is configured to connect to channel B or 60 108) in the data processing system 100. The output enable 
the secondary bus 108 of the network 104. logic 228 is operatively connected to trigger tri-state con- 
Returning to FIG. 2, the bus interface recovery circuit 126 troller 226 to allow UART 218 to report the sensed current 
may include a latch 216 operatively connected between the based upon the output 234 of switch 232 and a state 
terminal 202 and the power controllers 206 and 210. The associated with enable signal 224 (e.g., Bit 6 in FIG. 3). For 
latch 216 is adapted to latch or store the bits of the control 65 example, the bus recovery tool 144 may transmit the enable 
message 300. The control message 300 may be received 224 signal in an active low state as an indication to enable 
either serially or in parallel via terminal 202. output of UART 218 if the output 234 of switch 232 reflects 
US 7,228,442 B2 
9 10 
“Channel A,” The bus recovery tool 144 may then transmit the asynchronous messages 408, 410 directed to and 
the enable signal 224 in an active high state as an indication received by the respective node 102a-102n. In one imple- 
to enable output of UART 218 if the output 234 of switch mentation, nodes 102a-102n do not provide a handshake 
232 reflects “Channel B.” acknowledge message in response to an asynchronous mes- 
Returning to FIG. 2, the bus interface recovery circuit 126 5 sage 408, 410 when the asynchronous message 408, 410 is 
may also include a bus switch 236, such as a Texas Instru- transmitted using a broadcast channel number as discussed 
ments switch SN74CBTLV16211, that allows the data pro- below. 
cessing computer 118, 120, and 122 to isolate the bus Within data processing system 100, each node 102a-102n 
interface circuits 110 and 112 when a current surge is is assigned a respective one of a plurality of channel 
detected in one or both of these circuits 110 and 112. In the i o  numbers so that eachnode 102a-102n may selectively direct 
implementation shown in FIG. 2, the bus switch is opera- a message in frame 402 to another node 102a-102n. In the 
tively connected to the signal 214 used to turn power on or implementation shown in FIG. 4, data processing system 
off to the Link controller 114, such that Link controller 114 100 has 4 nodes (e.g., nodes 102a-102n) that are each 
and PHY controller 110 are isolated from the data processing assigned a different channel number. Each message of frame 
computer 118, 120, and 122 when power is turned off to the 15 402 has a header (not shown in FIG. 4) including a desti- 
Link controller 114. nation channel number reflecting the destination of the 
In addition, the bus interface recovery circuit 126 or the respective message. For example, message 412 of frame 402 
network interface card 127 may include a first bus isolation has a header that includes a destination channel number 414 
device 238 operatively connecting the PHY controller 110 to that indicates message 412 is directed to channel number 
the Link controller 114 and a second isolation device 240 20 “1,” assigned to node 102a. The header of each message of 
operatively connecting the PHY controller 110 to the bus frame 402 may also include a source channel number 
106. The bus isolation devices 238 and 240 may be capaci- reflecting the source of the respective message. Continuing 
tors in series with data lines corresponding to bus 106. The with the example depicted in FIG. 4, message 412 of frame 
bus isolation devices 238 and 240 inhibit a current from Link 402 has a source channel number 416 indicating that mes- 
controller 114 or bus 106, which could otherwise maintain 25 sage 412 was transmitted by the node 102b-102n assigned 
a latch-up condition in PHY controller 110. to channel number “2” (e.g., node 102b). 
The bus interface recovery circuit 126 also may include a Any channel number not assigned to nodes 102a-102n 
test enable logic 242 circuit that receives a test enable signal may be assigned as a broadcast channel to direct a message 
244 from the bus recovery tool 144 of the respective data to each node in data processing system 100 other than the 
processing computer 118, 120, or 122 via latch 216. Test 30 node transmitting the message. For example, in the imple- 
enable logic 242 has a first output 246 operatively connected mentation shown in FIG. 4, data processing system 100 is 
to the current sensor 208 and a second output 248 opera- configured such that channel number 62 is assigned as a 
tively connected to the current sensor 204. Test enable logic broadcast number and node 102a transmits message 418 
242 is operatively configured to send a test signal, such as with channel number 62 as the destination channel number, 
a ground signal, on the first output 246 andor the second 35 directing other nodes 102b-102n to respond to message 418. 
output 248 to cause the respective current sensor 208 to As shown in FIG. 4, the data processing system 100 may 
report a current surge or short circuit in the respective bus be further configured so that each frame 402 has a duration 
interface circuit, Link controller 114 and PHY controller of time t corresponding to a nominal refresh rate for all 
110. In one implementation, test enable signal 244 may nodes 102a-102n to generate the messages in frame 402, 
comprise a collection of signals corresponding to Bits 5 and 40 such as 10 ms duration for a 100 Hz refresh rate. Frame 402 
7 of Command 300 in FIG. 3. In this implementation, test may be subdivided into a number of minor frames 420, 422 
enable logic 242 sends a test signal on the first output 246 of a duration that is an integral multiple of the cycle period 
to current sensor 208 when Bit 5 is set to enable a current or length for the busses 106 and 108. For example, in one 
surge test and Bit 7 is set to select receiving the sensed implementation in which the communication protocol of bus 
current level ofthe Link controller 114. Similarly, test enable 45 106 and 108 corresponds to IEEE-1394 standard protocol, 
logic 242 sends a test signal on the second output 246 to the cycle length is 125 microseconds. In this implementa- 
current sensor 204 when Bit 5 is set to enable a current surge tion, the frame 402 may have ten minor frames 420,422 and 
test and Bit 7 is set to select receiving the sensed current each minor frame 420, 422 may have eight cycles (e.g., 
level of the PHY controller 110. Thus, the bus recovery tool cycles 424, 426, and 428) having a cycle length of 125 
144 of each data processing computer 118, 120, and 122 is 50 microseconds such that each minor frame has a duration of 
able to perform a test on whether each current sensor 204 1 millisecond. 
and 208 as well upstream hardware and software compo- Each node 102a-102n may be assigned one or more 
nents are operative for identifying a radiation-induced error. minor frame numbers in which it is authorized to arbitrate 
Turning to FIG. 4, an exemplary timing diagram 400 is for the bus 106 and 108 to transmit an asynchronous 
depicted for a frame 402 of messages generated by nodes 55 message 408 and 410. For example, in the implementation 
102a-102n under the supervision of bus management tool shown in FIG. 4, node 102a is assigned channel number “1” 
142 using methods and systems consistent with the present and assigned to arbitrate for the bus 106 and 108 in minor 
invention. Messages in the frame 402 are generated follow- frames 420 and 422 to transmit message 418 and message 
ing the communication protocol of busses 106 and 108, such 440, respectively. In addition, multiple nodes may be 
as the IEEE-1394b standard protocol. As shown in FIG. 4, 60 assigned to any minor frame 420, 422 or in any cycle 424, 
the data processing system 100 is operatively configured to 426, and 428 in accordance with a predetermined amount of 
allow nodes 102a-102n to generate isochronous messages messages to be transmitted by the nodes 102a-102n on the 
404, 406 (e.g., for transfer of video or audio up to a bus 106 or 108. 
predetermined bandwidth) and asynchronous messages 408, The bus management tool 142 may be configured to 
410 within each frame 402. Nodes 102a-102n may be 65 authorize the allocation of bandwidth to any node 
configured to provide a handshake acknowledge message 102a-102n requesting to transmit an isochronous message 
(not shown in frame 402 of FIG. 4) in response to each of 404 or 406, to transmit a synchronization message (not 
US 7,228,442 B2 
11 12 
shown in FIG. 4) at the beginning of each frame, and to 
transmit a cycle start message (not shown in FIG. 4) at the 
beginning of each minor frame. 
and 612) identifies a communication error has occurred in 
association with the “heartbeat” message, such as a check- 
sum error. 
Turning to FIG, 5, a flow diagram is shown that illustrates 
a process performed by the bus management tool 142 of 
If the “hf3rtbeat” message was received, the bus man- 
142 may continue processing at step 502. Thus, agement 
node 102a to detect a bus interface circuit of a node the bus management tool 142 is able to continually monitor for any node 102a-102n experiencing a latch-up or radiation 102a-102n that is experiencing a latch-up or radiation- induced upset condition on bus 106 or 108 by periodically 
communication on the bus 106 or 108 to the respective node on busses 106 and 108, 
102a-102n. Initially, the bus management tool 142 of node If the “heartbeat” message was not received, the bus 
102a transmits a “heartbeat” or first message on one or both management tool 142 may transmit a second “heartbeat” 
of the busses 106 and 108 to at least one other node message to the non-responsive node on the first andor 
102b-102n. (Step 502) The “heartbeat” message is at least second bus (e.g., bus 106 or 108). (Step 506) In one 
one ofthe Plurality of messages (e%., kOChrOnOUS messages 15 implementation, the bus management tool 142 waits until 
404, 406 and asynchronous messages 408, 410) transmitted the next frame 402 to transmit the second “heartbeat” 
by the nodes 102a-102n in frame 402. The bus management message. Alternatively, the bus management tool 142 may 
tool 142 may transmit the “heartbeat message” 418 once transmit the second “heartbeat” message when node 102a or 
each frame 402 or once each minor frame 420 and 422 to one the node hosting the bus management tool 142 is able to gain 
node or to all nodes (e.g., via a broadcast message). For 20 access to bus 106 or 108. 
example, the bus management tool 142 of node 102a may Next, the bus management tool 142 determines whether 
transmit the “heartbeat” message as broadcast message 418 the second “heartbeat” message was received by the non- 
of frame 402 so that each other node 102k102n may be responsive nodes onthe first bus (e.!&> bus 106or108). (Step 
expected to respond to the “heartbeat” message on one or 508) The bus management tool 142 may determine that the 
both busses 106 and 108 during its response period within 25 second “heartbeat” message was received using the same 
the each frame, In the implementation shown in FIG, 4, techniques discussed above for the first “heartbeat” mes- 
nodes 102b-102n are assigned channel numbers “2” through sage’ 
If the second “heartbeat” message was received, the bus “4” and are configured to respond to the “heartbeat” mes- management tool 142 may continue processing at step 502. 
induced upset error On a bus lo6 Or and to recover transmitting a “heartbeat” message to each node 102b-102n 
sage 418 by transmitting a handshake 
Or a respective message (e.g.> messages 412, 442, and 
444) in the minor frame 420, 422 assigned to each node 
message 30 If the second “heartbeat” message was not received, the bus 
management tool 142 transmits a recovery command to the 
non-responsive other node on a second of the plurality of 
busses. (Step 510) The bus management tool 142 may have 102b-102n. 
Alternatively, the bus management tool 142 of node 102a previously performed the process 500 to verify that the other 
may individually transmit the “heartbeat message” to other 35 node is not experiencing a radiation induced on the 
nodes 102b-102n in the data Processing system 100. For second bus. For example, assuming frame 402 in FIG. 4 is 
example, in the implementation shown in FIG. 6, the bus transmitted on primary bus 106 and node 102b (assigned to 
management tool 142 is configured to transmit separate number -2” in this fails to transmit 
“heartbeat messages” (e.g., collectively referenced as 602) message 412 in response to -heartbeat” message 418 or 
on bus 106 or 108 to nodes 102k102n in the frame 604. 40 transmits message 412 with an indication that a communi- 
Each of the nodes 102b-102n receiving the “heartbeat cation occurred with “heartbeat” message 418, then the 
message” 602 may subsequently respond by transmitting a bus management tool 142 may transmit recovery command 
respective handshake acknowledge message (e& messages 143 in a message 702 in a frame 704 on the secondary or 
608, 61% and 612) to the bus management tool 142 hosted unaffected bus 108 as shown in FIG. 7. The message 702 
on node 102a. 45 may be transmitted by the bus management tool 142 when 
Returning to FIG. 5, after transmitting the “heartbeat” the node 102 is next granted access to the secondary or 
message, the bus management tool 142 determines whether unaffected bus 108. As discussed in further detail below, the 
the “heartbeat” message was received by the other of the non-responsive other node (e.g., node 102b) is configured to 
nodes on the first bus (e.g., bus 106 or 108). (Step 504) If the re-initialize or cycle power to a bus interface circuit (e.g., 
“heartbeat” message has been transmitted on both busses 50 PHY controller 110 andor Link controller 114) operatively 
106 and 108, the bus management tool may determine connecting the other node to the first bus (e.g., the bus 106 
whether the “heartbeat” message was received by the other on which node 102b is experiencing a radiation induced 
of the nodes on each of the busses 106 and 108. As shown error) in response to receiving the recovery command on the 
in FIG. 4, the bus management tool 142 may determine that second bus (e.g., the bus 108 on which node 102b is not 
the “heartbeat” message (e.g., 418) was not received by the 55 experiencing a radiation induced error). 
other nodes 102b-102n if the other nodes 102b-102n fail to After transmitting the recovery command to the non- 
transmit the respective reply message (e.g., messages 412, responsive other node, the bus management tool 142 may 
442, and 444) in the response period or minor frame then terminate processing. The bus management tool 142 
assigned to each node 102k102n. Alternatively, the bus may continue to perform the process depicted in FIG. 5 to 
management tool 142 may determine that the “heartbeat” 60 verify communication is re-established with the non-respon- 
message was not received, if the other nodes 102b-102n fail sive other node (e.g., node 102b) on the first bus (e.g., the 
to respond to a respective “heartbeat message” (e.g., respec- primary bus 106) and to maintain communication on both 
tive one of “heartbeat” messages 602 in FIG. 6) within a busses 106 and 108 for all nodes 102a-102n. 
predetermined period. The bus management tool 142 may FIG. 8 depicts a flow diagram illustrating an exemplary 
also determine that the “heartbeat” message was not 65 process performed by the bus recovery tool 144 of a node 
received if the handshake acknowledge message or respec- (e.g., node 102b) to clear a bus interface circuit of the node 
tive reply message (e.g., messages 412, 442, 444, 608, 610, that is experiencing a radiation induced latch-up or upset 
US 7,228,442 B2 
13 14 
error on a bus 106 or 108 as detected by the bus management tive node 102a-102n senses a current level on a bus inter- 
tool 142. Initially, the bus recovery tool 144 of the node face circuit (e.g., PHY controller 110 or 112, or Link 
determines whether a recovery command 143 has been controller 112 or 116). (Step 902) As discussed above, the 
received on one of the busses 106 or 108. (Step 802) If a bus recovery tool 144 may provide an enable signal 224 
recovery command 143 has not been received on one of the 5 (e.g., Bit 6 of control message 300 in FIG. 3) to the bus 
busses 106 or 108, the bus recovery tool 142 may end interface recovery circuit 126 and 128 to selectively cause 
processing. Alternatively, in one implementation, the bus the bus interface recovery circuit to report the sensed current 
management tool 142 is configured to thread or perform level of PHY controller 110, 112 or the sensed current level 
processes in parallel, and thus may continue processing at of Link controller 114, 116 when the output signal 234 of 
step 802. i o  switch 232 is set to correspond to the channel designated by 
In the example shown in FIG. 7, the bus recovery tool 144 enable signal 224. The bus recovery tool 144 provides a 
of node 102b may determine that the recovery command 143 second enable signal (e.g., Bit 7 of control message 300) to 
was received in message 702 in frame 704 on the secondary select receiving the sensed current level of the PHY con- 
bus 108 after the bus management tool 142 has performed troller 110, 112 or the Link controller 114, 116. 
the process in FIG. 5 to detect that PHY controller 110 of 15 Next, the bus recovery tool 144 of the node 102a-102n 
node 102b, Link controller 114 of node 102b, or both are determines whether the sensed current level on the bus as 
experiencing a radiation induced latch-up or upset error on received by the corresponding bus interface circuit (e.g., 
primary bus 106. PHY controller 110 or 112, or Link controller 114 or 116) 
If a recovery command 143 has been received on one of exceeds a predetermined level, such as that corresponding to 
the busses 106 or 108, the bus recovery tool 144 re- 20 a radiation induced glitch or surge. (Step 904) If the sensed 
initializes or cycles power to the bus interface circuit (e.g., current level does not exceed a predetermined level, the bus 
PHY controller or Link controller) corresponding to the recovery tool 144 ends processing. If the sensed current 
second or other bus of the node experiencing a radiation level on the bus corresponding to the bus interface circuit 
induced error. (Step 804) Continuing with the example of 110, 112, 114, or 116 exceeds the predetermined level, the 
FIG. 7, the bus recovery tool 144 of node 102b may 25 bus recovery tool 144 of the node 102a-102n re-initializes 
re-initialize the PHY controller 110, the Link controller 114, or cycles power to the respective bus interface circuit 110, 
or both that are operatively connected to the primary or 112, 114, or 116. (Step 906) For example, assuming that the 
affected bus 106 in response to receiving the recovery bus recovery tool 144 of node 102a determines that the 
command 143 on the secondary or unaffected bus 108. To sensed current level on the primary bus 106 corresponding 
re-initialize the PHY controller 110 and the Link controller 30 to the PHY controller 110 in FIG. 1 exceeds the predeter- 
114, the bus recovery tool 144 of node 102b may transmit mined level corresponding to a radiation induced surge on 
one or more control messages 300 in FIG. 3 to the respective the primary bus 106, the bus recovery tool 144 of node 102a 
bus interface recovery circuit 126 or 128 of the node 102b may automatically re-initialize the PHY controller 110 of 
so that power controllers 206 and 210 re-cycle power to the node 102a by toggling bit 2 in one or more control messages 
PHY controller 110 and the Link controller 144 as discussed 35 300 to bus interface recovery circuit 126 of node 102a so 
above in reference to FIG. 2. that power is cycled to PHY controller 110. One skilled the 
Next, the bus recovery tool 144 transmits a message on art would appreciate that the bus recovery tool 144 may 
the second or unaffected one of the busses 106 or 108 detect and clear a radiation induced latch-up or upset on 
indicating communication has been restored. (Step 806) In PHY controller 112 and Link controllers 114 and 116 in a 
the implementation in FIG. 7, to indicate that communica- 40 like manner via corresponding power enable signals (e.g., 
tion has been restored for node 102b on the primary bus 106, Bits 4, 1 and 3 of control message 300). 
the bus recovery tool 144 transmits the message 710 to the In one implementation, each bus interface recovery circuit 
bus management tool 142 of node 102a in frame 704. 126 and 128 may have a dedicated bus recovery tool 144 
Alternatively, the bus recovery tool 144 may transmit the suitable for use with methods and systems consistent with 
message 412 on the primary bus 106 in the next frame 402 45 the present invention to allow automatic recovery from a 
in response to receiving the “heartbeat” message 418 from radiation induced latch-up or upset condition detected by the 
the bus management tool 144 as discussed above. To ensure dedicated bus recovery tool 144 on a bus 106 or 108. In this 
communication has been restored on the first or affected one implementation, each bus interface recovery circuit 126 and 
of the busses 106 and 108, bus recovery tool 144 may read 128 has a CPU 1002 and a memory 1004 containing the bus 
the current level via the respective current sensors 204 and 50 recovery tool 144 as shown in FIG. 10. The CPU 1002 is 
208 of the node 102b to determine whether the current level operatively connected to memory 1004, latch 216, and 
is below the predetermined level (e.g., 200 milliamps or multiplexer 220 so that bus recovery tool 144 residing in 
more) corresponding to a radiation-induced glitch or short memory 1004 may perform process 900 as described above 
circuit. After transmitting the message 710 or 412 indicating to automatically detect and clear a radiation induced latch- 
communication has been restored, the bus recovery tool 144 55 up or upset condition associated with bus interface circuit 
may end processing as shown in FIG. 8. 110, 112, 114, or 116. In this implementation, the bus 
FIG. 9 depicts a flow diagram illustrating a exemplary recovery tool 144 may send a control message 300 directly 
process 900 performed by the bus recovery tool 144 of each to latch 216 and monitor a sensed current level directly from 
node 102a-102n to detect a bus interface circuit of the node multiplexer 220. As shown in FIG. 10, the CPU 1002 may 
that is experiencing a radiation induced latch-up or upset 60 also be operatively connected to the backplane or second 
error on a bus 106 or 108 and to clear the detected latch-up network 124 so that the bus recovery tool 144 may perform 
or upset error. Thus, by performing process 900, each node process 800 and respond to a recovery command 143 from 
102a-102n may automatically recover from a latch-up or the bus management tool 142 on the bus 106 or 108. 
single event functional interrupt caused by a radiation FIG. 11 depicts a block diagram of another vehicle data 
induced glitch or current surge on a bus interface circuits 65 processing system 1100 suitable for practicing methods and 
110,112,114, or 114 operatively connected to respective bus implementing systems consistent with the present invention. 
106 or 108. Initially, the bus recovery tool 144 of a respec- The data processing system 1100 also includes a plurality of 
US 7,228,442 B2 
15 16 
nodes 102a-102n operatively connected to a network 1102 
having a primary bus 106 and a secondary bus 1104. In this 
implementation, the secondary bus 1104 is a different type 
with both object-oriented and non-object-oriented program- 
ming systems. The claims and their equivalents define the 
scope of the invention. 
of bus than the primary bus 106. For example, the prim& 
bus 106 may be configured to implement a first communi- 5 
cation protocol such as a IEEE-1394b cable based network 
protocol and the secondary bus 1104 may be a multi-drop 
bus, such as an Inter-IC or 12C bus. In this implementation, 
the secondary bus 1104 connects the bus management tool 
142 in node 102a to a bus interface recovery circuit 126 in 10 
each of the nodes 102a-102n of the data processing system 
1100, such that the bus management tool 142 and the bus 
interface recovery tool 144 of node 102a may control the 
respective bus interface recovery circuit 126 of each node 
102a-102n in accordance with methods consistent with the 15 
present invention. 
As shown in FIG. 11, each node 102a-102n has at least 
one bus interface circuit (e.g., a PHY controller 110 andor 
a Link controller 114) to operatively connect a data process- 
ing computer 118, 120, and 122 of the respective node 2o 
102a-102n to the primary bus 106. Each data processing 
computer 118, 120, and 122 is operatively connected to the 
bus interface circuit via a second network 124 as described 
above for data processing system 100. In one implementa- 
tion, the PHY controller 110, the Link controller 114, and the 25 
bus interface recovery circuit 126 or 128 may be incorpo- 
rated into a single network interface card 127. 
In this implementation, when performing the process 
depicted in FIG. 5, the bus management tool 142 may detect 3o 
a bus interface circuit (e.g., circuit 110 or 114) of a node that 
is experiencing a radiation induced latch-up or upset error on 
the primary bus 106 and send a recovery command to 
recover communication on the primary bus 106 to the 
unresponsive node on the secondary bus 1104 so that the bus 35 
recovery tool 144 may perform the process depicted in FIG. 
What is claimed is: 
1. A network interface apparatus comprising: 
a bus interface circuit for operatively connecting a net- 
work interface card to a data bus, wherein the bus 
interface circuit is a physical layer controller; 
a power controller operatively connected to the bus inter- 
face circuit; 
a first current sensor operatively connected to the bus 
interface circuit to sense a first current level in the bus 
interface circuit; and 
means for determining whether the first sensed current 
level exceeds a predetermined level and for causing the 
power controller to cycle power to the bus interface 
circuit in response to determining that the first sensed 
current level exceeds the predetermined level; 
a link layer controller operatively connected to the data 
bus via the physical layer controller; 
a second power controller operatively connected to the 
link layer controller; and 
a second current sensor operatively connected to the link 
layer controller to sense a second current level in the 
link layer controller; 
wherein the means for determining and for causing further 
comprises means for determining whether the second 
sensed current level exceeds the predetermined level 
and for causing the power controller to cycle power to 
the physical layer controller and the second power 
controller to cycle power to the link layer controller in 
response to determining that one of the first sensed 
current level or the second sensed current level exceeds 
the predetermined level. 
2. A network interface apparatus of claim 1, further _ _  
8 to recover communication on the primary bus 106 for the comprising a switch operatively configured to select one of 
unresponsive node. the first and second current sensors to report the selected 
Since the secondary bus 1104 connects the bus manage- sensed current level to the means for determining whether 
merit tool 142 to the bus interface recovery circuit 126 of 4o the selected sensed current level exceeds the predetermined 
each node 102~-102n, the bus management tool 142 may, in level and for causing the power controller to cycle power to 
lieu of or in response to sending a recovery command on the the bus interface circuit in response to determining that the 
secondary bus, cause the bus recovery tool 144 of node 102a selected sensed current level exceeds the predetermined 
to re-initialize or cycle power to the bus interface circuit level. 
(e.g., PHY controller or Link controller) of the node expe- 45 3. A network interface apparatus of claim 2, further 
riencing a radiation induced error. To re-initialize the PHY comprising 
controller 110 and the Link controller 114, the bus recovery a terminal operatively connected between another data 
tool 144 of node 102a may transmit one or more control bus and the switch; and 
messages 300 in FIG. 3 via bus 1104 to the respective bus a latch operatively configured to receive an enable signal 
interface recovery circuit 126 of the unresponsive node 50 via the terminal and to provide the enable signal to the 
102a-n so that power controllers 206 and 210 re-cycle switch to allow one of the first and second current 
power to the PHY controller 110 and the Link controller 114 sensors to continuously report the selected sensed cur- 
as discussed above in reference to FIG. 2. In one imple- rent level on the other data bus. 
mentation, the recovery command may comprise the one or 4. A network interface apparatus of claim 3 further 
more control messages 300 for effecting the re-initialization 55 comprising a bus switch operatively connected to the power 
of the bus interface circuit of the unresponsive node 102a-n. controller and between the bus interface circuit and the 
The foregoing description of an implementation of the terminal, the bus switch being operatively configured to 
invention has been presented for purposes of illustration and isolate the bus interface circuit when the power controller 
description. It is not exhaustive and does not limit the cycles power to the bus interface circuit. 
invention to the precise form disclosed. Modifications and 60 5. A network interface apparatus of claim 4, wherein 
variations are possible in light of the above teachings or may re-initializing the bus interface circuit comprises inhibiting 
be acquired from practicing of the invention. Additionally, a current from the physical layer controller from reaching 
the described implementation includes software, such as the the link layer controller. 
bus management tool, but the present invention may be 6. A network interface apparatus of claim 4, wherein 
implemented as a combination of hardware and software or 65 re-initializing the bus interface circuit comprises inhibiting 
in hardware alone. Note also that the implementation may a current from the link layer controller from reaching the 
vary between systems. The invention may be implemented physical layer controller. 
US 7,228,442 B2 
17 18 
7. A network interface apparatus of claim 1, further re-initializing the bus interface circuit in response to 
comprising means for selectively causing the current sensor determining that the sensed current level exceeds the 
to sense a current level that exceeds the predetermined level. predetermined level, 
8. A network interface apparatus of claim 1, further wherein the bus interface circuit is one of a plurality of 
comprising: 5 bus interface circuits of the node operatively connect- 
a second bus interface circuit for operatively connecting ing the node to the data bus and sensing a current level 
comprises selecting the sensed current level associated 
a second power controller operatively connected to the with one of the plurality of bus interface circuits; 
wherein a first one of the plurality of bus interface circuits 
a third current sensor operatively connected to the second lo is a physical layer 
wherein re-initializing the first one of the plurality of bus 
interface circuits comprises inhibiting a current from a 
power bus from reaching the physical layer controller; 
and 
wherein a second of the plurality of bus interface circuits 
is a link layer controller and re-initializing the second 
of the plurality of bus interface circuits comprises 
inhibiting a current from the link layer controller from 
reaching the physical layer controller. 
the network interface card to a second data bus; 
second bus interface circuit; 
bus interface circuit to a third current level in the 
second bus interface circuit; and 
means for determining whether the sensed third current 
level exceeds a second predetermined level and for 
automatically causing the second power controller to 
cycle power to the second bus interface circuit in 
response to determining that the sensed third current 
level exceeds the second predetermined level. 
15 
9. A interface apparatus Of 8, wherein the 20 15, A computer-rea&ble of claim 14, further 
current level in the bus interface circuit and the second 
current level in the second bus interface circuit are sensed 
substantially simultaneously. 
10. A method in a data processing system including a 
network having a data bus, the method comprising: 
sensing a current level in a bus interface circuit opera- 
tively connecting a node on the network to the data bus; 
determining whether the sensed current level exceeds a 
predetermined level; and 
re-initializing the bus interface circuit in response to 30 interface circuits are sensed substantially simultaneously. 
determining that the sensed current level exceeds the 
predetermined level, 
wherein the bus interface circuit is one of a plurality of 
bus interface circuits of the node operatively connect- 
ing the node to the data bus and sensing a current level 35 
comprises selecting the sensed current level associated 
with one of the plurality of bus interface circuits; 
wherein a first one of the plurality of bus interface circuits 
is a physical layer controller; 
wherein re-initializing the first one of the plurality of bus 40 
interface circuits comprises inhibiting a current from a 
power bus from reaching the physical layer controller; 
and 
wherein a second of the plurality of bus interface circuits 
of the plurality of bus interface circuits comprises 
inhibiting a current from the link layer controller from 
reaching the physical layer controller. 
11. A method of claim 10, further comprising providing a 
test signal to one of the plurality of bus interface circuits to 
cause the sensed current level to exceed the predetermined 
level. 
12. A method of claim 10 wherein re-initializing one of 
the plurality of bus interface circuits is completed within a 55 
period equal to or greater than 10 milliseconds. 
13. Amethod of claim 10, wherein the current levels in the 
first and second of the plurality of bus interface circuits are 
sensed substantially simultaneously. 
causing a program in a data processing system to perform a 
method, the data Processing system including a network 
having a data bus, the method comprising: 
sensing a current level in a bus interface circuit opera- 
determining whether the sensed current level exceeds a 
comprising providing a test 
bus interface circuits to 
exceed the predetermined level. 
to one of the plurality of 
the sensed current level to 
16. A computer-readable medium of claim 14 wherein 
25 re-initializing one of the plurality of bus interface circuits is 
completed within a period equal to or greater than 10 
mi~~iseconds, 
17. Acomputer-readable medium of claim 14, wherein the 
current levels in the first and second of the plurality of bus 
18. A network interface apparatus, comprising: 
a bus interface circuit for operatively connecting a net- 
work interface card to a first bus; 
a power controller operatively connected to the bus inter- 
face circuit; 
a first current sensor operatively connected to the bus 
interface circuit to sense a first current level in the bus 
interface circuit; and 
means for determining whether the first sensed current 
level exceeds a predetermined level and for causing the 
power controller to cycle power to the bus interface 
circuit in response to determining that the first sensed 
current level exceeds the predetermined level, 
wherein the bus interface circuit is a physical layer 
is a link layer controller and re-initializing the second 45 and the interface further 
comprises: 
a link layer controller operatively connected to the bus via 
the physical layer controller; 
a second power controller operatively connected to the 
link layer controller; and a second current sensor opera- 
tively connected to the link layer controller to sense a 
second current level in the link layer controller, 
wherein the means for determining and for causing further 
comprises means for determining whether the second 
sensed current level exceeds the predetermined level 
and for causing the power controller to cycle power to 
the physical layer controller and the second power 
controller to cycle power to the link layer controller in 
response to determining that one of the sensed current 
level or the second sensed current level exceeds the 
predetermined level. 
19. A network interface apparatus of claim 18, further 
comprising a switch operatively configured to select one of 
the first and second current sensors to report the selected 
tiveb connecting a node on the network to the data bus; 65 sensed current level to the means for determining whether 
the sensed current level exceeds the predetermined level and 
predetermined level; and for causing the power controller to cycle power to the bus 
50 
14. A computer-readable medium containing instructions 60 
US 7,228,442 B2 
19 20 
interface circuit in response to determining that the sensed circuit, the bus switch being operatively configured to iso- 
current level exceeds the predetermined level. late the bus interface circuit when the power controller 
20. A network interface apparatus of claim 19, further cycles power to the bus interface circuit. 
22. A network interface apparatus of claim 21, wherein comprising 
a terminal operatively connected between a second bus 5 re-initializing the bus interface circuit comprises inhibiting 
a current from the physical layer controller from reaching and the switch and a latch operatively configured to receive an enable signal the link layer controller. via the terminal and to provide the enable signal to the 
switch to allow one of the first and second current 23. A network interface apparatus of claim 21, wherein 
to continuous~y report the selected sensed cur- and re-initializing the bus interface circuit comprises inhib- 
rent level on the second bus. iting a current from the link layer controller from reaching 
21. A network interface apparatus of claim 20 further the Physical layer controller. 
comprising a bus switch operatively connected to the power 
controller and the bus upstream from the bus interface * * * * *  
