Express lanes modification to the data vortex photonic all-optical path interconnection network by Bozek, Matthew Peter



























In Partial Fulfillment 
Of the Requirements for the Degree 
Master of Science in the  













































     Approved by:  
 
 
     Dr. D. Scott Wills, Advisor 
     School of Electrical and Computer Engineering 
     Georgia Institute of Technology 
 
     Dr. Sudhakar Yalamanchili 
     School of Electrical and Computer Engineering 
     Georgia Institute of Technology 
 
     Dr. David C. Keezer 
     School of Electrical and Computer Engineering 
     Georgia Institute of Technology 
 
 












 This research could not have been conducted without the efforts of others.  
Appreciation goes out to Dr. Cory Hawkins whose research and code founded the basis 
of this research; the previous research done by Dr. Qimin Yang, Karen Bergmen and her 
students; to Dr. D. Scott Wills for introducing me to this topic and his advisement 
through the thesis process; and to PICA members Krit Athikulwongse and Senyo 


































LIST OF TABLES..............................................................................................................vi 
 




CHAPTER 1: INTRODUCTION........................................................................................1 
1.1 Express Lane Addition………..……………….............................................................1 
1.2 Semi-Express Lane Addition………….……… ...........................................................2 
1.3 Express Output Lane Addition…...................................................................................3 
1.4 Summary of Research Contributions.............................................................................4 
 
CHAPTER 2: ORIGIN AND HISTORY …...……………................................................5 
2.1. Developing Optical Technology...................................................................................5 
2.2. Express Lane Modification...........................................................................................8 
2.3. Optical Interconnection Networks and “Express Lanes”............................................11 
2.4 Data Vortex Interconnection Network..................................................... ...................16 
 




CHAPTER 4: TOPOLOGY ENCHANCEMANT STUDY…..........................................23 
4.1 Express Lane Modification……………………………..............................................23 
4.2 Semi-Express Lane Modification................................................................................37 
4.3 Express Output Addition….........................................................................................46 
 






































Table 1. Data Vortex parameters for express lane performance study…..........................26 
 
Table 2. Accepted traffic and average latency comparisons between the unmodified data 
vortex and express lane data vortex for a height of 256. ………………………………..30 
 
Table 3. Accepted traffic and average latency comparisons between the unmodified data 
vortex and express lane data vortex for an angle count of 
6.........................................................................................................................................32 
 
Table 4. Accepted traffic and average latency comparisons between the unmodified data 
vortex and express lane data vortex under conditions of locality for a height of 256. 
...........................................................................................................................................35 
 
Table 5. Accepted traffic and average latency comparisons between the unmodified data 
vortex and express lane data vortex under conditions of locality for an angle count of 
6…………………………………………………………………………..……………...37 
 
Table 6. Accepted traffic and average latency comparisons between the unmodified data 
vortex and semi-express lane data vortex for a height of 256…………………………...40 
  
Table 7. Accepted traffic and average latency comparisons between the unmodified data 
vortex and semi-express lane data vortex for an angle count of 
6……………………………………………………………….........................................41 
 
Table 8. Accepted traffic and average latency comparisons between the unmodified data 
vortex and semi-express lane data vortex under conditions of locality for a height of 
256……………………………………………………………………………………….44 
 
Table 9. Accepted traffic and average latency comparisons between the unmodified data 
vortex and semi-express lane data vortex under conditions of locality for an angle count 
of 6....................................................................................................................................46 
 
Table 10. Accepted traffic and average latency comparisons between the unmodified data 
vortex and express output data vortex for a height of 256. 
...........................................................................................................................................49 
 
Table 11. Accepted traffic and average latency comparisons between the unmodified data 
vortex and express output data vortex for an angle count of 6.........................................51 
 
 
Table 12. Accepted traffic and average latency comparisons between the unmodified data 





Table 13.  Accepted traffic and average latency comparisons between the unmodified 

















































Figure 1. Flow map of two main sources of research…………………………………......8 
 
Figure 2. an 8-ary 2-Cube (torus) [18]….............................................................................9 
 
Figure 3. k-ary 1 cube with express channels, n=node, I=Interchange [19]……................9 
 
Figure 4. 2 level hierarchal express cube [19]…...............................................................10 
 
Figure 5. Distributed Crossbar Switch Hypermesh[20]………………………………….11 
 
Figure 6. An example shuffleNet connecting eight I/O nodes four wavelengths[16].......13 
 
Figure 7. directed graph of binary shift register is an example of a  
de Brujin graph [16]…………………………………………………………...…………13 
 
Figure 8. The Manhattan street network.[16]…................................................................14 
 
Figure 9. The RAPID network shown in a) architectural view and b) conceptual 
diagram.[16].......................................................................................................................15 
 
Figure 10. Diagram illustrating the data vortex topology[2]….........................................16 
 
Figure 11. Technology independent 2x2 Switch [16]……................................................20 
 
Figure 12. A Data Vortex with five angles[16]…….........................................................24 
 
Figure 13.  Data Vortex with express angle……………………………..........................25 
 
Figure 14.  Random express lane angle comparison……………….................................29 
 
Figure 15.  Random express lane height comparison……................................................31 
 
Figure 16.  Locality express lane angle comparison…………………….........................34 
 
Figure 17.  Locality express lane height comparison…………………………................36 
 
Figure 18.  Semi-Express Lane modification…................................................................38 
 
Figure 19.  Random Semi-express lane angle comparison…............................................39 
 




Figure 21.  Locality Semi-express lane angle comparison…............................................43 
 
Figure 22.  Locality Semi-express lane height comparison……………………………...45 
 
Figure 23. Express Outputs modification……..................................................................47 
 
Figure 24. Random express output angle comparison…...................................................48 
 
Figure 25. Random express output height comparison……………..................................50 
 
Figure 26. Locality express output angle comparison………...........................................52 
 






































 Today’s supercomputers require interconnection networks with high bandwidth 
and low latency to exploit parallelism.  The data vortex is an all optical path 
interconnection network defined [3,5] and then proven to achieve high level of message 
acceptance and low levels of message latency [16].  In this thesis research, three 
enhancements to the data vortex are defined and tested for performance. They are 
compared to an unmodified data vortex using the average latency and offered traffic 
acceptance rates as metrics.  Minimal angle counts are established where express lane 
enhancements are established.  An express lane enhancement allows exploitation of 
locality yielding an 8% to 12 % reduction in average latency and a 4% to 6% increase in 
message acceptance.  Semi-Express lanes cannot effectively exploit locality but still yield 
a 20% increase in message acceptance and a 4% decrease in average latency. Express 
outputs can exploit locality for a 28% to 32% increase in message acceptance and 12% to 
15% decrease in average latency.
 1 
 
CHAPTER 1: INTRODUCTION 
 
 Today’s supercomputers solve computationally demanding programs through 
massive parallelism. A critical factor to parallelism is interconnection latency.  An optical 
interconnection network employing wavelength division multiplexing (WDM) offers 
high bandwidth and low latency.  However due to lack of random-access optical memory 
optical interconnection networks do not have access to traditional buffering methods. 
The data vortex is designed to overcome this limitation.  The data vortex[3,5] is 
an all-optical path technology that bypasses the need for standard buffering by use of 
deflection routing around concentric cylinders to provide non-blocking communications 
and virtual buffering. 
  This thesis presents enhancements of the data vortex topology. Enhancements 
will include three different variations of express lane. Improvements presented are 
offered traffic acceptance rate and reductions in packet latency under a heavy random 
traffic load and under conditions exploiting locality. 
 
 
1.1 Express Lane Addition 
 
The first modification explored in this research is the express lane modification. 
One angle of the model is altered from the base (patented) version to enhance its 
performance.  One angle of the network model now contains an express lane, a direct 
connection from the input node in the outermost cylinder to the output node in the 
 2 
 
innermost cylinder at the same height and angle. This allows bypass through all cylinders 
directly to output, protecting them from all deflections except at the output node. All 
nodes on this angle not in the I/O cylinders are removed.  This direct connection 
increases sensitivity to locality which can be exploited. The performance is then tested by 
simulation for varying network angles, heights, traffic loads and locality. This is 
compared to message acceptance and average latency of an unmodified data vortex. The 
simulator used is a custom, cycle accurate simulator used in previous data vortex 
research[16] modified to support express lane enhancements. 
The number of angles in the data vortex is found to have a large impact of 
performance, too few angles lead to performance loss. Benefits in latency are seen 
starting at four angles. A trade off in benefits develop after this point, more angles 
improves packet acceptance but increase average latency.  In a random traffic pattern no 
benefits are seen in message acceptance but baseline (as seen in an unmodified data 
vortex) performance is approached at nine angles.  The express lane modification can 
yield 4% reduction in average latency for a systems not utilizing locality. Yields in 
systems utilizing locality are 8% to 12% reduction in average latency and a 4% to 6% 
increase in message acceptance.  This topology change illustrate that the data vortex, 
while an indirect network, can be modified to exploit locality if the user’s application 
warrants. 
 
1.2 Semi-Express Lane Addition 
 
 The second modification explored in this research is the semi-express lane 
 3 
 
modification.  Similar to the express lane enhancement, it will provide advancement only 
for packets at the correct height and angle.  However the semi-express lane will provide 
nodes at every cylinder.  This will reduce the benefit of packets that use the semi-express 
lane as more hops are required to reach the output.  The advantage that the semi-express 
lane provides is increased opportunities for packets to utilize it.  It can by utilized at any 
cylinder, not just the outermost. Performance is tested by simulation under the same 
circumstances as the express lane. 
The numbers of angles again have a powerful impact on performance, as too few 
cause large penalties in packet acceptance.   At five angles large increases can be seen in 
packet acceptance, which beyond this point grants further benefits to packet acceptance at 
the expense of benefits in average latency. Semi-express lanes yield a 20% increase in 
packet acceptance, improvements can also be seen in average latency with yields 
reaching 4% reductions.   
Unlike the express lane, no enhancements in performance are seen in traffic 
affected by locality as the semi-express lane will not protect packets utilizing it from 
deflection any more then the unmodified angles.  Average latency will rise in semi-
express lanes under conditions of locality as the penalties of extra hops from packets 
unable to use the lane dominate. 
 
 
1.3 Express Output Lane Addition 
 
 The last modification explored in this research is the express output 
 4 
 
modification.  While the previous express lanes concentrated at getting messages at the 
right angle and height to the innermost (output) cylinder the express output lane adds 
additional outputs for immediate egress of such packets.  This adds a considerable cost 
due to the current high cost of I/O nodes compared to non I/O nodes.  Having multiple 
outputs to the same destination will also add some hardware cost for output buffering to 
avoid contention, slight latency penalties may also be added due to additional buffering. 
For its additional cost the express outputs show considerable improvements to 
performance.  After four angles are added improvements are seen in performance, 
yielding an average latency decrease of 10% and message acceptance increase of 24%.  
Like the express lane the express output lane shows sensitivity to locality, showing 
additional improvements in performance over an unmodified data vortex yielding 28% to 
32% increase in packet acceptance and 12% to 15% decrease in average locality.  
 
1.4 Summery of Research Contributions 
 
 Key contributions to knowledge made by this thesis research are summarized 
here. 
• Definition and evaluation of three different data vortex enhancements 
• Determination that better performance can be obtained with a moderate to 
large number of angles in an express lane enhanced data vortex 
• Determination that express lane and express output lane enhancements allow 




CHAPTER 2:  ORIGIN AND HISTORY 
  
 Today’s supercomputers use large scale parallelism to solve computationally 
complex problems.  In order to allow the processors to coordinate their efforts on a single 
problem a capable interconnection network is needed.  With the trend of increasing 
processor count to increase performance (all the current top 25 supercomputers in the 
world have more then a thousand processors according to the Top500 supercomputer 
sites website [23]) the pressure on the performance of the interconnect network grows.  
For such large high performance supercomputers an interconnect network that offers high 
bandwidth and low packet latency while offering scalability to increasing I/O is needed.  
To ensure scalability in large scale systems a multi-hop net work serves best.  A 
single hop network (such as a bus or star network) only offer sufficiently low latency for 
smaller networks but poor scalability hinders use in large scale systems. 
 The optical domain provides several serious advantages over the electrical 
domain. Optical fibers do not need to use signal regeneration as often as wires over long 
distances and has the ability to employ wavelength division multiplexing (WDM) giving 
multiple data channels for use.  Messages can be transmitted in a parallel form in WDM 
packets [1] reducing transit time.  Multiple nodes can also share a single link by utilizing 
different wavelengths in some network implementations.   
 
2.1 Developing Optical Technology 
 
 Optical communications have been part of human history for thousands of years. 
 6 
 
Reflected and generated light have been used to communicate over distance in air using a 
wide variety of methods including reflecting solar light off glass, and light generated by 
signal lamps and fire. Optical fiber guided networks were introduced in 1966 by Kao and 
Hockham [24] utilizing optical fibers and the recently proposed laser[25] to bolster the 
British telephony system against high demand.  However impurities in the fiber used 
prevented them from being a suitable use for communication systems.  Advances in fiber 
technology eventually overcame this and in the early 1970s new purer fibers were being 
used in trails of optical telephone systems. 
 Many improvements have been made since the early days of optical networking.  
Fiber purity has improved, lowering message attenuation. Advances in lasers and optical 
receivers improved message generation and acceptance.  Optical components have 
achieved higher and higher rates and quality as the growing world telecom industries 
push for rapid reliable communication. 
 Optical networking for interconnection network has different needs then the needs 
of the telecom industry, meaning many of the new developments in optical networking 
have no apparent impact in the context of this research.  While the result of this has been 
a slow start in optical networking for multicomputer interconnection networks, many 
ongoing improvements have been made.  New optical topologies and lower cost 
switching elements like SOAs (Semiconductor optical amplifiers) increase feasibility of 
using optics in a parallel computer interconnection network.  WDM (Wavelength division 
multiplexing) grants optics a massive bandwidth advantage over wired electronics.  Fiber 
optics has large data capacity and by adding multiple channels to the fiber length it 
increases many fold.  Current technology allows for 60 channels per fiber [26] and 
 7 
 
promises thousands more in the near future [27].  
 For all the advantages that can be provided by optical interconnection systems the 
lack of random access memory means that most electronic interconnection networks 
cannot be directly utilized by simply transplanting optical technology.  Any manner of 
standard buffering in modern optical technology requires optical to electric conversion to 
buffer and electric to optical conversion back to the network.  This causes undesirable 
increases in costs in terms of latency, power use and cost of hardware.  To avoid this 
topologies that store and forward packet switching are not preferred for use with optical 
interconnections.  Topologies that use deflection routing, sometimes referred to as hot 
potato routing [28,29],  utilizing alternate paths always open for deflection avoid 






Figure 1.  Flow map of two main sources of research leading to this concept, data 
vortex performance analysis and express paths added and their effects on otherwise 
uniform topologies. 
 
 In this thesis research enhancements to a network specifically designed to 
utilize deflection routing in a photonic environment is examined.  As seen in Figure 1. 
this research flows directly from the study of this network (known as the data vortex) but 
enhanced with a concept known as express channels.  
 
2.2 The “Express Lane” Modification: 
 A uniform topology is simple in layout, easy to expand and often well focused 
around limiting factors that affect critical metrics in network design.  However it does not 
 9 
 
take advantage of locality and have defining limitations in their design.  Non uniform 
additions that can bypass clusters of nodes in uniform design can improve locality and 
help lessen limitations.   
 
 
Figure 2, an 8-ary 2-Cube (torus) [18] 
 
 
For example K-ary n-cubes, as described in a 1990 performance analysis as “cubes with 
n dimensions and k nodes in each dimension.” [18](see Figure 2), as wiring density is a 
cost limiting factor the cubes were studies under the assumption of constant bisection, 
concluding that lower dimensionality but containing wider channels are superior in terms 
of latency and congestion. 
However, the k-ary n-cube has to deal with nodal delays and other nodal effects that 
introduce latency; the solution to this is the introduction of express channels in 1991. [19] 
 
 





While the express channel requires the addition of interchanges (see Figure 3) and 
increase bisectional width they greatly improve nonlocal message latency and cut 
congestion through the local levels of travel through the slower nodal travel. 
 Little improvement is seen in local area package latency, as the nodal effects on 
latency still dominate. 
Extra express lanes can be added to the bisectional width limit, if desired each 
interchange can have connections to each express lane however certain benefits can be 
seen from a hierarchal approach. 
 
 
Figure 4, 2 level hierarchal express cube [19] 
 
 
By using several hierarchal express lanes with extra interchanges, latency is improved 
at both local and nonlocal communication levels.  The local messages benefit from more 
interchanges and more express channels reducing latency with local routing.  The 
nonlocal messages will have unique accent phases, hitting higher levels of the express 
cube the farther it needs to travel, leaving lower levels open for local travel. 
It is important to note that these express channels do not need to be uniform in nature, 
the interchanges do not discriminate from which channel messages arrive from allowing 
local areas where repeat traffic is expected such as system resources with spacial locality. 
An express lane provides the K-ary N-cube with partial bypass clusters of nodes within 
 11 
 
a dimension, if the express lanes are expanded to such as extend that all nodes have a 
direct path to one another the network could be said to have total bypass within a 
dimension.[20]  Such a system would actually be a hypermesh such as the Distributed 




Figure 5, Distributed Crossbar Switch Hypermesh [20] 
 
 The powerful natural of the DCSH is enabled by its crossbar, which functions as a 
multidirectional express channel.  Express channels are a boon for locality as they 
directly connect nearby nodes, the DCSH gives all elements within the dimension the 
benefits of these connections. 
 When compared to the torus the benefits are the DSCH supporting a wide range 
of traffic patterns can be seen. [21] The local switching delay means that with even very 
high degrees of locality the switching time involved will end up dominating and latency 
will increase. 
 
2.3 Optical Interconnection Networks and “Express Lanes” 
 
 Applications of optical networking in supercomputing have created a new type of 
 12 
 
supercomputer [16]. A grid type, also known as a transistor type or “type-T” connects 
many stand alone type systems together in LAN (Local area network), adding additional 
computers to a system until it is capable of handling the problem.  While these systems 
are cheaper (especially in terms of cost per performance ratio) then traditional systems 
they do not address the issue of high message latency, communication between stand 
alone systems is outputted through slower network ports.  With an optical interconnection 
network in a more tradition supercomputer architecture (a “type-c” [30]) allows 
transmission much closer to the processor.   Since the interconnection of type-T networks 
work like other LANs scaling the system means the addition of switches and other 
networking devices.  The communication at the inner node level gives type-C systems a 
marked edge when scaled to very large node sizes. 
 Several topologies can be useful as a type-C all-optical interconnection network.  
Several are similar in application to the data vortex [3,5] and may become competition 
for development. These networks have followed a pattern of study and enhancement as 
summarized in the performance enhancement of the data vortex [16] by Dr. Cory 
Hawkins. 
 These topologies vary in terms of sensitivity to locality and not all are good 
candidates for express lane modification.  ShuffleNet [31,36] is a family of networks 
based on the perfect shuffle [37] graph arrangement.  Physical topology is flexible in this 
system, as routing is performed at the wavelength level via WDM of fiber lengths as seen 
in figure 6.  At this wavelength routing level the perfect shuffle is as an indirect network 





Figure 6. An example shuffleNet connecting eight I/O nodes four wavelengths [16]. 
 
 De Brujin graph networks [32,33] are based on a family of graphs and similar to 
Shufflenet.  Nodes are connected based on a left shift or right shift of the source nodes 
address as seen in Figure 7.  Like Shufflenet the indirect network set up makes express 
link enhancement doubtful. 
 
Figure 7.  This directed graph of binary shift register is an example of a de Brujin 
graph. [16] 
 
 MSN (Manhattan Street Network) [34] is regular mesh structure similar to a 
hypercube or torus [38,39] with wrap around unidirectional links as seen in figure 8.  
Having a grid structure makes MSN a great candidate for express link enhancement.  
Placement of multilevel express lanes could be done in the same manner as k-ary n-
 14 
 
cubes, providing bypass of nodes.  However the complex routing used in MSN is a hurtle 
to be overcome, optical systems do not have much time to do so before the packet must 
be forwarded. 
 
Figure 8. The Manhattan street network.[16] 
 
 RAPID [35] has been designed specifically for a DSM (distributed shared 
memory machine).  The network proposed by the designers [40] is an entire technology 
specific system.  It’s topology is a wavelength dependent crossbar as seen in figure 9.  
The interconnect topology is as connected as it can get, as a crossbar any attempt at 





















2.4 Data Vortex Interconnection Network 
  
 
Figure 10. Diagram illustrating the data vortex topology. A data vortex of C = 3, 
H = 4, and A = 3 (top), with height crossing patterns of the three cylinders 
(bottom). Curved lines are deflection fibers, straight lines are ingression fibers, 
and dotted lines are electronic deflection signal control cables.[2] 
 
 
 The Data Vortex was first introduced in a patent by Coke Reed of the National 
Security Agency in 1993[3] (refined in 2001 [5]). 
 It was described as “[the] interconnect structure operates as a “deflection” or “hot 
potato” system in which processing and a storage overhead at each node is 
minimized.”[5].  
 As seen in figure 2, the data vortex consists of concentric cylinders (which form 
“stages” of network alike the stages of a butterfly network) which all incorporate 
 17 
 
deflection routing.  All cylinders have on them 2x2 switching elements (nodes), nodes are 
arranged in the columns around each cylinders.  Each column of nodes is specified as an 
‘angle’, the height of each column is also specified. Total nodes are angles multiplied by 
height multiplied by cylinders. 
Deflection decisions are all made locally with no central control or buffering.  If 
the packet is at the correct height ingression occurs and the packet moves into the next 
cylinder at the same angle and height. If not, it is deflected to another height of the next 
angle within the same cylinder.  The lengths of fiber between the nodes are of sufficient 
length to “virtually buffer” no electrical buffering or OEO conversion takes place. 
Following the patent, research continued at Columbia University on the data 
vortex[1, 4]. By 2003 work had been done on the optical physical components of the data 
vortex[2, 6, 8] in a physical laboratory setting. Later routing function and testing at our 
current technology level have been done [8,11,12,13]. 
Prior to 2003 bursty synthetic data traffic was only simulated on data vortexes of 
fixed small size[1,4] extended to a collaboration effort with Georgia Tech[22]. 
The first large scale performance simulation over a large range of angles, height 
and random traffic patterns was done at the Georgia Institute of Technology was 
presented by Cory Hawkins in 2007[16]. 
Three major tests completed the comparison of the data vortex against other 
optical networks, such as the perfect shuffle and butterfly, under random synthetic load.  
The data vortex was shown to be very capable in comparison with the other networks, 
especially in terms of total packets accepted out of attempted injections as the network 
grows to large (512 nodes) size. 
 18 
 
The parameter adjustment study, metrics of latency and successful injection 
attempts were again measured on data vortexes of varying height and total angle count.  
Also of focus was the affects of injection angles [15], that is angles on the outer cylinder 
that are used for input or those that are used for simple routing. 
This study demonstrated the critical steps of the system containing adequate 
acceptance of messages (as all rejected require a total retransmission of message) and the 
total latency of the system. Often there was a trade off between these metrics as a balance 
was found in establishing sufficient angles to buffer to allow angles acceptance versus 
over buffering causing latency. [16] 
The network topology study verified the patented data vortex [5].  Intercylinder 
link arrangements were adjusted to butterfly and perfect shuffle, both were found to harm 
performance [16]. A hierarchical layering and clustering was defined and found to 











CHAPTER 3: RESEARCH METHODOLOGY 
 
  In this research, all enhancements are measured by relative comparison to 
an unmodified data vortex by means of a custom-written data vortex simulator written in 
C++.  The simulator was written by Dr. Cory Hawkins and used to extract all data from 
his research into the data vortex [16].  The simulator has been modified to support all 
three varieties of express lane enhancement.  The simulator is cycle-accurate and models 
the whole-network system level of the data vortex.  The focus of this research is on 
performance inherit to network topology not node and link technology.  As the exact 
workings of physical nodes and properties of optical fiber are technology dependent a 
sub-system view of their workings is not needed.   Optical switches have made the 
change in technology from lithium niobate [41] to lower cost semiconductor optical 
amplifiers (SOAs)[12].  As technology progress SOAs will improve in performance or be 
replaced by new technology.  Switches of the vortex are modeled as simple 2x2 switches 
to remain technology independent. 
 As seen in Figure 11 the 2x2 switch has inputs from the outer cylinder and same 
cylinder.  Outputs are likewise to the same cylinder and inner cylinder.  A control input 
comes from the inner cylinder to notify of an incoming packet from the same cylinder 
blocking the output to the inner cylinder, the node has a control output to notify of 
deflection. At the beginning of each cycle within each node any present packet at an input 
gets outputted and the hop of the packet count is incremented.  Routing decisions are 
made depending on the packet header and the nodes height in the vortex. A packet at the 
correct height can advance one cylinder as long as there is no deflection bit active. 
 20 
 
Different conditions must be met for nodes that make up an express lane which will be 
covered later. Otherwise the packet must output to the same cylinder, activating its 
deflection bit in the process. 
 
Figure 11. Technology independent 2x2 Switch [16] 
 
  The scope of system size that would be most likely to employ a photonic network 
is hundred of meters wide, meaning that travel time in fiber would out weight switching 
time [42].  With this factor dominating and technology independence in mind the 
simulator does not model switching time.  Instead a straight count of hops (defined as 
fiber lengths encountered between switches) is maintained to measure system latency.   
 Like cycles in electronic networks the data vortex uses slots to hold messages.  In 
the simulator they are modeled as a single cycles and cycle/slot accuracy is maintained.  
Each message is assumed to be contained in one packet and in one slot keeping each 
message in one switching node at the star of every cycle. Links are assumed to be one 
cycle in length.  This keeps the simulations results easier to interpret for future use, as 
hop counts can be multiplied by the fiber length in terms of optical packet time slots.  For 
example if a system has fiber lengths that require 15 optical packet time slots to traverse 
the hop count can be multiplied by 15 to find the correct latency. 
 21 
 
 The packets themselves only need to be routed so they are modeled as a WDM 
header with no payload.  The simulator injects packets with either randomly generated 
destination heights and angles or with a local angle and height relative to its input.  
Probability of injection and percentage of local destinations generated can be specified by 
the user by the command line argument input at simulator execution.  Also controlled by 
command line input is the height, number of angles and the addition of express lane 
enhancements to the data vortex.   
 In this thesis research simulations are used to compare different express lane 
enhancements to the unmodified data vortex.  For each case different heights, angles, 
loads and locality traffic will be examined. 
 The Enhancement command changes the node linking process in the simulator to 
construct a modified data vortex.  Different criteria must also be met for packets in a 
node on the angle of the express lane are allowed to advance to the next cylinder by the 
data vortex, the specifics to each modification will be covered in chapter 4. 
 
A list of total script inputs and output data follows: 
• Input Script 
1. Height 
2. Angles 
3. % Load 
4. % Locality 







3. % Load 
4. % Locality 
5. Attempted injections 
6. Successful injections 


















CHAPTER 4: TOPOLOGY ENHANCEMENT STUDY 
 
 The physical topology of the Data Vortex can modified to improve performance.   
To determine under what conditions improvement occurs and the extent of improvement 
a series of simulations is run.  The data vortex simulator will be altered to include an 
express lane on one angle.  Three different express lane modifications are discussed in 
this chapter.  All three will provide a different extent of bypass for packets at the correct 
height and angle for output but not in the output cylinder.  The express lanes primarily 
purpose is to exploit locality to decrease latency and improve network packet acceptance 
for packets offered.  Indirect networks are classically locality insensitive.  The 
unmodified Data Vortex, while an indirect network, is somewhat locality sensitive.   This 
sensitivity can be exploited by express lane modifications. 
 
 
4.1 Express Lane Modification 
 When the data vortex was invented and patented in [5], a certain link arrangement 
between each node was specified: 
For each 2x2 node, N(a,c,h), at angle (a) at cylinder (c) and height (h) has one 
output connected to a node within the same cylinder and one connected to a 
node in the within the next (c-1) cylinder except in the case of the innermost 
cylinder N(a,C-1,h) where the output is located. 
The inner cylinder output node is N(a+1 mod A, c-1, h). 
The same cylinder output  node is N(a+1 mod A, c, T[h]) 
 24 
 
T[h] is defined as a transformation of the height address, h, as in the 
pseudocode[16] that follows: 
bitmask = H/(2^(c+1));           //H = total height size; c = current cylinder 
   //initialize bitmask 
if (c == (C - 1))                //means node is in innermost cylinder 
{ 
     T[h] = h;                //outputs are of same height 
 } 
 else if ((h AND bitmask) == 0)     //first bit is zero - just flip the one bit 
 { 





 else  
{ 
  T[h] = h;                //init to h for transformation 
  do {                     //loop 
   T[h] = T[h] XOR bitmask;      // flip a bit 
   bitmask = bitmask / 2;        //move to next less significant bit 
     } while ((h & (2*bitmask)) != 0); //stop when a zero is reached 
 
The outermost cylinder has input nodes from the input buffers. 
 
As seen below in figure 12, the inner cylinder links resemble those of a butterfly at least 
in its out cylinders.  This arrangement allows any input to reach any output. 
 
Figure 12. A Data Vortex with five angles, height H=8 and angles A=5, cylinders are 
defined by height. [16] 
 
 The express lane modification as seen in figure 13 modifies one angle to provide a 
direct link from the outmost (input) cylinder to the innermost (output) cylinder to bypass 
all other cylinders. The angle before the express lane must be modified as well.  This is to 
keep all nodes as simple 2x2 switches and keep standard routing intact.  Except in I/O 
 25 
 
links all same cylinder links now bypass the express lane, and all next cylinder links will 
bypass them as well. 
 
Figure 13. The same Data Vortex as Figure 12 but with an express angle (angle 1 in 
red).  Inner cylinder links from express angles stay at the same height. 
 
Packets that do use them will only be one hop from exit, quickly removing them 
from the Data Vortex decreasing congestion within the vortex. 
 Express lanes only forward under exact circumstances, correct height and angle, 
rather then the standard forwarding of normal angles.  Because of this express lanes will 
not alter the height of any packets, since a packet could be at the correct standard height 
for forwarding at its level but not the exact height or angle for express lane use. 
 In terms of resources the express lane modification reduces nodes by removing all 




4.1.2 Enhancement Study Parameters and Method 
 To study the performance of express lane enhancements a series of data vortex 
configurations are simulated.  The configurations can be seen bellow in table 1. 
 
 
Table 1. Data Vortex parameters for express lane performance study. (H = Height, A = Totals angles) 
H A Workload Locality 
4 - 4096 3-9 40% - 80% Random - 80% 
 
 
 All systems are simulated using a custom data vortex simulator written in C++ by 
Dr. Cory Hawkins for the performance analysis of the data vortex [16].  The simulator is 
modified to support express lane enhanced data vortexes.  The entire network is 
simulated, with packets injected in the first 40,000 time slots followed by 1,000 non-
injection time slots to clear the network of all data.  It is assumed that all packets are one 
cycle in length (they only occupy one node per cycle) and each message consists of one 
packet.  Also assumed is each link has the same physical latency defined as one hop.  All 
angles can be injected upon per cycle.  For each angle once per cycle the likelihood of 
injection is simulated by the workload percentage (For example, a 60% workload has a 
60% chance of injection for each cycle). All traffic will be random except in the case of 
locality testing.  A variable locality factor has been added to test the impact of express 
lanes in locality exhibiting environments. When a locality percentage is expressed it 
refers to the probability that the message generated will have a destination at the most 
local node.  With express lane angles this is the output of the same height and angle as 
the input, directly connected by the express lane. 
 Metrics for comparison are total packets injected for packets offered and average 
 27 
 
latency once injected.  Latency is measured in hops from the cycle of packet injection to 
packet exit from the data vortex. 
 
4.1.3 Performance with random traffic 
 
 The express lanes additions are primarily to exploit locality but the system should 
still perform adequately with random traffic so it can be used with a wide variety of 
workloads.  Before investigating with locality, all tests will be done with purely random 
traffic.   
 The express lane modification effects differ depending on angles, height and load 
as figure 14 and table 2 indicate.  Low angle vortexes show significant penalties in traffic 
acceptance and slighter penalties in latency are seen.  The express lane will only forward 
packets with exact height and angle destinations. In this heavily loaded with all angles 
injecting vortex the packets sent on the same cylinder link will often cause the rejection 
of an incoming packet   As express lanes have strict rules for allowing advancement to 
the next cylinder more packets are forced to stay in the same cylinder then in the 
unmodified version, in the outermost cylinder this can block messages from input and 
lower packets acceptance.  The packets not able to advance also have to take at least one 
additional hop to advance increasing average latency. Approaching six angles the packet 
injection penalty diminishes and latency reduction is seen.  Quickly removing packets 
reduces overall congestion within the data vortex allowing more packets to be injected 
overall.  This benefit is increased with additional angles until the penalties of the express 
lane are negated. More angles further reduce the accepted traffic penalty until at nine 
angles it is reduced to near zero.  Latency reduction is still present but diminished 
 28 
 
slightly. Latency benefits are seen due to the one hop ability of the express lane.  Higher 
angle vortexes have more angles forwarding under normal conditions, one cylinder at a 
time.  This reduces the total average benefit from the data vortex seen. To maintain an 
acceptable balance with the unmodified data vortex in a random traffic environment a 
greater number of angles must be present, enough to bring packet acceptance reduction to 
a suitable level while maintaining enough of a benefit in latency.   




























Unmodified 0.4 Load Express Lane 0.4 Load Unmodified 0.6 Load


































Unmodified 0.6 Load Express Lane 0.6 Load
 
Figure 14. Comparison of a) accepted traffic and b) average latency between an 











Table 2. Accepted traffic and average latency comparisons between the unmodified 




Difference) (% Difference) 
(Average 
hops 
Difference) (% Difference) 
(3,0.4) -14.88% -27.6% 0.6552 3.0% 
(3,0.6) -10.09% -27.8% 0.2847 1.2% 
(3,0.8) -7.59% -27.8% 0.1497 0.6% 
(6,0.4) -2.24% -5.7% -1.5298 -4.5% 
(6,0.6) -1.44% -5.5% -1.541 -4.5% 
(6,0.8) -1.13% -5.8% -1.7964 -5.1% 
(9,0.6) -0.01% -0.0% -1.7215 -3.8% 
(9,0.6) -0.03% -0.1% -1.7729 -3.8% 
(9,0.8) -0.03% -0.1% -1.7929 -3.8% 
 
Larger height values see a small degradation in accepted packet rates and latency 
as seen in figure 15 and table 3.  The more height a data vortex has the less of a chance 































Unmodified 0.4 Load Express Lane 0.4 Load Unmodified 0.6 Load


































Unmodified 0.6 Load Express Lane 0.6 Load
 
Figure 15. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express lane enhanced data vortex for an angle 











Table 3. Accepted traffic and average latency comparisons between the unmodified 




Difference) (% Increase) 
(Average hops 
Difference) (% Increase) 
256 -1.44% -5.53% -1.541 -4.50% 
1024 -1.57% -6.08% -1.6692 -4.13% 




4.1.4: Performance with traffic exhibiting locality 
 
 The types of locality of interest in this study are spatial locality of data reference 
and network locality.  Spatial locality is the observation that the likelihood of referencing 
a memory location by a program is higher if a memory location near it was just 
referenced.  An application with strong network locality will communicate with its 
nearest neighbors more often then distant neighbors. An example of an application that 
uses both is the Ocean program from the SPLASH benchmark suite [43], as well as 
programs that model particle dynamics and force interactions.   
Distributed computers often display a level of network locality in which 
processors often communicate with their neighbors.  The standard Data Vortex is 
somewhat sensitive to locality.  Outputs can be more local to inputs if no height changes 
are required to reach them from the input as there is a direct advancement through the 
cylinders to the inner cylinder unless a packet is deflected.  The express link modification 
can exploit this locality by allowing direct access to the output cylinder with a greatly 
reduced chance of the packet being deflected (only deflections at the output node for the 
express lane).    
Now significant improvements are seen as depicted in figure 16 and table 4. Not 
 33 
 
surprisingly having more packets being able to meet the strict forwarding rules of the 
express lane clears them quickly lower average latency and clears congestion throughout 
the vortex raising packet acceptance. Once again a low angle system shows very heavy 
reductions in packet acceptance rates but not as bad as a random traffic express lane and 
latency is significantly reduced.  Increasing angles will reduce the effect of the express 
lane sending one hop packets to their destination but enough angles are needed to ensure 







































Express Lane 0% Locality Express Lane 40% Locality






























Express Lane 0% Locality Express Lane 40% Locality
Express Lane 60% Locality Express Lane 80% Locality
Unmodified
 
Figure 16. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express lane enhanced data vortex under conditions 
















Table 4. Accepted traffic and average latency comparisons between the unmodified 













(3,0.4) -6.70% -18.47% -2.2574 -9.69% 
(3,0.6) -4.81% -13.25% -3.4377 -14.75% 
(3,0.8) -2.78% -7.66% -4.5560 -19.55% 
(6,0.4) -0.30% -1.14% -2.7715 -8.10% 
(6,0.6) 0.38% 1.46% -3.4014 -9.94% 
(6,0.8) 1.08% 4.16% -4.1820 -12.22% 
(9,0.4) 0.59% 3.04% -2.6314 -5.69% 
(9,0.6) 0.85% 4.41% -3.7363 -8.09% 
(9,0.8) 1.17% 6.08% -4.7951 -10.38% 
 
 
The effects of changing height differ based on locality as seen in figure 17 and 
table 5. High locality (0.8) shows further reductions in latency for increases in height.  
Lower locality values have slight reductions in latency improvements.  Greater height 
increases cylinders allowing greater bypass when the express lane is used.  However 
unless very high locality is present the reduced probability of a packet being at the exact 
height for express lane use dominates reducing performance.  All levels of locality see 

































Express Lane 0% Locality Express Lane 40% Locality






























Express Lane 0% Locality Express Lane 40% Locality
Express Lane 60% Locality Express Lane 80% Locality
Unmodified
 
Figure 17. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express lane enhanced data vortex under conditions 












Table 5. Accepted traffic and average latency comparisons between the unmodified 
data vortex and express lane data vortex under conditions of locality for an angle 













(256,0.4) -0.30% -1.14% -2.7715 -8.10% 
(256,0.6) 0.38% 1.46% -3.4014 -9.94% 
(256,0.8) 1.08% 4.16% -4.182 -12.22% 
(1024,0.4) -0.43% -1.68% -3.0441 -7.52% 
(1024,0.6) 0.20% 0.77% -3.7902 -9.37% 
(1024,0.8) 0.84% 3.24% -5.7515 -14.22% 
(4096, 0.4) -0.50% -1.96% -2.6179 -5.67% 
(4096, 0.6) 0.05% 0.21% -3.3891 -7.35% 





4.2 Semi-Express Lane Modification 
 The second modification presented is a semi-express lane seen below in figure 18.  
This differs from the true express lane in rather then just a direct link from the outermost 
cylinder to the innermost cylinder, nodes now exist on every cylinder.  Links will only 
advance one cylinder at a time and be subject to possible deflected at each cylinder.  
However, this also gives multiple opportunities to use the express lane for a direct path to 
the output for packets that find themselves at the correct height and angle.  Changes are 
also made on the angle immediately preceding the semi-express lane angle.  Same 
cylinder outputting links now link to express lane angle.  Like in the true express lane, 
next angle links will bypass the semi-express lane. In terms of resources the semi-express 




Figure 18. The Semi-Express lane modification (angle in red) to the data vortex. 
 
 The Semi-express lane will be simulated under the same configurations as the true 
express lane as seen in table 1. 
 
4.2.1 Performance with random traffic 
 The semi-express, like the true express lane shows poor performance in a low 
angle vortex. This is for the most of the same reasons as the true data vortex. The low 
angle count will cause packet rejections from the strict forwarding rules in the semi-
express lane to dominate the packet acceptance and latency as seen in figure 19 and table 
6.  At five or six angles packet acceptance rises dramatically and reductions in latency are 
seen.  The multiple entrances to the semi-express lane allow congestion to clear in 
multiple cylinders of the express lane by forwarding them down at least part of the semi-
express lane.  Latency reductions are not as large as the true express lane as additional 
 39 
 
hops in the semi-express lane reduce its impact.  As angles increase packet acceptance 

























Unmodified 0.4 Load Semi-Express Lane 0.4 Load
Unmodified 0.6 Load Semi-Express Lane 0.6 Load


































Unmodified 0.6 Load Semi-Express Lane 0.6 Load
 
Figure 19. Comparison of a) accepted traffic and b) average latency between an 











Table 6. Accepted traffic and average latency comparisons between the unmodified 




Difference) (% Increase) 
(Average hops 
Difference) (% Increase) 
(3,0.4) -12.37% -23.02% 2.8684 13.20% 
(3,0.6) -8.16% -22.50% 2.683 11.51% 
(3,0.8) -6.08% -22.28% 2.6342 10.98% 
(6,0.4) 4.26% 10.93% -1.4543 -4.41% 
(6,0.6) 2.91% 11.21% -1.081 -3.16% 
(6,0.8) 2.17% 11.13% -1.0896 -3.12% 
(9,0.4) 5.63% 19.54% -0.266 -0.59% 
(9,0.6) 3.74% 19.43% -0.2303 -0.50% 
(9,0.8) 2.82% 19.52% 0.067 0.14% 
 
Increasing height slightly improves packet acceptance but has a negative effect on 
latency as seen in figure 20 and table 7.  Additional height provides more cylinders 
increasing chances for express lane use.  However the total length (in hops) of the express 

































Unmodified 0.4 Load Semi-Express Lane 0.4 Load
Unmodified 0.6 Load Semi-Express Lane 0.6 Load


































Unmodified 0.6 Load Semi-Express Lane 0.6 Load
 
Figure 20. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an semi-express lane enhanced data vortex for an angle 
count of 6. 
 
 
Table 7. Accepted traffic and average latency comparisons between the unmodified 




Difference) (% Increase) 
(Average hops 
Difference) (% Increase) 
256 2.91% 11.21% -1.081 -3.16% 
1024 3.02% 11.70% -0.2173 -0.54% 




4.2.2 Performance with traffic exhibiting locality 
 
 As seen in figure 21 and table 8 the semi-express lane shows no improvements in 
a locality environment over a random environment.  The increased length and possibility 
of deflection along the semi-express lane negates any advantage that locality could 
provide, making the semi-express lane no better then standard links in locality terms.  In 
fact increasing locality increases average latency, indicating that few packets can 
transverse the entire semi-express lane.  Many packets are forced to leave the semi-
express lane and route normally.  While traffic congestion is lowered increasing packet 
acceptance the extra hops forced on all packets not able to advance on the semi-express 


































Semi-Express Lane 0% Locality Semi-Express Lane 40% Locality






























Semi-Express Lane 0% Locality Semi-Express Lane 40% Locality
Semi-Express Lane 60% Locality Semi-Express Lane 80% Locality
Unmodified
 
Figure 21. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an semi-express lane enhanced data vortex under 














Table 8. Accepted traffic and average latency comparisons between the unmodified 
data vortex and semi-express lane data vortex under conditions of locality for a 













(3,0.4) -8.25% -22.74% 4.2212 18.11% 
(3,0.6) -8.46% -23.33% 5.0333 21.60% 
(3,0.8) -8.74% -24.08% 5.8461 25.08% 
(6,0.4) 2.80% 10.76% -0.4091 -1.20% 
(6,0.6) 2.78% 10.68% -0.1042 -0.30% 
(6,0.8) 2.73% 10.51% 0.1535 0.45% 
(9,0.4) 3.77% 19.57% 0.3157 0.68% 
(9,0.6) 3.79% 19.70% 0.5062 1.10% 




The extra cylinders (from higher height) do not help the semi-express lane as seen in 
figure 22 and table 9.  True-express lanes will bypass these extra cylinders, semi-express 


































Semi-Express Lane 0% Locality Semi-Express Lane 40% Locality































Semi-Express Lane 0% Locality Semi-Express Lane 40% Locality
Semi-Express Lane 60% Locality Semi-Express Lane 80% Locality
Unmodified
 
Figure 22. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express lane enhanced data vortex under conditions 











Table 9. Accepted traffic and average latency comparisons between the unmodified 
data vortex and semi-express lane data vortex under conditions of locality for an 













(256,0.4) -0.30% -1.14% -2.7715 -8.10% 
(256,0.6) 0.38% 1.46% -3.4014 -9.94% 
(256,0.8) 1.08% 4.16% -4.182 -12.22% 
(1024,0.4) -0.43% -1.68% -3.0441 -7.52% 
(1024,0.6) 0.20% 0.77% -3.7902 -9.37% 
(1024,0.8) 0.84% 3.24% -5.7515 -14.22% 
(4096, 0.4) -0.50% -1.96% -2.6179 -5.67% 
(4096, 0.6) 0.05% 0.21% -3.3891 -7.35% 




4.3 Express Output Addition 
 
 The third modification presented is the Express output lane seen below in Figure 
23. The structure of the modification is very similar to the Semi-express lanes, but rather 
then forwarding packets at the right height and angle for output to the next cylinder it 
immediately outputs them to the output buffer.  However the cost of I/O to non I/O links 
is considerable.   A purely-routing node currently costs only about 1/10th of the price of 
an I/O node when utilizing SOAs, due to the expensive modulators (about $1000 each) 
necessary for each input wavelength input and the expensive optical receivers (at about 
$2000 per wavelength) necessary for output versus the relatively inexpensive SOAs 
(about $1000 each) for switching.[12,16,44] Also at there multiple outputs to the same 
destination there may be an increased cost in extra hard ware and a slight addition in 
latency for buffering. However changes in technology may allow for cheaper I/O nodes 




Figure 23.  A data vortex with six angles and a height of eight, angle one (in red) has 
express outputs. 
 
4.3.1 Performance with random (no locality traffic) 
 
 Once again poor performance is seen in low angle vortexes.  When angles 
increase large improvements are seen in both accepted packets and latency as seen in 
Figure 24 and table 10.  Like in both previous modifications improvements are seen in 
packet acceptance as the angles increase but latency increases.  Vortexes with larger 
heights show slightly greater packet acceptance but decreasing latency improvements.  
Express outputs allow the benefits of both express lanes and semi-express lanes. They 
allow immediate egress of packets in all cylinders decreasing congestion which in turn 
increases packet acceptance.  Lower latency is seen from the number of packets that 



























Unmodified 0.4 Load Express Output Lane 0.4 Load
Unmodified 0.6 Load Express Output Lane 0.6 Load


































Unmodified 0.6 Load Express Output Lane 0.6 Load
 
 
Figure 24. Comparison of a) accepted traffic and b) average latency between an 














Table 10. Accepted traffic and average latency comparisons between the unmodified 




Difference) (% Increase) 
(Average hops 
Difference) (% Increase) 
(3,0.4) -11.69% -21.75% 1.5718 7.23% 
(3,0.6) -7.60% -20.95% 1.3103 5.62% 
(3,0.8) -5.61% -20.55% 1.2251 5.11% 
(6,0.4) 5.81% 14.91% -3.3161 -10.05% 
(6,0.6) 4.00% 15.39% -2.8833 -8.43% 
(6,0.8) 3.00% 15.38% -2.8712 -8.23% 
(9,0.4) 6.77% 23.47% -1.7022 -3.78% 
(9,0.6) 4.48% 23.30% -1.5884 -3.44% 
(9,0.8) 3.38% 23.45% -1.5627 -3.35% 
 
 As seen in figure 25 and table 11 minimal increases are seen in packet acceptance 
and larger increases in average latency.  As seen before, increasing height decreases the 
odds that a packet will be at the exact height when at an express output.  Extra cylinders 
are needed for greater height, meaning more express outputs are added. Clearing at 
multiple cylinders decreases overall congestion and increases packet acceptance.  
































Unmodified 0.4 Load Express Output Lane 0.4 Load
Unmodified 0.6 Load Express Output Lane 0.6 Load


































Unmodified 0.6 Load Express Output Lane 0.6 Load
 
 
Figure 25. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express lane enhanced data vortex for an angle 















Table 11. Accepted traffic and average latency comparisons between the unmodified 




Difference) (% Increase) 
(Average 
hops 
Difference) (% Increase) 
256 4.00% 15.39% -2.8833 -8.43% 
1024 4.03% 15.63% -2.3875 -5.90% 




4.3.2 Performance with traffic exhibiting locality 
 
 Express outputs share express lanes sensitivity to locality as seen in figure 26 and  
table 12.  Under very strong locality (0.8) it even shows improvements in both packet 
acceptance and latency as low as three angles.  Increasing angles increases packet 
































Express Output Lane 0% Locality Express Output Lane 40% Locality






























Express Output Lane 0% Locality Express Output Lane 40% Locality
Express Output Lane 60% Locality Express Output Lane 80% Locality
Unmodified
 
Figure 26. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express output enhanced data vortex under 
















Table 12. Accepted traffic and average latency comparisons between the unmodified 













(3,0.4) -3.10% -8.54% -1.4249 -6.11% 
(3,0.6) -1.01% -2.79% -2.3041 -9.89% 
(3,0.8) 0.85% 2.34% -2.8865 -12.38% 
(6,0.4) 5.62% 21.64% -4.0669 -11.89% 
(6,0.6) 6.40% 24.62% -4.5664 -13.35% 
(6,0.8) 7.16% 27.56% -5.0218 -14.68% 
(9,0.4) 5.38% 27.94% -3.1729 -6.87% 
(9,0.6) 5.82% 30.25% -3.8904 -8.42% 
(9,0.8) 6.20% 32.19% -5.4281 -11.75% 
 
 
 Increasing height has a negligible effect on traffic acceptance but latency 
decreases as seen in table 13 and figure 27.  The combination of effects seen in express 
lanes and semi-express lanes allow for benefits in both packet acceptance and average 




































Express Output Lane 0% Locality Express Output Lane 40% Locality






























Express Output Lane 0% Locality Express Output Lane 40% Locality
Express Output Lane 60% Locality Express Output Lane 80% Locality
Unmodified
 
Figure 27. Comparison of a) accepted traffic and b) average latency between an 
unmodified data vortex and an express output enhanced data vortex under 












Table 13. Accepted traffic and average latency comparisons between the unmodified 
data vortex and express output data vortex under conditions of locality for an angle 












(256,0.4) 5.62% 21.64% -4.0669 -11.89% 
(256,0.6) 6.40% 24.62% -4.5664 -13.35% 
(256,0.8) 7.16% 27.56% -5.0218 -14.68% 
(1024,0.4) 5.58% 21.66% -3.6774 -9.09% 
(1024,0.6) 6.34% 24.59% -4.2251 -10.44% 
(1024,0.8) 7.06% 27.41% -4.7266 -11.68% 
(4096, 0.4) 5.56% 21.69% -2.814 -6.10% 
(4096, 0.6) 6.29% 24.54% -3.4148 -7.40% 



















CHAPTER 5: SUMMARY AND FUTURE WORK 
 
 
 To achieve higher performance in supercomputing processor and memory count 
are increasing.  The interconnection network is a critical factor in this, as new systems 
demand high bandwidth and low latency.  An optical network will allow massive 
bandwidth but current technology lacks optical buffering.  Without optical buffering 
optical electrical conversion must be included to resolve contention, a process expensive 
in both latency and hardware costs.  However the Data Vortex network circumvents the 
need for standard buffering by use of an all-optical path employing deflection routing to 
provide non-blocking communications and virtual buffering. In previous research [16] it 
was shown to be viable option for large scale supercomputer interconnection network 
implementation.  In this thesis research enhancements in the form of three different 
express lanes to improve message acceptance and reduce latency were introduced.   They 
are shown to improve latency and message acceptance under most circumstances when 
compared to the original data vortex.  Two enhancements, the express lane and express 
output allowed the Data Vortex to exploit its sensitivity to locality for further reductions 
in latency and message acceptance. Another enhancement, the semi-express link was 
shown to have powerful effects on message acceptance rates and reduce latency without 
the benefits of locality. 
 A few interesting items of future research can be considered from the researched 
performed.  This research concentrated on a proof of concept with the express lane 
 57 
 
enhancements and measuring the extent of improvement in extremely traffic heavy 
environment.  Future work could be focused on designing with the results.  Placement of 
multiple express lanes (interesting results could be had from mixing express lanes and 
semi-express lanes in a data vortex) spaced out in a data vortex with sufficient angles 
could produce useful results.  Adding buffering angles (as seen in previous research [16]) 
will allow up to 100% message acceptance, message acceptance increasing express lanes 
should be used to reduce the number of buffering angles needed.  Using another concept 
from the previous research would be the use of express lanes in clustered vortexes.  
Clustering allows for the exploitation of some locality, express lanes used in conjunction 

















1. Qimin Yang, Keren Bergman, Gary D. Hughes, and Frederick G. Johnson, 
“WDM Packet Routing for High-Capacity Data Networks,” Journal of Lightwave 
Technology, vol. 19, num. 10, pp. 1420-26, Oct. 2001. 
2. A. Shacham, B.A. Small, O. Liboiron-Ladouceur, K. Bergman, "A Fully 
Implemented 12x12 Data Vortex Optical Packet Switching Interconnection 
Network," Journal of Lightwave Technology, vol. 23, no. 10, pp. 3066-3075, Oct 
2005.  
3. Reed, Coke S., “Multiple level minimum logic network,” U.S. Patent 5,996,020, 
Nov. 30, 1999. 
 
4. Qimin Yang and Keren Bergman, “Performances of the Data Vortex Switch 
Architecture under Nonuniform and Bursty Traffic,” Journal of Lightwave 
Technology, vol. 20, num. 8, pp. 1242-47, Aug 2002. 
5. Reed, Coke S., “Multiple level minimum logic network,” U.S. Patent 6,272,141, 
Aug. 7, 2001. 
6. Qimin Yang and Keren Bergman, “Traffic Control and WDM Routing in the Data 
Vortex Packet Switch,” IEEE Photonics Technologies Letters, vol. 14, num. 2, pp. 
236-38, Feb 2002. 
7. B.A. Small, J.N. Kutz, W. Lu, K. Bergman, "Characterizing and Simulating the 
Performance of the Physical Layer of Data Vortex Switching Nodes," LEOS 
2003, MF5, pp. 59-60, Oct 2003. 
8. W. Lu, K. Bergman, Q. Yang, "WDM Routing with Low Cross-Talk in the Data 
Vortex Packet Switching Fabric," OFC 2003, vol. 2, FS4, pp. 795-97, Mar 2003. 
9. Macias, M.I.; Turkiewicz, J.P.; Vegas Olmos, J.J.; Koonen, A.M.J.; Tafur 
Monroy, I., “High-throughput, self-routing, optical switch for photonic slot 
routing,” Proceedings London Communications Symposium 2003, 8-9 September 
2003, ISBN 0-0538863-2-6; Communications Engineering Doctorate Centre, 
University College London, pp. 249-53, ECO-3, 2003. 
10. Macias, M.I.; Turkiewicz, J.P.; Vegas Olmos, J.J.; Koonen, A.M.J.; Tafur 
Monroy, I., “A Novel Data Vortex Switch for Photonic Slot Routing,” 
Proceedings European Conference on Optical Communication 2003, 21-25 
September 2003, Rimini, Italy, Tu1. 4.2., ECO-3, 2003. 
11. W. Lu, B.A. Small, K. Bergman, L.Leng, "Ultra-high Capacity WDM Optical 
Packet Routing through an 8-Node Data Vortex Sub-network," OFC 2004, MF94 
(poster), pp. 281-83, Mar 2004. 
 59 
 
12. W. Lu, O. Liboiron-Ladouceur, B.A. Small, K. Bergman, "Cascading switching 
nodes in data vortex optical packet interconnection network," Electron. Lett., vol. 
40, num. 14, pp. 895-96, 8 Jul 2004. 
13. W. Lu, B.A. Small, J.P. Mack, L. Leng, K. Bergman, "Optical Packet Routing and 
Virtual Buffering in an Eight-Node Data Vortex Switching Fabric," IEEE 
Photonics Technol. Lett., vol. 16, num. 8, pp. 1981-83, Aug 2004. 
14. Cory Hawkins, B.A. Small, D.S. Wills, K. Bergman, "The Data Vortex, an All 
Optical Path Multicomputer Interconnection Network," IEEE Transactions on 
Parallel and Distributed Systems (TPDS), submitted in Feb. 2005 for review, 
accepted April 2006 for publication 
 
15. Cory Hawkins and D.S. Wills, "Impact of Number of Angles on the Performance 
of the Data Vortex Optical Interconnection Network," IEEE/OSA Journal of 
Lightwave Technology (JLT), vol. 24, no. 9, Sept. 2006. 
16. Cory Hawkins,  “Performance Anaysis, Comparisons, and Proposed 
modifications to the Data Vortex Photonic All-Optical Path Interconnection 
Network for Next-Generation Supercomputers” , Doctoral Dissertation, May 2007 
 
17. Kim Hongkyu, “Architectural enhancements for efficient operand transport in 
multimedia systems”, Doctoral dissertation, May 2007 
 
18. William J. Dally, “Performance Analysis of k-ary n-cube Interconnection 
Networks”, IEEE \transaction networks Vol. 39, No. 6, June 1990 
 
19. William J. Dally, “Express Cubes:Improving the Performance of k-ary n-cube 
Interconnection Networks”, IEEE transaction networks Vol. 40, No. 9, September 
1991 
 
20. Loucif, S., Mackenzie, L.M., Ould-Khaoua, R., ‘The “express channel” concept in 
hypermeshes and k-ary n-cubes”, Parallel and Distributed Processing, 1996. 
Eighth IEEE Symposium on, pp 566 – 569, 23-26 Oct. 1996  
 
21. Ould-Khaoua, M., Sotudeh, R., ”Communication locality in hypermeshes and 
tori”, algorithms and Architectures for Parallel Processing, 1996. ICAPP '96. 
1996 IEEE Second International Conference on, pp 256-262, 11-13 June 
 
22. B.A. Small, A. Shacham, K. Bergman, K. Athikulwongse, C. Hawkins, D.S. 
Wills, "Emulation of Realistic Network Traffic Patterns on an Eight-Node Data 
Vortex Interconnection Network Subsystem," Journal of Optical Networking, vol. 
3, num. 11, pp. 802-09, Nov 2004. 
 
23. TOP500.org, “lists for November 2007”, http://www.top500.org/lists/2007/11 
 
24. K.C.Kao and G.A.Hockham, "Dielectric-Fiber Surface Waveguides for Optical 
Frequencies," Proceedings of the Institution of Electrical Engineers, vol.133, 
 60 
 
pp.1151-1158, July 1966. 
 
25.  T.H.Maiman, "Stimulated Optical Radiation in Ruby," Nature, vol.187, pp.493-
494, August 1960. 
 
26. Libatique,N.J.C.; Jain,R.K, “Large channel count (~60) wavelength-selectable 1.5 
µm laser for 50 GHz WDM applications,” IEEE Lasers and Electro-Optics 
Society 2000 Annual Meeting, LEOS 2000, vol. 2, pp. 403 – 04, Nov. 13-16, 
2000.  
 
27. Rigby, Pauline, “Essex Claims 4000-Channel DWDM,” Light Reading Online, 
http://www.lightreading.com, December 5, 2000.  
 
28. P. Baran, “On Distributed Communications Networks,” IEEE Transactions on 
Communications Systems, pp. 1-9, March 1964.  
 
29. Acampora, A.S. and Shah, S.I.A., “Multihop lightwave networks: a comparison of 
store-and-forward and hot-potato routing,” Proceedings of the Joint Conference of 
the IEEE Computer and Communications Societies (INFOCOM) '91, vol. 1, pp. 
10-19, April 7-11, 1991. 
 
30. Burton J. Smith, “Redressing the Balance,” technical presentation, Cray Inc., 
available online: http://www.lanl.gov/orgs/ccs/salishan02/burton.ppt A. S. 
Acampora, “A Multichannel Multihop Local Lightwave Network,” Proceedings 
of the Global Telecommunications Conference (GLOBECOM) ’87, pp. 1459-
1467, November 1987. 
 
31. A. S. Acampora, M.J Karol, and M.G.Hluchyj, “Terabit Lightwave Networks: 
The Multihop Approach,” AT&T Technical Journal, vol. 66, no. 6, pp. 21-34, 
November/December 1987. 
32. D. K. Pradhan and S. M. Reddy, “A fault-tolerant communication architecture for 
distributed systems,” IEEE Transactions on Computers, vol. C-31, pp. 863-870, 
Sept. 1982. 
33. Samatham, M.R. and Pradhan, D.K., “The de Bruijn multiprocessor network: a 
versatile parallel processing and sorting network for VLSI,” IEEE Transactions 
on Computers, vol. 38, no. 4, pp. 567–581, Apr. 1989. 
34. N. F. Maxemchuk, “The Manhattan Street Network,” Proceedings of IEEE Global 
Telecommunications Conference (GLOBECOM) ’85, New Orleans, LA, pp. 255-
261, Dec. 1985. 
35. Kodi, A.K. and Louri, A, “RAPID: reconfigurable and scalable all-photonic 
interconnect for distributed shared memory multiprocessors,” IEEE/OSA Journal 
of Lightwave Technology, vol. 22, no. 9, pp. 2101-2110, Sept. 2004. 
 61 
 
36. A. S. Acampora, “A Multichannel Multihop Local Lightwave Network,” 
Proceedings of the Global Telecommunications Conference (GLOBECOM) ’87, 
pp. 1459-1467, November 1987. 
37. H.S. Stone, “Parallel processing with the perfect shuffle,” IEEE Transactions on 
Computers, vol. 20, no. 6, pp. 57-65, June 1975. 
38. Louri, A. and Hongki Sung, “A hypercube-based optical interconnection network: 
a solution to the scalability requirements for massively parallel computers,” 
Proceedings of the First International Workshop on Massively Parallel Processing 
Using Optical Interconnections (MPPOI) ‘94, pp. 81–93, April 26-27, 1994. 
39. Hayes, J.P. and Mudge, T., “Hypercube supercomputers,” Proceedings of the 
IEEE, vol. 77, no. 12, pp. 1829-1841, Dec. 1989. 
40. A.K. Kodi and A. Louri, “A scalable architecture for distributed shared memory 
multiprocessors using optical interconnects,” 18th International Parallel and 
Distributed Processing Symposium 2004, pp. 11-21, Apr. 26-30, 2004. 
41. Murphy, E.J.; Kemmerer, C.T.; Moser, D.T.; Serbin, M.R.; Watson, J.E.; and 
Stoddard, P.L., “Uniform 8×8 lithium niobate switch arrays,” IEEE/OSA Journal 
of Lightwave Technology, vol. 13, no. 5, pp. 967-970, May 1995. 
42. B.A. Small and K. Bergman, "Slot Timing Considerations in Optical Packet 
Switching Networks," IEEE Photon. Technol. Lett. 17 (11) 2478-2480 (Nov 
2005). 
43. J. P. Singh, W. Weber, and A. Gupta, “SPLASH: Stanford Parallel Applications 
for Shared Memory,” Technical Report, Computer Systems Laboratory, Stanford 
University, 1991. 
44. B.A. Small, O. Liboiron-Ladouceur, A. Shacham, J.P. Mack, and K. Bergman, " 
Demonstration of a Complete 12-Port Terabit Capacity Optical Packet Switching 
Fabric," OFC 2005, OWK1, Mar. 2005. 
 
