An Efficient and Low Density Crossbar Switch Design for NoC by SATYANARAYANA, DONGA  PURNA & SRIVIDYA, S.
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9348 
An Efficient and Low Density Crossbar Switch Design for NoC 
DONGA  PURNA SATYANARAYANA 
M.Tech Scholar, Department of Electronics and 
Communications Engineering, Chaitanya Institute of 
Science & Technology, India. 
 
S.SRIVIDYA 
Assistant Professor, Electronics & Communications 
Engineering, Chaitanya Institute of Science & 
Technology, India. 
Abstract: Code Division Multiple Access (CDMA) is a sort of multiplexing that facilitates various signals 
to occupy a single transmission channel. In this medium, sharing is enabled in the code space by assigning 
a limited number of N-chip length orthogonal spreading codes to the processing elements sharing 
interconnect. Serial and parallel overloaded CDMA interconnect (OCI) architecture variants are 
presented to adhere to different area, delay, and power requirements. Compared with the conventional 
CDMA crossbar, on a  Xilinx  Artix-7  AC701  FPGA  kit,  the  serial  OCI crossbar achieves 100% 
higher bandwidth, 31% less resource utilization, and 45% power saving, while the parallel OCI crossbar 
achieves N times higher  bandwidth  compared with the serial OCI crossbar at the expense of increased 
area  and power consumption. A 65-node OCI-based star NoC is implemented, evaluated, and compared 
with an equivalent space division multiple access based torus NoC for various synthetic traffic patterns. 
The evaluation results in terms of the resource utilization and throughput highlight the OCI as a 
promising technology to implement the physical layer of NoC routers. 
Terms: Overloaded CDMA Interconnect (OCI); Network on Chip (NoC); 
I. Introduction 
A network on a chip or Network-on-Chip (NoC) is a 
network-based communications subsystem on an 
integrated circuit ("microchip"), most typically 
between modules in a system on a chip (SoC). NoC 
technology applies the theory and methods of 
computer networking to on-chip communication and 
brings notable improvements over conventional bus 
and crossbar communication architectures. A crossbar 
switch is a shared communication medium adopting a 
multiple access technique to enable physical packet 
exchange. The main resource sharing techniques 
adopted by existing NoC crossbars are time-division 
multiple access (TDMA), where the physical link is 
time shared between the interconnected PEs, and 
space-division multiple access (SDMA), where a 
dedicated link is established between every pair of 
interconnected PEs. The physical layer of a NoC 
router also contains buffering and storage devices. 
Code-division multiple access (CDMA) is another 
medium sharing technique that leverages the code 
space to enable simultaneous medium access. In 
CDMA channels, each transmit–receive (TX-RX) pair 
is  assigned  a  unique  bipolar spreading code  and  
data  spread  from  all  transmitters are summed in an 
additive communication channel. CDMA has been 
proposed as an on-chip inter connect sharing 
technique for both bus and NoC interconnect 
architectures. Many advantages of using CDMA for 
on-chip interconnects include reduced power 
consumption, fixed communication latency, and 
reduced system complexity. A CDMA switch has less 
wiring complexity than an SDMA crossbar and less 
arbitration overhead than a TDMA switch, and thus 
provides a good compromise of both. 
Overloaded CDMA is a well-known medium access 
technique deployed in wireless communications 
where the number of users sharing the communication 
channel is boosted by increasing the number of usable 
spreading codes at the expense of increasing multiple-
access interference (MAI). The overloaded CDMA 
concept can be applied to on-chip interconnects to 
increase the interconnect capacity. 
In this paper, we apply the overloaded CDMA 
concept to NoCs and advance a novel overloaded 
CDMA interconnect (OCI) crossbar architecture to 
increase the CDMA router capacity by 100% at 
marginal cost. Crossbar overloading relies on 
exploiting special properties of the used orthogonal 
spreading code set, namely, Walsh–Hadamard 
codes, to add a set of non orthogonal spreading 
codes that can be uniquely identified on the receiver 
side. 
The contributions of this paper are as follows. 
1) Introduce two novel approaches that can be 
deployed in CDMA NoC crossbars to increase 
the router capacity and, consequently, 
bandwidth by 100% at marginal cost. 
2) Present the OCI mathematical foundations, 
spreading code generation procedures, and 
OCI-based router architectures. 
3) Develop and evaluate the OCI-based routers 
built on a Xilinx Artix-7 AC701 evaluation kit 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9349 
and using a 65nm ASIC technology for several 
synthetic traffic pat- terns and compare their 
latency, bandwidth, and power consumption 
with the basic CDMA and SDMA switching 
topologies. 
 
II. Related work 
 
 Regular models we have seen are the 48-center SCC 
processor[1] and the 64-center Tile64[2] chip 
multiprocessor. Every one of these models abuses 
bundle switched(PS) NoCs, which bring the upside of 
high adaptability and high data transmission to 
interchanges. Anyway such legitimacy is 
accomplished by misusing an unpredictable switch 
pipeline. The mind boggling switch pipeline prompts 
a high dormancy and power proportion. In 
examination with PS NoC, circuit switching[3] can 
fundamentally bring down the correspondence 
dormancy and power utilization, in light of the fact 
that directing and discretion are not required once 
circuits are set up. Yet, circuit exchanging needs 
adaptability. On the off chance that few interchanges 
vie for a typical physical channel, circuits will be set 
up thusly, at that point the since quite a while ago set 
up time will diminish the general execution. To take 
care of the issues of circuit exchanging and bundle 
exchanging, the Novel exchanging system that joins 
parcel exchanging and circuit exchanging is 
proposed. It not exclusively can give high 
adaptability to correspondences yet in addition 
advance the hour of NoCs by setting up circuit 
exchanged associations between correspondence sets. 
And furthermore by setting up CS associations on PS 
system can decrease correspondence control. Before 
circuits have been set up, bundles are transmitted on 
PS associations with counterbalance the long 
arrangement deferral of circuits. In any case, the way 
that CS associations are not permitted to share a 
typical physical channel limits the quantity of CS 
associations. In the event that few parcel 
transmissions will seek a typical physical channel, 
just a single bundle transmission can be executed in 
circuit exchanging and different bundles must go on 
PS associations. In this paper, we propose a novel 
half and half plan called virtual circuit changing to 
mix with circuit exchanging and parcel exchanging. 
In epic circuit exchanging, virtual channels are 
misused to frame various virtual CS(VCS) 
associations by putting away the interconnect data in 
switches. The primary bit of leeway of this system is 
that it can have the comparable switch pipeline with 
circuit exchanging, and can have different VCS 
associations with share a typical physical channel. To 
help the introduced half breed plot, one altered switch 
engineering is executed dependent on the benchmark 
switch and comparing exchanging system is 
displayed. 
III Design Implementation 
 Overloaded CDMA in wireless 
communications and the requirements of its on-chip 
interconnect counterpart and preliminaries of the 
classical on-chip CDMA switch are presented. 
A. Overloaded CDMA in Wireless Communications 
Direct sequence spread spectrum CDMA 
(DSSS-CDMA) is a leading approach for medium 
sharing in wireless communications where a set of 
orthogonal spreading codes composed of a stream of 
chips of length N are multiplied by the transmitted 
data bits such that each data bit is spread in N cycles. 
A unique spreading code is assigned to every TX-RX 
pair sharing the communication channel. Data streams 
of users sharing the channel are spread and 
simultaneously transmitted to an additive 
communication channel. Despreading is achieved by 
applying the correlation operation to the received 
sum, where each receiver can extract its data by 
correlating it with the assigned spreading code. 
Orthogonality between spreading codes guarantees 
unique identification of every code received in the 
channel sum by exploiting the associative and 
distributive properties of the addition operation 
carried out by the communication channel. In wireless 
communications, random effects such as noise, 
fading, and multipath arising in the communication 
channel affect proper identification of the received 
sum, which increases the bit error rate (BER) of the 
received data. 
Unfortunately, the number of orthogonal codes in a 
spread- ing code set is usually limited to the spreading 
code length N, which reduces the channel utilization 
efficiency. Overloaded CDMA has been proposed in 
the wireless communication literature to increase the 
number of spreading codes by adding non orthogonal 
codes that can be identified on the receiver side [13]. 
Increasing the channel utilization comes at the 
expense of relaxing the orthogonality requirements of 
the spreading codes and increasing MAI, which 
consequently increases the BER. The proposed 
overloaded CDMA spreading codes in wireless 
communications are accompanied with complicated 
receiver structures making use of multiuser detection 
instead of the simple correlator or matched filter 
receiver employed in basic DSSS-CDMA. 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9350 
 
Fig. 1 CDMA NoC Router Architecture 
In this paper, we apply the overloaded CDMA 
concepts developed in the wireless communication 
field to on-chip interconnects to increase the CDMA-
based NoC capacity.  
 
Fig. 2 Classical CDMA Crossbar 
However, on-chip interconnects are significantly 
different from wireless communication channels on 
both the characteristic and requirement levels. In the 
following, basic features of overloaded CDMA will 
be enumerated from the on-chip inter- connect 
perspective to sum up the OCI design considerations. 
1) Overloaded CDMA is a medium access technique 
deployed in wireless communications based on DSSS-
CDMA. 
2) The complexity of wireless overloaded CDMA 
limits its applicability for on-chip interconnects, 
which require simple communication schemes to meet 
the performance requirements. 
3) Despite that wireless CDMA is usually adopted in 
con- junction with other modulation techniques, only 
baseband binary CDMA is considered for on-chip 
interconnects, which can be directly implemented in 
digital platforms such as FPGAs. 
4) Because only digital on-chip interconnects are 
considered, random effects arising in analog 
communication channels such as noise, fading, and 
MAI can be efficiently mitigated using error detection 
and correction techniques. Therefore, such random 
effects are neglected in this paper. 
5) Consequently, due to the last two assumptions, the 
complexity of the CDMA receivers can be 
significantly reduced to fit the on-chip interconnect 
requirements. 
IV Overloaded CDMA Interconnect 
Fig. 1 illustrates the high-level architecture of the 
CDMA-based NoC router. The CDMA router has M 
trans- mit/receive ports. The main difference between 
the overloaded and classical CDMA routers is that M 
> N 1 for the former due to channel overloading. Each 
PE is connected to two network interfaces (NIs), 
transmit and receive NI modules. During packet 
transmission from a PE, the packet is divided into flits 
to be stored in the transmit NI first-input first- output 
(FIFO). The router arbiter then selects M winning flits 
at most from the top of the NI FIFOs to be transmitted 
during the current transaction. The selected flits must 
all have an exclusive destination address to prevent 
conflicts, and a winner from two conflicting flits is 
selected according to the router’s priority scheme. The 
employed priority scheme is the fixed winner that 
takes all priority schemes; only one of the transmitters 
is given a spreading code and is acknowledged to start 
encoding. Once done, the router assigns CDMA codes 
to each transmit and receive NI. NIs with empty 
FIFOs or conflicting destinations are assigned all-zero 
CDMA codes such that they do not contribute MAI to 
the CDMA channel sum. Afterward, flits from each 
NI are spread by the CDMA codes in the encoder 
module. 
The data are spread into N chips, where N is the 
CDMA code length that equals the number of clock 
cycles in a single crossbar transaction. Spread data 
chips from all encoders are summed by the CDMA 
crossbar adder and the sum is sent out serially to all 
decoders. The encoding/decoding process lasts for N 
clock cycles synchronized via a counter. At each 
decoder, the assigned code is cross correlated with the 
received sum to decode the data from the summed 
chips. The decoded flits are stored in the receive NI 
FIFOs until they are read by the PEs. In this paper, we 
focus on the high-level architecture and 
implementation details of the overloaded CDMA 
crossbar represented by the gray block in Fig. 1. 
A store and forward flow control and a deterministic 
routing algorithm are employed in the OCI router. 
The routing algorithm lies at the network layer, which 
is a higher layer than the physical layer containing the 
crossbar switch. According to the OSI model design 
principles, each layer of the model exists as an 
independent layer. Theoretically, one can substitute 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9351 
one protocol for another at any given layer without 
affecting the operation of layers above or below. 
Thus, using the same flow control protocol and 
routing algorithm enables comparing the OCI-based 
router with SDMA- and TDMA-based routers. 
A. OCI Crossbar High-Level Architecture 
The main objective of this paper is increasing the 
number of ports sharing the ordinary CDMA crossbar 
presented in [17], while keeping the system 
complexity unchanged using simple encoding 
circuitry and relying on the accumulator decoder with 
minimal changes. To achieve this goal, some 
modifications to the classical CDMA crossbar are 
advanced. Fig. 2 depicts the high-level architecture of 
the OCI crossbar for a single-bit interconnection. The 
same architecture is replicated for a multibit CDMA 
router. M TX-RX ports share the CDMA router, 
where spread data from the transmit ports are added 
using  an  arithmetic  binary  adder  having   M   
binary  inputs 
and an m-bit output, where m log2 M . The adder is 
implemented in both the reference and pipelined 
architectures. A controller block is used for code 
assignment and arbitration tasks. Each PE is 
interfaced to an encoder/decoder wrapper enabling 
data spreading/dispreading. 
Unlike orthogonal spreading codes, which are XORed 
with the binary data bit, an AND gate is utilized to 
spread data using nonorthogonal spreading codes. The 
AND gate encoder works as follows: if the transmitted 
data bit is “0,” it sends a stream of zeros during the 
whole spreading cycle, which does not cause MAI on 
the channel; if the transmitted data bit is “1,” the 
encoder sends a nonorthogonal spreading code. 
Therefore, the additional MAI spreading code will 
either contribute an MAI value of one or zero each 
clock cycle because the encoder is an AND gate. The 
XOR encoder of the ordinary CDMA crossbar cannot 
be used to encode the OCI codes because it only 
complements the spreading code chips, so an XOR 
gate will cause MAI to the crossbar whether the data 
bit is  “0”  or “1.” A hybrid encoder is developed for 
both orthogonal and nonorthogonal spreading with an 
XOR gate, an AND gate, and a multiplexer unit, as 
shown in Fig. 2. Two decoder types are implemented 
for orthogonal and nonorthogonal data. More details 
about each component of the OCI crossbar will be 
presented in Section IV-C after describing the OCI 
code design procedures and decoding scheme in 
Section IV-B. 
B. OCI Code Design 
The Walsh–Hadamard spreading code family has 
a featured property that enables CDMA interconnect 
overloading. The difference between any 
consecutive channel sums of data spread by the 
orthogonal spreading codes for an odd number of 
TX-RX pairs M is always even, regardless of the 
spread data. This property means that for the N  1 
TX-RX  pairs  using the Walsh orthogonal codes, 
one can encode additional N 1 data bits in  
consecutive  differences  between  the  N chips 
composing the orthogonal code. Thus, exploiting 
this property enables adding 100% nonorthogonal 
spreading codes, which can double the capacity of 
the ordinary CDMA crossbar. In this section, the 
code design methodology, mathematical 
foundations, and the decoding details of the OCI 
codes are provided. The notations used throughout 
this paper are listed in Table I. 
 
 
Fig. 2 High level Architecture and building blocks of 
the OCI Crossbar  
 
a) T-OCI nonorthogonal decoder 
b) P-OCI nonorthogonal decoder  
c) T-OCI pipelined crossbar tree adder, N-times of 
P-OCI. 
d) P-OCI Orthogonol Decoder.  
e) T-OCI Orthogonal decoder.  
 
Fig. 3. Encoding/decoding of three orthogonal codes and two 
T-OCI codes. 
An AND gate encoder is used to encode data with 
nonorthogonal spreading codes as shown in Fig. 2(a). 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9352 
There- fore, for a nonorthogonal encoder, if data  to  
transmit  are one, a single spreading chip at a specific 
time slot in the spreading cycle is added to the 
channel sum,  which causes  the consecutive sum 
difference to deviate. The nonorthogonal codes 
imitate the TDMA signaling scheme as each code is 
composed of a single chip of “1” sent in a specific 
time slot. The encoding/decoding scheme presented in 
this paper provide a novel approach that enables 
coexistence between CDMA and TDMA signals in 
the same  shared medium. Therefore, the developed 
encoder is called TDMA overloaded on CDMA 
interconnect (T-OCI). Fig. 3 shows an 
encoding/decoding example of two T-OCI  codes for 
a  spreading code  of length N 8. An odd number of 
orthogonal codes must be used simultaneously to 
preserve the even difference property of Walsh codes. 
TDMA codes cause MAI to the sum of 
CDMA spread data. The equation of the crossbar sum 
for both CDMA and TDMA encoded data can be 
written as 
 
where S is the  N -cycle waveform of the channel 
sum, dC(j) is the orthogonal CDMA data bit sent by 
the jth user, dT(j) is the non orthogonal TDMA data 
bit sent by the jth, Co(j) is the orthogonal code 
assigned to the jth user, and T( j − N +1) is the 
TDMA code assigned to the j th user. The TDMA 
code T(i) is a single chip  of “1” assigned  at the i th  
time slot.  The TDMA term of the equation is the sum 
of products of TDMA chips and their corresponding 
data bits. This term can be viewed as another N -chip 
spreading code added to the orthogonal spread data 
represented by the first term of the equation. It should 
be indicated that the first chip of the TDMA MAI 
code is always set to zero (T (1)   0), and the 
remaining N 1  chips are assigned according to the 
encoded data bits;  this note is the key to properly 
decode both orthogonal and nonorthogonal spread 
data. Equation (6) can be rewritten as follows  
 
 
where Cn (dT ) is the TDMA MAI code as a function 
of the nonorthogonal data. The number of the 
crossbar adder output bits  is  m= log2 N+1  despite  
that  the  number  of adder inputs  is  2(N-1),  which  
is  the  total  number of  orthogonal and 
nonorthogonal TX-RX pairs sharing the OCI 
crossbar. This is because at any time instance, there 
can be only N inputs having a value of “1” in the T-
OCI encoding scheme. The number of the adder 
output bits is specifically important because it 
directly determines the crossbar wiring density. 
Orthogonal spread data can be still decoded properly 
using the accumulator-based correlator. Despreading 
of the kth orthogonal spread data is achieved by 
multiplying the crossbar sum by the kth orthogonal 
spreading code as follows: 
 
The first term of (8) is the autocorrelation term, which 
is equal to ±N/2 according to the data spread dC , 
while the second term is the cross correlation between 
the orthogonal spreading code Co(k) and the 
nonorthogonal MAI TDMA code Cn (dT). The 
maximum MAI value contributed by the second term 
is ±N/2 because the MAI code is correlated with a 
balanced orthogonal code, where the number of “1” 
chips is equal to the number of “0” chips and equals 
N/2. This case can only occur if the MAI TDMA code 
constructed by the nonorthogonal encoded data is  
identical to  Co(k) or its  complement  Co(k), which 
yields ±N/2, respectively. 
V Implementation 
In this section, the performance evaluation results of 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9353 
the developed OCI crossbars are presented. 
A. OCI Crossbar Evaluation 
In this section, a comparison among the conventional 
CDMA, T-OCI, and the P-OCI crossbars is drawn. A 
crossbar containing a number of TX-RX ports is built 
with full capacity, i.e., the number of ports is the 
maximum number offered by the crossbar. All CDMA 
crossbar architectures in both the reference and 
pipelined variants are implemented and validated on 
an Artix-7 AC701 evaluation kit. The developed 
crossbars are evaluated for different spreading code 
lengths N = {8, 16, 32, 64}. To establish a fair 
comparison among different crossbar architectures 
with different numbers of ports, all utilization metrics 
are normalized to the number of crossbar ports M. The 
evaluation results, including the resource utilization 
expressed in the number of lookup tables (LUTs) and 
FFs per port, maximum crossbar frequency, dynamic 
power consumption per port, and crossbar bandwidth, 
are illustrated in Fig. 4. 
As depicted in Fig. 4(a), for a spreading code of length 
N, the resource utilization per port of the T-OCI 
crossbar is lower than that of the ordinary CDMA 
crossbar by 31%. This salient reduction in the 
normalized resource utilization is due to the 
significant increase in the CDMA interconnect 
capacity compared with the marginal overhead added 
by the crossbar circuitry. On the other hand, the P-OCI 
crossbar is 400% larger than the conventional CDMA 
crossbar due to the parallel crossbar adders. Increasing 
the spreading code length N increases the resource 
utilization per port, due to the increasing crossbar 
complexity. Specifically, with increasing N , the size 
of the crossbar adder and accumulator decoder 
circuitry increases. The resource utilization of all 
crossbar pipelined variants is always larger than that 
of the basic architectures due to the additional 
nonarchitectural pipelining registers. 




where W is the port width in bits, fc is the crossbar 
clock frequency, M is the number of crossbar ports, 
and ᴦ is the number of cycles to encode 1 bit of data 
from all ports. The T-OCI crossbar bandwidth 
demonstrates a significant increase over the ordinary 
CDMA crossbar as it has an overloading ratio of 
M/N=2 compared with  the basic CDMA  crossbar 
ratio  of  M/N=1 for  the same  ᴦ=N  for  both 
crossbars. For the P-OCI crossbar,  however,  ᴦ=1,  
and  therefore,  the bandwidth of the P-OCI crossbar is 
N times that of the T-OCI crossbar and 2N times that 
of the conventional CDMA crossbar. Fig. 4(d) depicts 
the bandwidth-to-resource  ratio; the T-OCI and P-
OCI crossbars offer higher ratios compared with the 
conventional CDMA crossbar due to the significant 
bandwidth enhancement compared with the induced 
marginal resource overhead. 
As illustrated in Fig. 4(e), for a spreading code of 
fixed length N , the dynamic power dissipation per 
port, estimated by the Xilinx Vivado tool for a single 
crossbar transaction, is decreased by 45% for the T-
OCI crossbar due to the offered capacity 
enhancement. However, due to the increased area and 
parallel encoding–decoding of the P-OCI crossbar, its 
dynamic power dissipation is 133% higher than that 
of the conventional CDMA crossbar. With increasing 
N, power dissipation per port increases for all CDMA 
crossbars due to the increased size and complexity of 
the crossbar components. 
 
B. OCI Communication Reliability Considerations: 
Since the OCI scheme relies on adding detectable 
interference to the interconnect, the robustness of the 
OCI  crossbar  to noise may be raised as a concern; 
would the added MAI reduce the robustness of the 
OCI compared with that of the conventional CDMA 
interconnect? According to full-swing digital 
implementations have  typically  been able to  
assume  BER  values  less  than  10
−15   
over  the 
operating range of voltages and frequencies, this 
assumption does not hold true for custom low-swing 
interconnect implementations and modern deep 
submicrometer circuits. Indeed, in wireless 
communication channels, overloaded CDMA would 
increase the BER compared with the classical 
CDMA because of overloading the channel with 
MAI. Wireless channels are purely analog exposing 
them to all random effects such as noise. On the 
other hand, the OCI crossbar adopts binary signaling 
to carry the crossbar sum instead of multilevel or 
analog signaling. The binary nature of the OCI 
interconnect enables enhancing its robustness by 
employing error detection and correction techniques 
to mitigate such random effects. 
To empirically test the robustness of the OCI 
crossbar on the FPGA  platforms, a  testbench  was  
applied  for  N  16 OCI crossbar implemented on a 
Zedboard FPGA evaluation kit with a 100-MHz 
clock frequency and a 1 V core voltage. Zynq’s 
embedded processor runs a program generating 10
6 
consecutive crossbar transactions and compares the 
decoded output with the input data. Zero errors were 
detected during the experiment, which lasted for 27 
h. 
On the other hand, to study the reliability of OCI and 
conventional CDMA links in the presence of error 
sources such as noise, the BER of the overloaded and 
classical CDMA links subject to additive white 
Gaussian noise (AWGN)  with  a variable signal-to-
noise ratio (SNR) was computed using MATLAB 
simulation for the following test exhibited a total of 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9354 
10
7 
test vectors are randomly generated per SNR 
value, which changes from 0 to 20. The BER–SNR 
curves shown in Fig. 5 depicts an increase in the BER 
of overloaded CDMA compared with that of classical 
CDMA in a digital communication channel subject to 
AWGN. The BER increase is no greater than 72% and 
its average is 35%. 
C. OCI for NoCs: Analytical Evaluation 
Table III provides an analytical comparison 
between the OCI crossbars and some existing bus 
and NoC interconnection techniques. The 
comparison is established for an interconnect 
 
of M TX-RX pairs representing the number of ports in 
an NoC router. The compared metrics are the 
interconnect complexity normalized to the port width 
in bits W , interconnect latency  in clock cycles, and 
the interconnect bandwidth normalized to the crossbar 





Fig. 3. (a) CONNECT torus topology (b) versus the 
OCI star topology 
As a bus, the OCI crossbar provides a higher 
bandwidth than the CDMA peripheral bus [6]. The 
CDMA peripheral bus interfaces multiple peripherals 
to multiple PEs on a shared CDMA bus. The OCI 
technique can be applied to the peripheral bus to 
increase the number of interconnected PEs and 
peripherals without degrading the transaction latency. 
In the CDMA parallel transfer wrapper of  [7]  and  
[8],  the number of parallel transfer lines is reduced by 
bundling data using spreading codes. The OCI 
spreading codes can be used to bundle more data bits 
on the same number of wires. Therefore, the OCI 
crossbar can provide higher bandwidth than the 
CDMA peripheral bus and the CDMA parallel transfer 
wrapper of the same complexity due to crossbar 
overloading. The CDMA encoding–decoding scheme 
presented in [5] is based on the standard basis TDMA 
codes, which replace the orthogonal Walsh codes. The 
encoders are consequently replaced by an AND gate, 
the bus adder is reduced to a single XOR gate, the 
channel wires are reduced to one wire per bit because 
no two TDMA chips are simultaneously sent in the 
same clock cycle. This scheme resembles TDMA 
signaling but adopts the CDMA arbitration procedures 
where the code assignment is done once every N 
encoding–decoding bus cycle. On the other hand, our 
proposed OCI technique enables coexistence between 
both CDMA and TDMA codes on a single channel, 
providing double bandwidth, while utilizing less area 
than two independent TDMA and CDMA crossbars. 
The data transfer latency of the CDMA NoC router in 
[10] is equal to the best case latency of a PTP 
network. This data transfer latency of the CDMA 
router can be reduced using fewer chips per spreading 
code while keeping the number of PEs unchanged 
through utilizing the OCI technique. The CDMA NoC 
router in utilizes the orthogonal Walsh code set to 
interconnect a maximum of N network nodes, where N 
is the number of chips in a spreading code. The 
presented routers can exploit the OCI schemes to 
double the number of ports of the network router 
without increasing the spreading code length and 
hence without increasing the hop latency.  
The multicast router of interconnects four ports and 
four PEs. The OCI technique can double the capacity 
of the switch without increasing the hop latency, and 
therefore, each PE can multicast more packets 
through the router in one hop. 
The modules of the MPEG-2 encoder in are 
intercon- nected using PTP, NoC, and TDMA bus 
topologies to evaluate these three different 
interconnects. The NoC is shown to have a close 
bandwidth to a PTP at fewer logic resources and 
wiring area and much higher bandwidth than the 
TDMA bus. The conventional parallel CDMA buses 
of demonstrate equal bandwidth to the best case 
bandwidth of mesh NoCs [8], in addition to the fixed 
latency, due to the simultaneous medium access by 
the interconnected PEs. The P-OCI crossbar can 
provide a higher bandwidth and a lower latency than 
Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9355 
the conventional parallel CDMA buses by 
simultaneously transmitting all the N chips of the 
spreading code in parallel due to overloading. This 
analytical discussion highlights the OCI capability of 
substituting the classical CDMA interconnect in any 
CDMA-based bus or NoC architecture while 
providing higher bandwidth at the same latency or 
interconnecting the same number of ports at lower 
latency per transaction. 
 
D. OCI for NoCs: Experimental Evaluation 
To study the effectiveness of the OCI crossbar in a full 
working NoC, a 65-node star topology is built using 
five OCI routers, each of the 13 PEs is connected by 
an OCI router  with N 8, and the five OCI routers are 
interconnected by an SDMA central router. Both T-
OCI- and P-OCI-based NoCs are compared with a 64-
node, 16-bit flit, and 8-ary 2-cube torus SDMA-based 
NoC generated by the CONNECT tool [3]. The 
CONNECT NoC employs simple input queued routers 
with peek flow control. Fig. 6 illustrates the torus 
topology employed by the CONNECT NoC versus the 
star topology adopted by the OCI NoC. The  star  
topology is  chosen  for the OCI NoC since the 
improvement of the OCI complexity against the 
SDMA router increases as the number of ports 
increases due to the linear increase in the OCI crossbar 
area compared to the quadratic increase in the SDMA 
crossbar area. Similarly, the torus topology was 
chosen for the CONNECT NoC since the  torus 
SDMA  crossbars have a  low  number  of ports, 
which is translated to lower complexity. Since each 
router in a torus network accommodates five buffers, 
the buffer spacing offered in the CONNECT NoC is 
64 5, while the spacing of the OCI-based NoC is equal 
to the number of PEs plus the number of buffers in the 
central router, which equates to 65 5. Therefore, to 
equalize the buffer spacing in the compared NoCs, the 
OCI buffer width is sized four times the CONNECT 
buffer width. Consequently, the flit size of the OCI-
based NoC is 64 bits. Table IV lists the 
implementation results of the three NoCs on the 65-
nm ASIC technology, the area of the T-OCI NoC is 
45% less than that of the CONNECT NoC, while the 
area of the P-OCI is 30% less than that of the 
CONNECT NoC, despite the larger flit size with a 
reduction in latency due to their lower complexity. 
 
The performance comparisons of the T-OCI and P-
OCI NoCs versus the CONNECT NoC  are  depicted  
in  Fig.  7  for six synthetic traffic patterns and for the 
same  packet  width of 256 bits. The uniform, 
hotspot, and tornado traffic patterns are employed 
with two variants: local and global traffic. In the 
global traffic, the traffic pattern is applied to the 
entire network, while in the local traffic, the traffic 
pattern is applied to separate clusters. For the OCI 
network, there are five clusters corresponding to the 
five OCI routers. On the other hand, the 64 nodes of 
the torus network are divided in the network layer 
into five clusters according to the proximity of the 
routers. The experiment is conducted by subjecting 
the NoCs to different traffic patterns for 500 clock 
cycles each, the latency per packet is then computed 
by dividing the total number of clock cycles (500) by 
the total number of packets arrived successfully to 
their target PEs in each traffic pattern. Additionally, 
the throughput © is calculated as follows: 
 
where Nc is the number of the simulation clock cycles 
(500), Nb is the number of bits per packet (256), Np is 
the number of packets received by the target PEs, and 
tc is the clock period. As illustrated by Fig. 7(a), the 
latency in clock cycles per packet of the T-OCI is 
higher than that of the CONNECT NoC in most 
traffic patterns due to the serial spreading of packets. 
However, the latency is lower in the hotspot traffic 
pattern due to the smaller number of hops needed to 
reach the hotspot node. Additionally, the P-OCI NoC 
offers lower packet latency compared with the 
CONNECT NoC for all traffic patterns except for the 
uniform pattern since torus NoCs are better in 
balancing the injected load than star NoCs. 
Consequently, the P-OCI throughput shown in Fig. 
7(b) is higher than that of the CONNECT NoC for all 
traffic patterns due to its lower clock period. 
Moreover, the improvement in throughput and area of 
the T-OCI and P-OCI over those of the CONNECT 
NoC appears in the throughput-to-area ratio (TPA) 
comparison in Fig. 7(c). However, as illustrated in 
Fig. 7(d), the dynamic and static power consumption 
of the OCI-based NoC for all traffic patterns are 
larger than that of the CONNECT NoC except the 
uniform pattern despite the P-OCI’s higher clock 
period. Therefore, the improvement in the TPA of the 
T-OCI and P-OCI routers comes at the expense of 
increasing power consumption. Resource replication 
and adapting the clock speed can be employed to 





Donga Purna Satyanarayana* et al. 
 (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
Volume No.7, Issue No.6, October-November 2019, 9348-9356.  
2320 –5547 @ 2013-2019 http://www.ijitr.com All rights Reserved. Page | 9356 













[1] K. Asanovic et al.,  “The  landscape  of  parallel  
computing  research: A view from berkeley,” Dept. 
EECS, Univ. California, Berkeley, CA, USA, Tech. 
Rep. UCB/EECS-2006-183, 2006. 
[2] P. Bogdan, “Mathematical modeling and control of 
multifractal work- loads for data-center-on-a-chip 
optimization,” in Proc. 9th Int. Symp. Netw.-Chip, 
New York, NY, USA, 2015, pp. 21:1–21:8. 
[3] Z. Qian, P. Bogdan, G. Wei, C.-Y. Tsui, and R. 
Marculescu, “A traffic- aware adaptive routing 
algorithm on a highly reconfigurable network-on- chip 
architecture,” in Proc. 8th IEEE/ACM/IFIP Int. Conf. 
Hardw./Softw. Codesign, Syst. Synth., New York, 
NY, USA, Oct. 2012, pp. 161–170. 
[4] Y. Xue and P. Bogdan, “User cooperation network 
coding approach for NoC performance improvement,” 
in Proc. 9th Int. Symp. Netw.-Chip, New York, NY, 
USA, Sep. 2015, pp. 17:1–17:8. 
[5] T. Majumder, X. Li, P. Bogdan, and P. Pande, 
“NoC-enabled multicore architectures for stochastic 
analysis of biomolecular reactions,” in Proc. Design, 
Autom. Test Eur. Conf. Exhibit. (DATE), San Jose, 
CA, USA, Mar. 2015, pp. 1102–1107. 
[6] S. J. Hollis, C. Jackson, P. Bogdan, and R. 
Marculescu, “Exploiting emergence in on-chip 
interconnects,” IEEE Trans.  Comput., vol. 63,  no. 3, 
pp. 570–582, Mar. 2014. 
[7] S. Kumar et al., “A network on chip architecture 
and design method- ology,” in Proc. IEEE Comput. 
Soc. Annu. Symp. (VLSI), Apr.  2002,  pp. 105–112. 
[8] T. Bjerregaard and S. Mahadevan, “A survey of 
research and practices of network-on-chip,” ACM 
Comput. Surv., vol. 38, no. 1, 2006, Art. no. 1. 
[9] Y. Xue, Z. Qian, G. Wei, P. Bogdan, C. Y.  Tsui,  
and  R. Marculescu, “An efficient network-on-chip 
(NoC) based multicore platform for hierarchical 
parallel genetic algorithms,” in Proc. 8th IEEE/ACM 
Int. Symp. Netw.-Chip (NoCS), Sep. 2014, pp. 17–24. 
[10] D. Kim, K. Lee, S.-J. Lee, and H.-J. Yoo, “A 
reconfigurable crossbar switch with adaptive 
bandwidth control for networks-on-chip,” in Proc. 
IEEE Int. Symp. Circuits Syst. (ISCAS), May 2005, 
pp. 2369–2372. 
