EFASBRAN: Error Free Adaptive Shared Buffer Router Architecture for Network on Chip  by Prasad, E. Lakshmi et al.
 Procedia Computer Science  89 ( 2016 )  261 – 270 
Available online at www.sciencedirect.com
1877-0509 © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the Organizing Committee of IMCIP-2016
doi: 10.1016/j.procs.2016.06.056 
ScienceDirect
Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016)
EFASBRAN: Error Free Adaptive Shared Buffer Router
Architecture for Network on Chip
E. Lakshmi Prasada,∗, A. R. Reddyb and M. N. Giri Prasada
aJNTUA, Anantapuramu (A.P.), India
bMITS, Madanapalle (A.P.), India
Abstract
Router mainly used to control the data ﬂow in Network on Chip (NoC). Every router can reliably control the trafﬁc throughout the
network. While controlling the heavy data inside the router, then the resultant may chance to get an error. Thus, to avoid an error
inside the shared buffered router, a single bit error correction module externally added. Therefore, the main objective of this paper
is to present an error-free low power and low latency shared buffered router architecture proposed for NoC. Thus, the improvements
of proposed work as interpreted with respected to area, power and delay. Therefore, an entire experimental work simulated and
synthesized by the Xilinx tool.
© 2016 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of organizing committee of the Twelfth International Multi-Conference on Information
Processing-2016 (IMCIP-2016).
Keywords: Error Correction; Network on Chip; Router; Trafﬁc Congestion; XY Routing.
1. Introduction
Network on Chip (NoC) is novel trend in multiprocessors. Multiprocessors have a wide range of IP (Intellectual
properties) cores and those are integrated at one place in a single chip. In this rapid growth technology, more functions
increase with more IP cores. When the number of IP cores increases in bus-based architecture then complexity
problems get increases1,2. Lets consider an example of IBM cell play station gaming console architecture as shown
in the Fig. 1. So, if we see the architecture of IBM cell, which is following regular bus-based architecture, with this
approach preserving those many hardware units in a small chip becomes tedious. Here, not only that bus-based designs
followed like when master and slave is enabled then remaining processing units should wait till it completes the task.
The drawback of this architecture doesn’t allow the parallel tasks3.
Suppose, if two device units directly communicated with a huge data then it raises a couple of problems. Those
couple of problems like data may not reach the proper destination called as starvation, deadlock problems occurred
when three or more units connected in a circular fashion and so on. This problem because of direct communication
among Intellectual property cores in multiprocessors. In this regards, Network on Chip plays an excellent role in
multiprocessors.
∗Corresponding author. Tel.: +91 8522920646.
E-mail address: lakshmi−prasad2@yahoo.com
© 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the Organizing Committee of IMCIP-2016
262   E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Fig. 1. IBM Cell Sony Play Station 3 MPSOC.
Therefore, unlike to follow the regular bus-based communication in multiprocessors, NoC is the best solution.
In NoC design, each router interconnected with one IP core also connected with one or more neighbouring routers.
Here, router plays a prominent role because of it manages the trafﬁc throughout the network. Several types of router
designs are available such as wormhole router, virtual channel router, shared buffered router and so on. Router topic
would discuss detailed in a later section.
1.1 Objective
In NoC design, major parts are namely router, Network Interface (NI), topologies and routing algorithms. The main
objective of this paper is to propose an error free low power and low latency adaptive router architecture implemented
for NoC. In fact, the job of an NoC router is to buffer the data and it controls the trafﬁc in and around the network.
But while buffering the data inside the router there might have a chance to get an error data. So, in this research paper,
router design is not only a goal, the eventual objective is to make detect and correct the error. Therefore, the standard
hamming coding technique externally added to the router for avoiding single bit error.
So, here in this section, we have clearly mentioned the problem with solution in multiprocessors and objective of
inside the router problem in NoC also interpreted. So, in the next following section organized as like in section 2
related work, in section 3 existing work, in section 4 proposed approach, in section 5 implementation results, and
ﬁnally section 6 presented with conclusion and future scope.
2. Related work
William J Dally et al.4 proposed a well-controlled architecture for network on chip. This network structure has
several advantages, like well-controlled electrical parameters, latency, trusted ﬂow control and robust topology. Also,
the network parameters balance the power consumption along with wire utilization. It supports a huge variety of
higher-level protocols and a huge variety of low overhead data width too. Finally, the design has implemented with
reduced latency.
Shashi Kumar et al.5 proposed a packet switched platform for NoC architecture. Two phases of NoC design
methodologies introduced. In the ﬁrst phase, introducing M × N mesh architecture of M × N routers, in which
each router is physically connected to a neighbouring router. In the second phase, each and every router is physically
interfaced with local IP core (intellectual property).
Shan Yan et al.6 proposed four distinguished algorithms for NoC. The four algorithms are namely Decompose,
Cluster, Perturb, and R-perturb. Multicast and unicast trafﬁc ﬂows are considered as input for testing these design.
Rectilinear steiner-Tree algorithm is used for generating the interconnection topology. These variety of solutions has
approached to make it free from deadlock problems in the design. The entire design and algorithm evaluated in terms
of delay, hop count, and power consumption.
Vassos Soteriou et al.7 proposedDistributed Shared Buffered (DSB) router micro-architecture for NoC. These paper
presented with innovative techniques like efﬁcient pipe-lining to achieve high throughput. Hence, these DSB design
improve bandwidth up to 20% and throughput up to 94% than the Output Buffered Router (OBR).
263 E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Table 1. Review Comparison of Related Work in Network on Chip.
Sergio V. Tota et al.8 presented Deﬂection routing algorithm for NoC based MPSoCs. Here, the system designed
with 16 processing elements for ray graphic accelerator. The whole design implemented in FPGA under 90 nm
technology with 500MHz clock frequency. Therefore, resultant output achieved with high performance in the design.
Instead of interpreting theory, we have well tabled with review comparison of related works with respected to
various metrics like, methods of data transmission, network computation, power, area, delay and tools used as shown
in Table 1.
3. Existing System
Router plays a signiﬁcant role in Network on Chip. In fact, the router has many responsibilities like controlling the
trafﬁc, avoiding the deadlock problems, and route the packets to the proper destination. So, router micro-architecture
must be robust, design should have the ability to avoid the deadlock and congestion problems. So, here is an example
of Virtual channel router micro-architecture as shown in the Fig. 29,10, 13.
Normally, router has ﬁve input and ﬁve output ports. Those ports are namely South, North, East, West, and Local
port. Each local port has one PE and the remaining ports connected to beside routers. It has many sub blocks such as
crossbar network, FIFO, Switch Allocator (SA), and so on as represented in the Fig. 2.
The working principle of the router as described here. At ﬁrst, each packet buffered by the each FIFO, while
buffering which makes in the form of ﬂits. Each packet has several ﬂits such as Source, destination ﬂits, payload, and
tail ﬂit. Crossbar network receives the packets from the FIFO. Every packet drives into the selected output port based
on scheduling of switch allocator and switch traversal. Switch Allocator used to preserve the scheduling to the crossbar
network to avoid the contention and congestion problems. So, based on the grants, it enables the task for the next cycle.
Routing logic decides the states of input and output ports also it select the correct path to reach the destination.
Here in this example, there is no priority for the density of the data. Even if the data is low density or high density
but the allocation of the buffer is same. Suppose if the output of the data is an error there is no possibility to rectify it.
So, in the next sub-section an important network interface part of NoC is presented.
264   E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Fig. 2. Virtual Channel Router Micro-Architecture.
Fig. 3. Router to Router Network Interface.
3.1 Network Interface
Network interface (NI) is an important part of Network on Chip (NoC). Its associated with a couple of nodes like
a router to router link and router to the local processing element (PE). NI acts as the backbone to the NoC because
it transfers the data packets to the proper destination. Here, the packet format created by the router and it contains
source, destination address, payload, and optional tail part. Router to router communication processes model as shown
in Fig. 39.
At the ﬁrst stage, the packet format splits into Flits (Flow control units) like header, payload, and end (tail) part.
While communicating between the core to the router (C2R) or router to router ((R2R), a packet format gets changed
into ﬂiterizing format. The purpose of ﬂiterizing make it into sub-ﬂits. This changeover packet format is needed to
reach the correct destination.
Similarly, when during communication between the router to core (R2C) then the ﬂiterizing packet format get
changed into a deﬂiterizing packet format. This is because of to remove unwanted ﬂits like source and destination
address and so on. An example of network interface packet format process as shown in Fig. 4.
4. EFASBRAN: Error Free Adaptive Shared Buffer Router Architecture for Network on Chip
In this proposed approach, We have improved with some parameters for the shared buffer router. Therefore, the
design change over with an error correction module, density identiﬁer module, and an I/O state identiﬁer. Bypass
shared buffered router introduced by Anh T. Tran14. EFASBRAN (Error Free Adaptive Shared Buffer Router
265 E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Fig. 4. Network Interface Processing Structure.
Fig. 5. Error free Adaptive Shared Buffer Router Architecture.
Architecture for Network on chip) can avoid the single bit error in the output data. Error Free Adaptive Shared Buffer
Router Architecture as shown in Fig. 5.
Error Free Adaptive Shared Buffer Router Architecture ﬂow chart with pipelined packet data path format as shown
in Fig. 6. In Fig. 6a & 6b, 6b express the processes of router deign. In this router design has important modules such
as single bit error correction and data density identiﬁer module.
At ﬁrst, an n-byte packet arrived at the input port, while writing into queue parity bits are encoded and it is considered
as a ﬂit. Each ﬂit divided into sub-ﬂits as represented in Fig 6b. Suppose, if the packet data is low and it selects the
bypass queue else it selects the shared queue when packet data is high.
266   E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Fig. 6. EFASBRAN ﬂow chart with Pipeline Packet Data Path Format.
When the packet data is low then it is having ﬁve stage pipelined format. In ﬁrst cycle input ﬂits arrived at the input
port (input data). During the second cycle, parity bit encoded can call as Input Queue Encoded Data (IQED). In the
third cycle at a time, three operations will be performed such as Look ahead Routing Computation (LRC), the output
port arbiter (OPA) and Shared queue allocator (SHQA). In these cycle, LRC has the destination information contained
in header ﬂit, if the OPA win, then it granted the credits to the crossbar network. In the fourth cycle, output switch
or crossbar traversal (OS/CT) receives the grants from OPA then it selects the desired output port. In the ﬁfth cycle,
during the selection of an output port, the received data will be decoded can called as output queue decoded data
(OQDD) along with output link traversal is connected towards the next router. Finally, the body ﬂit or tail ﬂit follows
the same route because destination information already contains in header ﬂit.
Suppose if the data is heavy then it is having eight stage pipelined format. When OPA grants fail, then grants can
get from shared queue and it allows to traverse the data by the crossbar shared queue network. OPA has again released
the grants for shared queue crossbar network and it goes for the next cycle. In the next cycle, output switch or crossbar
traversal (OS/CT) selects the desired output port. The remaining process would be same as the previous approach.
While decoding the output data at the ﬁnal stage, whether data is in error checked and corrected by hamming code.
The hamming code error correction and detection clearly will be explained with an illustration in the next part. The
Fig. 5a & b expresses a clear vision of router process.
4.1 Error detection and correction
Error detection and correction process is done by using Hamming code. Appending a parity bit (p) to the given
data units (m) of any length at the transmission side. Making relationship between m and p, then its resultant output
bit length is m + p. So, if the total transmittable data is m + p + 1, p must produce 2p different states. Therefore, 2p
must be greater than equal to m + p + 1.
2p >= m + p + 1
The value of p can ﬁnd out by making a XOR operation among the message data units. For example, we have
tested with this input m = 10101011 at the transmission side as shown in Fig. 7 and the following parity bit process
determined as expressed below
p1 = d1 xor d2 xor d4 xor d5 xor d7
p2 = d1 xor d3 xor d4 xor d6 xor d7
p4 = d2 xor d3 xor d4
p8 = d5 xor d6 xor d7 xor d8
267 E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Fig. 7. Error Detection and Correction process of EFASBRAN.
So, apply the given data units to the above equations and ﬁnd out the parity bit. Therefore, the resultant parity bits are
p1 = 1, p2 = 1, p4 = 0, p8 = 0, after appending this values to the given data units then the ﬁnal transmittable bit
length is 12 bits as shown in Fig. 7. These bits are transferred through shared buffered router and the output of this
router is given to the error correction module.
Therefore, the output of the router is taken into the checking process. This checking process is done by hamming
code. If the error is determined in a given data bits the process is known as error detection and if error is corrected in a
given data bits the process is known as error correction. For example, 12-bit checking process steps as expressed below
c1 = p1 xor d1 xor d2 xor d4 xor d5 xor d7
c2 = p2 xor d1 xor d3 xor d4 xor d6 xor d7
c4 = p4 xor d2 xor d3 xor d4
c8 = p8 xor d5 xor d6 xor d7 xor d8
After checking process is done, suppose if there is an error in the output data, we must ﬁnd the location of the error
then the only error can be corrected. So, the above checking process is repeated for determining the location of the
error in the data. So, arrange the checked bits in the form of MSB to LSB like c8, c4, c2, c1 that gives the location
of error. This process is same for any bit length. let’s consider an example c8 = 0, c4 = 0, c2 = 1, c1 = 1 then the
location error is 3. This entire process as illustrated in Fig. 7. In the next section will be discussed with experimental
results.
5. Experimental Results
An EFASBRAN is tested in a 2 × 2 mesh network. A 2 × 2 mesh network illustration as shown in Fig. 8. In this
NoC architecture contains router, network interface and Intellectual Property core (IP core). Among those parameters,
we have designed router and network interface. These two designs already discussed in previous sections. A real IP
core is not designed, but NoC meshes tested with some hex data such as 8, 16, 32-bit data. Here, Packets are routed in
a shortest path by using XY routing algorithm.
The XY routing algorithm is generally used for MESH. This XY routing algorithm is simpliﬁed shortest path
algorithm for communicating between the routers. X -plane can be called horizontal and Y -plane can be called as
Vertical Plane. Normally, XY routing operation performed in this form like if X -plane is considered as the sender and
268   E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Fig. 8. 2 × 2 MESH NoC.
then automatically Y -plane becomes a receiver16.
say Xo = Xoffset, Xt = X target and Xs = Xsource
say Yo = Yoffset,Yt = Ytargetand Ys = Ysource
Dimension ordered XY routing algorithm
1: Xo = Xt − Xs
2: Yo = Yt − Ys
3: if Xo = 0&Yo = 0 then
4: routing = local
5: else if Xo > 0 then
6: routing = east
7: else if Xo < 0 then
8: routing = west
9: else if Xo = 0&Yo > 0 then
10: routing = north
11: else if Xo = 0&Yo < 0 then
12: routing = south
13: endif
The above XY routing algorithm applied to the Fig. 8. The whole experimental work is done by using Xilinx 14.2
and device targeted in vertex-7FPGA (Xc7vx330t-3ffg1157).
5.1 Performance analysis
As per the data packet format, when the data is a low, router takes minimum of 5 cycles to ex-cute the process.
suppose when the data is high it takes a minimum of 8 cycles. If the bit length changes required number of clock
cycles doesn’t change, but clock frequency may vary with different bit lengths. For 8, 16, 32-bit 2 × 2 low-density
EFASBRAN worst case it takes 13 clock cycles to communicate between R1 to R4 as represented in Fig. 8. In case,
if the density of the data is high then worst case it takes 19 clock cycles. As per the synthesis report, power report
results are well tabulated in Table 2 and also graphically represented in Fig. 9 & 10.
269 E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
Table 2. Synthesis Report.
2 × 2 EFASBRAN
Bit Size Slice Registers Slice LUTs Unused Flip Flops Unused LUTs LUT-FF Pairs Bonded I/O Delay (ns) Memory (MB)
8 224 93 31 162 62 194 0.622 220
16 448 167 52 333 115 354 0.689 222
32 896 311 92 677 219 674 0.837 228
Table 3. Power Report.
Bit Size Static Power Dynamic Power Total
8 0.325 0.145 0.470
16 0.555 0.146 0.701
32 0.725 0.147 0.872
Fig. 9. Synthesis Report.
Fig. 10. Power Report.
6. Conclusions & Future Scope
In this paper, an error occurred inside the shared buffered router eliminated by a single bit Hamming code.
Therefore, the main objective of this paper is to present an error-free low power and low latency shared buffered
270   E. Lakshmi Prasad et al. /  Procedia Computer Science  89 ( 2016 )  261 – 270 
router architecture proposed for NoC. Error free adaptive shared buffered router tested with various bit lengths for
2 × 2 mesh NoC. When the size of the bit length increases then the design area and delay get increases, but number
cycles needed to communicate between the routers is not increased. Finally, the improvements of proposed work as
interpreted with respected to area, power and delay. In future, would like to extend this work by 3D mesh NoCs with
real-time IP cores. We would also like to improve the error correction method for burst errors.
Acknowledgment
The authors would like to thank the Principal and Management of Madanapalle Institute of Science & Technology for
their kind support to carry out the research work, and also, they offered economical support on behalf of TEQIP-II
world bank organization.
References
[1] Everton Alceu Carara and Ney Laert Vilar Calazans, Differentiated Communication Services for NoC-Based MPSoCs, IEEE Transactions
on Computers, vol. 63, no. 3, March (2014).
[2] Sudeep Pasricha and Nikil Dutt, On-Chip Communication Architectures, System on Chip Inter-Connect, Morgan Kaufmann Publications,
Elesvier, (2008).
[3] IBM Cell Project, http://www.research.ibm.com/cell.
[4] Wlliam J. Dally and Brain Towels, Route Packets, Not Wires: On-chip Interconnection Networks, ACM Transactions DAC 2001 June 18–22,
(2001).
[5] Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikael Millberg, Jhony Oberg, Kari Tiensyrja and Ahmed Hemani,
A Network on Chip Architecture and Design Methodology, IEEE Computer Society, 76951486-3/02 IEEE, (2002).
[6] Shan Yan and Bill Lin, Custom Networks-on-Chip Architectures With Multicast Routing, IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 17, no. 3, March (2009).
[7] Vassos Soteriou, Rohit Sunkam Ramanujan, Bill Lin and Li-Shiuan Peh, A High-Throughput Distributed Shared-Buffer NoC Router, IEEE
Computer Architecture Letters, vol. 8, no. 1, January–June (2009).
[8] Sergio V. Tota, Mario R. Casu, Massimo Ruo Roch, Luca Macchiarulo and Maurizio Zamboni, A Case Study for NoC-Based Homogeneous
MPSoC Architectures, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 3, March (2009).
[9] David Atienza, Federico Angiolini, Srinivasan Murali, Antonio Pullini, Luca Benini and Giovanni De Micheli, Network-on-Chip Design and
Synthesis Outlook, Integration, The VLSI Journal, vol. 41, pp. 340–359, (2008).
[10] Mehdi Modarressi, Arash Tavakkol and Hamid Sarbazi-Azad, Virtual Point-to-Point Connections for NoCs, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 6, June (2010).
[11] Rohit Sunkam Ramanujam, Vassos Soteriou, Bill Lin and Li-Shiuan Peh, Extending the Effective Throughput of NoCs with Distributed
Shared-Buffer Routers, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 4, April (2011).
[12] Mohammad Abdullah Al Faruque, Thomas Ebi and Jrg Henkel, AdNoC: Runtime Adaptive Network-on-Chip Architecture, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, February (2012).
[13] En-Jui Chang, Hsien-Kai Hsin, Chih-Hao Chao, Shu-Yen Lin and An-Yeu (Andy) Wu, Regional ACO-Based Cascaded Adaptive Routing
for Trafﬁc Balancing in Mesh-Based Network-on-Chip Systems, 10.1109/TC.2013.2296032, IEEE Transactions on Computers, (2013).
[14] Anh T. Tran and Bevan M. Baas, Achieving High-Performance On-Chip Networks With Shared-Buffer Routers, IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 22, no. 6, June (2014).
[15] Guoyue Jiang, Zhaolin Li, Fang Wang and Shaojun Wei, A Low-Latency and Low-Power Hybrid Scheme for on-Chip Networks. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 4, (2015).
[16] C. J. Glass and L. M. Ni, The Turn Model for Adaptive Routing, In Proc. The 19th Intl Symposium on Computer Architecture.
