Abstract-We present comparative reliability analysis between shared-bus AMBA and network-on-chip (NoC) in the presence of single-event upsets (SEUs) using MPEG-2 video decoder as a case study. Employing SystemC-based cycle-accurate fault simulations, we investigate how the decoder reliability is affected when SEUs are injected into the computation cores and communication interconnects of the decoder. We show that for a given soft error rate, AMBA-based decoder experiences higher SEUs during computation due to higher execution time. On the other hand, NoC-based decoder experiences higher SEUs during inter-core communication due to higher channel latency and resource usage in the interconnects. Furthermore, we evaluate the impact of total SEUs at application-level for NoC-and AMBA-based decoders.
I. INTRODUCTION
Comparative analyses in terms of performance, latency, scalability, power, area and throughput between AMBA and network-on-Chip (NoC) have recently been reported, such as [1] . Reliability against soft errors is an emerging design objective for these systems, particularly due to exacerbation of single-event upsets (SEUs) with technology scaling [2] . To this end, a number of studies have recently been reported showing NoC fault tolerant architectures and techniques, examples [3] , [4] . To our knowledge, no comparative studies between sharedbus and NoC has yet been reported considering reliability using real-application, which is the main aim of this paper. Using SEU as fault model in simulated fault injection environment, we investigate the number of SEUs experienced during computation and communication in shared-bus AMBAand NoC-based decoder and evaluate the impact of SEUs at application-level. The rest of the paper is organized as follow. Section II describes system and fault injection setup used in this work. Section III compares between AMBA-and NoC-based decoder in terms injected SEUs and its impact on MPEG-2 video decoder. Section IV concludes the paper.
II. SYSTEM SETUP
In this section, AMBA-and NoC-based MPEG-2 video decoder and fault injection technique are explained briefly.
A. MPEG-2 Video Decoder Cores
MPEG-2 video decoder constitutes a major component of MPSoC applications and is chosen as an application case study. Fig. 1(a) shows block diagram of the MPEG-2 video decoder with four processing cores used in this work. Behavioral modeling was used to design the decoder cores. Partitioning and allocation of application are performed arbitrarily to reflect MPSoC. Fig. 1(b) shows a block diagram of a processing core used in Fig. 1(a) . Each core consists of 32-bit input/output Fig. 1(b) ). The memory size is chosen to give high availability for data processing and storage within processing cores. Table I shows four video bitstreams with different resolutions and sizes, which are used for comparisons in Section III.
B. Shared-bus AMBA
Advanced high-performance bus (AHB) is used as sharedbus AMBA in this work due to its high performance [7] . A single-layer central multiplexor configuration with pipelined single-burst transfer and no waiting states are used to maximize throughput. MPEG-2 video decoder cores ( Fig. 1(b) ) are configured by using the 32-bit input port as slave port (for memory interface) and 32-bit output port as master port (for PE interface) as shown in Fig. 2 . The cores share bus access in the sequence of cores VLD, ISQ and IDCT ( Fig. 2) with each core holding the interconnect access until the current macroblock is processed and stored in the memory of the next core. To facilitate AMBA-based cycle-accurate simulations, we use Synposys Designware SystemC libraries 1 .
C. Network-on-Chip
Network-on-Chip (NoC) incorporates packet-based on-chip communication with a large design space arising out of various options with routing techniques and topologies [3] . In this work, we use a mesh-based NoC topology with with deterministic XY routing and single-flit-packet wormhole communication due to simplicity of switch design, performance and scalability [8] , as shown in Fig. 3(a) . As can be seen, NoC architecture is made up of tiles ( Fig. 3(a) ) consisting of MPEG-2 decoder cores, switch for communication and network interface (NI) for packets-based interface with 46-bit overhead (i.e. size of NoC packet is 32+46=78 bits). Fig. 1 (b) shows block diagram of switch architecture used with five inbound and outbound ports. Four ports connect with other switches with buffer for eight packets on channels and one port is laid out between PE and NI with buffer for four packets ( Fig. 3(b) ). Virtual channel (VC) provides buffering for eight incoming packets and router selects output port based on routing technique used ( Fig. 1(b) ). To facilitate NoC simulations, we use SystemC-based cycle-accurate simulator NIRGAM [9] .
D. Fault Injection
In this work, fault injection is carried out using SEUbased fault model (as also used in [6] ) employing technique proposed in [10] . The fault injection is initiated by replacing all original data/signal types to fault injection enabler types. Such data/signal types facilitate a centralized list of register space for fault injections. For a given soft error rate (SER, in number of SEUs per bit per cycle), the number of SEUs to be injected within this register space is found and their locations are determined by Poisson distribution. Using simulationspecific monitor modules, total register usage and number of faults injected can be found. More details of fault injection can be found in [10] .
III. COMPARATIVE RELIABILITY ANALYSIS
Reliability of an application against SEUs is related to the total number of SEUs experienced over a given time [6] . Our aim in this work is to analyze how the reliability of MPEG-2 video decoder is affected by the choice of AMBA and NoC on-chip communication architecture. To this end, we carry out the following investigations:
• SEUs experienced during computation, F comp , to show how computation is affected, • SEUs experienced during inter-core communication,
F comm , to show how inter-core communication is affected, and • the impact of total SEUs, F = F comp + F comm , at an application-level. In the following (Section III-A and III-B), F comp and F comm of AMBA-and NoC-based decoder are investigated and compared. Later (Section III-C), the impact of F is evaluated at application-level. 
A. SEUs Experienced During Computation
The SEUs in registers of processing cores manifest themselves in different ways during computation, as shown in Fig. 4 . As can be seen, SEUs extending between two idle cycles (instance 3) do not affect computation process, while SEUs that are injected between busy cycles (instance 2) or between busy and idle cycles (instances 1 and 4) are likely to affect computation process (Fig. 4) . Hence, for a given SER of λ, effective F comp in MPEG-2 video decoder with C processing cores can be given as
where T i is the execution time (in clock cycles), T
I−I i
is the number of idle-to-idle transitions within T i and R i gives a measure of actual register usage of the application (since SEUs in other registers have no impact [6] ) and is found by
In (2), R i,t is the instantaneous number of registers (in bits) used or occupied by MPEG computation process at t-th clock cycle in i-th processing core. Table II shows execution time, T i , idle-idle transition cycles, T
and register usage, R i of each processing core in AMBA- (Fig. 2) and NoC-based decoder ( Fig. 3(a) ) obtained through simulations. As can be seen from Table II (row 2) , AMBA-based decoder has similar register usage, R i , as NoC for all four cores but 2.08, 1.98, 2.15 and 2.18 times higher execution time, T i , in processing cores VLD, ISQ, IDCT and MC, respectively, due to time-sharing of bus access among processing cores while decoding test1.m2v. The time sharing of bus access causes more idle cycles in AMBA-based decoder, resulting in 5.7, 4.6, 2.8 and 6.9 times higher T I−I i compared to NoC-based decoder for processing cores VLD, ISQ, IDCT and MC, respectively (row 2, Table II ). With increased video sizes (test2.m2v, test3.m2v and test4.m2v, Table II ), T i and T
increase but similar trend continues between AMBAand NoC-based decoder for R i , T i and T
values. This results in higher F comp in AMBA compared to NoC for decoding video different bitstreams (Table I) , as shown in Fig. 5 (found using (1) with T i , T
and R i values from Table II and for an arbitrary SER of 10 −9 SEUs/bit/cycle). As expected, AMBA-based decoder experiences approximately 83% higher F comp compared to NoC, while decoding video bitstream test3.m2v due to higher execution time (Fig. 5 ). Higher F comp in AMBA-based decoder is expected to perturb MPEG-2 computation more than NoC-based decoder, which will be examined at application-level in Section III-C.
B. SEUs Experienced During Communication
The SEUs experienced during communication, F comm , depends on how data transfer is carried out between communicating cores and is given by
where M is the number of inter-core communication links within the MPSoC (M = 4, Fig. 1(a) ), N j is the total number of data transaction units (DTUs) between cores, L ch j is the channel latency (in clock cycles) and R comj is the average register usage in communication components during transfer of DTUs on j-th link. Channel latency, L ch j in (3) gives a measure of contention of DTUs within the channels and is given by the time (in cycles) required for a DTU to be transferred from the output port of a processing core to an input port of target processing core as 
where R n,l is the instantaneous register usage on j-th link during inter-core communication of n-th DTU at l-th clock cycle (l=1:L chj ). For NoC-based decoder, R n,l in (5) includes registers used in packet overheads and buffers in NI interfaces, channels, VCs, and routers as packet is communicated between cores. For AMBA-based decoder, R n,l includes the registers used in address (HADDR), control signals (RD and WR), decoder and arbiter as DTU is communicated between cores. Using (5), R com j in NoC-based decoder obtained from simulation logs is approximately 212 bits per data transfer cycle and that in AMBA-based decoder is approximately 87 bits per transfer cycle (details not shown due to space constraint). Table III shows the number of DTUs, N i (N 1 for VLD-MC link, N 2 for VLD-ISQ link, N 3 for ISQ-IDCT link, and N 4 for IDCT-MC link, Fig. 1(a) ), recorded from simulation logs. Note that N values do not change for AMBA-and NoC-based decoder for a given video bitstream due to similar architecture for processing cores ( Fig. 1(a) ). As video sizes increase, N also increases for a given link. For example, 108 × 10 3 DTUs are transferred from core ISQ to IDCT, while decoding test1.m2v compared to 364×10
3 DTUs on the same link, while decoding test2.m2v (column 4, Table III ). Fig. 6 shows comparative F comm for an arbitrary SER of 10 −9 , while decoding different video bitstreams (Table I ) using above L ch and R com values for AMBA-and NoC-based decoder (with links having 2, 3 or 4 intermediate switches). As expected, due to higher register usage (R com j ) and channel latency (L ch ), NoC-based decoder links with 2 intermediate switches ( Fig. 3(a) ) suffer from 11 times higher F comm compared to AMBA, which worsens to 18 and 24 times higher F comm as number of intermediate switches increase to 3 and 4, while decoding test1.m2v (Fig. 6) . Similar trends between AMBA-and NoC-based decoder in terms of F comm are also observed with other video bitstreams (Fig. 6) . Although, F comm F comp (Fig. 5 and 6 ), F comm leads to high-level faults including misrouting or loss of DTUs and are of higher significance [5] . Next, the impact of injected SEUs is evaluated at application-level.
C. Impact of SEUs at Application-Level
In Section III-A and III-B, NoC-and AMBA-based decoders have been compared in terms of total SEUs experienced at architectural-level during computation (F comp ) and communication (F comm ). In this section, we evaluate the impact of total SEUs (F = F comp + F comm ) at application-level using peak signal-to-noise ratio (PSNR) as the application fidelity metric. PSNR is used to characterize subjective quality of video in the presence of SEUs [6] (also used in [6] ) and is defined as P SNR = 10 log 10
where P is the number of frames, Q is number of pixels per frame, x p,q and y p,q are the q-th pixels in p-th reference and decoded frames. PSNR describes fidelity of video decoder but it does not reflect temporal fidelity [6] . Hence, to evaluate temporal fidelity, we use frame error ratio (FER), defined as
where x is the number of lost frames out of P frames. Fig. 7 shows PSNR (in dB) and FER (in %) of decoded video frames found by using (6) and (7), while decoding video bitstream test4.m2v in AMBA- (Fig. 2) and NoC-based decoder (with links having 2 intermediate switches, Fig. 3(a) ). The PSNR and FER results were obtained by varying SER Fig. 2) and NoC ( Fig. 3(a) ) with varying SERs (Fig. 2) and NoCbased decoder (with 2, 3 and 4 int. switches) at SER of 10 −9 from 10 −10 to 10 −6 in simulated fault injection environment (Section II-D). As expected, NoC outperforms with upto 5dB higher PSNR (Fig. 7) due to lower F comp (Section III-A). However, due to higher F comm (Section III-B), NoC-based decoder has upto 3% higher FER compared to AMBA-based decoder. To investigate and compare the impact of NoC routing or floor mapping (as demonstrated in Section III-B), Fig. 8 shows the FERs for AMBA-and NoC-based decoder with routing of 2 ( Fig. 3(a) ), 3 and 4 intermediate switches for a given SER of 10 −9 , while decoding test4.m2v. As expected, AMBA-based decoder has the lowest FER of 3.6% (Fig. 8) . Due to higher F comm , NoC with 2 intermediate switches ( Fig. 3(a) ) has higher FER of 6.5%, which increases to 9.5% and 11% as number of intermediate switches increases to 3 and 4 (Fig. 8) . The comparative FER (Fig. 8) indicates that NoC-based decoder is more susceptible to SEUs during communication compared to AMBA-based decoder. We expect that the observation made about AMBA-and NoC-based decoders with respect to SEUs experienced to follow similar trend if the number of computation cores increase.
IV. CONCLUSIONS
Using MPEG-2 video decoder traffic in simulated fault injection environment, we presented a comparative reliability analysis between shared-bus AMBA and NoC using four processing cores. We showed that for a given soft error rate, AMBA-based decoder experiences higher single-event upsets (SEUs) during computation due to higher execution time (Section III-A). On the other hand, NoC-based decoder experiences higher SEUs during inter-core communication (Section III-B). Considering the impact of total SEUs at applicationlevel using peak signal-to-noise ratio (PSNR) as a reliability metric, we showed that NoC-based decoder gives higher PSNR than AMBA-based decoder. However, due to higher SEUs experienced by NoC-based decoder interconnects, it suffers from higher frame error ratio than AMBA-based decoder.
