In any heterogeneous system, porting reconfigurable computing is often a high performance platform for a broad range of computationally challenging issues. However, efficiently utilizing the maximum potential of these reconfigurable systems is a difficult job without understanding their performance characteristics. This work proposes an analytic performance model using Petri Nets (PN) for a Reconfigurable OR1200 (ROR1200) soft-core processor with model validation and verifications. By modeling the ROR1200 system using Petri Nets, both behavioral and structural properties existing in parallel systems were analyzed. The Bound Level Analysis with respect to the dependency level of data is also performed on Soft Core Processors (SCP) like the ROR1200, the OR1200 and the MicroBlaze.
INTRODUCTION
The analysis of the system is started by developing or creating the model of the system, which may help in further examination and potential findings of the behaviors of the system. Petri Net theory is one of the most promising tools used for the design and analysis of those system models (Marcin Radom et al., 2015) .
Petri Net Performance Model
In any computing system, the performance of the system model can be analyzed using two approaches: first, the Analytical Model and second, the Simulation Model. Various prominent techniques used to measure the analytical models are: Markov chain models, Semi-Markov models, Queuing Network models, Petri Net models and Stochastic Process Algebra models. Markov chains are conventional analytical modeling used to express random processes. Due to their memory-less properties, Markovian chains are used to model those systems whose output behavior depends on their current state (John and Laurie, 1960) . A queuing model is one of the most powerful techniques for modeling hardware based on contention and various scheduling strategies. A queuing model also holds for many efficient analysis techniques. One of the demerits or disadvantages of a queuing model is that, it is difficult to model blocking concepts, synchronization, mutual exclusion and software-based contention concepts. Stochastic process algebra (SPA) has emerged as a functional analysis modeling technique for concurrent systems. The major limitation of stochastic process algebra is its deficiency in expressing the system with respect to time distribution (Gilmore et al., 1996) .
Performance of the hardware systems like field programmable gate array (FPGA), applicationspecific integrated circuit (ASIC) are analyzed with respect to its respective qualitative and quantitative properties. For example, to analysis the fault tolerance of the hardware system, the feasibility of recovering the error is said to be a qualitative approach. The amount of time required to handle the error recovery routine is characterized as a quantitative property (Falko Bause et al., 2005) . The Petri Net model is well suitable for both qualitative (time-less) and quantitative (time-dependent) analysis modeling (Monika et al., 1997) . This model is supported in terms of modeling blocking concepts, synchronization, mutual exclusion and software-based contention concepts (Jennings et al., 2000) . The Petri Net model is used to analyze the properties and issues coupled with parallel systems. Thus, the Petri Net is mainly used for modeling the dynamic concurrent actions of the systems. Moreover, the PN model helps to guarantee completeness of the design and permits improvisation in the correctness of the designed system. When compared to the state of the art, the proposed methodology provides novel contributions. These contributions use a reconfigurable framework for the real-time multimedia dataset that describes the different entities executing in the high performance reconfigurable computing (HPRC) and Reconfigurable Register File (RRF) stack to reduce temporal isolation. This reconfiguration technique increases the system performance and also the output quality experienced by the user without any temporal deviation (Wang et al., 2009 ). Analytical modeling is extensively useful in performance evaluation due to its superiority and flexibility (Kant, 1992) . The performance of reconfigurable ROR1200 with HPRC system has been successfully modeled using Petri Nets (Lotfifar et al., 2008) . This Petri Net methodology has evidenced itself to model the most important characteristics of modern computer systems exploring parallelism and concurrency (Gaubert et al., 1997) . For reconfigurable computing, the Petri Net model and its associated analytical processes afford a promising modeling tool for system evaluation and validation (Hadis He Idari, 2013) . The Petri Net model offers excellent analysis in terms of both qualitative and quantitative behavior for hardware systems (Maciel et al., 1998) .
For a given system model in Petri Net analysis, the qualitative analysis facilitates in identifying feasible behaviors of the system without time factor considerations, whereas the quantitative assessment aims in analyzing them as a measure of occurrence probability (Franco Cicirelli et al., 2015) . After an introduction to the Peri Net Performance Model in Section 1, this work is focused on analyzing various qualitative behavioral properties of PN models, such as reachability, boundedness, liveness, coverability analysis, reversibility, persistence, synchronic distance form and fairness analyzer, which is also discussed in Section 3. Section 2 describes the methodology with its architectural view of ROR1200. Various performance analytical factors along with the Petri Net based analysis are used to evaluate and to validate the ROR1200. These are discussed in Section 3 and Section 4. Section 5 concludes the work by summarizing the contributions and findings of the research work conducted for analyzing the proposed ROR1200 system.
METHODOLOGY
The Petri Net model offers an excellent analytical technique in terms of both qualitative and quantitative aspects for the hardware systems, specifically related to their behavior (Maciel et al., 1998) . To enhance the performance of the open core processor, the ROR1200 was designed with such a reconfiguration technique (Maheswari et al., 2013) .
Architectural View
The ROR1200 is a five-stage pipeline soft-core processor. The CPU system design of ROR1200 with reconfigurable HPRC (High Performance Reconfigurable Computing) unit, RRF (Reconfigurable Register File) and the Hazard Controller unit is shown in Figure 1 and the internal architectural view of ROR1200 is shown in Figure 2 . 
EXPERIMENTAL OUTPUT OF ANALYZING ROR1200 USING PETRI NET
Petri Net is a bipartite model which represents a dual mode of formal description, such as graphical and mathematical formulation, which ensures the logical interactions and dynamic modeling of complex design, including embedded systems. The graph-based Petri Net model consist of three major components of set-like places (P i ), transitions (T i ) and directed arcs (A i ). Formally, the Petri Net is expressed as P n ={P i ,T i , A i }. Any Petri Net model is said to be in an execution mode when it is firing the set of transitions (T i ), which intrinsically moves the token from the input to its output node place. The final execution PN model consists of five tuples, such as (P, T, F, W, and M 0 ), where P is a finite set of places, T indicates the finite set of transitions, F represents finite sets of arcs connected from place to transition and also from transition to place, W is the number of weights on the arcs and M 0 indicates the initial marking, i.e. the number of tokens present in place P 0 . Geometrically, the place is indicated by a circle and bar is used to represent the space between the places. In this data flow, if a firing happens between condition and events, then it is referred as an input function and if a firing exists between events and conditions, then the function is called an output function (Murata, 1989) . In system modeling, all system conditions are represented by places and their events are mapped as transitions.
ROR1200 Petri Net Model
Parallel processing systems can be modeled as Petri Nets by assigning transition nodes to process and place nodes from inputs/outputs (Murata, 1984) . A PN is used to illustrate the functionalities of all the subsystems of the model and it also describes the interaction between those subsystems through token exchange. The PN model is designed using the HiPS (Hierarchical Petri Net Simulator) tool for the execution unit of ROR1200, which is shown in Figure 3 and and the token transition between the places is shown in Figure 4 . The PN model is driven by two execution rules, such as enabling and firing. When all the places in the transition hold at least one token, then the transition is said to be in an enabling state. During the firing transition, the token at the input place is moved to the output place such that the place (P i ϵ PN ) has a new marking as shown in Equation 1.
(1)
The place and transition of the each input node and corresponding output transition with initial marking is given as: ={1,0, 0, 0, 0, 0, 0, 0, 0, 0} where P is place, T is transition, I is input, O is output and M is token marking. The special cases of firing transitions are source transitions and sink transitions. In those cases, the transition always is enabled, but it does not generate any token. The average firing transition delay for a token at marking
-1 , where d is the delay.
Behavioral Properties
When the properties of the PN model are evaluated based on their dependency on the initial marking M 0 , then the property is known as a behavioral property. The formalization and analysis of behavioral properties of the Petri Net enable us to identify errors at an early stage of the design. The various behavioral properties analyzed in the PN model are reachability, boundedness, liveness, coverability analysis, reversibility, persistence, synchronic distance form and the fairness analyzer.
Reachability
The reachability property of the Petri Net model helps to analyze the dynamic behavior of parallel systems. In the given bounded system, when the firing sequence from the initial marking M 0 is transformable to M n , then it is proven that reachability property is true to the given model. Figure 5 represents the reachability outcome of model PN. For an unbounded net, the reachability tree will be infinitely large. To create a finite tree, a special symbol ω is defined such that, ω +n = ω for any integer n.
Figure 5 Reachability graph
Thus the reachability analysis (R A ) represented in Figure 5 is proving that PN model holds .
Boundedness
The PN model is said to be k-bounded, when the total number of tokens does not cross the bound value k at each place P with the marking M. Figure 6 shows the boundedness analysis for the depth value=99, also satisfying the reachability property of the system. Since it is 1-bounded, the target system is said to be in a safe condition. In general, the equation for boundedness is given in Equation 2. Ensuring the boundedness property to safe guarantees, there will not be any buffer/register overflow in the designed system, irrespective of the firing sequence of the token. The PN (N,M 0 ) with initial marking M 0 is safe when the maximum number of tokens in places is identified as 1 for every (Saúl-Alonso et al., 2016).
Liveness
Irrespective of the firing sequence, trueness to this liveness property convinces us that the designed model operates in a deadlock-free mode. The execution result towards liveness is given in Figure 7 . In general, the liveness is defined in Equation 3.
The liveness is true if very transition of the PN is transparent.
Coverability analysis
The property coverability is also referred as potential fireability. Any transition t in the given marking M' is said to be coverable, if the model PN is true for the given condition, given in Equation 4.
If M' is not coverable by the net, then that transition t is said to be dead. Figure 8 represents the coverability of the PN model from initial marking M 0 to M 7 . The coverability graph designed for ROR1200 holds 243 vertices and 877 edges.
Reversibility
The reversibility property ensures the presence of the cyclical behavior of the system. The PN system model holds the reversible property, for any given node N, if the net is reachable to its initial marking M 0 or any initial state M given in Equation 5. Figure 9 shows the reversibility analysis of the PN with an invariant activation. Thus existence of the reversibility property always ensures its recoverability behavior even under system failure.
Persistence
The persistence property plays a very important role in analyzing parallel processing concepts. For any two active transitions (t1 and t2) in the net, if the firing sequence of transition t1 does not deactivate the other transition t2, then the PN model holds the persistence property. The PN model with persistence is also referred as a conflict-free Petri Net model. In the given PN model, all the marked graphs (N,M n ) have the persistence property.
Synchronic distance form and fairness analyzer
The degree of interdependence/mutual-dependence of transitions between any two places is identified by synchronic distance form. In the PN model shown in Figure 10a as shown in Figure 10a . The global fair with an infinite transition is represented in the purple color transition (t 37 ) to indicate the general transition of an unbounded net as represented in Figure 10b and the transition table is shown in Figure 10c . 
Structural Properties
The properties, which depend on the topological structure of the PN model, are known as the structural properties. The structural analyzer tool in the HiPS is used to analyze these structural properties. The resulting outcome, as shown in Figure 11 , represents that the given PN model is structurally bounded, holding structurally bounded and partially conservative properties.
Incidence Matrix
The dynamic behavior of a parallel system is analyzed using an incidence matrix whose transition table and incidence matrix are shown in Figure 12 . The stated equation for the incidence matrix A for the PN model is shown in Equation 6.
where P is place with n P . (1001000) T 3.3.1.1. Necessary condition of reach-ability By modeling ROR1200 using the Petri Net, both behavioral and structural properties existing in a parallel system are analyzed. The design of the PN model for the ROR1200 allows the translation from net level metrics into system level metrics. At the end of modeling, the various performance indices that need to be mapped with designed system are Mean sojourn time (S t ), Average number of tokens (T avg ),Place idle time (P idle ), Total number of transitions (N t ), Transition effective firing rate (E t ). Sojourn time is the average time spent by the system at a marking M. The mean sojourn time is shown in Equation 8. (8) where N is total number of tokens at initial clock cycle, n is number of transition, is interval of the cycle, N t total number tokens occurred at place P i from initial clock cycle until the current cycle. The average number of tokens is shown in Equation 9 . (9) where N s is total simulation time.
The transition effective firing rate is shown in Equation 10 . (10) where N f is total number of firing transitions. The simulation result of PN gives 96.7% of the confidence interval; the mean sojourn time ranges between 1% to 6%, and the average error rate falls between 1.91% and 2.28%. The average number of tokens is identified as 1.83 with a reduced place idle time of 0.08 ns. The various properties are analyzed and the model outcomes with the stated equation for the Petri Net model are shown in Table 1 . 
Inference
PN of ROR1200 has 165 reachable markings out of which 133 markings are legal markings.
PN is k-bounded, therefore the net is safe. Property Liveness Coverability Equation
Inference PN guarantees Deadlock free model
Minimal covering is ensured Property Reversibility Equation
Inference
The output markings of incidence matrix ensure the over-approximation of actual reachable marking.
BOUND LEVEL ANALYSIS
In terms of technological perception, further research is required to explore the application to meet high performance standards with a lower power at a reduced cost. Performance analysis is made on both extreme cases, such as upper bound analysis and lower bound analysis, based on the dependency level. In upper bound parallel analysis, all the N input operations do not have data dependency, whereas in the lower bound serial analysis, all the input N operations have data dependency on their predecessors (Ling-Pei, 2002) . These two analyses provide information required to verify whether the given task is bounded by memory or not, which in turn confine the speedup potential of that region whenever the obtained memory bandwidth is greater than the upper bound. Depending upon these values, the required clock period is estimated to speed up the process.
Upper Bound Analysis
For a non-dependence dataset, any parallel computational unit targeting high performance always requires the minimum number of cycles to complete the execution. It has been characterized by the task which disregards data dependency with all parallel operations supporting maximum pipeline processing, which is expressed in Equation 11.
where N op indicates the number of operations, i n is the number of inputs, N is the number of multiple functional units connected to a reconfigurable register file.
The latency of the upper bound level is calculated as shown in Equation 12.
where L u indicates latency of upper bound level, I l represents the initiation interval of i th functional unit with latency L.
Lower Bound Analysis
The lower bound analysis is performed in pipeline processing, when there exists a maximum dependency of data for the longest execution. The throughput for this lower bound is obtained by sequencing all executions of the functional unit.
The latency of lower bound level is calculated as in Equation 13.
where L l indicates latency of lower bound level, T l represents the inexpensive, smallest simple instruction task of i th functional unit with latency L.
Bound Ratio
The bound ratio B r gives the ratio of upper and lower bound levels
Table 2 represents the performance analysis of upper bound and lower bounds using a Media Benchmark on three Soft Core Processors (SCPs), such as OR1200, MicroBlaze, and ROR1200. From Table 1 it has been inferred that the upper and lower bound values are directly proportional to the value of N. The lower bound analysis is used to perform the task load test with the measurement of the instruction size N and the task T. Similarly, the upper bound analysis helps to achieve the characteristics test with the distribution of functional units between the tasks. The bound level analysis graph is shown in Figure 14 . Soft-core Processor using FPGA Figure 14 Bound analysis graph
CONCLUSION
To ensure proof of the concept, the Petri Net model for ROR1200 was designed to verify the various behavioral and structural properties like reachability, boundedness, liveness, coverability, reversibility, persistence, synchronic distance form and fairness analyzer for safeness and an incidence matrix with stated equations. The various analysis metrics in Bound Level Analysis, such as upper bound analysis, lower bound analysis and the bound ratio with respect to the dependency level of data and its latency were carried out to validate the system. The results obtained for ROR1200 were the upper bound latency (L u ) as 28 ns, lower bound (L l ) as 40 ns with the bound ratio (Br) as 1.42, which show improved performance when compared to the OR1200 and MicroBlaze soft-core processors.
