Interrupt service routines are a key technology for embedded systems. In this paper, we introduce the standard approach for using Generalized Stochastic Petri Nets (GSPNs) as a high-level model for generating CTMC Continuous-Time Markov Chains (CTMCs) and then use Markov Reward Models (MRMs) to compute the performance for embedded systems. This framework is employed to analyze two embedded controllers with low cost and high performance, ARM7 and Cortex-M3. Cortex-M3 is designed with a tail-chaining mechanism to improve the performance of ARM7 when a nested interrupt occurs on an embedded controller. The Platform Independent Petri net Editor 2 (PIPE2) tool is used to model and evaluate the controllers in terms of power consumption and interrupt overhead performance. Using numerical results, in spite of the power consumption or interrupt overhead, Cortex-M3 performs better than ARM7.
Introduction
Advances in technologies and design methodologies are enabling the integration of complex embedded systems on system-on-chip (SoC) devices, especially after the Advanced RISC Machine (ARM) Limited Company introduced the ARM7 core in 1993 [1] . ARM7 is the processor core of the Reduced Instruction Set Computer (RISC) designed for low cost and high performance. Many products have been designed using this core, including digital still cameras, mobile telephone handsets, Bluetooth controllers and personal digital assistants. The 32-bit embedded cores have better performance than 8-bit embedded cores, but the 8-bit cores ship in billions of micro-controllers each year. Although the 8-bit cores meet specific real-time performance requirements, they are limited by performance, software, and development tool constraints [2] . In 2004 Cortex-M3, the first of the new Cortex family of processor cores was announced by ARM. Cortex-M3 designed for cost-sensitive microcontroller applications solves the constraints of 8-bit cores. Many 8-bit cores limit direct addressing to a maximum of only 64 KB but 32-bits cores have 4 GB available memory space. One of the constraints of 8-bit cores is that not enough memory is given for the application codes when the application codes become very larger for meeting more requirements. The performance of Cortex-M3 is larger than Manuscript that of many 32-bit cores because Cortex-M3 delivers 1.2 Dhrystone MIPS per MHz reported in [2] . The embedded controller market is a huge market with over $26bn in annual revenue [3] . Hence, designing embedded systems using embedded controllers has become very important. High performance and low power consumption much be carefully considered in embedded product designs. Chen et al. [4] in 2005 focused on reducing the number of memory access times in embedded systems to improve performance and save power. In 2007, Jun et al. [5] presented a latency-aware bus arbitration scheme for improving the performance of real-time embedded systems. Embedded software is defined as special-purpose software built into embedded systems. The embedded software accelerates practical system functions in cars, mobile communications, robots, cameras, and information appliances, etc [6] . In 2007, Liu et al. implemented a hybrid operating system with two-level interrupts to improve interrupt latency [7] . Song et al. [8] in 2008 presented an analytical model using Lyapunov stability in a linear matrix inequality framework for a real-time scheduler. Gendy and Pont in 2008 [9] used an interrupt driven technique to provide automatic configuring time-triggered schedulers for a single-processor embedded system.
In the last few years, several articles have been devoted to the study of instruction level power consumption for embedded systems. Tiwari et al. [10] in 1994 introduced a power analysis technique applied to two commercial microprocessors -Intel 485DX2 and Fujitsu SPARClite 934. The technique includes software energy consumption estimation, instruction reordering, and generation of energy efficient code. Nikolaids et al. [11] in 2005 proposed instruction level energy modeling to extract the actual energy components for pipelined processors like the ARM 7 (threestage pipeline). These components are used for the energy consumption calculation for complete software programs. Junior et al. [12] in 2006 used Coloured Petri Net (CPN) modeling language to analyze embedded systems' software performance and energy consumption. Callou et al. [13] in 2008 presented a formal model and a platform for estimating energy consumption and the execution time of embedded biomedical applications. These proposed papers can be viewed as instruction-level estimation schemes. In this paper, we focus on task-level estimation for reducing the interrupt latency to improve performance and save power when a nested interrupt occurs in an embedded system.
As discussed above, embedded systems and software
Copyright c 2010 The Institute of Electronics, Information and Communication Engineers have become more complex. Many tools and methodologies have been developed for designing embedded systems [14] - [17] . In this work, a performance evaluation framework will be developed using three system models, including Generalized Stochastic Petri Nets (GSPNs), Continuous-Time Markov Chains (CTMCs), and Markov Reward Models (MRMs) to model interrupt overhead and power consumption performance evaluation. Both ARM7 and Cortex cores will be considered as the modeling targets using the framework to evaluate their nested interrupt performance. This paper is organized as follows. Section 2 presents three system models. The GSPN model will be introduced first with both CTMC and MRM models will be presented. The formal nested interrupt problem will be introduced with both ARM7 and Cortex cores models in Sect. 3. Section 4 uses the PIPE2 tool and several equations described in Sect. 3 to develop our experiments for evaluating power consumption and interrupt overhead performance. Concluding remarks are presented in Sect. 5.
System Models
In this section, we will introduce three useful analysis models for performance evaluation. The GSPN model for generating Markov Chains is presented first. The CTMC model is demonstrated next to find the steady-state probability using a liner equation and normalization condition. A reward is assigned next to obtain the performance evaluation numerical results using the MRM model.
Generalized Stochastic Petri Nets
Petri nets (PN) introduced by C.A. Petri in 1962 are a graphical formalism for modeling concurrency and synchronization to evaluate the behavior of a system [18] . A PN is a bipartite graph denoted by PN={P, T, D − , D + , m 0 }. It consists of two types of nodes called places, P, and transitions, T. Arcs used to connect between a place and a transition can be divided into two categories: input, D + , and output, D − . An initial marking m 0 belongs to a member of marking set M used to describe the number of tokens in each place during an initial state.
GSPNs provide two types of transitions [20] . The first type is called the timed transitions which produces exponentially distributed firing time. The second type is called immediate transitions, which has priority over the timed transitions. A priority can be assigned immediately if the priority issue is required. The GSPN model also provides inhibitor arcs. An inhibitor arc is indicated by a small circle instead of an arrowhead for disabling the transition firing when connected places contain a token.
Using the GSPN model, we can easily construct CTMC steady states and reachability graphs. Two reachability graphs can be generated from modeling the nested interrupt in both ARM 7 and Cortex-M3 cores, as shown in Figs. 4 and 7, respectively. When the graph is formed, we can use the elimination of vanishing markings technology to gener-ate the corresponding CTMC. Related methods are introduced in detail [20] .
Continuous-Time Markov Chains
Some systems use Markov chains introduced by A.A. Markov in 1907 to analyze performance and reliability, especially CTMCs. According to the definition presented by Kwiatkowska et al. [21] , a labeled CTMC is a tuple, C= (S , s, R, L), consisting of four components as follows.
(1) S is a finite set of states;
(2) s ∈ S is the initial state;
(3) R:S × S → R ≥0 is the transition rate function; (4) L:S → 2 AP is a labelling function which assigns to each state s ∈ S the set L(s) of atomic propositions (AP) that are valid in the state.
For CTMCs, we should change the transition rate matrix R into the infinitesimal generator matrix Q. Hence, we calculate the matrix R to obtain the matrix Q. The calculation equation is given by
For transient solutions of CTMCs, we use the matrix and initial probability vector π 0 . The computation of transient state probability vector π (t) can be solved using the following linear differential equation,
For steady-state solutions of CTMCs, the computation of steady-state probability vector π can be solved using the following linear equation and normalization condition,
where the unit vector 1=[1, 1, . . . , 1] T .
Markov Reward Models
In the previous subsection, the probability of steady states or transient states can be solved by simple linear equation. For the performance evaluation, we more concerned with reliability, availability and throughput, etc. issues after obtaining the state probability. Hence, Markov reward models (MRMs) provide a unifying framework for obtaining these performance parameters' values using reward rate assignment [20] . A reward rate is assigned based on the system requirements. Let the reward rate r i be assigned to the state i ∈ S. The expected instantaneous reward rate, E[Z(t)], can be solved using the following equation based on the solution for Eq. (2),
For steady-state of CTMCs, based on Eq. (3) the expected reward rate in the limit as t → ∞ can be calculated using
The expected accumulated reward is used to calculate throughput, capacity, and utilization. Before calculating the reward, cumulative probabilities are of interest. We denote Li(t) as the expected total time the CTMC spends in state i during a given period of time [0, t). The expect total time is calculated using
Hence, the expected accumulated reward is given using
For CTMCs with absorbing states, the limit as t → ∞ of the expected accumulated reward exits. We call the reward as an expected accumulated reward until absorption calculated using
Nested Interrupt Analysis
In this section, we will analyze nested interrupt performance using the GSPN model. We first formally present the nested interrupt problem and then model two microcontrollers, ARM 7 and Cortex-M3.
Problem Formulation
Embedded systems are designed to perform dedicated functions. This characteristic distinguishes it from other types of computer systems such as personal computers and supercomputers. Some embedded systems are applied in airplanes, automobiles, industry and the military. These applied domains required a shorter response time. To meet the requirement, reducing the interrupt latency becomes more important because using the interrupt mechanism to interact with I/O devices produces a shorter response time than using a polling mechanism. Interrupts are signals triggered by certain events during instruction stream execution by an embedded processor. When an interrupt signal occurs, the processor stops executing the current instruction stream and begins an interrupt service routine (ISR) to handle the interrupt request.
In general, the main interrupt types are software, internal hardware, and external hardware. There are two kinds of trigger mechanisms for these interrupts generated by external hardware: level-triggered and edge triggered. These trigger mechanisms are modeled using GSPN, as shown in Fig. 1 . When an interrupt signal is raised, the token in P3 moves to P2 because T2 fires. T2 is a timed transition with an exponential distribution represented by the interval between two successive signals. This means that T2 is represented by the interrupt arrival rate. An ISR is executed when immediate transition T0 is fired after P0 and P2 obtain a token. T1 represents the ISR service rate. The difference between two triggered models is that a token moves to P3 to wait for another interrupt request when the transition T0 is fired for the edge triggered mechanism. Otherwise the token moves back to P2 to wait for the interrupt signal to disappear for the level triggered mechanism. The transition T3 is represented by an interrupt interval, while a triggered interrupt signal stands in a triggered level. Figure 1 (a) demonstrates a level triggered mechanism. We observe that after firing T0, one token is stored in both places P1 and P2. In this new marking (tangible one), two transitions are enabled, T3 and T1, hence both will fire. Nevertheless, depending on the rate assigned to T1 and T3, the sequence T0→T1→T0→T1. . . may fire with very high probability if the transition rate of T1 is higher than one of T2. The level-triggered interrupt mechanism generates a serial of continuous interrupts (P1) while the interrupt source (P2) always keep at a triggered level. This is not good mechanism if the interrupt source can potentially remain at a triggered level for a long time.
When two interrupt signals occur at the same time, an interrupt priority method must be triggered. After executing an ISR, another ISR is immediately executed. We call this situation a nested interrupt. Figure 2 illustrates a nested interrupt model using the GSPN model. Immediate transitions, T0 and T2, are assigned to different priority levels used to determine which ISR is executed first when two ISRs occur at the same time or another ISR occurs during ISR execution. The lower priority interrupt is not triggered when the embedded processor is handling a higher priority interrupt. When a lower priority interrupt is executed first and a higher priority interrupt just occurs, the higher priority interrupt immediately executed. We call this mechanism preemptive. In general, the lower priority interrupt provides a masked scheme and the highest priority interrupt is called a non-maskable interrupt (NMI). External hardware interrupts belong to the masked interrupt category. We focus only on the nested interrupt issue generated by external hardware signals in this paper. Hence, we assume that all ISRs are non-preemptive.
Modeling Nested ISR of ARM7
In this subsection, we will focus on the ARM7 nested interrupt mechanism. ARM7 is a popular core. Currently, a large number of products are implemented using this core. ARM7 belongs to one of the reduced instruction set of computers having excellent efficiency because it employs a simple 3-stage pipeline: fetch, decode, and execute. Two kinds of interrupts are provided, including the fast interrupt request (FIQ) with higher priority and the normal interrupt request (IRQ). An interrupt controller is implemented to provide a uniform way of enabling, disabling, and examining the status of up to 32 level-sensitive IRQ sources and one FIQ source [22] . Sadasivan [23] introduced interrupt handling using three subtasks, Push, ISR, and Pop. The Push task is responsible for pushing some registers onto the stack, including program counter, program status register, link register, and several general purpose registers. Once the Push task is completed, the ISR is automatically executed. After the ISR is over, the Pop task pops these pushed registers from the stack. For the ARM7 processor, the Push and Pop tasks executing time requires 26 and 16 cycles, respectively.
In this section, we utilize three models and the PIPE2 tool to evaluate nested interrupt performance from generating model starts. PIPE2 [19] is an open source, platformindependent tool for creating and analyzing GSPN. The tool is implemented entirely in Java and provides an easy-to use graphical user interface. The tool allows for the creation, saving, loading and analysis of Petri nets. The framework steps for modeling embedded software are listed below: (1) The PIPE2 tool is initially used to draw a GSPN graph modeling nested interrupt behavior. (2) Next, a Petri net is called a deadlock when no transition is enabled. If a Petri net is unbound, its reachability graph becomes infinite. In order to obtain ergodic CTMC, we then check whether the system state generated by the drawn GSPN graph to be bounded and deadlock-free or not. The ergodic CTMC represents having a unique steady-state probability. First of all, we use the PIPE2 tool to model the executed behavior of two interrupt requests using GSPN technology as shown in Fig. 3 . For first interrupt request, Push, ISR, Pop subtasks are represented by P0, P1, and P2, respectively. Another is denoted by P6, P7, and P8. The nested interrupt model costs 15 places, 8 timed transitions, 3 immediate transitions. Among places, P12, P13, and P14 are flags that indicate idle, busy, and end of ISR, respectively. P3 and T3 and P9 and T8 are two interrupt arrival generators. Hence, T3 and T8 define the interrupt arrival rate for two ISRs, respectively. T4 and T9 are two interrupt triggered transition with two level priorities. T4 has a higher priority than T9. We assume that the edge triggered mechanism is used. Figure 4 demonstrates the reachability graph generated by the PIPE2 tool. For reachability graph, there are two types of nodes: tangible markings and vanishing markings. Vanishing markings contain at least one immediate transition to be enabled. Tangible markings only have timed transitions or no enabled transitions. Figure 4 demonstrates that a reachability graph consists of 25 tangible markings and 7 vanishing markings. Marking corresponds to {P0, P6, P1, P7, P12, P13, P14, P2, P8, P3, P9, P5, P11, P4, P10}. For example, only P12, P3, P9, P5, and P11 contain a token in S0 as shown in Fig. 4 . S0 is a tangible marking due to T3 and T8 are timed transitions and two immediate transitions, T4 and T9, are not enabled but the state is changed into S1 and S2 when T8 and T3 are enabled, respectively. S1 and S2 are vanishing markings because each of them has an immediate transition. Table 1 lists 25 marking states that are presented in Fig. 4 . These states are generated by the GSPN analysis function build in the PIPE2 tool. A marking state, M i can be viewed as same as the probabilities of state i, π i in the CTMC model. We use these places in the table as a binary reward. For example, P12 indicates the system in idle state if the place obtains a token in it otherwise it represents the system in the busy state. Except for π 0 , P12 has no token in all other states so the state π 0 can be viewed as the system working in an idle state and other states π 1 to π 24 as in the busy state. In other words, the probability of system idle is equal to the probability of state 0. Hence, the probability of system idle and the probability of system busy are driven by P system idle =π 0 , and (9) P system busy = 1−P system idle = 1−π 0 ,
respectively. Next, we assign a power consumption reward to each state. The reward is not a binary reward. The sleep power reward, P sleep is assigned to a state if the state belongs to the set system idle states otherwise the working power reward, P work is assigned to it. P sleep and P work belong to the reward mentioned in Sect. 2.3. The relation of P sleep and P work is disjointed. The assignment of both rewards is presented in Sect. 4.1. Hence, we can define the total of power consumption, P total by
Next, we will discuss the ISR execution time. We know each triggered ISR consists of three parts: push registers, ISR program, pop registers. The three parts can be represented by P0, P1, P2 for ISR1 and P6, P8, P9 for ISR2, respectively. From observing Table 1 , the set of markings, {M2, M6, M7, M8, M13, M14, M15, M16, M20, M21, M22, M24} has 12 states that can be divided into three parts. The execution time of ISR during [0, t) can be driven by
where P Push1 =π 2 +π 6 +π 7 +π 13 , 
The execution time of another ISR can be represented by
where P ISR2 =π 5 +π 10 +π 11 +π 17 , 
Although the ISR execution time is very important for meeting the constraint in applications, we are concerned more with the ISR execution overhead. The overhead pushes the registers into the stack and pops registers from the stack. We define ISR overhead as the ratio between the push/ pop register execution time and the total ISR execution time. Therefore, the ISR overhead can be represented by
Modeling Nested ISR of Cortex-M3
In the previous section, we introduced ARM7 modeling that has a very popular core with low cost and high performance.
In this section, we will present another embedded core, Cortex-M3 with lower cost and higher performance than the ARM7 core. In the performance demonstration, the Cortex-M3 core has several crucial technologies, including Harvard architecture, Thumb2, branch speculation in pipeline, integrated sleep modes to the core and tail-chaining for supporting interrupt. In this paper, we will focus on the interrupt issue. The novel interrupt architecture is designed to support nested interrupt to reduce the interrupt latency based on the ARM7 architecture. The Cortex-M3 processor implements the nested vectored interrupt controller (NVIC) based on the interrupt controller in ARM7. The controller can support a NMI and 32 general purpose physical interrupts with 8 level of preemption priority. The Cortex-M3 processor simplifies moves between active and pending interrupts by implementing tail-chaining technology in the NVIC hardware. Tail-chaining achieves much lower latency than ARM7 by replacing serial stack pop and push actions that normally take 32 clock cycles with 6 cycles.
In the tail-chaining mechanism, it is useful to reduce the interrupt latency while a nested interrupt occurs. The original cost requirement is 32 cycles, including pop action at current ISR and push action at next ISR while a nested interrupt occurs. The tail-chaining mechanism only requires 6 cycles for a series action of pop and push as shown in Fig. 5 . We use GSPN to model the nested interrupt behavior for Cortex-M3. The model has three important steps listed as below:
(1) A flag represented by P6 is designed for indicating another interrupt occurred while the current ISR want to entry the pop action. (2) The flag is used to determine which 6 cycles action of (P3, T4) or 12 cycles of (P4, T5) can be performed. An If-Then-Else is implemented by the flag of P6 connected to two immediate transitions of T2 using an arrowed arc and T3 using an inhibitor arc.
(3) A signal is generated by T4 while the tail-chaining of place P3 is performed. A new token will be created and be placed in P9 when T4 is fired. P9 is another flag used to determine whether the push action is performed or not. Notice that the priority in T10 is higher than that in T0.
After understanding the tail-chaining design principle, GSPN is used to model the nested ISR for Cortex-M3. We found that the model is more complex than ARM7. Figure 6 demonstrates the model consisting of 29 places, 10 timed transitions, and 13 immediate transitions. A reachability graph for the nested ISR for Cortex-M3 is illustrated in Fig. 7 . The reachability generates 35 marking states as listed in Table 2 .
From the view in Table 2 , we can understand that M0 is an idle state and M1 to M34 are all busy states. We know that the Cortex-M3 power consumption condition is the same as ARM7. Hence, Eqs. (9)-(11) can be used to evaluate the Cortex-M3 power consumption. For the interrupt evaluation, Eqs. (12) , (16) , and (20) are used to analyze the interrupt service time and interrupt overhead, but probabilities for ISR, Pop, and Push in Cortex-M3 are different from those in ARM7. These probabilities are calculated using 
P ISR2 =π 5 +π 10 +π 11 +π 17 , (26)
Experimental Results
In this section, we present the power consumption and interrupt overhead results based on Eqs. (9) to (26) to analyze the nested interrupt performance for ARM7 and Cortex-M3 using the GSPN analysis results provided by the PIPE2 tool. The default transition rate for the timed transitions drawn in Figs. 3 and 5 are listed in Table 3 . Each interrupt task consists of three timed transitions (Push, ISR, and Pop) with an extra timed transition (Pop 6 cycles) added by Cortex to perform tail-chaining.
Power Consumption
In this section, we will evaluate two kinds of microcontrollers with 32-bit RISC cores to demonstrate the power consumption. We select the STR750 family for ARM7 and the STM32F101 family for Cortex-M3 as our evaluation targets. The STR750 family features have high performance, very low power, and a very dense code, with a comprehensive set of peripherals and embedded Flash technology. The STM32F101 family incorporates the high-performance Cortex-M3 32-bit RISC core operating at a 36 MHz frequency, high-speed embedded memories, and an extensive range of enhanced I/Os and peripherals connected to two ARM Peripheral Bus (APB) buses. Table 4 lists the supply current at different internal advanced microcontroller bus architecture (AHB) clock frequencies f HCLK comparing ARM7 and Cortex-M3. We were especially concerned with the power consumption in the run and sleep modes. Notice that only Cortex-M3 has the integrated sleep mode and the STR750 family implements a low power model called the Wait For Interrupt (WFI) mode; similar to the sleep mode in Cortex-M3.
The "reward" parameter plays a key role in comparing the power consumption between ARM 7 and Cortex-M3. We can use Eq. (11) to calculate the power consumption. P sleep and P work in Eq. (11) represent the rewards associated with the idle and busy states, respectively. According to Table 4, we know the reward P sleep to be 3.3 V × 65 mA when the ARM7 processor executes at run mode and 48 MHz. The assignment of P work is similar to P sleep . Figure 8 demonstrates power consumption comparing the ARM7 and Cortex-M3 families when the interrupt arrival rate value is changed from 1*10 −4 to 9*10 −4 and other parameters set according to Table 4 . Each family has four frequency modes listed in Table 4 [24], [25] . From the numerical results, Cortex-M3 has better performance than ARM7. The increased power consumption speed for ARM7 is faster than that for Cortex-M3 while the interrupt arrival increases. For ARM7, the power consumption working at f HCLK = 16 MHz is smaller than that for Cortex-M3 working at f HCLK = 24 MHz. When the interrupt arrival rate is smaller than or equal to 2*10 −4 , its power consumption is larger than that for Cortex-M3. The power consumption of ARM7 running at f HCLK = 8 MHz is close to that of Cortex-M3 running at f HCLK = 16 MHz while the interrupt arrival Table 4 Comparison of ARM7 and Cortex-M3 in run/sleep modes. increases to 9*10 −4 . However, the performance of Cortex-M3 working at 8 MHz in power consumption is the better than that for other controllers with a power consumption of only about 37.8−47 mW. With ARM7 working at f HCLK = 48 MHz, the power consumption requires more than four times that for Cortex-M3 working at f HCLK = 8 MHz.
After observing the change in power consumption behavior produced by the interrupt arrival rate, another experiment was conducted by changing the interrupt service rate. The ISR execution time is smaller when the interrupt service rate increases. Figure 9 illustrates that the power consumption evaluated by changing the interrupt service rate from 1*10 −3 to 9*10 −3 while the interrupt arrival rate is set to 1*10 −4 . From the figure, ARM 7 running at f HCLK = 48 MHz is at least twice that than other chips. The ARM7 working at f HCLK = 32 MHz is close to the Cortex-M3 running at f HCLK = 36 MHz. These results lead to the conclusion that Cortex-M3 has better power consumption compared with ARM7 at the same running frequency. 
Interrupt Overhead
When executing ISR, two extra tasks are required. The first extra task is that the current working registers must be saved into a stack before a controller can execute ISR. The second extra task is that restoring the working register pop content from the stack after executing ISR is over. The interrupt overhead is defined as the ratio between the execution time for these extra tasks and the total ISR executing time including the extra tasks. When the overhead is smaller, the performance is better. Figure 10 demonstrates that Cortex-M3 has lower interrupt overhead than ARM7. ARM7 maintains its overhead at 4% but Cortex-M3 reduces its overhead when the interrupt arrival rate increases. This is because Cortex-M3 supports the tail-chaining mechanism in reducing the interrupt latency when at least two interrupts occur at the same time or another interrupt occurs during ISR execution.
We change the interrupt service rate value by changing the interrupt arrival rate value. Figure 11 illustrates that the interrupt overhead increases when the interrupt service rate increases from 1*10 −3 to 9*10 −3 in spite of ARM7 and Cortex-M3. The Cortex-M3 overhead increases slower than ARM7 because of the probability for nested interrupt grows to enable tail-chaining when the interrupt service increases. Fig. 10 Interrupt overhead vs. interrupt arrival rate. Fig. 11 Interrupt overhead vs. interrupt service rate.
Conclusions
Many products have been implemented using embedded systems. With 32-bit embedded cores presented to the public, high performance, low power consumption has become a key technology in information product designs. Modeling embedded systems and evaluating performance are important. We presented a performance evaluation framework using three models to work in concert. The GSPN can assist us in constructing system behavior. The PIPE2 tool for the GSPN model was implemented. We used this tool to construct nested interrupt behavior for both embedded cores with high performance and low power consumption. After generating Petri-nets, the CTMC model was used to calculate all probability states. A reward assignment using the MRM model assists us in evaluating performance when the probabilities of each state in embedded systems are calculated.
ARM7 and Cortex-M3 both possess high performance power saving characteristics. The experimental results show that Cortex-M3 has better performance than ARM7 because of tail-chaining for nested interrupt implementation. Tail chaining is useful in reducing interrupt latency.
Our future work includes the following issues.
1. Continuously develop efficient software for embedded systems. 2. Designing proposed performance analysis models. 3. Modeling tools implementation. 4. Algorithms will be designed to improve performance for embedded systems.
