1.0 INTRODUCTION
Emergence of healthcare services in acute care can provide patients with more potential benefits such as in intensive care and medical imaging services [1] [2] [3] . Electrocardiograph (ECG) monitoring device plays vital role to measure and diagnose human heart conditions or any irregularities occurred. Furthermore, the growth of very large scale integrated circuits (VLSI) in medical diagnostic tools for digital signal processing applications can provide low cost ECG with real-time processing, suitable for monitoring and alert system. Such inventions can benefit patients, i.e. patients can do their regular habit as their mobility increased, reduce stress of being connected to medical system through wireless data transmission, give immediate analysis, lower price, and ease of monitoring.
Specifically, digital signal processing applications in electrocardiogram (ECG) system with a particular emphasis on the impact of embedded system / system-on-chip (SoC) have emerged. Digital bio-signal processing system of ECG for detection, classification [4] [5] [6] [7] [8] and data compression [9, 10] had been developed using FPGA platform.
Heart abnormality such as atrial fibrillation (AF) is a problem of the heart's electrical system. AF can disturb the normal rhythm between the atrias and ventricles. Heart beat can increase to 600 beats per minute with ventricular rates above 100 beats per minute. Atrias can no longer pump blood to ventricles and cause heart attack, high blood pressure, coronary heart disease, or heart valve disease, or sometimes can cause no symptoms. When atrial muscle contraction fibrillate irregularly, or AF, ischemic stroke could happen [11] Women at higher risk of AF which need more genomic, clinical and biomarkers profile [12] for the society not just focused on certain population [13, 14] .
A previous study has shown the viability of second order system to classify ECG signal between normal sinus rhythm of healthy human and atrial fibrillation patient [15, 16] . Therefore this study illustrates the high-level design method to develop atrial fibrillation (AF) digital signal processing applicationspecific module in hardware design. The hardware designed in this study embraced specific purpose by relating natural frequency,, algorithm of ECG signal from second order system into hardware language for FPGA prototyping. Steps taken starting from the algorithmic level to construct the registertransfer level (RTL) model for the proposed design were discussed. High-level synthesis is the mapping of a behavioral description of a digital system into RTL design, or transforming the algorithm into an RTL design [17] .
2.0 METHODOLOGY
Several techniques must be highlighted to map the algorithmic description in the form of data flow graph (DFG) for RTL design which is suitable for FPGA prototyping of AF detection system. Three phases were included, parsing, transformation and RTL synthesis, to implement the hardware design [18] [19] [20] . Parsing phase was where less resources are needed by factoring the algorithm or removing redundant operations in the algorithm. Transformation phase was where DFG corresponding to the algorithm are allocated and scheduled to minimize resources and whole computation time. Finally the RTL synthesis phase was where RTLs were coded and designed for the specific algorithm. This study used algorithm of natural frequency of ECG signal which was derived from second-order system. The selected algorithm was chosen based on previous study to characterize and classify atrial fibrillation and normal sinus rhythm [15, 16] .
Phasing Phase
Derivation of algorithm in Equation (1) were discussed in previous study [15, 21] where  is natural frequency of ECG signal; , ′ , ′′ , ′′′ and ′′′′ are differential of over time, t, of first to forth derivation respectively. As an early step, Equation (1) is expressed with variables as Equation (3), where ′ , ′′ , ′′′ and ′′′′ were defined by , , , and , respectively. The first to fourth derivation of in (1) are expressed by Equation (4) to (7), respectively. Parameters , , , , and represent value in time delay of , −1 , −2 , −3 , −4 , respectively [21] . This to ensure the mapping of algorithm to hardware design is correct and later to be used with continuous signal, or apply pipelining technique in hardware design. Referring to Equation (2) 
Transformation Phase
Algorithms as in Equation (3) to (7) were transformed into dataflow graph (DFG). This study comprised of three architectures in transformation phase, i.e. Design 1, Design 2, and Design 3. Design 1 is single-cycle design as Equation (3) to (7). Design 2 is the conversion of Equation (3) to (7) by replacing with 4. Design 3 is multi-cycle fully-constraint design, using only 1 multiplier/divider and 1 add/subtract in a cycle.
Total of seven resources were needed to map the algorithm of Equation (3) in single-cycle design while two resources (1 multiplier/divider and 1 adder/sub) were allowed in multi-cycle fully constrained design. Additional resources were needed to construct the RTL design for natural frequency module. These are shown by Equation (4) to (7) for parameters needed. Therefore, the designs were instantiated into 5 submodules correspond to algorithms of Equation (3) to (7) . All three designs included of 5-submodule, namely m-, n-, p-, q-, and wmodules.
The top-level module was named as NaturalFrequencyModule. Figure 1 shows the DFG for (a) single-cycle design and (b) multi-cycle design with optimal resource utilization of wmodule. Figure 1(c) shows the top-level module for the NaturalFrequencyModule. There were five sub-modules which correspond to each algorithm of Equation (3) to (7) . Modules m-, n-, p-, and q-are concurrent to each other and sequential with w-modules.
RTL Synthesis Phase
Each algorithm was included into top module as shown in Figure 1 (c). Three design types were synthesized using SystemVerilog HDL. The platform was Quartus II v13.1. Example of RTL code is shown in Table 1 . 
Design 1
Design 2 Design 3 S1 : q ← ( 6*c -4*b + a + (e -4*d) ) / (t*t*t*t) ; S1 : q ← ( 6*c -4*b + a + (e -4*d) ) / 256 ; S1 : R1 ← R1 -4*R2 ; S2 : R1 ← 6*R3 + R1 ; S3 : R1 ← R1 -4*R3 ; S4 : R1 ← (R1 + R5) / 256 ; Table 1 shows the RTL code for q-module of three designs under study. The q-modules for all design were chosen as it employs maximum number of inputs (a, b, c, d and e) before the output (of the q-module, that is q) can be generated. q-module involved four states in Design 3. The q-module needed more inputs than other submodules (m-, n-, p-, q-modules) to provide the correct output of the module. The w-module consists of five states in Design 3, which was previously shown in Figure 2(b) . Since m-, n-, p-, and q-modules are concurrently arranged in the top-level design, therefore the required states for those modules were synchronized. Thus the outputs of m-, n-, p-, and qmodules will be the synchronized inputs to w-module. 
3.0 RESULTS AND DISCUSSION
The algorithms of Equation (3) to (7) were defined as submodule; w-module, m-module, n-module, p-module and qmodule, as in Figure 1(c) respectively. Each sub-module was synthesized separately in QuartusII. From Figure 1(c) , the outputs of m-, n-, p-, q-modules were connected to be the inputs for w-module, while the inputs for the top-level module, i.e. NaturalFrequencyModule were signals a, b, c, d and e, and the output is w.
Initially, the inputs were stored in registers during first state. Mapping of algorithm to hardware design was realized by designing one state of single-cycle design. The difference between single-cycle designs (Design 1 and Design 2) was the used of fixed value of t (Design 2), instead of using multiplier(s) to find the multiplication of t (Design 2). Referring to algorithms of Equation (4) to (7) of which ts were replaced by integer 4, 16, 64, and 256, respectively, in Design 2. Design 1 used Equation (4) to (7) directly as shown in the algorithms.
Theoretically, t is time, where in our previous study had shown the optimum value of t was 4 seconds for characterizing AF from ECG signal [15, 16] . There was only one state designed for both types of single-cycle design (Design 1 and 2) . Three, four and five states were involved in rescheduling p-, q-and wmodules while one and two states involved in m-and nsubmodules, respectively, for Design 3. Since the design of m-, n-, p-, q-modules were concurrent to each other, the output of those modules were synchronized. The number of states of each module are summarize in Table 2 . 
RTL Synthesis
After synthesizing the RTL code of each module in QuartusII, the output waveforms were observed and monitored. Every design input and output was monitored at timing waveform, Figure 3 . Meanwhile, Table 3 shows the truth table for top-level design, i.e. NaturalFrequencyModule. All calculations based on algorithms of Equation (3) to (7), with inputs labeled as a, b, c, d and e while the outputs are w, m, n, p, q, respectively. The port-connection between submodules and top-level module are shown previously in Figure 1 (c), and synthesized into RTL code as shown in Figure 2 . The output, w, of NaturalFrequencyModule is 0.227362 after first clock cycle, and the output for each submodule was labeled as moutt, noutt, pout and qoutt for m-, n-, p-, q-modules in Figure 3 . The values obtained from timing waveform of Figure  3 (a) and Figure 3 (b) for m-, n-, p-, q-modules are 4.75, 0.5625, -1.5, and 0.992188, respectively. Submodules m, n, p, and q design used positive-edge clock-trigger while w-submodule used negative-edge clocktrigger. The w-module was set to have different clock cycles to show that differently triggered clock cycle can delay the output by half of a clock cycle. In consideration of Design 1 and Design 2 comprising single-cycle design, all given inputs need 1 cycle to produce the output. Therefore, the second batch of inputs produce output after second clock cycle, which was 1.12273 as shown in Figure 3 (a) and Figure 3(b) . According to truth table in Table 3 table of Table 3 . Therefore, both Design 1 and Design 2 were verified as both truth table and output waveform produced the same result.
Inputs for algorithm of Equation (3) were listed as m, n, p, q and the output was w, of which in Figure 3 (c), were labeled as moutt, noutt, poutt, qoutt, and woutt, respectively. In Design 3, five clock-cycle needed to complete the concurrent modules, which are m-, n-, p-, and q-module. These modules, m-, n-, p-, q-module, must generate output and pass it to w-module. Then five more clock cycles were needed to complete the process for algorithm of Equation (3) (refer to Table 2 ). Since the inputs to the top-level module was given every 20 ps, and control input (labeled as done signal) is low or '0', w-module will be invoked. Refer to Figure 3 It is shown from truth table of Table 3 , that the waveform output of Figure 3 are the same if every number is rounded to 2 decimal points. The outputs were valid and verified. Therefore, RTL code for Design 1, Design 2, and Design 3 were verified.
Referring to Figure 3 , the performance of each design are summarized in Table 4 . During the design, QuartusII default clock speed was used, thus, producing the same fmax at 50 GHz. Algorithm of natural frequency, Equation (3), for Design 1, Design 2, and Design 3 needed 20 ps, 20 ps, and 220 ps, respectively, for a complete operation. According to the DFG, Design 2 had less of 5 mul/div than Design 1 but both designs consist of 13 add/sub, while Design 3 only needs 1 mul/div and 1 add/sub. The result from QuartusII compilation shows total logic utilization for Design 1, Design 2, and Design 3 are 2530, 36, and 1, respectively. Total DSP blocks are 12 blocks for Design 1, while both Design 2 and Design 3 have no DSP blocks. Therefore, Design 3 needed the least resources among the proposed designs even though execution time taken for a complete operation was longest. More study had to be taken for Design 3 to optimize the performance because current study only emphasised the mapping of algorithm into RTL design using high-level synthesis approach, in order to find solution of multi-cycle inputs complexity and resource scheduling [22] .
4.0 CONCLUSION AND FUTURE WORKS
It has been shown that algorithm of natural frequency,  of ECG signal obtained from second-order system can be successfully implemented in hardware design based on highlevel (HL) synthesis approach. Several architectures, i.e. SingleCycle and Multi-Cycle fully-constraint were compared and it has been found that even though Design 3 comprised the least utilization of logic but needed longest time for one complete execution. Design 2 ranked second in logic utilization and provided less time than Design 3 and same execution time as Design 1. Therefore, Design 2 is chosen as the better design among the proposed designs during this study.
Optimization of resource utilization, such as usage of shift registers to do multiplication and division are intended to be further explored. Furthermore, the inputs variable in this study will be prolonged as single input which change over time, to realize the real-time ECG signal processing for detecting AF in human. Furthermore, multi-cycle operations using pipelined functions can enhance the performance of system design based on high-level synthesis approach. 
