Abstract-Real-time networks, such as industrial network, field bus and so on, have been becoming one vital component to develop large-scale and cooperative embedded systems. As one important branch, the wireless mode of realtime networks also raises more and more attentions in recent years since its conveniences to construct a flexible control system. Ethernet Powerlink is such a typical real-time industrial network protocol, and provides a master-slave, time-slot mechanism that can well avoid radio collisions. With the designed FPGA-based hardware node, in this paper, a new method to extend wireless capability of Powerlink is explored and described. Concretely, Powerlink architecture and especially its original mechanisms are analyzed at first. For effectively connecting OpenMAC module of Powerlink and RF module, an interface logic at MAC layer is introduced and is typically designed in a multiplexing mode with a dual-FIFO logic according to the limited resource on FPGA, some key designs and mechanisms of which are detailed later. Finally, all designed mechanisms and logics are implemented as an extension part of IP core of Powerlink in VHDL language, and the communication functions and performance of such extended protocol are verified.
I. INTRODUCTION
Nowadays, networking and intelligence have been becoming two remarkable emergent characteristics for novel industrial embedded systems with innovative information technologies. Thereinto, real-time network is one key infrastructure for its capability to flexibly connect electronic devices inside equipments and construct flexible product lines. With these obvious advantages, such networks and corresponding application technologies have increasingly raised engineering and research interesting.
In recent decades, several different industrial networks have been developed, in which ethernet-based real-time networks formed one main branch. For instance, EtherCAT, Profinet IRT and Ethernet Powerlink(EPL) are three typical ones [1] [2] [3] . Surrounding this topic, many studies to improve the performance of these networks are also carried out. Yoon et al. [4] Corresponding author: Kailong ZHANG (e-mail: kl.zhang@nwpu.edu.cn).
and designed a new topology adaptation network management protocol, which can provide a recoverable control network. After analyzing the schemes and performances of EtherCAT, Rostan et al. [5] studied an EtherCAT enabled control architecture with extraordinary real-time performance and flexible topology. On the basis of Dynamic Frame Packing (DFP) algorithm in Profinet IRT, Schlesinger et al. [6] proposed an automatic packing mechanism with subframes, the positions of which are recognizable for each device. This design makes a scheduling dispensable and achieves better performance than DFP, especially in star and tree topology. Schlesinger et al. [7] compared the performance of three real-time ethernets, and thought that Profinet IRT gains the advantage with asymmetric throughput data payloads, while EtherCAT and VABS is better for very small data payloads. Limal et al. [8] employed a model-checking approach, concretely timed finite state automata, to validate the medium redundancy management part of the ethernet Powerlink high availability extension in the context of special power critical applications. From current work, we can find that these studies are mainly focused on the optimization of protocols and applications. On the other hand, some researchers also attempt to migrate real-time networks to wireless mode because its convenience and flexibility to construct local control networks. Typically, Kjellsson et al. [10] discussed how the WISA(Wireless Interface for Sensors and Actuators) concept can be efficiently integrated into wired field networks, and proposed amendments to WISA to improve the 802.11b/g coexistence and harmonize the integration of WISA. Seno et al. [11] analyzed the possibility to realize the wireless capability of ethernet Powerlink protocol based on 802.11. These work show that wireless mode is feasible for some real-time networks, because the capability to avoid radio collisions with time-division multiple access (TDMA) or other no-competitive communication mechanisms.
It's obvious that, as a time-slot-based protocol, transplanting Powerlink protocol into wireless mode is feasible [9] . After analyzing the architecture and characteristics of Powerlink, in this paper we design an new interface logic at MAC layer to extend the wireless interface of a Powerlink node. Primarily, FIFO-based multiplexing mechanisms at the physical and data link layers are explained in detail. And finally, some key implementation methods are also presented. 
II. ARCHITECTURE OF WIRELESS ETHERNET POWERLINK(WEPL)
Time-slot communication mechanism of Powerlink is the foundation to support wireless mode. In this section, related features of this protocol and the self-designed FPGA-based hardware are explained firstly.
A. Ethernet Powerlink Architecture
Ethernet Powerlink (EPL), regarded as a combination of Ethernet and CANopen, was originally defined by the specification subsequently included in the IEC 61784 International Standard [9] , and its architecture, shown as Figure. 1, is consistent with the OSI standard. EPL is completely compatible with legacy Ethernet since it is based on the definition of a data link layer protocol placed on top of the Ethernet Medium Access Control(MAC) layer. This means EPL frames are encapsulated and transmitted by means of Ethernet protocol data units. Among different physical layers encompassed by the original Ethernet specification, EPL standard refers explicitly to 100BASE-X with half-duplex transmission. And at higher layers of the communication stack, it includes an application layer protocol based on CANopen profiles. Especially, the Procedure Data Object (PDO) and Service Data Object (SDO) of object dictionary are kept to make EPL open and flexible [11] . In EPL networks, all EPL stations are connected via either hubs or switchers, and one managing node (MN) is responsible for polling a set of controlled nodes (CNs). Actually, the operation of EPL is based on a cycle of predefined fixed duration, continuously repeated on the network and handled by the MN. As shown in Figure. 2, MN broadcasts a Start-ofCycle frame (SoC) at the beginning of each cycle to inform all CNs that a new cycle is started. After this initialization, an isochronous period is launched in which MN polls all the CNs according to a round-robin technique. Concretely, a Poll-Request frame (PReq) is issued by MN to each CN which carries command data, and when received a PReq the addressed CN responds a Poll-Response frame (PRes) with data. Typically, the communication procedure is always organized as the "Cycle i-Classic" shown in Figure. 2. During this period, if MN does not receive a correct PRes from a CN within a predefined interval (EPL time-out), it marks that query as failed and moves on the next CN, and if one CN does not respond for a predefined number of consecutive cycles, it is removed from the isochronous cycle. At the end of each isochronous period, MN broadcasts SoA to inform CNs that an asynchronous period has started, in which a station that made a request during one of the previous isochronous periods may be granted to transmit an asynchronous message. Additionally, in the optimized EPL procedure, Poll-Request-Chaining (PRC) technology is adopted based on the synchronized distributed timer on each EPL node, as "Cycle i-PRC" shown in Figure. 2. Thus, most repeated hand-shaking between PReq and PRes will be eliminated, and the transmission efficiency of the whole EPL network increases at about 40%.
B. FPGA-based WEPL Hardware Structure
In the industry, Powerlink protocol are typically implemented with standard C or VHDL, being able to run on heterogeneous platforms. In our previous work, we have designed a FPGA-based hardware, as shown in Figure. 3, in which a CC2530 RF module is adopted for the possible wireless extension.
According to the requirements of performance, when we designed the hardware one Altera Cyclone IV FPGA is choosed as the central protocol processor, and a 50MHz crystal oscillator and other peripheral circuits are also attached. Meanwhile, one CC2530, which owns built-in MCU and a RF transceiver at 2.4GHz, is employed to serve as RF module, with a 32MHz crystal oscillator. According to our analysis, these two module can satisfy the expected requirement well. While, oscillator frequencies and data formats between FPGA and the RF module are apparently different, it requires the IP core logic of EPL must be extended. This is obviously the key problems we should consider.
III. MULTIPLEXED INTERFACE LOGIC DESIGN
By the EPL protocol stack, OpenMAC is a particular unit that connects to Packet Buffer via DMA channel, and meanwhile with the physical interface via OpenFILTER. And in fact, it separates the protocol layer from interface layer. So, only logics and interfaces between OpenMAC and physical layer need to be changed. In our design, the extended logics mainly include one S2P/P2S unit, two data FIFOs (FIFO-O and FIFO-I), and one I/O Multiplexer unit. Additionally, a dual-port RAM logic and 40 pins are set up on FPGA for the parallel communication with master system.
A. Dual-FIFO Mechanism
To eliminate the differences of data formats and communication frequencies, data buffers are introduced between OpenMAC and RF unit. Concretely, FIFO-I and FIFO-O are employed to buffer data for or from RF unit. These two buffers are implemented with existing DCFIFO model in QuartusII because this mode can support the synchronized adjustments of different clocks via its inside wrclk and rdclk signals.
In concrete implementation, each FIFO is set as the Legacy synchronous FIFO mode, which can satisfy the possible transmission delay in WEPL hardware. And two 8-bit data buses, named q [7:0] and data [7:0] , are adopted as the input and output bus respectively. Meanwhile, wrreq, wrfull, and wrempty are defined as the writing request, FIFO full, and FIFO empty signals, separately; And, emphrdreq, rdfull, and rdempty are indicated for reading operation as the reading request, FIFO full, and FIFO empty signals, respectively. With these signals, the other designed logics can operate FIFOs in effective operation series.
B. Serial-Parallel-Coverter Logic
Serial-Parallel Converter (SPC) logic is further designed as a internal interface between OpenMAC and FIFOs. Its main function is to convert 2-bit data stream from OpenMAC to 8-bit data stream, write converted data into FIFO-O, and notice Multiplexer to read data from FIFO-I. When received a read signal from Multiplexer, it reads a 8-bit data frame, converts to a 2-bit data stream, and transfers to OpenMAC. The detailed logic of SPC is presented in Figure. 4, where the Tx Block and Rx Block are designed to carry converting functions, and the details of I/O definitions are defined in Table I . 
1) Tx Block Processes
Tx Block of SPC is composed of two hardware process modules: Tx Sm process and Tx Ctl process. The former process receives control signals in real-time, and shifts the status of state automata according to received signal. The later process executes corresponding functions under a special state. Concretely, control signals, states and their relationships of Tx Block are presented in Figure. 5, where R Idl, R Crs, R Sof, R Rxd, and R Stat are the idle state, data monitor state, start state, receiving state and terminate state, respectively.
It should be cleared that the signal Wr En sent to I/O Multiplexer in R Stat must be kept in high level for a long enough time because the clock frequency of I/O Multiplexer is different from that of SPC. To realized accurate control, we design a 128-period time counter, Stat Count [7. .0], and only when its highest bit Stat Count [7] switches to 1 the automata shifts to R Idl. With this counter, a qualified Wr En signal in 2.56μs can be generated. On the other hand, rCrs Dv signal plays a role of data beginning synchronization, and via monitoring its up-edge R Crs can learn the correct beginning time of the data frame from OpenMAC. In fact, this is an empirical design to eliminate the lost of partial data that we have observed during experiments. To transform four 2-bit data frames to a 8-bit data frame, a shift register logic Rx Sr is also designed in Tx Ctl process. At each clock period, two bits Rx Dat [1] and Rx Dat[0] will be serially shifted into this register, and after every 4 clock periods a 8-bit data is formed.
2) Rx Block Processes
Similar to Tx Block, Rx Block also contains two blocks, the state automata logic Rx Sm and the read-related control logic Rx Ctl. As shown in Figure. 6, R Idl, R Bop, R Pre and R Txd indicate the idle state, initial state, pre-sending state and sending state, separately. In R Bop state, all required registers and timers will be initialized, and then Rx Block shifts into R Pre state. And in R Pre state, Rx Ctl process sends the beginning ethernet frame, meanwhile keeps detecting the timer signal Tx Time. When Tx Time signal changes to 1, the state automata of Rx Block switches to R Txd. It needs to be cleared that in Rx Ctl a shift register is also adopted to transform received data to 8-bit format. After a reading operation, the state automata shifts to Rd Idl state again. 
C. I/O Multiplexer Logic
To realize duplex communication with limited I/O resources between FPGA and RF module, a I/O Multiplexer logic is designed, which connects to RF module, FIFOs and SPC logic, as presented in Figure.7 important signals. When applying the use of multiplexed data bus C Data, OpenMAC triggers F Rd Dv signal, and RF module will trigger C Wr Start signal when applying C Data. Obviously, there must exist a resource competition problem between such two mutually-exclusive operations. In our work, it is resolved via introducing the state automata Mult Sm, Rx Ctl and Wr Ctl processes with decision-making logics detailed as below. 
1) Mult Sm Process
State automata Mult Sm implements the main multiplexing logic, covering five statuses, two independent and mutuallyexclusive state rings, as shown in Figure. 
2) Rd Ctl and Wr Ctl Processes
Rd Ctl process is for processing signals and transmitting data to RF module in R Rd and R RdE states. When the state of Mult Sm shifts to R Rd, Rd Ctl will generate a F Rd Req signal and monitor the status of FIFO-O: Empty or Full, continuously. At the same time, it sends control signals C Rd Clk and C Rd Start to RF module, guaranteeing a correct data reception. When in R RdE, Rd Ctl process will trigger the termination signal C Rd End, and switch Mult Sm into R Idl state.
Corresponding to Rd Ctl process, Wr Ctl process is in charge of things related to receive data from RF module in states R Wr and R WrE. When in state R Wr, Rd Ctl continuously receives signals C Wr Clk, C Wr Start, and C Wr End from RF module. Such mechanism is valuable to ensure all data from RF module will be received synchronically and correctly. Then, Wr Ctl process will write received data into FIFO-I via triggering F Wr Req signal if FIFO-I is in correct status, such as not full. When Mult Sm transfers to R WrE state, Wr Ctl notifies SPC logic to get data from FIFO-I via a termination signal F Wr En, and switches Mult Sm into R Idl state.
3) Sampling Synchronized Signal As aforementioned, synchronization is an important factor for correct data transferring. For WEPL, it means the synchronization signal Wr clk triggered by RF module must be sampled exactly. At beginning, we set the time of Wr clk in high level be longer than one sampling period, and not exceeding two periods. However, the abnormal phenomena are observed that when a synchronization period is bigger than a sampling period, redundant synchronization signals are always sampled. This leads to false sampled data bits. And we also find another phenomenon that when a synchronization signal is sampled, a up-level signal with the duration equal to one sampling period will be generated. Thereout, we set two signal monitoring registers L1 and L2 to store two sampled synchronization signals in two continuous clock periods. And in Multiplexer logic, only when the signal in L1 is low level and signal in L2 is high, it outputs Wr Req to FIFO-I, as the example shown in Figure. 9.
D. Driving RF Chip
In the RF module, we design a group of functions, covering hal McuInit(), hal port init(), and hal RFInit(), to set I/O ports of CC2530 as a communication interface and initialize the corresponding functions. Concretely, P0 is in general I/O mode, P1 is in GPIO mode, and the lower three bits of P1 are set as output. Further, P1 0, P1 1, P1 2, P1 3, P1 4, P1 5, and the port P0 are defined as the Wr Clk, Wr Start, Wr End, Rd Clk, Rd Start, Rd End, and C Data respectively. Based on these signals, rfRecvData(), hal port send(), hal port receive() and rfSendData() functions are designed to receive/send wireless data. Because the coupling relationship of CC2530 and Multiplexer, the logics of these functions are similar to that of Multiplexer.
Further, CC2530 allows the carrier frequency range from 2394MHz to 2507MHz and 1MHz step-width, and provide 16 channels conforming to IEEE802.15. . So, we expand the communication interface of RF module to be frequencyreconfigurable with programmable capabilities of frequencies and channels. In one concrete application, users can configure wireless parameters of all WEPL nodes according to their requirements via hal RFInit().
IV. NODE INTEGRATION AND VERIFICATION

A. Integration of WEPL Node
All extended logics designed above are developed with VHDL language, via Quatus and ModelSim, and software in RF module is developed in IAR Embedded Workbench. Then, we solidified the implemented IP Core on WEPL node hardware, and the final WEPL node is shown in Figure. 
B. Experiments and Verification
Further, we construct a prototype wireless network with three nodes, in which one node serves as the master and the other are slaves. On this basis, several experiments are carried out to verify the communication performance. Figure.11 shows the waveform of a shortest data frame captured from openMAC, at 50MHz. It's obvious that the transmission-enable signal Tx En of openMAC is kept in high level for 6.08μs, and during this period 8-byte Ethernet head, 64-byte data and 4-byte CRC code are transmitted on 2-bit data bus within T h (640ns), T d (5.12μs) and T c (320ns). It's easy calculated that one byte can be transmitted in T 1 (80ns), and the bandwidth of openMAC is about 12.5MB/s. Figure. 12 presents the data and time series when Multiplexer transmits 8-bit data from FIFO-O to CC2530. As shown in this figure, C Rd Start signal is triggered firstly to notify CC2530 to prepare a reception, while C RD End is the last one to be triggered, which indicates the transmission procedure is completed. C Rd Clk is the signal to active CC2530 to sample data on C Data. The width of data signal is bigger than that of C Rd Clk signal ensure the signal correctness. It's also observed that waveforms of both rRD Data and C Data are totally same, which means the data is correctly transferred via Multiplexer. Trough experiments, we can measure the duration time of C Rd Start is 138us, and the effective transmission time is 136us. Thus, we know the transmission rate of Multiplexer is approximated 500KB/s. Figure. 13 presents one typical time series of waveforms when one 8-bit data is transferred from CC2530 to Multiplexer. It's obvious that setting the duration time of each bit being bigger than that of C Wr Clk, and the width of C Wr Clk in up-level being no less than the sampling period (1us) of Multiplexer, will guarantee the correct sampling of synchronous data well. which is Considering the delay led by software code, the widths of C Wr Clk and each data bit are set to be 1.06us and 4.26-4.22us, respectively. Thereout, we can know that sending one byte from CC2530 to Multiplexer needs almost 4.24us, and the data rate is approximately 235.85KB/s. Figure. 14 indicates the time relations and data transmission procedure when Multiplexer writes FIFO-I. For the designed filtering mechanism in Figure. 9, the average transmission time of one 8-bit data frame is almost 4.375us, so the receiving speed of Multiplexer is approximately 228.57KB/s.
Based on the prototype network, the full communication performance is also evaluated. When the average length of data frame is 64 bytes, the maximum transmission and receiving speeds of WEPL are 29.27KB/s and 27.56KB/s, respectively. Assume that the two slave nodes are independent, their shortest control period can be 5ms. For our designed Pattern-Sewing machine that compose a three-axis linkage electromechanicalmechanism, forming a cooperative movement between X-Y plane and a needle in vertical direction. For the three-axis linkage system based on WEPL, as long as its maximum control time of X-Y plane motion at each frame is not over 10ms the maximum sewing speed will reach 3000 needles per minute, which satisfies our requirements.
V. CONCLUSION
Real-time networks have been becoming one mainly technology for novel industrial systems and embedded applications. For the TDMA mechanism of Powerlink, extending it with a wireless interface is possible. In this paper, we proposed a wireless extension scheme at MAC layer on the basis of a self-designed WEPL hardware. Further, a dual-FIFO based multiplexer logic and interface are designed and implemented when several synchronization problems are well resolved. Experiments show that the basic communication performance can satisfy the requirements of different applications.
Our ongoing and future studies on this topic are mainly on the optimization of hardware design and protocols, applying this extension to other time-slot based protocol, and the applications of such real-time wireless networks in the design of novel industrial equipments. 
