This contribution successfully accomplished the design and implementation of an advanced DSP circuit for direct measurements of electrical network parameters (RMS and real and reactive power) with application to network monitoring and quality assurance.
Introduction
The technological development has given rise to such a significant increase of the integration density that it is causing more and more parts of a complete system to be included inside the main core, constituted by a single chip. This chip has been traditionally referred to as Integrated Circuit, but new terms like Integrated System or System on Chip (SoC) have become popular because they better express the fact that the chip can contain not only a part of the system but the whole system itself.
Most of the times, a SoC structure includes a microprocessor as a central core in charge of the control of the system, and a set of general purpose and/or custom peripherals that carry out different operations: co-processing, display control, communication, etc. SoC clearly fulfils the demand for more compact, functionrich and portable appliances that has become so popular, like multimedia-enabled cellular phones, tiny multimedia players, pocket computers, etc. In fact, SoC design is currently the most relevant methodology for the design of a wide range of embedded systems.
SoC designs are commonly built out of already available parts in the form of IP-cores: microprocessors, memory blocks, Ethernet controllers, standard inputoutput devices, etc; so that SoC designers typically do a work of integrations of already available parts and design of specific functions and glue logic. This scheme improves re-usability and reduce the time to market and total costs.
The advance in technology has also made possible to produce better programmable circuits (FPGAs) with increased performance and higher integration densities. FPGAs offer the highest flexibility and cost effectiveness because hardware implementation can be done several times on the same chip at no additional cost and because mass productions of FPGA chips has lead to very competitive prices. Today, full systems may be implemented in mid-range FPGAs from several vendors [1, 2] that can allocate 1,5X 106 equivalent gates working at 100MHz or more (Spartan-3 XC3S1500 FPGA [3] ). FPGA vendors also provide high level development tools where the designer typically works by assembling high level configurable building blocks from a library and designing custom parts through a Hardware Description Language (Verilog, VHDL, etc.) [4, 5] .
This makes FPGAs [6] from Xilinx, since Xilinx Spartan-3 FPGAs were selected to implement the hardware platform. The use of systemlevel tools resulted in a much more productive, resourceefficient and produced better performance do to the variety of fully parametrizable building blocks available in the library and the fact that these blocks are highly optimized for the target programmable chip. Then, it is this methodology that is described in this work.
The rest of the paper is organized as follows: in the next section, the system tools used to develop the DSP device will be described. In section 3, DSP specifications will be presented. In the fourth section, the most important details of the design are discussed. The fifth section is dedicated to presenting the main results both from simulation and implementation. Finally, some conclusions will be derived.
Design and implementation methodology
As it has been mentioned previously, it was decided to address the design of the system using the new tools developed by Xilinx to facilitate the design and implementation of DSPs on its FPGAs: System Generator for DSP [6] and EDK [7] .
System Generator for DSP is a software platform, integrated within M\ATLAB and Simulink [8] tools from The MathWorks, that allows for the design of DSP systems using The Xilinx BlockSet [9, 10] . System Generator also handles the automatic generation of peripherals for MicroBlaze [11] , synthesizable on Xilinx FPGAs. The typical methodology for digital system design has been followed, based on a control unit implemented in VHDL code and a data path built out of System Generator's building blocks. The different tools used in the methodology followed for the design of the DSP peripheral are shown in Fig. 1 . Most of the system simulation is carried out using Simulink; however, Mentor Graphics tool ModelSim [12] has been used to simulate VHDL code (HDL Co-simulation). Synthesis and implementation of the system on FPGA have been done using the Xilinx Platform Studio for embedded systems (XPS) integrated within the Embedded Development Kit (EDK). 
where Xjms(no) is the RMS value of the input signal i in the instant no.
Real (Pij(no)) and reactive (Qij(no)) power are given by:
Pi li(no ) =Ni xi(no -n)x xi (no -n) ( 
Qac(no)
= Qij (no)-DCi(no)x DCj (no) (6) Parameters calculated by the two previous subsystems are average values, the average period being one cycle of the electrical signal. For the estimation of reactive power the approximation that the input signal is sinusoidal and stationary has been used, so that the derivative of the current signal is the same signal shifted 3N/4 samples. 
System design and implementation
In this section, the most important aspects of design and implementation are commented.
The AFE Interface subsystem has been designed according to the datasheets of the external systems we have to interface to, AD7656 converter [13] and AD5233 digital potentiometer [14] , by using a VHDL behavioral description. The OPB Interface has been modeled using the Xilinx BlockSet, just as it is described in [11] .
To design the Direct Measurements Processing subsystem, we have modeled a data path using System Generator blocks, together with a VHDL control unit that is in charge of controlling the processing of the eight analog inputs.
In the design of this DSP peripheral, a great effort has been made to optimize both FPGA resources and operation frequency.
In order to optimize resources, it has been taken advantage of the fact that operation frequency is much higher than Al sampling frequency (75-1OOMHz versus 3200-7680Hz). This feature allows us to have a lot of clock cycles at our disposal between two consecutive captures. Thus, the suggested approach consists on designing a data path that uses the same processing elements for every input (multipliers, adders, etc.) in series. The control unit is in charge of arbitrating the processing, establishing the input that is processed in each moment. A general algorithm of the functionality that implements the control unit is shown in Fig. 3 .
In (7), RMS value (8), real power (9) and reactive power (10) .
In the following lines, the most outstanding implementation details will be commented.
First, FIR filter design and coefficient calculation have been carried out using FDATool, a MATLAB package [10] . FDATool features an advanced interface that allows the design to define filter type, filter order, pass and stop frequencies, passband ripple, stopband attenuation, etc. The System Generator FIR block has been used for the filter implementation, providing a great variety of implementation options: filter coefficients (those generated by FDATool), number of bits per coefficient (16 bits), entry number of channels (eight channels), processing type (serial input) and latency (10 cycles).
In order to implement offset control, RMS, real and reactive power subsystems, a data path where all inputs share the same resources has been modeled. Thus, multiplexing logic has been added to select the input channel that will be processed. System Generator's CORDIC SQRT block has been used to implement square root operation. This block has been modified so that negative inputs produce zero as result, which is part of the system specifications.
System Generator facilitates DSP design and verification [10] , reducing the design time and simplifying the exploration of the design space. Therefore, the challenge is to improve hardware area and speed while producing acceptable results. To do this, as well as optimizing the hardware architecture to suit the ideal algorithm [15] , a number of options are available: using on-chip resources (embedded multipliers, BRAMs, etc.) and configuring System Generator blocks [9, 10] : arithmetic type, precision, latency, overflow, quantization, etc. By using these options the system functional behavior is not affected only the precision and overall performance.
Taking account of these two last alternatives, two versions of the peripheral have been developed: 1.0 and 1.2. In the first version, all intermediate operations are carried out using 32 bits fixed-point arithmetic. In version 1.2, the design has been modeled using 16 bits fixed-point arithmetic. Besides, operations which produce 32 bits results are rounded to 16 bits.
System Generator's blocks may also be configured for frequency optimization, by adding pipeline stages and latency in the different data path components: multipliers, adders, etc. However, these modifications affect the system behavior, making it necessary to modify the control unit so that it behaves properly.
Once system requirements are fulfilled and checked by simulation at the block level, the two versions of the DSP peripheral are generated and imported into an EDK project for later integration with the MicroBlaze processor provided by Xilinx. OPB_Export_Tool [11] carries out these tasks.
Results
In this section, simulation and hardware implementation results are described in some detail.
Simulation results
To check that the design fulfills the specifications Simulink and ModelSim tools have been used together (HDL Co-simulation) for system simulation. This kind of simulation is mandatory in order to take account of blocks that are only available as black boxes.
A wide range of system configurations have been simulated: different sampling rates, NCP, amplitudes, offset, etc., getting a correct operation in all the cases.
A simulation example that corresponds to the following configuration is shown in Fig. 4 Once the simulation begins, the analogue-to-digital conversion control subsystem samples the generated input signals (see Fig. 4a ). In Fig. 4a , the blue signal is the voltage and the red signal is the current. Both signals are out of phase Kt/6 rad.
As it can be observed in Fig. 4b-4e and mainly in Fig. 4f , a transitory phase exists during the first cycle of the input signal. From the second cycle on, when the processing of each cycle concludes, valid and updated data of the different calculated parameters are obtained: offset, RMS values and real and reactive power.
For the offset and RMS values calculation the filtered signal is used (Fig. 4b) . The signals in Fig. 4c-4f show the outputs of the accumulators ACDC, ACRms, ACP and ACQ, respectively, which store inter-cycle transitory values of the parameters being processed. When a cycle of the input signal is completed, the accumulated values in that instant are used to calculate the offset, the RMS value and real and reactive power corresponding to that cycle, according to equations (1) to (6) .
In Table 1 
Hardware implementation results
In order to compare the implementations of the two versions of the peripheral, they have been synthesized separately using XPS from Xilinx [7] . Some figures of merit will be analyzed in this section: used hardware resources and maximum frequency of operation.
Both designs have been implemented on a Spartan-3 XC3S1500 FPGA. These devices have all the features required for efficiently implementing DSP functions: Embedded 18x18 Multipliers, Distributed RAM, Shift Register Logic, etc [15] . In short, these features allow for the implementation of high-performance DSP functions in a small fraction of the total device. XC3S1500 device features about 1,5X 106 equivalent gates, 32 multipliers and 32 BRAM and is in the moderate to low capacity range of the Spartan-3 family.
The hardware implementation results show that the version 1.0 of the peripheral occupies 5181 slices (38% of a XC3S1500 FPGA) and 6 embedded multipliers whereas the version 1.2 only uses 4232 slices (310%) and half of the embedded multipliers, leaving plenty of available resources for implementing the MicroBlaze processor and additional circuitry (see Table 2 ).
Finally, we observe that frequency optimization have provided a maximum frequency of 110.461MHz which easily meet the initial 75-100 MHz requirements.
Conclusions
An advanced DSP for direct electrical measurements has been successfully designed and implemented on a Xilinx Spartan-3 FPGA. The DSP takes the form of a standard OPB peripheral so that it can be managed by a system processor implemented in the same We can also conclude that state-of-the-art FPGAs are ready to be used for advanced DSP design, taking advantage of a variety of tools and integrated design environments that automate the most tedious design tasks, allowing the designer to focus on architecture and specifications and facilitating the exploration of the design space in search of an optimal solution.
