Abstract-Power quality (PQ) monitoring is an important issue to electric utilities and many industrial power customers. This paper presents a DSP-based hardware monitoring system based on a recently proposed PQ classification algorithm The algorithm is implemented with a Texas Instruments (TI) TMS320VCS416 digital signal processor (DSP) with the TI THS1206 12-bit 6 MSPS analog to digital converter. A TI TMS320VC5416 DSP Starter Kit (DSK) is used as the host hoard with the THS1206 mounted on a daughter card. The implemented PQ classification algorithm is composed of two processes: feature extraction and classification. The feature exaction projects a PQ signal onto a time-frequency representation (TFR), which is designed for maximizing the separability between classes. The classifiers include a Heaviside-function linear classifier and neural networks with feedforward structures. The algorithm is optimized according to the architecture of the DSP to meet the hard real. time constraints of classifying a 5-cycle segment of the 60 Hz sinusoidal voltagdcurrent signals in power systems. The classification output can he transmitted serially to an operator interface or control mechanism for logging and issue resolution.
to electric utilities and many industrial power customers. This paper presents a DSP-based hardware monitoring system based on a recently proposed PQ classification algorithm The algorithm is implemented with a Texas Instruments (TI) TMS320VCS416 digital signal processor (DSP) with the TI THS1206 12-bit 6 MSPS analog to digital converter. A TI TMS320VC5416 DSP Starter Kit (DSK) is used as the host hoard with the THS1206 mounted on a daughter card. The implemented PQ classification algorithm is composed of two processes: feature extraction and classification. The feature exaction projects a PQ signal onto a time-frequency representation (TFR), which is designed for maximizing the separability between classes. The classifiers include a Heaviside-function linear classifier and neural networks with feedforward structures. The algorithm is optimized according to the architecture of the DSP to meet the hard real. time constraints of classifying a 5-cycle segment of the 60 Hz sinusoidal voltagdcurrent signals in power systems. The classification output can he transmitted serially to an operator interface or control mechanism for logging and issue resolution.
I. INTRODUCTION
he increasing popularity of power electronics has led to io power systems by electric utilities and industrial power customers. Software and hardware for automatic classification of voltage and current disturbances are highly desired. Existing recognition methods need much improvement in terms of their capability, reliability, and accuracy. Today, power quality has become a very interesting cross-disciplinary topic, coupling power engineering and power electronics with other research areas, such as digital signal processing, software engineering, networking, and VLSI.
Voltage related PQ disturbances are the major causes of disruption in industrial and commercial power supply systems, T recent focus on power quality (PQ) Support a Digital Society) shows that the U S . economy is losing between $104 billion and $161 billion each year due to outages and another $15 billion to $24 billion due to PQ phenomena [I] .
Traditional monitoring methods are based on RMS measurements and constrained by their accuracies. Recently proposed approaches for automated detection and classification of PQ disturbances are based on wavelet analysis, artificial neural networks, hidden Markov models, and bispectra [2] [3] [4] [5] [6] . Real-time PQ monitoring hardware should be capable of acquiring voltage or current waveforms, identifying the event type based on the waveform pattem, understanding the cause of the disturbance, and making system protection and prevention decisions.
Digital signal processors (DSP) are distinct from generalpurpose microprocessor, mainly due to their capacity for realtime computing. With more optimized architectures towards faster multiplications and accumulations than general-purpose microprocessors, DSPs have wide applications io speech, digital audio, image, and video processing, and telecommunications. This paper presents a digital signal processor-based hardware system for PQ classification based on a recently proposed PQ classification algorithm by the authors [7] . The algorithm is implemented with a Texas Instruments (TI) TMS320VC5416 digital signal processor (DSP) with the TI THS1206 12-bit 6 MSPS analog to digital converter. A TI TMS320VC5416 DSP Starter Kit is used as the host board with the THS1206 mounted on a daughter card. This paper demonstrates the feasibility of implementing the proposed PQ classification algorithm in real-time with a DSPbased system and is one of the first case studies of using DSP technologies in the area of power quality monitoring [8,91. 11. THE PQ CLASSIFICATION ALGORITHM PQ disturbances cover a broad frequency range and significantly different magnitude variations. In this paper, a new PQ classification algorithm is presented with an example application of discriminating five major power system waveform events: harmonics, voltage sags, capacitor high frequency switching, capacitor low frequency switching, and normal voltage variations, as shown in Fig. 1 . The complete implementation algorithm presented in this paper is shown in Fig. 2 . The two sequential processes: feature extraction and classification are explained in details in the following two subsections.
A. Feature extraction

A.1. Theoretical background
There is an infinite number of possible time-frequency representations (TFRs) corresponding to a signal [IO] . For waveform recognition problems, features need to be selected from a TFR that maximizes the separability of signals in different classes and minimizes the similarity of signals in the same class. Therefore, it is desirable to design a classificationoptimal representation TFR, that specifically emphasizes the differences between classes, hut not necessarily describes the time-frequency information accurately [ I 1,121. Time-frequency ambiguity plane has been an important tool in the radar field, in analyzing and constructing radar signals, formulating the performance characteristics of a waveform, and relating range and velocity resolution [ 131. It has also been used extensively in the fields of sonar, radio astronomy, communications, and optics The connection between the ambiguity plane and timefrequency representations has been recognized for a long time. Any bilinear (Cohen class) TFR P(t, f) can be expressed as the two-dimensional Fourier transform of the product of the ambiguity plane A(q,r) of the signal and a kernel function v(q,r) [IO] :
where t represents time, f represents frequency, q represents continuous frequency shift, and T represents continuous time lag. Equation (1) shows that the kernel functions determine the TFRs and their properties. A kernel function is a generating function that operates upon the signal to produce the TFR. The
characteristic function for each TFR P ( r , f ) is
A h 7) v h T I .
The classification-optimal representation TFR, can be obtained through smoothing the ambiguity plane with an appropriate kernel p,, which is a classification-optimal kernel.
The problem of designing the TFR, becomes equivalent to designing the classification-optimal kernel 9, ( s ,~) . With the Fisher's criteria, locations on the ambiguity plane are ranked according to their importance for this classification task. For example, when designing kernel i, a Fisher's discriminant score is calculated for each location ( 7 ,~) o n the ambiguity plane, where m i [ q , r ] 
where the function mod( pI , p z ) represents modulus after dividing the first parameter p, by the second parameter pz .
In this application, the kernel i p , [~,~] is defined as a binary matrix (each matrix element is either 0 or I), therefore, Feature points are ambiguity plane points of locations ( 7 ,~) where ipj[q,T] = 1 . Therefore, the process of feature extraction is to select points that are optimal for the classification task from the ambiguity plane. The feature ranking mechanism is shown in Equation (2). Locations that receive higher discriminant scores are selected as feature locations.
B. Clnssijication
Multiple classifiers are adopted in the presented method. Each classification node consists of a kemel function and a classifier. Depending on the nature of the kernel, classification node i is to either discriminate signals that belong to class i from signals that belong to class / i + l , ..., n/. or discriminate signals in class /i, __., i+m/ from signals in class /i+m+l, .._. For a two-class classification problem and an input f, the Heaviside linear classifier is defined as n / .
where f is a real threshold value. Training this classifier is to determine the threshold parameter f.
Three feedforward neural network ( F " ) classifiers adopted in this algorithm all have three layers. The structure of the F" for discriminating sags is 2-12-2 (input layer node number-hidden layer node number-output layer node number); the one for capacitor switching is 3-10-2; the one for capacitor high-frequency switching is 3-10-2. The transfer and training 
Ill. DATA FLOW AND DSP FEATURES
A global block diagram for the monitoring system is shown in Fig. 3 . The input signal is first passed through a potential transformer and sampled using a 12-bit analog to digital converter (ADC) daughter card. The 12-bit ADC collects signed integer values with a range from -2047 to 2047. This data is then placed into a 32-word FIFO buffer. Upon filling the buffer, a "data available" signal activates an external interrupt on the C5416 processor Extemal Peripheral Interface bus. Within the interrupt service routine, the FIFO is read through an inpulloutput (IO) port via the External Memory Interface bus on the C5416 processor. This data is moved into a 640-element array for input to the feature extraction and classification algorithm. While in file mode, text files are sent from a host computer via the USB port to the C5416 using the C standard IO functions conveniently modified for bidirectional transmission along the USB port. While in standalone mode, the resulting classification of a sampled signal would be relayed to a control device via the general purpose IO port on the Host Port Interface as a binary number
The TMS302VC5416 is a fixed-point DSP processor with 128 KB of on-chip memory and a 160 MHz clock speed, which can perform 160 MIPS. This processor has a 17x17 parallel multiply accumulator unit which allows single cycle multiply accumulate operations. This allows for fast execution of integer multiplications. While floating point multipliers on other processors may allow direct multiplication of floating point values, this DSP processor executes single clock cycle integer multiplications. Optimization to use all integers is therefore necessary. However, if a loss of precision is allowable, this processor will actually execute an integer multiplication faster than a floating-point processor of a similar clock speed due to the parallel multiplier and accumulation units in the place of a pipelined multiplier. The pipelined multiplier on the TMS32OVC6711, a 32-bit DSP, requires 4 cycles to complete a 32-hit multiplication. While the pipeline may theoretically allow for faster sequential multiplications, in practice a single multiplication is carried out and stalls the pipeline while it finishes and stores the result to the accumulator or a memory location.
1v. OPTIMIZATION FOR REAL-TJME COMPUTING
Because the major task of the presented PQ monitor is to classify disturbances in real-time, significant optimization efforts have been taken when programming the DSP, in order tu reduce the algorithm computation time.
A. Reduce the quantities to be calculated
The results of kemel and classifier training show that only nine kernel points from seven columns of ambiguity plane are needed for implementing the classification process. According toequations (5)and (4), it is enough to just calculate seven kernel-related columns from the matrix R [ n , i ] and nine kernel points from the matrixA [v,T] .
If the process window size is N, the computation cost for 
B. Usefixed-point integer multiplication as much as possible
Due to the 16-hit fixed-point nature of the processor used in this paper, optimization was required to ensure floating-point values were avoided. The analog to digital converter conveniently produces integer values ranging from -2047 to 2047 to allow a smooth transition into the algorithm execution without conversion. While these values could be stored as 16-bit integers, the subsequent steps required the use of lung (32-bit) integers. The discrete Fourier transform (DFT) requires multiply accumulations which would easily exceed 32-bits quite quickly. The long integer values were broken into seven bit integers to allow for use of the single cycle multiplyaccumulate (MAC) function. Each accumulation represents a portion of the final summation after being multiplied by Z7 and is then normalized for storage into a floating-point value. For each DFT operation, this normalization and addition step would occur once for the real part and once for the imaginary part. This all-integer optimization cut the algorithm execution time in half.
C. Use hard-coded sin table and cos table
The discrete Fourier transform (DFT) is implemented with cos and sin functions instead of the exponential function, according to the Euler's Equation.
Due to the focus on accuracy in the standard C math header file, the sin and cos are quite costly in processor time. Because the on-chip memory had not been completely consumed by other operations of the algorithm, the use of a lookup table for these functions was chosen. The values were stored as signed integers ranging from -32767 to 32767. Due to the 12-bit ADC resolution, this range was adequate.
V. RESULTS AND DISCUSSIONS
A. Real-time monitoring capability
The classification process of an 83.33 ms window takes 10.9 ms when the ADC is not running on the same board, which satisfies the real-time constraints in most power quality monitoring tasks. Within the 10.9 ms, 1.7 ms are used for the autocorrelation step, 8.5 ms for the D I T step, and 0.70 ms for classifier step. In Fig. 4 , the classification process of the same five-cycle window takes 11.2 ms, which is measured when the ADC is running on the same hoard and interrupting 960 times per second. This requires a real-time constraint of moving the data from the FIFO buffer into program memory within a 11960 sec window. 
B. Classification performance
In this study, the classification experiment is conducted with five-class examples, as shown in Fig. 1 five-cycle window is long enough to capture the characteristics of all types of PQ events under our study. Second, a five-cycle window is short enough for generating real-time monitoring outputs for many PQ-related applications. The 83.3 ms window size used in this paper for demonstration of the algorithm, can be adjusted appropriately for specific applications. For example, when this method is applied for the discrimination of different types of high frequency power system transients, the window size can he reduced to one or two cycles.
The classification results from Matlah simulations and from the DSP system (both 12-bit and 14-hit) are presented in Table   I . Matlah uses @-hit for the double calculations, hut the presented system uses 12-hit precision.
C. Discussions
The ADC daughter card allows for rapid evaluation of diffcrent ADCs with a host DSK. However, limitations due to processor context changes for interrupt service routines occur and typically limit these ADCs to lMSPS (without Direct Memory Access ports). For the purposes of this paper, this was not an issue. However, this DSK may not he adequate for the high sampling rates associated with power protection applications. A custom printed circuit hoard would he required for this application and the use of non-interrupt based techniques, such as polling, would most likely be required to manage context switching delays. Programmatically, the optimizations enacted to yield faster algorithm performance could be carried further with the use of all assembly language routines and intrinsic functions. Average calculation times could also he decreased by performing only the portion of the algorithm required for each classification step and checking the neural network output immediately. While this would yield faster average computation times, this would increase the execution time length for the worst case as the function calls to perform these short queries would slow the D F I step even further. The assumption that only one power quality class will occur within a five-cycle window introduces the possibility of inaccurate classification.
VI. CONCLUSIONS
A DSP-based hardware monitoring system for power quality event identification is presented in this paper. The algorithm is implemented with a Texas Instruments (TI) TMS32OVC5416 digital signal processor (DSP) with the TI THS1206 12-bit 6 MSPS analog to digital converter. In the algorithm, by designing classification-optimal TFRs, features are selected from the time-frequency ambiguity plane based on the Fisher's principle. Four linear and neural network classifiers are used as classifiers. The algorithm is optimized according to the architecture of the DSP to meet the real-time constraints of classifying a five-cycle segment of the 60 Hz sinusoidal voltagelcurrent signals in power systems.
The proposed system is successfully tested with a five-class PQ classification experiment. A waveform window of 83 ms (640 sample points) can be classified in 10.9 ms. Recognition rate of 96.5% with 14-bit ADC and 95.0% with 12-hit ADC arc achieved on 860 testing PQ waveforms. The real-time power quality monitoring system has potential applications of enhancing power system protections and accumulating PQ event statistics for power quality assessment. 
