Abstract-This paper describes a mixed-platform framework dedicated to Dynamic Stability Assessment of power systems. DSA refers to tools capable of characterizing the dynamic stability of the system. Time domain simulation is critical for DSA analysis and is done by algorithms known as TD engines. In this work, operations are shared between a software platform and a hardware one. TD simulation is handled by a dedicated mixed-signal electronics implementation. Data flow control, user interfacing, configuration, result post-processing and other auxiliary operations are realized in software. This architecture combines the flexibility of the software with the highperformance of dedicated hardware. Results of a multicontingency analysis and a critical clearing time determination analysis for sample test cases are presented. It is demonstrated that an increase in speed of almost three orders of magnitude can be achieved, compared to single-platform solutions.
INTRODUCTION
The ongoing major restructuring of power systems has led to them being operated closer to their limits [1] , [9] . Therefore stability concerns have been raised, and while security is an inherent function of the system, secure operation can be facilitated by the availability of adequate analytical tools [2] .
One such tool is Dynamic Stability Assessment (DSA), which is concerned with the quantitative/qualitative characterization of the ability of the system to retain a state of operating equilibrium after being subjected to severe disturbances. In such transient events examined during DSA studies, the behavior of the system is described by complex, highly non-linear differential-algebraic equations (DAEs), which are handled by specific solvers known as time-domain (TD) engines. The bottleneck of traditional software-onlybased solutions lies in the computational complexity of the TD simulation.
This work proposes an alternative mixed-platform architecture. A software platform is responsible for peripheral tasks, and the TD simulation proper is handled by a dedicated hardware platform. Itself, the hardware uses a mixed-signal architecture: algebraic grid equations are solved on an analog resistor lattice, and differential nodal injection equations (generators, loads, etc.) are solved digitally on specially designed reconfigurable hardware. In this way the flexibility of software is combined with the high performance of dedicated hardware.
The rest of the paper is organized as follows. Section II gives the conceptual and theoretical background of the proposed framework. Section III presents the system that has been implemented in the Electronics Laboratory in EPFL. In section IV the application of the latter on a sample test case is showcased, and results are discussed. Finally, V concludes this paper, highlighting the prospect of the proposed architecture.
II. PROPOSED ARCHITECTURE
Recent trend in the operation of power systems is that emergency acting is gradually overtaking prevention because of the conservativeness and thereof incurred costs of the latter. Emergency acting suggests for the availability of online and real-time DSA analysis [3] . This requires a massive number of TD simulations to be solved in a small amount of time. It is this computational burden involved in full-system scale DSA that hinders the development of true online real-time implementations. It will be shown in section IV that the proposed architecture lifts the computational bottleneck and enables such approaches.
A. TD simulation fundamentals
The behavior of the power system is governed by a set of parametric differential-algebraic equations. The linear components of the system, such as transmission lines, transformers, and shunt capacitors, form the augmented transmission grid, in red in fig. 1 . Non-linear components of the system are nodal injections to the grid, shown in green in fig. 1 If the study is confined to electromechanical phenomena in the order of Hz, then the fast continuous dynamics of the system, such as electromagnetic phenomena with time constants in the order of milliseconds, can be neglected. Also, given that signals in power systems are periodical and mostly sinusoidal, the phasor representation is used. The system frequency is taken as a reference and signals are expressed as complex quantities in that frame of reference.
With this assumptions, simple algebraic equations can describe the behavior of the grid ܻ. The latter can be seen as a system where the inputs are complex nodal current injections ‫ܫ‬ and the outputs are nodal voltages magnitudes and angles ܸ.
The injections are coupled with the grid in the form of pairs of voltages ("input" to the injection) and current injections ("output" of the dynamic behavior of the injection). The behavior of injection ݅ is described by a set of state variables ‫ݔ‬ and instantaneous variables ‫ݕ‬ . The first are governed by differential and the latter by algebraic equations, both of which are parametrized by a parameter set ߣ .
The complete system of equations (1) and (2) for all ݅, can be solved in a number of different ways. One of them, following the partitioned solution scheme, is presented schematically in fig. 2 . At each time step of the integration, the nodal flow algebraic equation of the grid is solved and then the dynamic behavior of the injections is determined.
B. Parallelization 1) Grid equations:
Mapping the topology of a power grid on a miniaturized electronic implementation has been proved feasible in [4] - [7] Starting from (1):
Neglecting the conductance under the assumption ‫‖ܩ‖‬ ≪ ‫‖ܤ‖‬ for the transmission system:
In (4), ‫ܤ‬ defines two identical separate grids that link voltage to current quantities. These two grids are built using reconfigurable discrete electronic elements.
Inputs to this analog computer are the nodal current injections, outputs are the nodal voltages and the "computing" force is the laws of Ohm and Kirchhoff. This way, immense parallelization is achieved and the solution of the system (1) is available with a complexity of ܱ(1). Comparatively, digital solvers have a complexity of ܱ(݊ ଷ ) for dense matrices or ܱ(݊ ଵ.௫ ) for favorably structured sparse matrices. Provided that node injections ‫ܫ‬ are handled instantaneously, the speed of this calculation does not depend on the size of the system. Preposterously, it is mentioned here that for the implementation described in section III, the solution of a ‫ܫ‬ = ܻ ܸ system of size up to 96 is attained in a constant time of ‫ݐ‬ ௦ = 200 ‫.ݏ݊‬
2) Injection equations:
In the partitioned scheme of fig. 2 , the solution of the equations of each injection is independent from other injections. The latter strongly suggests parallelization of that part of the solution procedure.
The set of injections ‫ܧ‬ is partitioned in ‫ܭ‬ equivalence classes ‫ܧ‬ of structurally similar equations. For all ݅, ݆ members of class ‫ܧ‬ , the following holds.
Where ߣ , ݇ = 1 … ‫ܭ‬ is selected so that ∃߰ : ߣ = ߰ (ߣ )∀݉ ∈ ‫ܧ‬ . The above relation means that differences between DAEs of in-class injections are limited to parameters ߣ. A simple example for the above is a set of generators modeled using the classical generator model. They are all put under the same equivalence class ‫ܧ‬ , and they are all described using a similar set of DAEs (the swing equation), adapted to the different parameters of each individual generator (mechanical starting time and transient impedance). In this work, parallelization according to the above partitioning scheme is suggested: there exists one computational module ‫ܥ‬ per equivalence class ‫ܧ‬ that is able to process injections of that class.
To further enhance performance, ‫ܥ‬ computations are performed in a pipelined fashion. At each pipeline clock cycle the parameters of the system to be solved are iterated across class injections ߣ , ݅ ∈ ‫ܧ‬ . As a result the system ܵ (ߣ ) is solved, for all the injections of the class. After all pipelined computations are finished, the nodal injections are concurrently updated and grid computations are performed for the next time step, as per section II-B1.
III. IMPLEMENTATION
An implementation of the concept sketched in section II has been realized. Fig. 3 presents the architecture of the system.
A. Hardware
The hardware platform, called a Field Programmable Power Network System (FPPNS), is a multi-layered system itself [5] . It consists of slices that can be interconnected in a stacked fashion. In this implementation, there are four slices. Each of them has a capacity of 24 power system nodes, giving the system a total capacity of 96 nodes. A photo of the complete hardware can be seen in fig. 5 . The analog part is implemented using discrete electronic elements. Reconfigurable resistors and switches allow for the mapping of almost any arbitrary power grid to the topology of the emulator. Two identical resistor lattices are realized, implementing a scaled version of matrix ‫ܤ‬ in (4) . Interfacing between the nodes of this analog grid and the FPGA is done using Analog-to-Digital (A/D) and Digital-to-Analog (D/A) converters.
B. Communication
The communication layer spans between both platforms. On the hardware side, a Cypress CY7C68013 USB 2.0 controller is interfaced to the FPGA via a two-port RAM. On the software side, a wrapper dynamic-link library (DLL) has been created. The protocol used to communicate commands and data between the two platforms is a flag-polled shared memory scheme. It is shown in the results section IV that communication is the bottleneck of the system.
C. Software
A standalone application has been created in C++. This does not aspire to be an EMS-DSA replacement, but is In the frontend, the Graphical User Interface (GUI) is based on the QT framework and the Qwt widget library. On the GUI there are environments dedicated to the creation of the power system, to the definition of the set of scenarios to investigate, and to the analysis of the scenarios. An interface is also provided, where results of various analyses can be realtime graphically superimposed on the schematic of the system under study.
In the backend, a TD-engine API has been created. TDengine implementations can register to the application, provided they respect the API. Registered engines are fully interchangeable in runtime. Two implementations have been realized: one purely in software and one encapsulating the dedicated hardware of section III-A.
In the software implementation, the set of DAEs is solved in a partitioned fashion, i.e. the solution iterates between differential ‫ݔ‬ and algebraic ‫ݕ‬ variables. The former are handled using a 4-stage explicit Runge-Kutta integration scheme. Resulting linear systems are solved using LU decomposition with partial pivoting. This engine has been created for comparison purposes only.
The TD-engine implementation that encapsulates the hardware is actually a Hardware Abstraction Layer (HAL) that respects the TD-engine API. A block diagram of it can be seen in fig. 4 . The HAL is runtime interconnected to the actual dedicated hardware. A short description of the most important modules is given hereunder.
x The Calibration module is responsible for performing a series of automated tests to calibrate the finiteprecision reconfigurable elements of the hardware. x The Mapper deals with the graph translation of the real topology of a given arbitrary grid to the constrained topology of the emulator. x The Fitter translates real power system parameters to corresponding scaled parameters of the hardware.
x The USB communication module refers to how communication is implemented on the PC side -see also section III-B. Each task assigned to the HAL involves a series of handshaked communication runs with the hardware to complete.
From this point on, the hardware-based TD-engine implementation will be referred to as the emulator.
D. Limitations
There are a number of limitations in the implementation of section III. The most important ones are listed here.
x Limited number of available nodes x Neglecting of conductance ‫ܩ‬ of the transmission lines x Lack of support for a full 3φ model of the grid x Limited accuracy due to finite precision hardware x Lack of advanced models of nodal elements x Support of a narrow spectrum of events It is the firm intent of the authors to push this concept forward and tackle challenges listed hereinabove.
IV. RESULTS
A demonstration of the use of the proposed architecture is given in this section. Results for sample test systems of 5, 18, 36 and 59 buses are presented.
A. Single TD simulation
The 59-bus system is a modified version of the IEEE 57 bus test system. The response of the system to an indicative 3φ fault has been simulated for 6 ‫,ݏ‬ using a time-step of ‫ݐ‬ ௦௧ ௦௪ = 10 ‫ݏ݉‬ for the software engine and ‫ݐ‬ ௦௧ ௪ = 61.035 ‫ݏݑ‬ for the emulator. Fig. 5 shows a comparison of machine internal angles coming from the two engines. Fig. 6 shows the timing break-up of a single emulator run for different system sizes. The pie-charts show how time is divided between the software platform ("pc"), the hardware platform ("hw") and the inter-platform communication ("comm"). As the full set of internal angles of all machines is requested and communicated back to the software, communication requirements increase (linearly) with the number of machines in the system. It can be easily verified that communication is the performance bottleneck of the proposed architecture. Time spent on software (for preconditioning and other auxiliary tasks) and on hardware (for the computation proper) remains almost the same for all system sizes. The detrimental effect of communication becomes less pronounced when the results requested can be in a compact form. This is normally the case for a multitude of DSA analyses; two examples follow.
B. Branch contingency analysis
During an N-1 contingency analysis, the system operator is interested in knowing whether the system retains or not its stability, after an outage/perturbation is applied on each one of its elements. Therefore, in this case the answer from the hardware platform can be communicated with a simple boolean statement for each of the scenarios (stable/unstable), thus greatly reducing communication requirements.
A variation of this procedure was applied to different test systems. A set of branches was selected to be examined and perfect 3φ faults were applied in the middle of each branch for 200 ‫.ݏ݉‬ The system was simulated for 4 ‫,ݏ‬ using a time-step of ‫ݐ‬ ௦௧ ௦௪ = 10 ‫ݏ݉‬ for the software and ‫ݐ‬ ௦௧ ௪ = 244.1 ‫ݏݑ‬ for the hardware-based engine. Stability was determined based on angle separation of the machines in the system. Table Ι summarizes the timing results for the software-only (SO) and the mixed-platform (MP) solution. 
C. Critical Clearing Time determination
Critical Clearing Time (CCT) determination is concerned with the maximum sustained duration of a fault, for which the system retains its stability. This is usually done using a binary search over a predefined time window. The CCT value is approximated by an upper and a lower bound, the difference of which needs to be under an asked-for precision.
A CCT algorithm similar to the one described here, was implemented on the FPGA. Faults considered were perfect 3φ fugitive faults in the middle of the branches examined in the previous section. A binary search was performed for each fault location, in the search window ‫ݐ‬ ௦ ∈ [0.0 ‫,ݏ‬ 1.5 ‫,]ݏ‬ and a precision of ‫ݐ‬ = 10 ‫ݏ݉‬ was requested. Timing results for the software-only (SO) and the mixed-platform (MP) architectures are summarized in table ΙΙ. Simulation options are the same as in the previous section.
It is clear that in both analyses, the performance of the mixed-platform approach is superior by almost three orders of magnitude, compared to the software-only solution. In both cases, results are available online, in exactly the same software environment. Notice that if a larger time-step was selected for the emulator, the time spent on it would be reduced accordingly. In this way much greater speed-ups can be achieved.
V. CONCLUSIONS
The prospect of mixed-platform DSA systems as the one introduced in this paper is very promising. Especially so, as dynamic instability is usually the result of a chain of "blackswan" events, rather than of a single big-scale catastrophic incident [8] . The investigation of multiple contingency scenarios, such as in the form of ݊ − ݇ outages, is enabled with the use of the hybrid architecture described in this work.
Furthermore, the advent of the smart grid reshapes the nature of the transmission and principally of the distribution system. Emerging concepts such as Active Distribution Networks (ADN), microgrids and virtual power plants are self-organizing, self-regulating electrical micro-worlds that stress for distributed intelligence [9] . The pervasiveness of intelligence implied there, strongly suggests the use of dedicated electronics, and computational architectures similar to the one proposed in this work.
