903 research outputs found

    Modelling of 3-Phase p-q Theory-Based Dynamic Load for Real-Time Simulation

    Get PDF
    This article proposes a new method of modelling dynamic loads based on instantaneous p-q theory, to be employed in large powers system network simulations in a digital real-time environment. Due to the use of computationally heavy blocks such as phase-locked-loop (PLL), mean calculation,and coordinate transformation blocks (e.g., abc–dq0), real-time simulation of large networks with dynamic loads can be challenging. In order to decrease the computational burden associated to the dynamic load modelling, a p-q theory-based approach for load modelling is proposed in this paper. This approach is based on the well-known p-q instantaneous theory developed for power electronics converters, and it consists only of linear controllers and of a minimal usage of control loops, reducing the required computational power. This improves real-time performance and allows larger scale simulations. The introduced p-q theory-based load (PQL) model has been tested on standard networks implemented in a digital real time simulator, such as the SimBench semi-urban medium voltage network and the 118-bus Distribution System, showing significant improvement in terms of computational capability with respect to standard load models (e.g., MATLAB/Simulink dynamic load)

    Synthetic Aperture Radar Algorithms on Transport Triggered Architecture Processors using OpenCL

    Get PDF
    Live SAR imaging from small UAVs is an emerging field. On-board processing of the radar data requires high-performance and energy-efficient platforms. One candidate for this are Transport Triggered Architecture (TTA) processors. We implement Backprojection and Backprojection Autofocus on a TTA processor specially adapted for this task using OpenCL. The resulting implementation is compared to other platforms in terms of energy efficiency. We find that the TTA is on-par with embedded GPUs and surpasses other OpenCL-based platforms. It is outperformed only by a dedicated FPGA implementation. © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Two fast and accurate routines for solving the elliptic Kepler equation for all values of the eccentricity and mean anomaly

    Get PDF
    Context. The repetitive solution of Kepler’s equation (KE) is the slowest step for several highly demanding computational tasks in astrophysics. Moreover, a recent work demonstrated that the current solvers face an accuracy limit that becomes particularly stringent for high eccentricity orbits. Aims. Here we describe two routines, ENRKE and ENP5KE, for solving KE with both high speed and optimal accuracy, circumventing the abovementioned limit by avoiding the use of derivatives for the critical values of the eccentricity e and mean anomaly M , namely e > 0.99 and M close to the periapsis within 0.0045 rad. Methods. The ENRKE routine enhances the Newton-Raphson algorithm with a conditional switch to the bisection algorithm in the critical region, an efficient stopping condition, a rational first guess, and one fourth-order iteration. The ENP5KE routine uses a class of infinite series solutions of KE to build an optimized piecewise quintic polynomial, also enhanced with a conditional switch for close bracketing and bisection in the critical region. High-performance Cython routines are provided that implement these methods, with the option of utilizing parallel execution. Results. These routines outperform other solvers for KE both in accuracy and speed. They solve KE for every e ∈ [0, 1 − ϵ ], where ϵ is the machine epsilon, and for every M , at the best accuracy that can be obtained in a given M interval. In particular, since the ENP5KE routine does not involve any transcendental function evaluation in its generation phase, besides a minimum amount in the critical region, it outperforms any other KE solver, including the ENRKE, when the solution E ( M ) is required for a large number N of values of M . Conclusions. The ENRKE routine can be recommended as a general purpose solver for KE, and the ENP5KE can be the best choice in the large N regime.Axencia Galega de InnovaciónAgencia Estatal de Investigación | Ref. FIS2017-83762-

    Improving low latency applications for reconfigurable devices

    Get PDF
    This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed. Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture. Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces

    Implementation of a real time Hough transform using FPGA technology

    Get PDF
    This thesis is concerned with the modelling, design and implementation of efficient architectures for performing the Hough Transform (HT) on mega-pixel resolution real-time images using Field Programmable Gate Array (FPGA) technology. Although the HT has been around for many years and a number of algorithms have been developed it still remains a significant bottleneck in many image processing applications. Even though, the basic idea of the HT is to locate curves in an image that can be parameterized: e.g. straight lines, polynomials or circles, in a suitable parameter space, the research presented in this thesis will focus only on location of straight lines on binary images. The HT algorithm uses an accumulator array (accumulator bins) to detect the existence of a straight line on an image. As the image needs to be binarized, a novel generic synchronization circuit for windowing operations was designed to perform edge detection. An edge detection method of special interest, the canny method, is used and the design and implementation of it in hardware is achieved in this thesis. As each image pixel can be implemented independently, parallel processing can be performed. However, the main disadvantage of the HT is the large storage and computational requirements. This thesis presents new and state-of-the-art hardware implementations for the minimization of the computational cost, using the Hybrid-Logarithmic Number System (Hybrid-LNS) for calculating the HT for fixed bit-width architectures. It is shown that using the Hybrid-LNS the computational cost is minimized, while the precision of the HT algorithm is maintained. Advances in FPGA technology now make it possible to implement functions as the HT in reconfigurable fabrics. Methods for storing large arrays on FPGA’s are presented, where data from a 1024 x 1024 pixel camera at a rate of up to 25 frames per second are processed

    Error Analysis of CORDIC Processor with FPGA Implementation

    Full text link
    The coordinate rotation digital computer (CORDIC) is a shift-add based fast computing algorithm which has been found in many digital signal processing (DSP) applications. In this paper, a detailed error analysis based on mean square error criteria and its implementation on FPGA is presented. Two considered error sources are an angle approximation error and a quantization error due to finite word length in fixed-point number system. The error bound and variance are discussed in theory. The CORDIC algorithm is implemented on FPGA using the Xilinx Zynq-7000 development board called ZedBoard. Those results of theoretical error analysis are practically investigated by implementing it on actual FPGA board. In addition, Matlab is used to provide theoretical value as a baseline model by being set up in double-precision floating-point to compare it with the practical value of errors on FPGA implementation.Comment: 5 pages, 7 Figure

    Equalization in Dispersion-Managed Systems Using Learned Digital Back-Propagation

    Full text link
    In this paper, we investigate the use of the learned digital back-propagation (LDBP) for equalizing dual-polarization fiber-optic transmission in dispersion-managed (DM) links. LDBP is a deep neural network that optimizes the parameters of DBP using the stochastic gradient descent. We evaluate DBP and LDBP in a simulated WDM dual-polarization fiber transmission system operating at the bitrate of 256 Gbit/s per channel, with a dispersion map designed for a 2016 km link with 15% residual dispersion. Our results show that in single-channel transmission, LDBP achieves an effective signal-to-noise ratio improvement of 6.3 dB and 2.5 dB, respectively, over linear equalization and DBP. In WDM transmission, the corresponding QQ-factor gains are 1.1 dB and 0.4 dB, respectively. Additionally, we conduct a complexity analysis, which reveals that a frequency-domain implementation of LDBP and DBP is more favorable in terms of complexity than the time-domain implementation. These findings demonstrate the effectiveness of LDBP in mitigating the nonlinear effects in DM fiber-optic transmission systems

    Design of Hardware Accelerators for Optimized and Quantized Neural Networks to Detect Atrial Fibrillation in Patch ECG Device with RISC-V

    Get PDF
    Atrial Fibrillation (AF) is one of the most common heart arrhythmias. It is known to cause up to 15% of all strokes. In current times, modern detection systems for arrhythmias, such as single-use patch electrocardiogram (ECG) devices, have to be energy efficient, small, and affordable. In this work, specialized hardware accelerators were developed. First, an artificial neural network (NN) for the detection of AF was optimized. Special attention was paid to the minimum requirements for the inference on a RISC-V-based microcontroller. Hence, a 32-bit floating-point-based NN was analyzed. To reduce the silicon area needed, the NN was quantized to an 8-bit fixed-point datatype (Q7). Based on this datatype, specialized accelerators were developed. Those accelerators included single-instruction multiple-data (SIMD) hardware as well as accelerators for activation functions such as sigmoid and hyperbolic tangents. To accelerate activation functions that require the e-function as part of their computation (e.g., softmax), an e-function accelerator was implemented in the hardware. To compensate for the losses of quantization, the network was expanded and optimized for run-time and memory requirements. The resulting NN has a 7.5% lower run-time in clock cycles (cc) without the accelerators and 2.2 percentage points (pp) lower accuracy compared to a floating-point-based net, while requiring 65% less memory. With the specialized accelerators, the inference run-time was lowered by 87.2% while the F1-Score decreased by 6.1 pp. Implementing the Q7 accelerators instead of the floating-point unit (FPU), the silicon area needed for the microcontroller in 180 nm-technology is below 1 mm²

    Efficient FPGA Implementation of Recursive Least Square Adaptive Filter Using Non- Restoring Division Algorithm

    Get PDF
    In this paper, Recursive Least Square (RLS) and Affine Projection (AP) adaptive filters are designed using Xilinx System Generator and implemented on the Spartan6 xc6slx16- 2csg324 FPGA platform. FPGA platform utilizes the non-restoring division algorithm and the COordinate Rotation DIgital Computer (CORDIC) division algorithm to perform the division task of the RLS and AP adaptive filters. The Non-restoring division algorithm demonstrates efficient performance in terms of convergence speed and signal-to-noise ratio. In contrast, the CORDIC division algorithm requires 31 cycles for division initialization, whereas the non-restoring algorithm initializes division in just one cycle. To validate the effectiveness of the proposed filters, a set of ten ECG records from the BIT-MIT database is used to test their ability to remove Power Line Interference (PLI) noise from the ECG signal. The proposed adaptive filters are compared with various adaptive algorithms in terms of Signal-to-Noise Ratio (SNR), convergence speed, residual noise, steady-state Mean Square Error (MSE), and complexity

    Field programmable gate array hardware in the loop validation of fuzzy direct torque control for induction machine drive

    Get PDF
    Introduction. Currently, the direct torque control is very popular in industry and is of great interest to scientists in the variable speed drive of asynchronous machines. This technique provides decoupling between torque control and flux without the need to use pulse width modulation or coordinate transformation. Nevertheless, this command presents two major importunities: the switching frequency is highly variable on the one hand, and on the other hand, the amplitude of the torque and stator flux ripples remain poorly controlled throughout the considered operating speed range. The novelty of this article proposes improvements in performance of direct torque control of asynchronous machines by development of a fuzzy direct torque control algorithm. This latter makes it possible to provide solutions to the major problems of this control technique, namely: torque ripples, flux ripples, and failure to control switching frequency. Purpose. The emergence of this method has given rise to various works whose objective is to show its performance, or to provide solutions to its limitations. Indeed, this work consists in validation of a fuzzy direct torque control architecture implemented on the ML402 development kit (based on the Xilinx Virtex-4 type field programmable gate array circuit), through hardware description language (VHDL) and Xilinx generator system. The obtained results showed the robustness of the control and sensorless in front of load and parameters variation of induction motor control. The research directions of the model were determined for the subsequent implementation of results with simulation samples.Вступ. В даний час пряме управління моментом дуже популярне в промисловості і викликає великий інтерес у вчених у галузі частотно-регульованого приводу асинхронних машин. Цей метод забезпечує розв'язку між керуванням моментом, що крутить, і магнітним потоком без необхідності використання широтно-імпульсної модуляції або перетворення координат. Тим не менш, ця команда представляє дві основні незручності: з одного боку, частота комутації сильно варіюється, а з іншого боку, амплітуда пульсацій моменту і потоку статора залишається погано контрольованою у всьому діапазоні робочих швидкостей. Новизна цієї статті пропонує поліпшення характеристик прямого керування моментом, що крутить, асинхронних машин шляхом розробки нечіткого алгоритму прямого управління моментом, що крутить. Останнє дозволяє вирішити основні проблеми цього методу управління, а саме: пульсації моменту, що крутить, пульсації потоку і нездатність контролювати частоту перемикання. Мета. Поява цього методу породило різні роботи, метою яких є показати його ефективність чи запропонувати рішення стосовно його обмежень. Дійсно, ця робота полягає у перевірці нечіткої архітектури прямого управління моментом, що крутить, реалізованої в наборі для розробки ML402 (на основі схеми Xilinx Virtex-4 з програмованою користувачем вентильною матрицею), за допомогою мови опису обладнання (VHDL) та генераторної системи Xilinx. Отримані результати показали робастність керування та безсенсорного керування при зміні навантаження та параметрів керування асинхронним двигуном. Визначено напрями дослідження моделі для подальшої реалізації результатів на імітаційних вибірках
    corecore