1,631 research outputs found

    An Algebraic Approach to Switch-Level Simulation

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research Corp. / SRC 88-DP-10

    Probabilistic Integral Circuits

    Full text link
    Continuous latent variables (LVs) are a key ingredient of many generative models, as they allow modelling expressive mixtures with an uncountable number of components. In contrast, probabilistic circuits (PCs) are hierarchical discrete mixtures represented as computational graphs composed of input, sum and product units. Unlike continuous LV models, PCs provide tractable inference but are limited to discrete LVs with categorical (i.e. unordered) states. We bridge these model classes by introducing probabilistic integral circuits (PICs), a new language of computational graphs that extends PCs with integral units representing continuous LVs. In the first place, PICs are symbolic computational graphs and are fully tractable in simple cases where analytical integration is possible. In practice, we parameterise PICs with light-weight neural nets delivering an intractable hierarchical continuous mixture that can be approximated arbitrarily well with large PCs using numerical quadrature. On several distribution estimation benchmarks, we show that such PIC-approximating PCs systematically outperform PCs commonly learned via expectation-maximization or SGD

    Sparse matrix based power flow solver for real-time simulation of power system

    Get PDF
    Analyzing a massive number of Power Flow (PF) equations even on almost identical or similar network topology is a highly time-consuming process for large-scale power systems. The major computation time is hoarded by the iterative linear solving process to solve nonlinear equations until convergence is achieved. This is a paramount concern for any PF analysis methods. This thesis presents a sparse matrix-based power flow solver that is fast enough to be implemented in the real-time analysis of largescale power systems. It uses KLU, a sparse matrix solver, for PF analysis. It also implements parallel processing of CPU and GPU which enables the simultaneous computation of multiple blocks in the algorithm leading to faster execution. It runs 1000 times and 200 times faster than newton raphson method for DC and AC power system respectively. On average, it is around 10 times faster than MATPOWER for both AC and DC power system

    Sparse matrix based power flow solver for real-time simulation of power system

    Get PDF
    Analyzing a massive number of Power Flow (PF) equations even on almost identical or similar network topology is a highly time-consuming process for large-scale power systems. The major computation time is hoarded by the iterative linear solving process to solve nonlinear equations until convergence is achieved. This is a paramount concern for any PF analysis methods. This thesis presents a sparse matrix-based power flow solver that is fast enough to be implemented in the real-time analysis of largescale power systems. It uses KLU, a sparse matrix solver, for PF analysis. It also implements parallel processing of CPU and GPU which enables the simultaneous computation of multiple blocks in the algorithm leading to faster execution. It runs 1000 times and 200 times faster than newton raphson method for DC and AC power system respectively. On average, it is around 10 times faster than MATPOWER for both AC and DC power system

    Near Deterministic Signal Processing Using GPU, DPDK, and MKL

    Get PDF
    RÉSUMÉ En radio défnie par logiciel, le traitement numcrique du signal impose le traitement en temps réel des donnés et des signaux. En outre, dans le développement de systèmes de communication sans fil basées sur la norme dite Long Term Evolution (LTE), le temps réel et une faible latence des processus de calcul sont essentiels pour obtenir une bonne experience utilisateur. De plus, la latence des calculs est une clé essentielle dans le traitement LTE, nous voulons explorer si des unités de traitement graphique (GPU) peuvent être utilisées pour accélérer le traitement LTE. Dans ce but, nous explorons la technologie GPU de NVIDIA en utilisant le modéle de programmation Compute Unified Device Architecture (CUDA) pour réduire le temps de calcul associé au traitement LTE. Nous présentons briévement l'architecture CUDA et le traitement paralléle avec GPU sous Matlab, puis nous comparons les temps de calculs avec Matlab et CUDA. Nous concluons que CUDA et Matlab accélérent le temps de calcul des fonctions qui sont basées sur des algorithmes de traitement en paralléle et qui ont le même type de données, mais que cette accélération est fortement variable en fonction de l'algorithme implanté. Intel a proposé une boite à outil pour le développement de plan de données (DPDK) pour faciliter le développement des logiciels de haute performance pour le traitement des fonctionnalités de télécommunication. Dans ce projet, nous explorons son utilisation ainsi que celle de l'isolation du système d'exploitation pour réduire la variabilité des temps de calcul des processus de LTE. Plus précisément, nous utilisons DPDK avec la Math Kernel Library (MKL) pour calculer la transformée de Fourier rapide (FFT) associée avec le processus LTE et nous mesurons leur temps de calcul. Nous évaluons quatre cas: 1) code FFT dans le cœur esclave sans isolation du CPU, 2) code FFT dans le cœur esclave avec l'isolation du CPU, 3) code FFT utilisant MKL sans DPDK et 4) code FFT de base. Nous combinons DPDK et MKL pour les cas 1 et 2 et évaluons quel cas est plus déterministe et réduit le plus la latence des processus LTE. Nous montrons que le temps de calcul moyen pour la FFT de base est environ 100 fois plus grand alors que l'écart-type est environ 20 fois plus élevé. On constate que MKL offre d'excellentes performances, mais comme il n'est pas extensible par lui-même dans le domaine infonuagique, le combiner avec DPDK est une alternative très prometteuse. DPDK permet d'améliorer la performance, la gestion de la mémoire et rend MKL évolutif.----------ABSTRACT In software defined radio, digital signal processing requires strict real time processing of data and signals. Specifically, in the development of the Long Term Evolution (LTE) standard, real time and low latency of computation processes are essential to obtain good user experience. As low latency computation is critical in real time processing of LTE, we explore the possibility of using Graphics Processing Units (GPUs) to accelerate its functions. As the first contribution of this thesis, we adopt NVIDIA GPU technology using the Compute Unified Device Architecture (CUDA) programming model in order to reduce the computation times of LTE. Furthermore, we investigate the efficiency of using MATLAB for parallel computing on GPUs. This allows us to evaluate MATLAB and CUDA programming paradigms and provide a comprehensive comparison between them for parallel computing of LTE processes on GPUs. We conclude that CUDA and Matlab accelerate processing of structured basic algorithms but that acceleration is variable and depends which algorithm is involved. Intel has proposed its Data Plane Development Kit (DPDK) as a tool to develop high performance software for processing of telecommunication data. As the second contribution of this thesis, we explore the possibility of using DPDK and isolation of operating system to reduce the variability of the computation times of LTE processes. Specifically, we use DPDK along with the Math Kernel Library (MKL) provided by Intel to calculate Fast Fourier Transforms (FFT) associated with LTE processes and measure their computation times. We study the computation times in different scenarios where FFT calculation is done with and without the isolation of processing units along the use of DPDK. Our experimental analysis shows that when DPDK and MKL are simultaneously used and the processing units are isolated, the resulting processing times of FFT calculation are reduced and have a near-deterministic characteristic. Explicitly, using DPDK and MKL along with the isolation of processing units reduces the mean and standard deviation of processing times for FFT calculation by 100 times and 20 times, respectively. Moreover, we conclude that although MKL reduces the computation time of FFTs, it does not offer a scalable solution but combining it with DPDK is a promising avenue

    Parallel solution of power system linear equations

    Get PDF
    At the heart of many power system computations lies the solution of a large sparse set of linear equations. These equations arise from the modelling of the network and are the cause of a computational bottleneck in power system analysis applications. Efficient sequential techniques have been developed to solve these equations but the solution is still too slow for applications such as real-time dynamic simulation and on-line security analysis. Parallel computing techniques have been explored in the attempt to find faster solutions but the methods developed to date have not efficiently exploited the full power of parallel processing. This thesis considers the solution of the linear network equations encountered in power system computations. Based on the insight provided by the elimination tree, it is proposed that a novel matrix structure is adopted to allow the exploitation of parallelism which exists within the cutset of a typical parallel solution. Using this matrix structure it is possible to reduce the size of the sequential part of the problem and to increase the speed and efficiency of typical LU-based parallel solution. A method for transforming the admittance matrix into the required form is presented along with network partitioning and load balancing techniques. Sequential solution techniques are considered and existing parallel methods are surveyed to determine their strengths and weaknesses. Combining the benefits of existing solutions with the new matrix structure allows an improved LU-based parallel solution to be derived. A simulation of the improved LU solution is used to show the improvements in performance over a standard LU-based solution that result from the adoption of the new techniques. The results of a multiprocessor implementation of the method are presented and the new method is shown to have a better performance than existing methods for distributed memory multiprocessors

    Performance Analysis of Hardware/Software Co-Design of Matrix Solvers

    Get PDF
    Solving a system of linear and nonlinear equations lies at the heart of many scientific and engineering applications such as circuit simulation, applications in electric power networks, and structural analysis. The exponentially increasing complexity of these computing applications and the high cost of supercomputing force us to explore affordable high performance computing platforms. The ultimate goal of this research is to develop hardware friendly parallel processing algorithms and build cost effective high performance parallel systems using hardware in order to enable the solution of large linear systems. In this thesis, FPGA-based general hardware architectures of selected iterative methods and direct methods are discussed. Xilinx Embedded Development Kit (EDK) hardware/software (HW/SW) codesigns of these methods are also presented. For iterative methods, FPGA based hardware architectures of Jacobi, combined Jacobi and Gauss-Seidel, and conjugate gradient (CG) are proposed. The convergence analysis of the LNS-based Jacobi processor demonstrates to what extent the hardware resource constraints and additional conversion error affect the convergence of Jacobi iterative method. Matlab simulations were performed to compare the performance of three iterative methods in three ways, i.e., number of iterations for any given tolerance, number of iterations for different matrix sizes, and computation time for different matrix sizes. The simulation results indicate that the key to a fast implementation of the three methods is a fast implementation of matrix multiplication. The simulation results also show that CG method takes less number of iterations for any given tolerance, but more computation time as matrix size increases compared to other two methods, since matrix-vector multiplication is a more dominant factor in CG method than in the other two methods. By implementing matrix multiplications of the three methods in hardware with Xilinx EDK HW/SW codesign, the performance is significantly improved over pure software Power PC (PPC) based implementation. The EDK implementation results show that CG takes less computation time for any size of matrices compared to other two methods in HW/SW codesign, due to that fact that matrix multiplications dominate the computation time of all three methods while CG requires less number of iterations to converge compared to other two methods. For direct methods, FPGA-based general hardware architecture and Xilinx EDK HW/SW codesign of WZ factorization are presented. Single unit and scalable hardware architectures of WZ factorization are proposed and analyzed under different constraints. The results of Matlab simulations show that WZ runs faster than the LU on parallel processors but slower on a single processor. The simulation results also indicate that the most time consuming part of WZ factorization is matrix update. By implementing the matrix update of WZ factorization in hardware with Xilinx EDK HW/SW codesign, the performance is also apparently improved over PPC based pure software implementation
    • …
    corecore