54 research outputs found

    Parallel solution of power system linear equations

    Get PDF
    At the heart of many power system computations lies the solution of a large sparse set of linear equations. These equations arise from the modelling of the network and are the cause of a computational bottleneck in power system analysis applications. Efficient sequential techniques have been developed to solve these equations but the solution is still too slow for applications such as real-time dynamic simulation and on-line security analysis. Parallel computing techniques have been explored in the attempt to find faster solutions but the methods developed to date have not efficiently exploited the full power of parallel processing. This thesis considers the solution of the linear network equations encountered in power system computations. Based on the insight provided by the elimination tree, it is proposed that a novel matrix structure is adopted to allow the exploitation of parallelism which exists within the cutset of a typical parallel solution. Using this matrix structure it is possible to reduce the size of the sequential part of the problem and to increase the speed and efficiency of typical LU-based parallel solution. A method for transforming the admittance matrix into the required form is presented along with network partitioning and load balancing techniques. Sequential solution techniques are considered and existing parallel methods are surveyed to determine their strengths and weaknesses. Combining the benefits of existing solutions with the new matrix structure allows an improved LU-based parallel solution to be derived. A simulation of the improved LU solution is used to show the improvements in performance over a standard LU-based solution that result from the adoption of the new techniques. The results of a multiprocessor implementation of the method are presented and the new method is shown to have a better performance than existing methods for distributed memory multiprocessors

    Hardware Architectures and Implementations for Associative Memories : the Building Blocks of Hierarchically Distributed Memories

    Get PDF
    During the past several decades, the semiconductor industry has grown into a global industry with revenues around $300 billion. Intel no longer relies on only transistor scaling for higher CPU performance, but instead, focuses more on multiple cores on a single die. It has been projected that in 2016 most CMOS circuits will be manufactured with 22 nm process. The CMOS circuits will have a large number of defects. Especially when the transistor goes below sub-micron, the original deterministic circuits will start having probabilistic characteristics. Hence, it would be challenging to map traditional computational models onto probabilistic circuits, suggesting a need for fault-tolerant computational algorithms. Biologically inspired algorithms, or associative memories (AMs)—the building blocks of cortical hierarchically distributed memories (HDMs) discussed in this dissertation, exhibit a remarkable match to the nano-scale electronics, besides having great fault-tolerance ability. Research on the potential mapping of the HDM onto CMOL (hybrid CMOS/nanoelectronic circuits) nanogrids provides useful insight into the development of non-von Neumann neuromorphic architectures and semiconductor industry. In this dissertation, we investigated the implementations of AMs on different hardware platforms, including microprocessor based personal computer (PC), PC cluster, field programmable gate arrays (FPGA), CMOS, and CMOL nanogrids. We studied two types of neural associative memory models, with and without temporal information. In this research, we first decomposed the computational models into basic and common operations, such as matrix-vector inner-product and k-winners-take-all (k-WTA). We then analyzed the baseline performance/price ratio of implementing the AMs with a PC. We continued with a similar performance/price analysis of the implementations on more parallel hardware platforms, such as PC cluster and FPGA. However, the majority of the research emphasized on the implementations with all digital and mixed-signal full-custom CMOS and CMOL nanogrids. In this dissertation, we draw the conclusion that the mixed-signal CMOL nanogrids exhibit the best performance/price ratio over other hardware platforms. We also highlighted some of the trade-offs between dedicated and virtualized hardware circuits for the HDM models. A simple time-multiplexing scheme for the digital CMOS implementations can achieve comparable throughput as the mixed-signal CMOL nanogrids

    Computational valve plate design

    Get PDF
    Axial piston machines are widely used in many industries for their designs compactness, flexibility in power transfer, variable flow rate, and high efficiencies as compared to their manufacturing costs. One important component of all axial piston machines that is a very influential on the performance of the unit is the valve plate. The aim of this research is to develop a design methodology that is general enough to design all types of valve plates and the simple enough not to require advanced technical knowledge from the user. A new style of valve plate designs has been developed that comprehensively considers all previous design techniques and does not require significant changes to the manufacturing processes of valve plates. The design methodology utilizes a previously developed accurate computer model of the physical phenomenon. This allows the precise optimization of the valve plate design through the use of simulations rather than expensive trial and error processes. The design of the valve plate is clarified into the form of an optimization problem. This formulation into an optimization problem has motivated the selection of an optimization algorithm that satisfies the requirements of the design. The proposed design methodology was successfully tested in a case study in the shown to be very successful in improving required performance of the valve plate design

    Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

    Get PDF
    The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations

    Improving Short DNA Sequence Alignment with Parallel Computing

    Get PDF
    Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data

    Parallelization of a software based intrusion detection system - Snort

    Get PDF
    Computer networks are already ubiquitous in people’s lives and work and network security is becoming a critical part. A simple firewall, which can only scan the bottom four OSI layers, cannot satisfy all security requirements. An intrusion detection system (IDS) with deep packet inspection, which can filter all seven OSI layers, is becoming necessary for more and more networks. However, the processing throughputs of the IDSs are far behind the current network speed. People have begun to improve the performance of the IDSs by implementing them on different hardware platforms, such as Field-Programmable Gate Array (FPGA) or some special network processors. Nevertheless, all of these options are either less flexible or more expensive to deploy. This research focuses on some possibilities of implementing a parallelized IDS on a general computer environment based on Snort, which is the most popular open-source IDS at the moment. In this thesis, some possible methods have been analyzed for the parallelization of the pattern-matching engine based on a multicore computer. However, owing to the small granularity of the network packets, the pattern-matching engine of Snort is unsuitable for parallelization. In addition, a pipelined structure of Snort has been implemented and analyzed. The universal packet capture API - LibPCAP has been modified for a new feature, which can capture a packet directly to an external buffer. Then, the performance of the pipelined Snort can have an improvement up to 60% on an Intel i7 multicore computer for jumbo frames. A primary limitation is on the memory bandwidth. With a higher bandwidth, the performance of the parallelization can be further improved

    Testing of Two Novel Semi-Implicit Particle-In-Cell Techniques

    Get PDF
    PIC (Particle-in-cell) modeling is a computational technique which functions by advancing computer particles through a spatial grid consisting of cells, on which can be placed electric and magnetic fields. This method has proven useful for simulating a wide range of plasmas and excels at yielding accurate and detailed results such as particle number densities, particle energies, particle currents, and electric potentials. However, the detailed results of a PIC simulation come at a substantial cost of computational requirement and the algorithm can be susceptible to numerical instabilities. As processors become faster and contain more cores, the computational expense of PIC simulations is somewhat addressed, but this is not enough. Improvements must be made in the numerical algorithms as well. Unfortunately, a physical limit exists for how fast a silicon processor can operate, and increasing the number of processing cores increases the overhead of passing information between processors. Essentially, the solution for decreasing the computational time required by a PIC simulation is improving the solution algorithms and not through increasing the hardware capacity of the machine performing the simulation. In order to decrease the computational time and increase the stability of a PIC algorithm, it must be altered to circumvent the current limitations. The goal of the work presented in this thesis is twofold. The first objective is to develop a three-dimensional PIC simulation code that can be used to study different numerical algorithms. This computer code focuses on the solution of the equation of motion for charged particles moving in an electromagnetic field (Newton-Lorentz equation), the solution of the electric potentials caused by boundary conditions and charged particles (Poisson\u27s Equation), and the coupling of these two equations. The numerical solution of these two equations, their coupling, which is the primary cause of instabilities, and the severe computational requirements for PIC codes make writing this code a difficult task. Solving the Newton-Lorentz equation for large numbers of charged particles and Poisson\u27s equation is complex. This is the focus of this newly developed computer code. The second objective of the work presented in this thesis is to use the developed computer code to study two ideas for improving the numerical algorithm used in PIC codes. The two techniques investigated are: 1) implementing a fourth order electric field approximation in the equation of motion and 2) solving for the electric field, i.e. solving Poisson\u27s equation, multiple times within a single time step. The first of these methods uses the electric fields of many cells that a charged particle may pass through in one time step. This is opposed to using only the cell of origin electric field for the particle\u27s entire path during one time step. The idea here is to allow PIC codes to use larger time steps while remaining stable and avoiding numerical heating; thus reducing the overall computer time required. The second technique studied is utilizing multiple Poisson equation solves during a single time step. Typically, an explicit PIC model will solve the electric field only once during a time step; however, solving the field multiple times during the particle push allows particles to distribute themselves in a more electrically neutral manner within a single time step. The idea here is to allow larger time steps to be used without obtaining unrealistic electric potentials due to an artificial degree of charge separation. This eliminates instabilities and numerical heating. Explicit PIC codes have limits on how large the numerical time step can be before the electric potentials blow up. This work has shown that neither of these techniques, in their current state, are practical options to increase the time step of the PIC algorithm while ..
    corecore