19 research outputs found

    A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

    Get PDF

    Asymmetrical three-phase fault evaluation in a distribution network using the genetic algorithm and the particle swarm optimisation

    Get PDF
    Abstract: Modern electric power systems are made up of three main sub-systems: generation; transmission; and distribution. The most common faults in distribution sub-systems are asymmetrical three-phase short circuit faults due to the fact that asymmetrical three-phase faults can be: line-to-line faults; two lines-to-earth faults; and single line-to-earth faults. This increases their probability of occurrence, unlike symmetrical three-phase faults which can only occur when all the three phases have been simultaneously shorted. Standard IEC 60909 and IEC 61363 provide all the basic information that is used for the detection of short circuit faults. However, the two standards use numerous estimates in their faults evaluation procedures. They estimate voltage factors (c), impedance correction factors (k), resistance to reactance ratios (R/X), resistance to impedance ratios (R/Z) and various other scaling factors for rotating machines. These IEC estimates are not evenly distributed throughout the 550kV and as such, they do not sufficiently cater for every nominal voltage. When the need arises, the user has to estimate these values accordingly. This research presents a genetic algorithm (GA) and a particle swarm optimisation (PSO) for the detection of asymmetrical three-phase short circuit faults within electric distribution networks of power systems with nominal voltages less than 550kV. GA and PSO are nature-inspired optimisation techniques. Although PSO has quick convergence, it suffers from partial optimism and premature stagnation. Some innovative coding adjustments were made in the creation of initial positions and particle distribution within the swarm. The GA struggles with: survival rates of individuals; stalling during optimisation; and proper gene replacements. Coding adjustments were also made to GA with regards to: strategic gene replacements; crossover when combining the properties of parents; and the arrangement of scores and expectation. Pattern search and Fmincon algorithms were also added to both algorithms as minimisation functions that commence after the evolutionary algorithms (EAs) terminate. The EAs were initially tested on the Rastrigin and Rosenbrock functions to ensure their efficiencies. During fault detection, the developed EAs were used to stochastically determine some of the most crucial estimates (R/X and R/Z ratios). The proposed methodology would compute these values on a case-to-case basis for every optimisation case with regards to the parameters and unique specifications of the power system. The EAs were tested on a nominal voltage that is properly catered for by Standard IEC. They obtained ratios, impedances and currents that were within an approximate range to the IEC values for that nominal voltage. This further implies that EAs can be reliably used to: stochastically determine these ratios; compute impedances; and detect fault currents for all the nominal voltages including those that are not sufficiently catered for by Standard IEC. Since R/X and R/Z ratios play a key role in determining the upstream and fault point impedances, the proposed methodology can be used to compute much more precise fault magnitudes at various network levels thereby setting up and repairing power systems sufficiently.M.Ing. (Electrical and Electronic Engineering Science

    High performance reconfigurable architectures for biological sequence alignment

    Get PDF
    Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    Microarchitecture Choices and Tradeoffs for Maximizing Processing Efficiency.

    Full text link
    This thesis is concerned with hardware approaches for maximizing the number of independent instructions in the execution core and thereby maximizing the processing efficiency for a given amount of compute bandwidth. Compute bandwidth is the number of parallel execution units multiplied by the pipelining of those units in the processor. Keeping those computing elements busy is key to maximize processing efficiency and therefore power efficiency. While some applications have many independent instructions that can be issued in parallel without inefficiencies due to branch behavior, cache behavior, or instruction dependencies, most applications have limited parallelism and plenty of stalling conditions. This thesis presents two approaches to this problem, which in combination greatly increases the efficiency of the processor utilization of resources. The first approach addresses the problem of small basic blocks that arise when code has frequent branches. We introduce algorithms and mechanisms to predict multiple branches simultaneously and to fetch multiple non-continuous basic blocks every cycle along a predicted branch path. This makes what was previously an inherently serial process into a parallelized instruction fetch approach. For integer applications, the result is an increase in useful instruction fetch capacity of 40% when two basic blocks are fetched per cycle and 63% for three blocks per cycle. For floating point benchmarks, the associated improvement is 27% and 59%. The second approach addresses increasing the number of independent instructions to the execution core through simultaneous multi-threading (SMT). We compare to another multithreading approach, Switch-on-Event multithreading, and show that SMT is far superior. Intel Pentium 4 SMT microarchitecture algorithms are analyzed, and we look at the impact of SMT on power efficiency of the Pentium 4 Processor. A new metric, the SMT Energy Benefit is defined. Not only do we show that the SMT Energy Benefit for a given workload with SMT can be quite significant, we also generalize the results and build a model for what other future processors’ SMT Energy Benefit would be. We conclude that while SMT will continue to be an energy-efficient feature, as processors get more energy-efficient in general the relative SMT Energy Benefit may be reduced.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61740/1/dtmarr_1.pd

    SCALE: A modular code system for performing standardized computer analyses for licensing evaluation. Miscellaneous -- Volume 3, Revision 4

    Full text link

    SCALE: A modular code system for performing standardized computer analyses for licensing evaluation

    Full text link

    SCALE: A modular code system for performing standardized computer analyses for licensing evaluation: Control modules C4, C6

    Full text link
    corecore