Search CORE

372,548 research outputs found

Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

Author: Carroll Chester C.
Saha Aindam
Youngblood John N.
Publication venue
Publication date
Field of study

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

NASA Technical Reports Server

Implementing an Affordable High Performance Computing for Teaching-oriented Computer Science Curriculum

Author: Abuzaghleh Omar
Lee Jeongkyu
Publication venue
Publication date: 01/01/2011
Field of study

The main objective of this poster is to present an affordable and easy-to-use high performance cluster system that can be used for the classroom in teaching-oriented computer science curriculum. In order to address this, we design and implement an affordable high performance cluster system that is based on PlayStation 3(r). PS3 is a well-known for game console manufactured by Sony. Since each PS3 console has IBM Cell BE processor that consists of 8 Synergistic Processing Elements (SPEs) and 1 Power Processing Element (PPE), it can be used as a processing node with multiple-core processor in the cluster system. In addition, the implemented cluster system has been used for new and existing computer science courses, such as CPSC 592: Parallel and Distributed Database, CPSC 590: Parallel and Distributed Processing, and CPSC 591: Parallel Programming

UB ScholarWorks

Application of the p-version of the finite-element method to global-local problems

Author: Szabo Barna A.
Publication venue
Publication date
Field of study

A brief survey is given of some recent developments in finite-element analysis technology which bear upon the three main research areas under consideration in this workshop: (1) analysis methods; (2) software testing and quality assurance; and (3) parallel processing. The variational principle incorporated in a finite-element computer program, together with a particular set of input data, determines the exact solution corresponding to that input data. Most finite-element analysis computer programs are based on the principle of virtual work. In the following, researchers consider only programs based on the principle of virtual work and denote the exact displacement vector field corresponding to some specific set of input data by vector u(EX). The exact solution vector u(EX) is independent of the design of the mesh or the choice of elements. Except for very simple problems, or specially constructed test problems, vector u(EX) is not known. Researchers perform a finite-element analysis (or any other numerical analysis) because they wish to make conclusions concerning the response of a physical system to certain imposed conditions, as if vector u(EX) were known

NASA Technical Reports Server

SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80

Author: Kamat Manohar P.
Watson Brian C.
Publication venue
Publication date
Field of study

The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center

NASA Technical Reports Server

The Design of a Processing Element for the Systolic Array Implementation of a Kalman Filter

Author: Condorodis John P.
Publication venue: University of Central Florida
Publication date: 01/01/1987
Field of study

The Kalman filter is an important component of optimal estimation theory. It has applications in a wide range of high performance control systems including navigational, fire control, and targeting systems. The Kalman filter, however, has not been utilized to its full potential due to the limitations of its inherent computational intensiveness which requires off-line processing or allows only low bandwidth real-time applications. The recent advances in VLSI circuit technology have created the opportunity to design algorithms and data structures for direct implementation in integrated circuits. A systolic architecture is a concept which allows the construction of massively parallel systems in integrated circuits and has been utilized as a means of achieving high data rates. A systolic system consists of a set of interconnected processing elements, each capable of performing some simple operation. The design of a processing element in an orthogonal systolic architecture will be investigated using the state of the art in VLSI technology. The goal is to create a high speed, high precision processing element which is adaptive to a highly configurable systolic architecture. In order to achieve the necessary high computational throughput, the arithmetic unit of the processing element will be implemented using the Logarithmic Number System. The Systolic architecture approach will be used in an attempt to implement a Kalman filtering system with both a high sampling rate and a small package size. The design of such a Kalman filter would enable this filtering technology to be applied to the areas of process control, computer vision, and robotics

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

RTL implementation of one-sided jacobi algorithm for singular value decomposition

Author: Wan Mohamad Wan Ahmad Zainie
Publication venue
Publication date: 01/06/2016
Field of study

Multi-dimensional digital signal processing such as image processing and image reconstruction involve manipulating of matrix data. Better quality images involve large amount of data, which result in unacceptably slow computation. A parallel processing scheme is a possible solution to solve this problem. This project presented an analysis and comparison to various algorithms for widely used matrix decomposition techniques and various computer architectures. As the result, a parallel implementation of one-sided Jacobi algorithm for computing singular value decomposition (SVD) of a 2х2 matrix on field programmable gate arrays (FPGA) is developed. The proposed SVD design is based on pipelined-datapath architecture The design process is started by evaluating the algorithm using Matlab, design datapath unit and control unit, coding in SystemVerilog HDL, verification and synthesis using Quartus II and simulated on ModelSim-Altera. The original matrix size of 4x4 and 8x8 is used to with the SVD processing element (PE). The result are compared with the Matlab version of the algorithm to evaluate the PE. The computation of SVD can be speed-up of more than 2 by increasing the number of PE at the cost of increased in circuit area

Universiti Teknologi Malaysia Institutional Repository

A Parallel Processor System for Nuclear Shell-Model Calculations

Author: Berry Douglas James
Publication venue: ProQuest Dissertations & Theses,
Publication date: 01/01/1988
Field of study

This thesis describes the design and implementation of a dedicated parallel processor system for nuclear shell-model calculations. The purpose of these calculations is to determine nuclear energy eigenvalues by the tridiagonalisation of the nuclear Hamiltonian matrix using the Lanczos method. The Theoretical Nuclear Structure group at Glasgow University's Physics Department would normally perform this type of calculation on a high-performance main-frame computer. However these machines have limitations which restrict the number and scope of the calculations that can be performed. The Shell Model Processor system consists of a Multiple Microprocessor Unit (MMPU) driven by a highly pipelined dedicated front-end processor. The MMPU has a modular, moderately coupled, MIMD architecture based on autonomous processing modules. The elements within the system communicate via three shared buses. The front-end is responsible for determining the position of non-zero elements within the Hamiltonian matrix. Once the position of an element has been found it is passed to one of the free processing modules within the MMPU. The processing module then determines the value of the matrix element and performs the appropriate arithmetic to accumulate the resultant Lanczos vector. Two such processing modules have been developed. The most recently developed module is based on two MC68000 16/32 bit microprocessors. In addition there are two supervisory processor modules, one of which controls the front-end and also assists it in its function. The other module has privileged system capabilities and is responsible for supervising the system as a whole. The system has been successfully tested and performance figures are presented. The future expansion of the system to allow it to perform larger calculations is also discussed

Glasgow Theses Service

Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA

Author: A Jacob
BA Shapiro
DH Mathews
DH Mathews
DW Mount
Fei Xia
G Tan
G Tan
G Tan
G Tan
IHM Fekete
IL Hofacker
IL Hofacker
IL Hofacker
JH Chen
Jiaqing Xu
M Zuker
P Gardner
R Nussinov
RB Lyngso
RB Lyngso
S Washietl
SR Eddy
Xingming Zhou
Xuejun Yang
Yang Zhang
Yong Dou
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In the field of RNA secondary structure prediction, the RNAalifold algorithm is one of the most popular methods using free energy minimization. However, general-purpose computers including parallel computers or multi-core computers exhibit parallel efficiency of no more than 50%. Field Programmable Gate-Array (FPGA) chips provide a new approach to accelerate RNAalifold by exploiting fine-grained custom design. Results RNAalifold shows complicated data dependences, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master Processing Element (PE) and multiple slave PEs for fine grain hardware implementation on FPGA. We exploit data reuse schemes to reduce the need to load energy matrices from external memory. We also propose several methods to reduce energy table parameter size by 80%. Conclusion To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete RNAalifold algorithm. The experimental results show a factor of 12.2 speedup over the RNAalifold (<it>ViennaPackage </it>– 1.6.5) software for a group of aligned RNA sequences with 2981-residue running on a Personal Computer (PC) platform with Pentium 4 2.6 GHz CPU.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central