51 research outputs found

    Vector support for multicore processors with major emphasis on configurable multiprocessors

    Get PDF
    It recently became increasingly difficult to build higher speed uniprocessor chips because of performance degradation and high power consumption. The quadratically increasing circuit complexity forbade the exploration of more instruction-level parallelism (JLP). To continue raising the performance, processor designers then focused on thread-level parallelism (TLP) to realize a new architecture design paradigm. Multicore processor design is the result of this trend. It has proven quite capable in performance increase and provides new opportunities in power management and system scalability. But current multicore processors do not provide powerful vector architecture support which could yield significant speedups for array operations while maintaining arealpower efficiency. This dissertation proposes and presents the realization of an FPGA-based prototype of a multicore architecture with a shared vector unit (MCwSV). FPGA stands for Filed-Programmable Gate Array. The idea is that rather than improving only scalar or TLP performance, some hardware budget could be used to realize a vector unit to greatly speedup applications abundant in data-level parallelism (DLP). To be realistic, limited by the parallelism in the application itself and by the compiler\u27s vectorizing abilities, most of the general-purpose programs can only be partially vectorized. Thus, for efficient resource usage, one vector unit should be shared by several scalar processors. This approach could also keep the overall budget within acceptable limits. We suggest that this type of vector-unit sharing be established in future multicore chips. The design, implementation and evaluation of an MCwSV system with two scalar processors and a shared vector unit are presented for FPGA prototyping. The MicroBlaze processor, which is a commercial IP (Intellectual Property) core from Xilinx, is used as the scalar processor; in the experiments the vector unit is connected to a pair of MicroBlaze processors through standard bus interfaces. The overall system is organized in a decoupled and multi-banked structure. This organization provides substantial system scalability and better vector performance. For a given area budget, benchmarks from several areas show that the MCwSV system can provide significant performance increase as compared to a multicore system without a vector unit. However, a MCwSV system with two MicroBlazes and a shared vector unit is not always an optimized system configuration for various applications with different percentages of vectorization. On the other hand, the MCwSV framework was designed for easy scalability to potentially incorporate various numbers of scalar/vector units and various function units. Also, the flexibility inherent to FPGAs can aid the task of matching target applications. These benefits can be taken into account to create optimized MCwSV systems for various applications. So the work eventually focused on building an architecture design framework incorporating performance and resource management for application-specific MCwSV (AS-MCwSV) systems. For embedded system design, resource usage, power consumption and execution latency are three metrics to be used in design tradeoffs. The product of these metrics is used here to choose the MCwSV system with the smallest value

    OPTIMIZATION OF FPGA-BASED PROCESSOR ARCHITECTURE FOR SOBEL EDGE DETECTION OPERATOR

    Get PDF
    This dissertation introduces an optimized processor architecture for Sobel edge detection operator on field programmable gate arrays (FPGAs). The processor is optimized by the use of several optimization techniques that aim to increase the processor throughput and reduce the processor logic utilization and memory usage. FPGAs offer high levels of parallelism which is exploited by the processor to implement the parallel process of edge detection in order to increase the processor throughput and reduce the logic utilization. To achieve this, the proposed processor consists of several Sobel instances that are able to produce multiple output pixels in parallel. This parallelism enables data reuse within the processor block. Moreover, the processor gains performance with a factor equal to the number of instances contained in the processor block. The processor that consists of one row of Sobel instances exploits data reuse within one image line in the calculations of the horizontal gradient. Data reuse within one and multiple image lines is enabled by using a processor with multiple rows of Sobel instances which allow the reuse of both the horizontal and vertical gradients. By the application of the optimization techniques, the proposed Sobel processor is able to meet real-time performance constraints due to its high throughput even with a considerably low clock frequency. In addition, logic utilization of the processor is low compared to other Sobel processors when implemented on ALTERA Cyclone II DE2-70

    Efficient design and implementation of image processing algorithms on reconfigurable hardware using Handel-C

    Full text link
    Computer manipulation of images is generally defined as Digital Image Processing (DIP). DIP is used in variety of applications, including video surveillance, target recognition, and image enhancement. These applications are usually implemented in software but may use special purpose hardware for speed. With advances in the VLSI technology hardware implementation has become an attractive alternative. Assigning complex computation tasks to hardware and exploiting the parallelism and pipelining in algorithms yield significant speedup in running times. In this thesis the image processing algorithms like median filter, basic morphological operators, convolution and edge detection algorithms are implemented on FPGA. A pipelined architecture of these algorithms is presented. The proposed architectures are capable of producing one output on every clock cycle. The hardware modeling was accomplished using Handel-C (DK2 environment). The algorithm was tested on standard image processing benchmarks and the results are compared with that obtained on software

    OPTIMIZATION OF FPGA-BASED PROCESSOR ARCHITECTURE FOR SOBEL EDGE DETECTION OPERATOR

    Get PDF
    This dissertation introduces an optimized processor architecture for Sobel edge detection operator on field programmable gate arrays (FPGAs). The processor is optimized by the use of several optimization techniques that aim to increase the processor throughput and reduce the processor logic utilization and memory usage. FPGAs offer high levels of parallelism which is exploited by the processor to implement the parallel process of edge detection in order to increase the processor throughput and reduce the logic utilization. To achieve this, the proposed processor consists of several Sobel instances that are able to produce multiple output pixels in parallel. This parallelism enables data reuse within the processor block. Moreover, the processor gains performance with a factor equal to the number of instances contained in the processor block. The processor that consists of one row of Sobel instances exploits data reuse within one image line in the calculations of the horizontal gradient. Data reuse within one and multiple image lines is enabled by using a processor with multiple rows of Sobel instances which allow the reuse of both the horizontal and vertical gradients. By the application of the optimization techniques, the proposed Sobel processor is able to meet real-time performance constraints due to its high throughput even with a considerably low clock frequency. In addition, logic utilization of the processor is low compared to other Sobel processors when implemented on ALTERA Cyclone II DE2-70

    Application of advanced on-board processing concepts to future satellite communications systems

    Get PDF
    An initial definition of on-board processing requirements for an advanced satellite communications system to service domestic markets in the 1990's is presented. An exemplar system architecture with both RF on-board switching and demodulation/remodulation baseband processing was used to identify important issues related to system implementation, cost, and technology development

    A distributed control microprocessor system

    Get PDF
    Imperial Users onl

    A Parallel Processor System for Nuclear Shell-Model Calculations

    Get PDF
    This thesis describes the design and implementation of a dedicated parallel processor system for nuclear shell-model calculations. The purpose of these calculations is to determine nuclear energy eigenvalues by the tridiagonalisation of the nuclear Hamiltonian matrix using the Lanczos method. The Theoretical Nuclear Structure group at Glasgow University's Physics Department would normally perform this type of calculation on a high-performance main-frame computer. However these machines have limitations which restrict the number and scope of the calculations that can be performed. The Shell Model Processor system consists of a Multiple Microprocessor Unit (MMPU) driven by a highly pipelined dedicated front-end processor. The MMPU has a modular, moderately coupled, MIMD architecture based on autonomous processing modules. The elements within the system communicate via three shared buses. The front-end is responsible for determining the position of non-zero elements within the Hamiltonian matrix. Once the position of an element has been found it is passed to one of the free processing modules within the MMPU. The processing module then determines the value of the matrix element and performs the appropriate arithmetic to accumulate the resultant Lanczos vector. Two such processing modules have been developed. The most recently developed module is based on two MC68000 16/32 bit microprocessors. In addition there are two supervisory processor modules, one of which controls the front-end and also assists it in its function. The other module has privileged system capabilities and is responsible for supervising the system as a whole. The system has been successfully tested and performance figures are presented. The future expansion of the system to allow it to perform larger calculations is also discussed

    Home Automation and Transparent Data Transmission Using Single-Medium Network Concept

    Get PDF
    Tämän diplomityön tarkoituksena on esitellä uusi yleiskäyttöinen tietoliikenneverkko läpinäkyvää tiedonsiirtoa ja kotiautomaation ohjaussovelluksia varten. Tietoliikennealusta nimeltään Wiseriver on ubiikki (kaikkialla läsnä oleva) langallinen parikaapeliverkko, joka on suunniteltu vastaamaan kaikenlaisiin yksittäisiin tiedonsiirtotarpeisiin kodeissa ja rakennuksissa. Teknologia perustuu konfiguroitaviin protokollariippumattomiin tiedonvälitysresursseihin, joita kutsutaan käsitteellä virtual wire (virtuaalinen johto). Opinnäyte alkoi yleiskatsauksella vastaavanlaisiin jo markkinoilla oleviin teknologioihin, jonka jälkeen seurasi tarkempi perehtyminen Wiseriver-järjestelmän toiminnassa käytettäviin tiedonvälitysperiaatteisiin. Keskeisin osuus opinnäytteen tekemisessä oli näiden Wiseriver-toimintojen implementointi FPGA:lla. Implementaatio sisälsi RTL-koodausta, simulointia ja logiikkasynteesiä. Kaksi erillistä, mutta samankaltaista FPGA-toteutusta toimivat ohjaimina Wiseriverin isäntä- ja liitäntäsolmuyksiköiden prototyyppiversioissa. Kokonainen Wiseriverin järjestelmäprototyyppi puolestaan toimii perustana kehitettäessä järjestelmää edelleen pilottikohteeseen. Simulaatio- ja testaustyön lopputuloksena syntyi perustoteutus, joka kykenee välittämään läpinäkyvästi Ethernet-pohjaista liikennettä ja hallitsemaan yksinkertaista valo-ohjaussovellusta. Simulaatiotulokset ja ajoitusraportit osoittavat että toteutus toimii myös valmisteilla olevassa prototyyppilaitteistossa. Wiseriver-järjestelmän prototyyppivaihe sisältää useita eri tahtiin eteneviä osakokonaisuuksia sisältäen esimerkiksi piirilevy- ja ohjelmistosuunnittelua. Jatkokehitystä ajatellen on myös jo olemassa suunnitelmia järjestelmän laajentamiseksi edelleen.The purpose of this thesis is to present a new universal communication network for transparent data transmission and control applications used in home automation. The communication platform called Wiseriver is a ubiquitous wired twisted-pair network that is designed to meet all kind of individual data transmission needs in homes and buildings. The technology is based on configurable protocol-independent communication resources called virtual wires. The thesis was started by a general survey to related technologies already existing in the market and then followed by a more specific introduction to transmission principles used in the operation of Wiseriver system. The main contribution of this thesis was to implement these Wiseriver functions with FPGA. The implementation included RTL coding using VHDL, functional simulations and logic syntheses. Two different but similar FPGA designs are used as controllers in master and access node prototype components of Wiseriver. A whole Wiseriver system prototype in turn will be used as groundwork for developing a pilot system. The outcome of the simulation and debugging process was a base design that permits to transmit Ethernet based traffic transparently and handle a simple light control application. Simulation results and timing analyze reports indicate that the design works in completed prototype hardware. Other related developments such as PCB layout and software designs are ongoing during the prototype phase of the whole system. Also several follow-up developments have been already considered for improving the system

    Hardware/Software Co-Design via Specification Refinement

    Get PDF
    System-level design is an engineering discipline focused on producing methods, technologies, and tools that enable the specification, design, and implementation of complex, multi-discipline, and multi-domain systems. System-level specifications are as abstract as possible, defining required system behaviors while eliding implementation details. These implementation details must be added during the implementation process and the high effort associated with this locks system engineers onto the chosen implementation architecture. This work provides two contributions that ease the implementation process. The Rosetta synthesis capability generates hardware/software co-designed implementations from specifications that contain low level implementation details. The Rosetta refinement capability extends this by allowing a system's functional behavior and its implementation details to be described separately. The Rosetta Refinement Tool combines the functional behavior and the implementation details to form a system specification that can be synthesized using the Rosetta synthesis capability. The Rosetta refinement capability is exposed using existing Rosetta language constructs that have, previous to this work, never been exploited. Together these two capabilities allow the refinement of high level, architecture independent specifications into low level, architecture specific hardware/software co-designed implementations. The result is an effective platform for rapid prototyping of hardware/software co-designs and provides system engineers with the novel ability to explore different system architectures with low effort

    State of the art survey of technologies applicable to NASA's aeronautics, avionics and controls program

    Get PDF
    The state of the art survey (SOAS) covers six technology areas including flightpath management, aircraft control system, crew station technology, interface & integration technology, military technology, and fundamental technology. The SOAS included contributions from over 70 individuals in industry, government, and the universities
    corecore