1,256 research outputs found

    A case study for NoC based homogeneous MPSoC architectures

    Get PDF
    The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    An Implementation of a Dual-Processor System on FPGA

    Full text link
    In recent years, Field-Programmable Gate Arrays (FPGA) have evolved rapidly paving the way for a whole new range of computing paradigms. On the other hand, computer applications are evolving. There is a rising demand for a system that is general-purpose and yet has the processing abilities to accommodate current trends in application processing. This work proposes a design and implementation of a tightly-coupled FPGA-based dual-processor platform. We architect a platform that optimizes the utilization of FPGA resources and allows for the investigation of practical implementation issues such as cache design. The performance of the proposed prototype is then evaluated, as different configurations of a uniprocessor and a dual-processor system are studied and compared against each other and against published results for common industry-standard CPU platforms. The proposed implementation utilizes the Nios II 32-bit embedded soft-core processor architecture designed for the Altera Cyclone III family of FPGAs

    Partitioning of computationally intensive tasks between FPGA and CPUs

    Get PDF
    With the recent development of faster and more complex Multiprocessor System-on-Cips (MPSoCs), a large number of different resources have become available on a single chip. For example, Xilinx's UltraScale+ is a powerful MPSoC with four ARM Cortex-A53 CPUs, two Cortex-R5 real-time cores, an FPGA fabric and a Mali-400 GPU. Optimal partitioning between CPUs, real-time cores, GPU and FPGA is therefore a challenge. For many scientific applications with high sampling rates and real-time signal analysis, an FFT needs to be calculated and analyzed directly in the measuring device. The goal of partitioning such an FFT in an MPSoC is to make best use of the available resources, to minimize latency and to optimize performance. The paper compares different partitioning designs and discusses their advantages and disadvantages. Measurement results with up to 250 MSamples per second are shown

    FPGA Based Embedded Multiprocessor Architecture

    Get PDF
    Multiprocessor is a typical subject within the Computer architecture field of scope. A new methodology based on practical sessions with real devices and design is proposed. Embedded multiprocessor design presents challenges and opportunities that stem from task coarse granularity and the large number of inputs and outputs for each task. We have therefore designed a new architecture called embedded concurrent computing (ECC), which is implementing on FPGA chip using VHDL. The design methodology is expected to allow scalable embedded multiprocessors for system expansion. In recent decades, two forces have driven the increase of the processor performance: Advances in very large-scale integration (VLSI) technology and Micro architectural enhancements. Therefore, we aim to design the full architecture of an embedded processor for realistic to perform arithmetic, logical, shifting and branching operations. We will be synthesize and evaluated the embedded system based on Xilinx environment. Processor performance is going to be improving through clock speed increases and the clock speed increases and the exploitation of instruction- level parallelism. We will be designing embedded multiprocessor based on Xilinx environment or Modelsim environment

    A real-time asymmetric multiprocessor-reconfigurable system-on-chip architecture

    Get PDF
    We propose an asymmetric multi-processor SoC architecture, featuring a master CPU running uClinux, and multiple loosely-coupled slave CPUs running real-time threads assigned by the master CPU. Real-time SoC architectures often demand a compromise between a generic platform for different applications, and application-specific customizations to achieve performance requirements. Our proposed architecture offers a generic platform running a conventional embedded operating system providing a traditional software-oriented development approach, while multiple slave CPUs act as a dedicated independent real-time threads execution unit running in parallel of master CPU to achieve performance requirements. In this paper, the architecture is described, including the application / threading development environment. The performance of the architecture with several standard benchmark routines is also analysed
    • ā€¦
    corecore