685 research outputs found

    A Quantitative Analysis of the Memory Architecture of FPGA-SoCs

    Get PDF
    In recent years, so called FPGA-SoCs have been introduced by Intel (formerly Altera) and Xilinx. These devices combine multi-core processors with programmable logic. This paper analyzes the various memory and communication interconnects found in actual devices, particularly the Zynq-7020 and Zynq-7045 from Xilinx and the Cyclone V SE SoC from Intel. Issues such as different access patterns, cache coherence and full-duplex communication are analyzed, for both generic accesses as well as for a real workload from the field of video coding. Furthermore, the paper shows that by carefully choosing the memory interconnect networks as well as the software interface, high-speed memory access can be achieved for various scenarios

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    High performance memory accesses on FPGA-SoCs : a quantitative analysis

    Get PDF
    FPGA-SoCs like Xilinx's Zynq-7000 and Altera's Generation 10 SoCs provide an integrated platform for HW/SW-co design applications. Computationally complex tasks can be implemented in the programmable logic part while control logic is implemented on the CPU. A potential bottleneck in such approaches is the interface latency and the data transfer throughput. Especially the data transfer to and from the memory subsystems can decrease the achievable performance significantly. Therefore, an analysis of the according subsystems of the Zynq-7000 has been performed in order to estimate the possible performance of HW/SW-codesigns with a special focus on two-dimensional memory accesses

    Study of heterogeneous and reconfigurable architectures in the communication domain

    Get PDF
    One of the most challenging design issues for next generations of (mobile) communication systems is fulfilling the computational demands while finding an appropriate trade-off between flexibility and implementation aspects, especially power consumption. Flexibility of modern architectures is desirable, e.g. concerning adaptation to new standards and reduction of time-to-market of a new product. Typical target architectures for future communication systems include embedded FPGAs, dedicated macros as well as programmable digital signal and control oriented processor cores as each of these has its specific advantages. These will be integrated as a System-on-Chip (SoC). For such a heterogeneous architecture a design space exploration and an appropriate partitioning plays a crucial role.</p><p style=&quot;line-height: 20px;&quot;> On the exemplary vehicle of a Viterbi decoder as frequently used in communication systems we show which costs in terms of <i>ATE</i> complexity arise implementing typical components on different types of architecture blocks. A factor of about seven orders of magnitude spans between a physically optimised implementation and an implementation on a programmable DSP kernel. An implementation on an embedded FPGA kernel is in between these two representing an attractive compromise with high flexibility and low power consumption. Extending this comparison to further components, it is shown quantitatively that the cost ratio between different implementation alternatives is closely related to the operation to be performed. This information is essential for the appropriate partitioning of heterogeneous systems

    A Methodology for Predicting Application-Specific Achievable Memory Bandwidth for HW/SW-Codesign

    Get PDF
    The trend of using heterogeneous computing and HW/SW-Codesign approaches allows increasing performance significantly while reducing power consumption. One of the main challenges when combining multiple processing devices is the communication, as an inefficient communication configuration can pose a bottleneck to the overall system performance. To address this problem, we present a methodology that assists the designer in making good design decisions for systems using shared DDR memory for communication. Our methodology analyzes a software implementation of the application and subsequently predicts the memory accesses of a functionally equivalent hardware implementation of the selected function. We furthermore propose an IP core that can perform these predicted memory accesses to estimate the achievable memory bandwidth between a functionally equivalent hardware implementation and shared memory. The resulting achievable memory bandwidth estimations differ by less than 2% from the actual achievable memory bandwidth of a functionally equivalent hardware implementation, demonstrating the feasibility of the presented methodology

    Automatic synthesis of application-specific processors

    Get PDF
    Thesis (D. Tech. (Engineering: Electrical)) -- Central University of technology, Free State, 2012This thesis describes a method for the automatic generation of appli- cation speci_c processors. The thesis was organized into three sepa- rate but interrelated studies, which together provide: a justi_cation for the method used, a theory that supports the method, and a soft- ware application that realizes the method. The _rst study looked at how modern day microprocessors utilize their hardware resources and it proposed a metric, called core density, for measuring the utilization rate. The core density is a function of the microprocessor's instruction set and the application scheduled to run on that microprocessor. This study concluded that modern day microprocessors use their resources very ine_ciently and proposed the use of subset processors to exe- cute the same applications more e_ciently. The second study sought to provide a theoretical framework for the use of subset processors by developing a generic formal model of computer architecture. To demonstrate the model's versatility, it was used to describe a number of computer architecture components and entire computing systems. The third study describes the development of a set of software tools that enable the automatic generation of application speci_c proces- sors. The FiT toolkit automatically generates a unique Hardware Description Language (HDL) description of a processor based on an application binary _le and a parameterizable template of a generic mi- croprocessor. Area-optimized and performance-optimized custom soft processors were generated using the FiT toolkit and the utilization of the hardware resources by the custom soft processors was character- ized. The FiT toolkit was combined with an ANSI C compiler and a third-party tool for programming _eld-programmable gate arrays (FPGAs) to create an unconstrained C-to-silicon compiler
    • 

    corecore