769 research outputs found

    An Empirical Model of Packet Processing Delay of the Open vSwitch

    Full text link
    Network virtualization offers flexibility by decoupling virtual network from the underlying physical network. Software-Defined Network (SDN) could utilize the virtual network. For example, in Software-Defined Networks, the entire network can be run on commodity hardware and operating systems that use virtual elements. However, this could present new challenges of data plane performance. In this paper, we present an empirical model of the packet processing delay of a widely used OpenFlow virtual switch, the Open vSwitch. In the empirical model, we analyze the effect of varying Random Access Memory (RAM) and network parameters on the performance of the Open vSwitch. Our empirical model captures the non-network processing delays, which could be used in enhancing the network modeling and simulation

    A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

    Get PDF
    Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the design’s representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach

    An Intermediate Language and Estimator for Automated Design Space Exploration on FPGAs

    Full text link
    We present the TyTra-IR, a new intermediate language intended as a compilation target for high-level language compilers and a front-end for HDL code generators. We develop the requirements of this new language based on the design-space of FPGAs that it should be able to express and the estimation-space in which each configuration from the design-space should be mappable in an automated design flow. We use a simple kernel to illustrate multiple configurations using the semantics of TyTra-IR. The key novelty of this work is the cost model for resource-costs and throughput for different configurations of interest for a particular kernel. Through the realistic example of a Successive Over-Relaxation kernel implemented both in TyTra-IR and HDL, we demonstrate both the expressiveness of the IR and the accuracy of our cost model.Comment: Pre-print and extended version of poster paper accepted at international symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2015) Boston, MA, USA, June 1-2, 201

    Digital implementation of the cellular sensor-computers

    Get PDF
    Two different kinds of cellular sensor-processor architectures are used nowadays in various applications. The first is the traditional sensor-processor architecture, where the sensor and the processor arrays are mapped into each other. The second is the foveal architecture, in which a small active fovea is navigating in a large sensor array. This second architecture is introduced and compared here. Both of these architectures can be implemented with analog and digital processor arrays. The efficiency of the different implementation types, depending on the used CMOS technology, is analyzed. It turned out, that the finer the technology is, the better to use digital implementation rather than analog

    Data path analysis for dynamic circuit specialisation

    Get PDF
    Dynamic Circuit Specialisation (DCS) is a method that exploits the reconfigurability of modern FPGAs to allow the specialisation of FPGA circuits at run-time. Currently, it is only explored as part of Register-transfer level design. However, at the Register-transfer level (RTL), a large part of the design is already locked in. Therefore, maximally exploiting the opportunities of DCS could require a costly redesign. It would be interesting to already have insight in the opportunities for DCS from the higher abstraction level. Moreover, the general design trend in FPGA design is to work on higher abstraction levels and let tool(s) translate this higher level description to RTL. This paper presents the first profiler that, based on the high-level description of an application, estimates the benefits of an implementation using DCS. This allows a designer to determine much earlier in the design cycle whether or not DCS would be interesting. The high-level profiling methodology was implemented and tested on a set of PID designs

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    • 

    corecore