3,058 research outputs found

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Power Estimation Technique for DSP Architectures.

    Get PDF
    The main goal of power estimation is to optimize the power consumption of a electronic design. Power is a strongly pattern dependent function. Input statistics greatly influence on average power. We solve the pattern dependence problem for intellectual property (IP) designs. In this paper, we present a power macro-modeling technique for digital signal processing (DSP) architectures in terms of the statistical knowledge of their primary inputs. During the power estimation procedure, the sequence of an input stream is generated by a genetic algorithm using input metrics. Then, a Monte Carlo zero delay simulation is performed and a power dissipation macro-model function is built from power dissipation results. From then on, this macro-model function can be used to estimate power dissipation of the system just by using the statistics of the macro-block’s primary in puts. In experiments with the DSP system, the average error is 26%

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    SIM-DSP: A DSP-Enhanced CAD Platform for Signal Integrity Macromodeling and Simulation

    Get PDF
    Macromodeling-Simulation process for signal integrity verifications has become necessary for the high speed circuit system design. This paper aims to introduce a “VLSI Signal Integrity Macromodeling and Simulation via Digital Signal Processing Techniques” framework (known as SIM-DSP framework), which applies digital signal processing techniques to facilitate the SI verification process in the pre-layout design phase. Core identification modules and peripheral (pre-/post-)processing modules have been developed and assembled to form a verification flow. In particular, a single-step discrete cosine transform truncation (DCTT) module has been developed for modeling-simulation process. In DCTT, the response modeling problem is classified as a signal compression problem, wherein the system response can be represented by a truncated set of non-pole based DCT bases, and error can be analyzed through Parseval’s theorem. Practical examples are given to show the applicability of our proposed framework

    High-level power optimisation for Digital Signal Processing in Recon gurable Logic

    No full text
    This thesis is concerned with the optimisation of Digital Signal Processing (DSP) algorithm implementations on recon gurable hardware via the selection of appropriate word-lengths for the signals in these algorithms, in order to minimise system power consumption. Whilst existing word-length optimisation work has concentrated on the minimisation of the area of algorithm implementations, this work introduces the rst set of power consumption models that can be evaluated quickly enough to be used within the search of the enormous design space of multiple word-length optimisation problems. These models achieve their speed by estimating both the power consumed within the arithmetic components of an algorithm and the power in the routing wires that connect these components, using only a high-level description of the algorithm itself. Trading o a small reduction in power model accuracy for a large increase in speed is one of the major contributions of this thesis. In addition to the work on power consumption modelling, this thesis also develops a new technique for selecting the appropriate word-lengths for an algorithm implementation in order to minimise its cost in terms of power (or some other metric for which models are available). The method developed is able to provide tight lower and upper bounds on the optimal cost that can be obtained for a particular word-length optimisation problem and can, as a result, nd provably near-optimal solutions to word-length optimisation problems without resorting to an NP-hard search of the design space. Finally the costs of systems optimised via the proposed technique are compared to those obtainable by word-length optimisation for minimisation of other metrics (such as logic area) and the results compared, providing greater insight into the nature of wordlength optimisation problems and the extent of the improvements obtainable by them

    Modeling and Mapping of Optimized Schedules for Embedded Signal Processing Systems

    Get PDF
    The demand for Digital Signal Processing (DSP) in embedded systems has been increasing rapidly due to the proliferation of multimedia- and communication-intensive devices such as pervasive tablets and smart phones. Efficient implementation of embedded DSP systems requires integration of diverse hardware and software components, as well as dynamic workload distribution across heterogeneous computational resources. The former implies increased complexity of application modeling and analysis, but also brings enhanced potential for achieving improved energy consumption, cost or performance. The latter results from the increased use of dynamic behavior in embedded DSP applications. Furthermore, parallel programming is highly relevant in many embedded DSP areas due to the development and use of Multiprocessor System-On-Chip (MPSoC) technology. The need for efficient cooperation among different devices supporting diverse parallel embedded computations motivates high-level modeling that expresses dynamic signal processing behaviors and supports efficient task scheduling and hardware mapping. Starting with dynamic modeling, this thesis develops a systematic design methodology that supports functional simulation and hardware mapping of dynamic reconfiguration based on Parameterized Synchronous Dataflow (PSDF) graphs. By building on the DIF (Dataflow Interchange Format), which is a design language and associated software package for developing and experimenting with dataflow-based design techniques for signal processing systems, we have developed a novel tool for functional simulation of PSDF specifications. This simulation tool allows designers to model applications in PSDF and simulate their functionality, including use of the dynamic parameter reconfiguration capabilities offered by PSDF. With the help of this simulation tool, our design methodology helps to map PSDF specifications into efficient implementations on field programmable gate arrays (FPGAs). Furthermore, valid schedules can be derived from the PSDF models at runtime to adapt hardware configurations based on changing data characteristics or operational requirements. Under certain conditions, efficient quasi-static schedules can be applied to reduce overhead and enhance predictability in the scheduling process. Motivated by the fact that scheduling is critical to performance and to efficient use of dynamic reconfiguration, we have focused on a methodology for schedule design, which complements the emphasis on automated schedule construction in the existing literature on dataflow-based design and implementation. In particular, we have proposed a dataflow-based schedule design framework called the dataflow schedule graph (DSG), which provides a graphical framework for schedule construction based on dataflow semantics, and can also be used as an intermediate representation target for automated schedule generation. Our approach to applying the DSG in this thesis emphasizes schedule construction as a design process rather than an outcome of the synthesis process. Our approach employs dataflow graphs for representing both application models and schedules that are derived from them. By providing a dataflow-integrated framework for unambiguously representing, analyzing, manipulating, and interchanging schedules, the DSG facilitates effective codesign of dataflow-based application models and schedules for execution of these models. As multicore processors are deployed in an increasing variety of embedded image processing systems, effective utilization of resources such as multiprocessor systemon-chip (MPSoC) devices, and effective handling of implementation concerns such as memory management and I/O become critical to developing efficient embedded implementations. However, the diversity and complexity of applications and architectures in embedded image processing systems make the mapping of applications onto MPSoCs difficult. We help to address this challenge through a structured design methodology that is built upon the DSG modeling framework. We refer to this methodology as the DEIPS methodology (DSG-based design and implementation of Embedded Image Processing Systems). The DEIPS methodology provides a unified framework for joint consideration of DSG structures and the application graphs from which they are derived, which allows designers to integrate considerations of parallelization and resource constraints together with the application modeling process. We demonstrate the DEIPS methodology through cases studies on practical embedded image processing systems

    High Performance Computing via High Level Synthesis

    Get PDF
    As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices

    SdrLift: A Domain-Specific Intermediate Hardware Synthesis Framework for Prototyping Software-Defined Radios

    Get PDF
    Modern design of Software-Defined Radio (SDR) applications is based on Field Programmable Gate Arrays (FPGA) due to their ability to be configured into solution architectures that are well suited to domain-specific problems while achieving the best trade-off between performance, power, area, and flexibility. FPGAs are well known for rich computational resources, which traditionally include logic, register, and routing resources. The increased technological advances have seen FPGAs incorporating more complex components that comprise sophisticated memory blocks, Digital Signal Processing (DSP) blocks, and high-speed interfacing to Gigabit Ethernet (GbE) and Peripheral Component Interconnect Express (PCIe) bus. Gateware for programming FPGAs is described at a lowlevel of design abstraction using Register Transfer Language (RTL), typically using either VHSIC-HDL (VHDL) or Verilog code. In practice, the low-level description languages have a very steep learning curve, provide low productivity for hardware designers and lack readily available open-source library support for fundamental designs, and consequently limit the design to only hardware experts. These limitations have led to the adoption of High-Level Synthesis (HLS) tools that raise design abstraction using syntax, semantics, and software development notations that are well-known to most software developers. However, while HLS has made programming of FPGAs more accessible and can increase the productivity of design, they are still not widely adopted in the design community due to the low-level skills that are still required to produce efficient designs. Additionally, the resultant RTL code from HLS tools is often difficult to decipher, modify and optimize due to the functionality and micro-architecture that are coupled together in a single High-Level Language (HLL). In order to alleviate these problems, Domain-Specific Languages (DSL) have been introduced to capture algorithms at a high level of abstraction with more expressive power and providing domain-specific optimizations that factor in new transformations and the trade-off between resource utilization and system performance. The problem of existing DSLs is that they are designed around imperative languages with an instruction sequence that does not match the hardware structure and intrinsics, leading to hardware designs with system properties that are unconformable to the high-level specifications and constraints. The aim of this thesis is, therefore, to design and implement an intermediatelevel framework namely SdrLift for use in high-level rapid prototyping of SDR applications that are based on an FPGA. The SdrLift input is a HLL developed using functional language constructs and design patterns that specify the structural behavior of the application design. The functionality of the SdrLift language is two-fold, first, it can be used directly by a designer to develop the SDR applications, secondly, it can be used as the Intermediate Representation (IR) step that is generated by a higher-level language or a DSL. The SdrLift compiler uses the dataflow graph as an IR to structurally represent the accelerator micro-architecture in which the components correspond to the fine-level and coarse-level Hardware blocks (HW Block) which are either auto-synthesized or integrated from existing reusable Intellectual Property (IP) core libraries. Another IR is in the form of a dataflow model and it is used for composition and global interconnection of the HW Blocks while making efficient interfacing decisions in an attempt to satisfy speed and resource usage objectives. Moreover, the dataflow model provides rules and properties that will be used to provide a theoretical framework that formally analyzes the characteristics of SDR applications (i.e. the throughput, sample rate, latency, and buffer size among other factors). Using both the directed graph flow (DFG) and the dataflow model in the SdrLift compiler provides two benefits: an abstraction of the microarchitecture from the high-level algorithm specifications and also decoupling of the microarchitecture from the low-level RTL implementation. Following the IR creation and model analyses is the VHDL code generation which employs the low-level optimizations that ensure optimal hardware design results. The code generation process per forms analysis to ensure the resultant hardware system conforms to the high-level design specifications and constraints. SdrLift is evaluated by developing representative SDR case studies, in which the VHDL code for eight different SDR applications is generated. The experimental results show that SdrLift achieves the desired performance and flexibility, while also conserving the hardware resources utilized
    • …
    corecore