2,415 research outputs found

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    Pixel detector R&D for the Compact Linear Collider

    Full text link
    The physics aims at the proposed future CLIC high-energy linear e+e−e^+ e^- collider pose challenging demands on the performance of the detector system. In particular the vertex and tracking detectors have to combine precision measurements with robustness against the expected high rates of beam-induced backgrounds. A spatial resolution of a few microns and a material budget down to 0.2\% of a radiation length per vertex-detector layer have to be achieved together with a few nanoseconds time stamping accuracy. These requirements are addressed with innovative technologies in an ambitious detector R\&D programme, comprising hardware developments as well as detailed device and Monte Carlo simulations based on TCAD, Geant4 and Allpix-Squared. Various fine pitch hybrid silicon pixel detector technologies are under investigation for the CLIC vertex detector. The CLICpix and CLICpix2 readout ASICs with \SI{25}{\micro\meter} pixel pitch have been produced in a \SI{65}{\nano\meter} commercial CMOS process and bump-bonded to planar active edge sensors as well as capacitively coupled to High-Voltage (HV) CMOS sensors. Monolithic silicon tracking detectors are foreseen for the large surface (≈\approx \SI{140}{\meter\squared}) CLIC tracker. Fully monolithic prototypes are currently under development in High-Resistivity (HR) CMOS, HV-CMOS and Silicon on Insulator (SOI) technologies. The laboratory and beam tests of all recent prototypes profit from the development of the CaRIBou universal readout system. This talk presents an overview of the CLIC pixel-detector R\&D programme, focusing on recent test-beam and simulation results.Comment: On behalf of CLICdp collaboration, Conference proceedings for PIXEL201

    TROUTE : a reconfigurability-aware FPGA router

    Get PDF

    A Defect-tolerant Cluster in a Mesh SRAM-based FPGA

    No full text
    International audienceIn this paper, we propose the implementation of multiple defect-tolerant techniques on an SRAM-based FPGA. These techniques include redundancy at both the logic block and intra-cluster interconnect. In the logic block, redundancy is implemented at the multiplexer level. Its efficiency is analyzed by injecting a single defect at the output of a multiplexer, considering all possible locations and input combinations. While at the interconnect level, fine grain redundancy is introduced which not only bypasses defects but also increases routability. Taking advantage of the sparse intra-cluster interconnect structures, routability is further improved by efficient distribution of feedback paths allowing more flexibility in the connections among logic blocks. Emulation results show a significant improvement of about 15% and 34% in the robustness of logic block and intra-cluster interconnect respectively. Furthermore, the impact of these hardening schemes on the testability of the FPGA cluster for manufacturing defects is also investigated in terms of maximum achievable fault coverage and the respective cost

    Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm

    Get PDF
    An essential step in the automation of electronic design is the placement of the physical components on the target semiconductor die. The placement step presents the opportunity to reduce costs in terms of wire length and performance degradation; however it is compute intensive and is NP-complete in terms of obtaining an optimal solution. As designs have grown in complexity and gate count, obtaining an optimal solution is not feasible due to time to market constraints or sheer compute effort required. Heuristic algorithms allow for efficient but sub-optimal designs to be produced with a reduction in processing time. A widely used algorithm is Simulated Annealing (SA). The goal of this work was to develop a model that would enable an analysis into the feasibility of developing a hardware accelerated placement system which uses SA at its core. The SA heuristic was analyzed for possible improvements in efficiency with focus given to targeting the system for hardware. A solution implementing parallel computing with specialized hardware configurations inside a field programmable gate array (FPGA) was investigated as having the possibility to improve the efficiency of the SA-based algorithm. All supporting subsystems were also described for a hardware accelerated model. A large speedup was analytically shown from both accelerating the critical path of the SA algorithm as well as novel methods of improving SA\u27s efficiency. As data throughput requirements were not included in this work, the results presented may be optimistic for an overall system speedup. However, the results clearly show that future work is warranted in studying the concept of a hardware accelerated placement system

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor
    • …
    corecore