692 research outputs found

    Baseband analog front-end and digital back-end for reconfigurable multi-standard terminals

    Get PDF
    Multimedia applications are driving wireless network operators to add high-speed data services such as Edge (E-GPRS), WCDMA (UMTS) and WLAN (IEEE 802.11a,b,g) to the existing GSM network. This creates the need for multi-mode cellular handsets that support a wide range of communication standards, each with a different RF frequency, signal bandwidth, modulation scheme etc. This in turn generates several design challenges for the analog and digital building blocks of the physical layer. In addition to the above-mentioned protocols, mobile devices often include Bluetooth, GPS, FM-radio and TV services that can work concurrently with data and voice communication. Multi-mode, multi-band, and multi-standard mobile terminals must satisfy all these different requirements. Sharing and/or switching transceiver building blocks in these handsets is mandatory in order to extend battery life and/or reduce cost. Only adaptive circuits that are able to reconfigure themselves within the handover time can meet the design requirements of a single receiver or transmitter covering all the different standards while ensuring seamless inter-interoperability. This paper presents analog and digital base-band circuits that are able to support GSM (with Edge), WCDMA (UMTS), WLAN and Bluetooth using reconfigurable building blocks. The blocks can trade off power consumption for performance on the fly, depending on the standard to be supported and the required QoS (Quality of Service) leve

    High Performance Computing via High Level Synthesis

    Get PDF
    As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    FPGA based technical solutions for high throughput data processing and encryption for 5G communication: A review

    Get PDF
    The field programmable gate array (FPGA) devices are ideal solutions for high-speed processing applications, given their flexibility, parallel processing capability, and power efficiency. In this review paper, at first, an overview of the key applications of FPGA-based platforms in 5G networks/systems is presented, exploiting the improved performances offered by such devices. FPGA-based implementations of cloud radio access network (C-RAN) accelerators, network function virtualization (NFV)-based network slicers, cognitive radio systems, and multiple input multiple output (MIMO) channel characterizers are the main considered applications that can benefit from the high processing rate, power efficiency and flexibility of FPGAs. Furthermore, the implementations of encryption/decryption algorithms by employing the Xilinx Zynq Ultrascale+MPSoC ZCU102 FPGA platform are discussed, and then we introduce our high-speed and lightweight implementation of the well-known AES-128 algorithm, developed on the same FPGA platform, and comparing it with similar solutions already published in the literature. The comparison results indicate that our AES-128 implementation enables efficient hardware usage for a given data-rate (up to 28.16 Gbit/s), resulting in higher efficiency (8.64 Mbps/slice) than other considered solutions. Finally, the applications of the ZCU102 platform for high-speed processing are explored, such as image and signal processing, visual recognition, and hardware resource management

    Reconfigurable acceleration of Recurrent Neural Networks

    Get PDF
    Recurrent Neural Networks (RNNs) have been successful in a wide range of applications involving temporal sequences such as natural language processing, speech recognition and video analysis. However, RNNs often require a significant amount of memory and computational resources. In addition, the recurrent nature and data dependencies in RNN computations can lead to system stall, resulting in low throughput and high latency. This work describes novel parallel hardware architectures for accelerating RNN inference using Field-Programmable Gate Array (FPGA) technology, which considers the data dependencies and high computational costs of RNNs. The first contribution of this thesis is a latency-hiding architecture that utilizes column-wise matrix-vector multiplication instead of the conventional row-wise operation to eliminate data dependencies and improve the throughput of RNN inference designs. This architecture is further enhanced by a configurable checkerboard tiling strategy which allows large dimensions of weight matrices, while supporting element-based parallelism and vector-based parallelism. The presented reconfigurable RNN designs show significant speedup over CPU, GPU, and other FPGA designs. The second contribution of this thesis is a weight reuse approach for large RNN models with weights stored in off-chip memory, running with a batch size of one. A novel blocking-batching strategy is proposed to optimize the throughput of large RNN designs on FPGAs by reusing the RNN weights. Performance analysis is also introduced to enable FPGA designs to achieve the best trade-off between area, power consumption and performance. Promising power efficiency improvement has been achieved in addition to speeding up over CPU and GPU designs. The third contribution of this thesis is a low latency design for RNNs based on a partially-folded hardware architecture. It also introduces a technique that balances initiation interval of multi-layer RNN inferences to increase hardware efficiency and throughput while reducing latency. The approach is evaluated on a variety of applications, including gravitational wave detection and Bayesian RNN-based ECG anomaly detection. To facilitate the use of this approach, we open source an RNN template which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools.Open Acces

    Circuit design and analysis for on-FPGA communication systems

    No full text
    On-chip communication system has emerged as a prominently important subject in Very-Large- Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects. Interconnects often dictates the system performance, and, therefore, research for new methodologies and system architectures that deliver high-performance communication services across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication. Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable fabrics, switches and the specific routing architecture also introduce additional latency and bandwidth degradation further hindering intra-chip communication performance. Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs. Communication with programmable interconnect received little attention and is inadequately understood. This thesis is among the first to research on-chip communication systems that are built on top of programmable fabrics and proposes methodologies to maximize the interconnect throughput performance. There are three major contributions in this thesis: (i) an analysis of on-chip interconnect fringing, which degrades the bandwidth of communication channels due to routing congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly improves the interconnect throughput by exploiting the fundamental electrical characteristics of the reconfigurable interconnect structures. This new scheme can potentially mitigate the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime optimization for route planning and dynamic routing which, effectively utilizes the in-silicon bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new methodologies and concepts are proposed to enhance the on-FPGA communication throughput performance that is of vital importance in new technology processes

    Ultrasound Beamforming on a FPGA

    Get PDF

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    Design of High Performance Modified Wave pipelined DAA Filter with Critical Path Approach

    Get PDF
    In this paper, a new high speed control circuit is proposed which will act as a critical path for the data which will go from input to output to improve the performance of wave pipelining circuits The wave pipelining is a method of high performance circuit designs which implements pipelining in logic without the use of intermediate registers. Wave pipelining has been widely used in the past few years with a great deal of significant features in technology and applications. It has the ability to improve speed, efficiency, economy in every aspect which it presents. Wave pipelining is being used in wide range of applications like digital filters, network routers, multipliers, fast convolvers, MODEMs, image processing, control systems, radars and many others. In previous work, the operating speed of the wave-pipelined circuit can be increased by the following three tasks: adjustment of the clock period, clock skew and equalization of path delays. The path-delay equalization task can be done theoretically, but the real challenge is to accomplish it in the presence of various different delays. So, the main objective of this paper is to solve the path delay equalization problem by inserting the control circuit in wave pipelined based circuit which will act as critical path for the data that moves from input to output. The proposed technique is evaluated for DSP applications by designing 4- tap FIR filter using Distributed arithmetic algorithm (DAA). Then comparison of this design is done with 4-tap FIR filter designs using conventional pipelining and non pipelining. The synthesis and simulation results based on Xilinx ISE Navigator 12.3 shows that wave pipelined DAA based filter is faster by a factor of 1.43 compared to non pipelined one and the conventional pipelined filter is faster than non pipelined by factor of 1.61 but at the cost of increased logic utilization by 200 %. So, the wave-pipelined DA filters designed with the proposed control circuit can operate at higher frequency than that of non-pipelined but less than that of pipelined. The gain in speed in pipelined compared to that of wavepipelined is at the cost of increased area and more dissipated power. When latency is considered, wavepipelined design filters with the proposed scheme are having the lowest latency among three schemes designed
    corecore