36 research outputs found

    Resource and thermal management in 3D-stacked multi-/many-core systems

    Full text link
    Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures. 3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead. This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing. This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory. At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads. On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00

    Resource-Constrained Low-Complexity Video Coding for Wireless Transmission

    Get PDF

    Algorithms and Circuits for Analog-Digital Hybrid Multibeam Arrays

    Get PDF
    Fifth generation (5G) and beyond wireless communication systems will rely heavily on larger antenna arrays combined with beamforming to mitigate the high free-space path-loss that prevails in millimeter-wave (mmW) and above frequencies. Sharp beams that can support wide bandwidths are desired both at the transmitter and the receiver to leverage the glut of bandwidth available at these frequency bands. Further, multiple simultaneous sharp beams are imperative for such systems to exploit mmW/sub-THz wireless channels using multiple reflected paths simultaneously. Therefore, multibeam antenna arrays that can support wider bandwidths are a key enabler for 5G and beyond systems. In general, N-beam systems using N-element antenna arrays will involve circuit complexities of the order of N2. This dissertation investigates new analog, digital and hybrid low complexity multibeam beamforming algorithms and circuits for reducing the associated high size, weight, and power (SWaP) complexities in larger multibeam arrays. The research efforts on the digital beamforming aspect propose the use of a new class of discrete Fourier transform (DFT) approximations for multibeam generation to eliminate the need for digital multipliers in the beamforming circuitry. For this, 8-, 16- and 32-beam multiplierless multibeam algorithms have been proposed for uniform linear array applications. A 2.4 GHz 16-element array receiver setup and a 5.8 GHz 32-element array receiver system which use field programmable gate arrays (FPGAs) as digital backend have been built for real-time experimental verification of the digital multiplierless algorithms. The multiplierless algorithms have been experimentally verified by digitally measuring beams. It has been shown that the measured beams from the multiplierless algorithms are in good agreement with the exact counterpart algorithms. Analog realizations of the proposed approximate DFT transforms have also been investigated leading to low-complex, high bandwidth circuits in CMOS. Further, a novel approach for reducing the circuit complexity of analog true-time delay (TTD) N-beam beamforming networks using N-element arrays has been proposed for wideband squint-free operation. A sparse factorization of the N-beam delay Vandermonde beamforming matrix is used to reduce the total amount of TTD elements that are needed for obtaining N number of beams in a wideband array. The method has been verified using measured responses of CMOS all-pass filters (APFs). The wideband squint-free multibeam algorithm is also used to propose a new low-complexity hybrid beamforming architecture targeting future 5G mmW systems. Apart from that, the dissertation also explores multibeam beamforming architectures for uniform circular arrays (UCAs). An algorithm having N log N circuit complexity for simultaneous generation of N-beams in an N-element UCA is explored and verified

    Dynamically reconfigurable management of energy, performance, and accuracy applied to digital signal, image, and video Processing Applications

    Get PDF
    There is strong interest in the development of dynamically reconfigurable systems that can meet real-time constraints in energy/power-performance-accuracy (EPA/PPA). In this dissertation, I introduce a framework for implementing dynamically reconfigurable digital signal, image, and video processing systems. The basic idea is to first generate a collection of Pareto-optimal realizations in the EPA/PPA space. Dynamic EPA/PPA management is then achieved by selecting the Pareto-optimal implementations that can meet the real-time constraints. The systems are then demonstrated using Dynamic Partial Reconfiguration (DPR) and dynamic frequency control on FPGAs. The framework is demonstrated on: i) a dynamic pixel processor, ii) a dynamically reconfigurable 1-D digital filtering architecture, and iii) a dynamically reconfigurable 2-D separable digital filtering system. Efficient implementations of the pixel processor are based on the use of look-up tables and local-multiplexes to minimize FPGA resources. For the pixel-processor, different realizations are generated based on the number of input bits, the number of cores, the number of output bits, and the frequency of operation. For each parameters combination, there is a different pixel-processor realization. Pareto-optimal realizations are selected based on measurements of energy per frame, PSNR accuracy, and performance in terms of frames per second. Dynamic EPA/PPA management is demonstrated for a sequential list of real-time constraints by selecting optimal realizations and implementing using DPR and dynamic frequency control. Efficient FPGA implementations for the 1-D and 2-D FIR filters are based on the use a distributed arithmetic technique. Different realizations are generated by varying the number of coefficients, coefficient bitwidth, and output bitwidth. Pareto-optimal realizations are selected in the EPA space. Dynamic EPA management is demonstrated on the application of real-time EPA constraints on a digital video. The results suggest that the general framework can be applied to a variety of digital signal, image, and video processing systems. It is based on the use of offline-processing that is used to determine the Pareto-optimal realizations. Real-time constraints are met by selecting Pareto-optimal realizations pre-loaded in memory that are then implemented efficiently using DPR and/or dynamic frequency control

    Otimização de soluções de fotónica integrada para sistemas óticos de nova geração

    Get PDF
    Next generation optical systems can highly benefit from optimized photonic integrated solutions. Photonic integrated circuits (PIC) appear as a promising technology under the current demand for flexibility/reconfigurability in optical systems and telecommunications networks. PIC-based optical systems offer an efficient and cost-effective solution to data transmission increasing claims. In order to contribute to the development of integrated photonic technology, optimized PIC solutions addressing different steps of the PIC development chain, mainly design, testing, and packaging processes, are investigated. Optical signal data compression techniques are progressing to sustain the fast processing/storing of large amounts of bandwidth demanding data, with the advantage of resorting to photonic integrated solutions for the implementation of optical transforms, e.g., Haar transform (HT). This demand motivated the research of an optimized PIC design solution in silicon nitride (Si3N4) based platform comprising a two-level HT network for compression, and a switching network as a framework that supplies all logical inputs of the HT network for testing/characterization purposes. Optimized design models for the multimode interference key building block structure of the PIC design solution, are proposed. Additionally, a first test and characterization of PIC solutions implementing the HT for compression applications in indium phosphide (InP) based platform and in a new organic-inorganic hybrid material were realized. Taking advantage of a tunable lattice filter dispersion compensator in Si3N4-based integrated platform, it was demonstrated a real-time extended reach PAM-4 transmission over 40 km enabled by the photonic integrated dispersion compensator, with application in data center interconnects. Under photonic integrated high-Q resonators need for accurate performance measurement, a technique based on RF calibrated Mach-Zehnder interferometer, and Brillouin gain measurements through Lorentzian fitting analysis were successfully attained. Finally, as technical and functional requirements of PIC demand a thorough characterization/testing to provide an accurate prediction of its performance, and current testing platforms can be expensive and have low flexibility, a proof of concept of a new soft-packaging flexible platform for photonic integrated processors and spatial division multiplexing systems, based in spatial light modulation operation principle is proposed.Os sistemas óticos de nova geração beneficiam com a otimização de fotónica integrada. Com os circuitos de fotónica integrada (PIC) avançados a surgir como uma tecnologia promissora, dentro da crescente procura por flexibilidade/ reconfigurabilidade dos sistemas óticos e redes de telecomunicações. Os sistemas óticos baseados em PIC oferecem soluções eficientes e rentáveis em resposta às necessidades crescentes de transmissão de dados. De modo a contribuir para o desenvolvimento tecnológico associado à fotónica integrada, são investigados no âmbito desta dissertação diferentes soluções otimizadas de PIC, abordando diferentes estágios do seu desenvolvimento, nomeadamente projeto/design, teste e encapsulamento. Técnicas de compressão de sinais óticos estão a progredir no sentido de apoiar a expansão de velocidade de processamento e quantidade de armazenamento com elevada largura de banda associada. São esperadas vantagens recorrendo a PIC para a implementação de transformadas óticas, e.g., transformada de Haar (HT). Esta necessidade motivou a investigação de soluções de PIC com design otimizado, desenvolvidas em plataforma integrada de nitreto de silício (Si3N4). O PIC desenhado é constituído por uma rede 2D a executar a HT para fins de compressão e uma rede de comutação para produzir todas as entradas lógicas esperadas para teste e caracterização. São propostos modelos de design otimizados para a estrutura elementar que compõe o PIC, i.e., componente de interferência multimodal. Adicionalmente, foi realizado o primeiro teste e caracterização experimental de um PIC implementando a HT para fins de compressão, numa plataforma integrada de fosfato de índio (InP) e num material orgânico-inorgânico híbrido. Tirando partido de um filtro sintonizável para compensação de dispersão, desenvolvido em plataforma integrada de Si3N4, foi demostrado um link de transmissão alargada (40 km) em modulação PAM-4, com possível aplicação em centros de processamento de dados de interconexão. A necessidade de medições precisas de desempenho para a caracterização efetiva de soluções integradas de ressoadores de elevado fator de qualidade, motivou a implementação de uma técnica de medição eficaz. Esta é baseada num interferómetro de Mach-Zehnder calibrado em rádio frequência e na realização de mediações de ganho de Brillouin por análise Lorentziana de ajuste de curva. Por fim, tendo em conta os rigorosos requisitos técnicos e funcionais associados ao teste/caracterização precisa de PIC e o facto de as atuais soluções serem dispendiosas e pouco flexíveis. Uma prova de conceito de uma nova plataforma flexível de encapsulamento por software é proposta com aplicação em processadores PIC e sistemas com multiplexagem por divisão espacial.Programa Doutoral em Telecomunicaçõe

    Fast Motion Estimation Algorithms for Block-Based Video Coding Encoders

    Get PDF
    The objective of my research is reducing the complexity of video coding standards in real-time scalable and multi-view applications

    Applications of MATLAB in Science and Engineering

    Get PDF
    The book consists of 24 chapters illustrating a wide range of areas where MATLAB tools are applied. These areas include mathematics, physics, chemistry and chemical engineering, mechanical engineering, biological (molecular biology) and medical sciences, communication and control systems, digital signal, image and video processing, system modeling and simulation. Many interesting problems have been included throughout the book, and its contents will be beneficial for students and professionals in wide areas of interest

    Ultra-low power IoT applications: from transducers to wireless protocols

    Get PDF
    This dissertation aims to explore Internet of Things (IoT) sensor nodes in various application scenarios with different design requirements. The research provides a comprehensive exploration of all the IoT layers composing an advanced device, from transducers to on-board processing, through low power hardware schemes and wireless protocols for wide area networks. Nowadays, spreading and massive utilization of wireless sensor nodes pushes research and industries to overcome the main limitations of such constrained devices, aiming to make them easily deployable at a lower cost. Significant challenges involve the battery lifetime that directly affects the device operativity and the wireless communication bandwidth. Factors that commonly contrast the system scalability and the energy per bit, as well as the maximum coverage. This thesis aims to serve as a reference and guideline document for future IoT projects, where results are structured following a conventional development pipeline. They usually consider communication standards and sensing as project requirements and low power operation as a necessity. A detailed overview of five leading IoT wireless protocols, together with custom solutions to overcome the throughput limitations and decrease the power consumption, are some of the topic discussed. Low power hardware engineering in multiple applications is also introduced, especially focusing on improving the trade-off between energy, functionality, and on-board processing capabilities. To enhance these features and to provide a bottom-top overview of an IoT sensor node, an innovative and low-cost transducer for structural health monitoring is presented. Lastly, the high-performance computing at the extreme edge of the IoT framework is addressed, with special attention to image processing algorithms running on state of the art RISC-V architecture. As a specific deployment scenario, an OpenCV-based stack, together with a convolutional neural network, is assessed on the octa-core PULP SoC

    NOVEL OFDM SYSTEM BASED ON DUAL-TREE COMPLEX WAVELET TRANSFORM

    Get PDF
    The demand for higher and higher capacity in wireless networks, such as cellular, mobile and local area network etc, is driving the development of new signaling techniques with improved spectral and power efficiencies. At all stages of a transceiver, from the bandwidth efficiency of the modulation schemes through highly nonlinear power amplifier of the transmitters to the channel sharing between different users, the problems relating to power usage and spectrum are aplenty. In the coming future, orthogonal frequency division multiplexing (OFDM) technology promises to be a ready solution to achieving the high data capacity and better spectral efficiency in wireless communication systems by virtue of its well-known and desirable characteristics. Towards these ends, this dissertation investigates a novel OFDM system based on dual-tree complex wavelet transform (D
    corecore