55 research outputs found

    Feasibility Study of High-Level Synthesis : Implementation of a Real-Time HEVC Intra Encoder on FPGA

    Get PDF
    High-Level Synthesis (HLS) on automatisoitu suunnitteluprosessi, joka pyrkii parantamaan tuottavuutta perinteisiin suunnittelumenetelmiin verrattuna, nostamalla suunnittelun abstraktiota rekisterisiirtotasolta (RTL) käyttäytymistasolle. Erilaisia kaupallisia HLS-työkaluja on ollut markkinoilla aina 1990-luvulta lähtien, mutta vasta äskettäin ne ovat alkaneet saada hyväksyntää teollisuudessa sekä akateemisessa maailmassa. Hidas käyttöönottoaste on johtunut pääasiassa huonommasta tulosten laadusta (QoR) kuin mitä on ollut mahdollista tavanomaisilla laitteistokuvauskielillä (HDL). Uusimmat HLS-työkalusukupolvet ovat kuitenkin kaventaneet QoR-aukkoa huomattavasti. Tämä väitöskirja tutkii HLS:n soveltuvuutta videokoodekkien kehittämiseen. Se esittelee useita HLS-toteutuksia High Efficiency Video Coding (HEVC) -koodaukselle, joka on keskeinen mahdollistava tekniikka lukuisille nykyaikaisille mediasovelluksille. HEVC kaksinkertaistaa koodaustehokkuuden edeltäjäänsä Advanced Video Coding (AVC) -standardiin verrattuna, saavuttaen silti saman subjektiivisen visuaalisen laadun. Tämä tyypillisesti saavutetaan huomattavalla laskennallisella lisäkustannuksella. Siksi reaaliaikainen HEVC vaatii automatisoituja suunnittelumenetelmiä, joita voidaan käyttää rautatoteutus- (HW ) ja varmennustyön minimoimiseen. Tässä väitöskirjassa ehdotetaan HLS:n käyttöä koko enkooderin suunnitteluprosessissa. Dataintensiivisistä koodaustyökaluista, kuten intra-ennustus ja diskreetit muunnokset, myös enemmän kontrollia vaativiin kokonaisuuksiin, kuten entropiakoodaukseen. Avoimen lähdekoodin Kvazaar HEVC -enkooderin C-lähdekoodia hyödynnetään tässä työssä referenssinä HLS-suunnittelulle sekä toteutuksen varmentamisessa. Suorituskykytulokset saadaan ja raportoidaan ohjelmoitavalla porttimatriisilla (FPGA). Tämän väitöskirjan tärkein tuotos on HEVC intra enkooderin prototyyppi. Prototyyppi koostuu Nokia AirFrame Cloud Server palvelimesta, varustettuna kahdella 2.4 GHz:n 14-ytiminen Intel Xeon prosessorilla, sekä kahdesta Intel Arria 10 GX FPGA kiihdytinkortista, jotka voidaan kytkeä serveriin käyttäen joko peripheral component interconnect express (PCIe) liitäntää tai 40 gigabitin Ethernettiä. Prototyyppijärjestelmä saavuttaa reaaliaikaisen 4K enkoodausnopeuden, jopa 120 kuvaa sekunnissa. Lisäksi järjestelmän suorituskykyä on helppo skaalata paremmaksi lisäämällä järjestelmään käytännössä minkä tahansa määrän verkkoon kytkettäviä FPGA-kortteja. Monimutkaisen HEVC:n tehokas mallinnus ja sen monipuolisten ominaisuuksien mukauttaminen reaaliaikaiselle HW HEVC enkooderille ei ole triviaali tehtävä, koska HW-toteutukset ovat perinteisesti erittäin aikaa vieviä. Tämä väitöskirja osoittaa, että HLS:n avulla pystytään nopeuttamaan kehitysaikaa, tarjoamaan ennen näkemätöntä suunnittelun skaalautuvuutta, ja silti osoittamaan kilpailukykyisiä QoR-arvoja ja absoluuttista suorituskykyä verrattuna olemassa oleviin toteutuksiin.High-Level Synthesis (HLS) is an automated design process that seeks to improve productivity over traditional design methods by increasing design abstraction from register transfer level (RTL) to behavioural level. Various commercial HLS tools have been available on the market since the 1990s, but only recently they have started to gain adoption across industry and academia. The slow adoption rate has mainly stemmed from lower quality of results (QoR) than obtained with conventional hardware description languages (HDLs). However, the latest HLS tool generations have substantially narrowed the QoR gap. This thesis studies the feasibility of HLS in video codec development. It introduces several HLS implementations for High Efficiency Video Coding (HEVC) , that is the key enabling technology for numerous modern media applications. HEVC doubles the coding efficiency over its predecessor Advanced Video Coding (AVC) standard for the same subjective visual quality, but typically at the cost of considerably higher computational complexity. Therefore, real-time HEVC calls for automated design methodologies that can be used to minimize the HW implementation and verification effort. This thesis proposes to use HLS throughout the whole encoder design process. From data-intensive coding tools, like intra prediction and discrete transforms, to more control-oriented tools, such as entropy coding. The C source code of the open-source Kvazaar HEVC encoder serves as a design entry point for the HLS flow, and it is also utilized in design verification. The performance results are gathered with and reported for field programmable gate array (FPGA) . The main contribution of this thesis is an HEVC intra encoder prototype that is built on a Nokia AirFrame Cloud Server equipped with 2.4 GHz dual 14-core Intel Xeon processors and two Intel Arria 10 GX FPGA Development Kits, that can be connected to the server via peripheral component interconnect express (PCIe) generation 3 or 40 Gigabit Ethernet. The proof-of-concept system achieves real-time. 4K coding speed up to 120 fps, which can be further scaled up by adding practically any number of network-connected FPGA cards. Overcoming the complexity of HEVC and customizing its rich features for a real-time HEVC encoder implementation on hardware is not a trivial task, as hardware development has traditionally turned out to be very time-consuming. This thesis shows that HLS is able to boost the development time, provide previously unseen design scalability, and still result in competitive performance and QoR over state-of-the-art hardware implementations

    Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric Approach

    Get PDF
    A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements. This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced. As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis. The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated. The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Intelligent Circuits and Systems

    Get PDF
    ICICS-2020 is the third conference initiated by the School of Electronics and Electrical Engineering at Lovely Professional University that explored recent innovations of researchers working for the development of smart and green technologies in the fields of Energy, Electronics, Communications, Computers, and Control. ICICS provides innovators to identify new opportunities for the social and economic benefits of society.  This conference bridges the gap between academics and R&D institutions, social visionaries, and experts from all strata of society to present their ongoing research activities and foster research relations between them. It provides opportunities for the exchange of new ideas, applications, and experiences in the field of smart technologies and finding global partners for future collaboration. The ICICS-2020 was conducted in two broad categories, Intelligent Circuits & Intelligent Systems and Emerging Technologies in Electrical Engineering

    Assessment and Real Time Implementation of Wireless Communications Systems and Applications in Transportation Systems

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e das Comunicacións en Redes Móbiles. 5029V01[Resumo] Os sistemas de comunicación sen fíos de cuarta e quinta xeración (4G e 5G) utilizan unha capa física (PHY) baseada en modulacións multiportadora para a transmisión de datos cun gran ancho de banda. Este tipo de modulacións proporcionan unha alta eficiencia espectral á vez que permiten corrixir de forma sinxela os efectos da canle radio. Estes sistemas utilizan OFDMA como mecanismo para a repartición dos recursos radio dispoñibles entre os diferentes usuarios. Este repartimento realízase asignando un subconxunto de subportadoras a cada usuario nun instante de tempo determinado. Isto aporta unha gran flexibilidade ó sistema que lle permite adaptarse tanto ós requisitos de calidade de servizo dos usuarios como ó estado da canle radio. A capa de acceso ó medio (MAC) destes sistemas encárgase de configurar os diversos parámetros proporcionados pola capa física OFDMA, ademais de xestionar os diversos fluxos de información de cada usuario, transformando os paquetes de capas superiores en paquetes da capa física. Neste traballo estúdase o deseño e implementación das capas MAC e PHY de sistemas de comunicación 4G ademais da súa aplicabilidade en sistemas de transporte ferroviarios. Por unha parte, abórdase o deseño e implementación en tempo real do estándar WiMAX. Estúdanse os mecanismos necesarios para establecer comunicacións bidireccionais entre unha estación base e múltiples dispositivos móbiles. Ademais, estúdase como realizar esta implementación nunha arquitectura hardware baseada en DSPs e FPGAs, na que se implementan as capas MAC e PHY. Dado que esta arquitectura ten uns recursos computacionais limitados, tamén se estudan as necesidades de cada módulo do sistema para poder garantir o funcionamento en tempo real do sistema completo. Por outra parte, tamén se estuda a aplicabilidade dos sistemas 4G a sistemas de transporte públicos. Os sistemas de comunicacións e sinalización son unha parte vital para os sistemas de transporte ferroviario e metro. As comunicacións sen fíos utilizadas por estes sistemas deben ser robustas e proporcionar unha alta fiabilidade para permitir a supervisión, control e seguridade do tráfico ferroviario. Para levar a cabo esta avaliación de viabilidade realízanse simulacións de redes de comunicacións LTE en contornos de transporte ferroviarios, comprobando o cumprimento dos requisitos de fiabilidade e seguridade. Realízanse diferentes simulacións do sistema de comunicacións para poder ser avaliadas e seleccionar a configuración e arquitectura do sistema máis axeitada en función do escenario considerado. Tamén se efectúan simulacións de redes baseadas en Wi-Fi, dado que é a solución máis utilizada nos metros, para confrontar os resultados cos obtidos para LTE. Para que os resultados das simulacións sexan realistas débense empregar modelos de propagación radio axeitados. Nas simulacións utilízanse tanto modelos deterministas como modelos baseados nos resultados de campañas de medida realizadas nestes escenarios. Nas simulacións empréganse os diferentes fluxos de información destes escenarios para comprobar que se cumpren os requisitos de calidade de servicio (QoS). Por exemplo, os fluxos críticos para o control ferroviario, como European Train Control System (ETCS) ou Communication-Based Train Control (CBTC), necesitan unha alta fiabilidade e un retardo mínimo nas comunicacións para garantir o correcto funcionamento do sistema.[Resumen] Los sistemas de comunicación inalámbricos de cuarta y quinta generación (4G y 5G) utilizan una capa física (PHY) basada en modulaciones multiportadora para la transmisión de datos con un gran ancho de banda. Este tipo de modulaciones han demostrado tener una alta eficiencia espectral a la vez que permiten corregir de forma sencilla los efectos del canal radio. Estos sistemas utilizan OFDMA como mecanismo para el reparto de los recursos radio disponibles entre los diferentes usuarios. Este reparto se realiza asignando un subconjunto de subportadoras a cada usuario en un instante de tiempo determinado. Esto aporta una gran flexibilidad al sistema que le permite adaptarse tanto a los requisitos de calidad de servicio de los usuarios como al estado del canal radio. La capa de acceso al medio (MAC) de estos sistemas se encarga de configurar los diversos parámetros proporcionados por la capa física OFDMA, además de gestionar los diversos flujos de información de cada usuario, transformando los paquetes de capas superiores en paquetes de la capa física. En este trabajo se estudia el diseño e implementación de las capas MAC y PHY de sistemas de comunicación 4G además de su aplicabilidad en sistemas de transporte ferroviarios. Por una parte, se aborda el diseño e implementación en tiempo real del estándar WiMAX. Se estudian los mecanismos necesarios para establecer comunicaciones bidireccionales entre una estación base y múltiples dispositivos móviles. Además, se estudia cómo realizar esta implementación en una arquitectura hardware basada en DSPs y FPGAs, en la que se implementan las capas MAC y PHY. Dado que esta arquitectura tiene unos recursos computacionales limitados, también se estudian las necesidades de cada módulo del sistema para poder garantizar el funcionamiento en tiempo real del sistema completo. Por otra parte, también se estudia la aplicabilidad de los sistemas 4G a sistemas de transporte públicos. Los sistemas de comunicaciones y señalización son una parte vital para los sistemas de transporte ferroviario y metro. Las comunicaciones inalámbricas utilizadas por estos sistemas deben ser robustas y proporcionar una alta fiabilidad para permitir la supervisión, control y seguridad del tráfico ferroviario. Para llevar a cabo esta evaluación de viabilidad se realizan simulaciones de redes de comunicaciones LTE en entornos de transporte ferroviarios, comprobando si se cumplen los requisitos de fiabilidad y seguridad. Se realizan diferentes simulaciones del sistema de comunicaciones para poder ser evaluados y seleccionar la configuración y arquitectura del sistema más adecuada en función del escenario planteado. También se efectúan simulaciones de redes basadas en Wi-Fi, dado que es la solución más utilizada en los metros, para comparar los resultados con los obtenidos para LTE. Para que los resultados de las simulaciones sean realistas se deben utilizar modelos de propagación radio apropiados. En las simulaciones se utilizan tanto modelos deterministas como modelos basados en los resultados de campañas de medida realizadas en estos escenarios. En las simulaciones se utilizan los diferentes flujos de información de estos escenarios para comprobar que se cumplen sus requisitos de calidad de servicio. Por ejemplo, los flujos críticos para el control ferroviario, como European Train Control System (ETCS) o Communication-Based Train Control (CBTC), necesitan una alta fiabilidad y un retardo bajo en las comunicaciones para garantizar el correcto funcionamiento del sistema.[Abstract] The fourth and fifth generation wireless communication systems (4G and 5G) use a physical layer (PHY) based on multicarrier modulations for data transmission using high bandwidth. This type of modulations has shown to provide high spectral efficiency while allowing low complexity radio channel equalization. These systems use OFDMA as a mechanism for distributing the available radio resources among different users. This allocation is done by assigning a subset of subcarriers to each user in a given instant of time. This provides great flexibility to the system that allows it to adapt to both the quality of service requirements of users and the radio channel state. The media access layer (MAC) of these systems is in charge of configuring the multiple OFDMA PHY layer parameters, in addition to managing the data flows of each user, transforming the higher layer packets into PHY layer packets. This work studies the design and implementation of MAC and PHY layers of 4G communication systems as well as their applicability in rail transport systems. On the one hand, the design and implementation in real time of the WiMAX standard is addressed. The required mechanisms to establish bidirectional communications between a base station and several mobile devices are also evaluated. Moreover, a MAC layer and PHY layer implementation is presented, using a hardware architecture based in DSPs and FPGAs. Since this architecture has limited computational resources, the requirements of each processing block of the system are also studied in order to guarantee the real time operation of the complete system. On the other hand, the applicability of 4G systems to public transportation systems is also studied. Communications and signaling systems are a vital part of rail and metro transport systems. The wireless communications used by these systems must be robust and provide high reliability to enable the supervision, control and safety of rail traffic. To carry out this feasibility assessment, LTE communications network simulations are performed in rail transport environments to verify that reliability and safety requirements are met. Several simulations are carried out in order to evaluate the system performance and select the most appropriate system configuration in each case. Simulations of Wi-Fi based networks are also carried out, since it is the most used solution in subways, to compare the results with those obtained for LTE. To perform the simulations correctly, appropriate radio propagation models must be used. Both deterministic models and models based on the results of measurement campaigns in these scenarios are used in the simulations. The simulations use the different information flows present in the railway transportation systems to verify that its quality of service requirements are met. For example, critical flows for railway control, such as the European Train Control System (ETCS) or Communication-Based Train Control (CBTC), require high reliability and low delay communications to ensure the proper functioning of the system

    Teaching Your Wireless Card New Tricks: Smartphone Performance and Security Enhancements Through Wi-Fi Firmware Modifications

    Get PDF
    Smartphones come with a variety of sensors and communication interfaces, which make them perfect candidates for mobile communication testbeds. Nevertheless, proprietary firmwares hinder us from accessing the full capabilities of the underlying hardware platform which impedes innovation. Focusing on FullMAC Wi-Fi chips, we present Nexmon, a C-based firmware modification framework. It gives access to raw Wi-Fi frames and advanced capabilities that we found by reverse engineering chips and their firmware. As firmware modifications pose security risks, we discuss how to secure firmware handling without impeding experimentation on Wi-Fi chips. To present and evaluate our findings in the field, we developed the following applications. We start by presenting a ping-offloading application that handles ping requests in the firmware instead of the operating system. It significantly reduces energy consumption and processing delays. Then, we present a software-defined wireless networking application that enhances scalable video streaming by setting flow-based requirements on physical-layer parameters. As security application, we present a reactive Wi-Fi jammer that analyses incoming frames during reception and transmits arbitrary jamming waveforms by operating Wi-Fi chips as software-defined radios (SDRs). We further introduce an acknowledging jammer to ensure the flow of non-targeted frames and an adaptive power-control jammer to adjust transmission powers based on measured jamming successes. Additionally, we discovered how to extract channel state information (CSI) on a per-frame basis. Using both SDR and CSI-extraction capabilities, we present a physical-layer covert channel. It hides covert symbols in phase changes of selected OFDM subcarriers. Those manipulations can be extracted from CSI measurements at a receiver. To ease the analysis of firmware binaries, we created a debugging application that supports single stepping and runs as firmware patch on the Wi-Fi chip. We published the source code of our framework and our applications to ensure reproducibility of our results and to enable other researchers to extend our work. Our framework and the applications emphasize the need for freely modifiable firmware and detailed hardware documentation to create novel and exciting applications on commercial off-the-shelf devices

    Low-complexity, low-area computer architectures for cryptographic application in resource constrained environments

    Get PDF
    RCE (Resource Constrained Environment) is known for its stringent hardware design requirements. With the rise of Internet of Things (IoT), low-complexity and low-area designs are becoming prominent in the face of complex security threats. Two low-complexity, low-area cryptographic processors based on the ultimate reduced instruction set computer (URISC) are created to provide security features for wireless visual sensor networks (WVSN) by using field-programmable gate array (FPGA) based visual processors typically used in RCEs. The first processor is the Two Instruction Set Computer (TISC) running the Skipjack cipher. To improve security, a Compact Instruction Set Architecture (CISA) processor running the full AES with modified S-Box was created. The modified S-Box achieved a gate count reduction of 23% with no functional compromise compared to Boyar’s. Using the Spartan-3L XC3S1500L-4-FG320 FPGA, the implementation of the TISC occupies 71 slices and 1 block RAM. The TISC achieved a throughput of 46.38 kbps at a stable 24MHz clock. The CISA which occupies 157 slices and 1 block RAM, achieved a throughput of 119.3 kbps at a stable 24MHz clock. The CISA processor is demonstrated in two main applications, the first in a multilevel, multi cipher architecture (MMA) with two modes of operation, (1) by selecting cipher programs (primitives) and sharing crypto-blocks, (2) by using simple authentication, key renewal schemes, and showing perceptual improvements over direct AES on images. The second application demonstrates the use of the CISA processor as part of a selective encryption architecture (SEA) in combination with the millions instructions per second set partitioning in hierarchical trees (MIPS SPIHT) visual processor. The SEA is implemented on a Celoxica RC203 Vertex XC2V3000 FPGA occupying 6251 slices and a visual sensor is used to capture real world images. Four images frames were captured from a camera sensor, compressed, selectively encrypted, and sent over to a PC environment for decryption. The final design emulates a working visual sensor, from on node processing and encryption to back-end data processing on a server computer

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs
    corecore