624 research outputs found

    State of the art baseband DSP platforms for Software Defined Radio: A survey

    Get PDF
    Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

    A comparative survey of open-source application-class RISC-V processor implementations

    Get PDF
    The numerous emerging implementations of RISC-V processors and frameworks underline the success of this Instruction Set Architecture (ISA) specification. The free and open source character of many implementations facilitates their adoption in academic and commercial projects. As yet it is not easy to say which implementation fits best for a system with given requirements such as processing performance or power consumption. With varying backgrounds and histories, the developed RISC-V processors are very different from each other. Comparisons are difficult, because results are reported for arbitrary technologies and configuration settings. Scaling factors are used to draw comparisons, but this gives only rough estimates. In order to give more substantiated results, this paper compares the most prominent open-source application-class RISC-V projects by running identical benchmarks on identical platforms with defined configuration settings. The Rocket, BOOM, CVA6, and SHAKTI C-Class implementations are evaluated for processing performance, area and resource utilization, power consumption as well as efficiency. Results are presented for the Xilinx Virtex UltraScale+ family and GlobalFoundries 22FDX ASIC technology

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    A comparative survey of open-source application-class RISC-V processor implementations

    Get PDF
    Revision notice: This version does not contain CVA6 SPEC CPU2017 scores. There is an updated version available with additional CVA6 SPEC CPU2017 scores: https://doi.org/10.24355/dbbs.084-202105101615-

    A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

    Full text link
    In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms

    Parallel Embedded Computing Architectures

    Get PDF
    It was around the years 2003 to 2005 that a dramatic change seized the semiconductor industry and the manufactures of processors. The increasing of computing performance in processors, based on simply screwing up the clock frequency, could not longer be holded. All the years before the clock frequency could be steadily increased by improvements achieved both on technology and on architectural side. Scaling of the technology processes, leading to smaller channel lengths and shorter switching times in the devices, and measures like instruction-level-parallelism and out-of-order processing, leading to high fill rates in the processor pipelines, were the guarantors to meet Mooreâs law
    • …
    corecore