Search CORE

326 research outputs found

Multi-Softcore Architecture on FPGA

Author: Mohamed Abid
Mouna Baklouti
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication)

Crossref

Directory of Open Access Journals

Mppsocgen: A framework for automatic generation of mppsoc architecture

Author: Abid Mohamed
Aoudni Yassine
Baklouti Mouna
Kallel Emna
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 30/04/2012
Field of study

Automatic code generation is a standard method in software engineering since it improves the code consistency and reduces the overall development time. In this context, this paper presents a design flow for automatic VHDL code generation of mppSoC (massively parallel processing System-on-Chip) configuration. Indeed, depending on the application requirements, a framework of Netbeans Platform Software Tool named MppSoCGEN was developed in order to accelerate the design process of complex mppSoC. Starting from an architecture parameters design, VHDL code will be automatically generated using parsing method. Configuration rules are proposed to have a correct and valid VHDL syntax configuration. Finally, an automatic generation of Processor Elements and network topologies models of mppSoC architecture will be done for Stratix II device family. Our framework improves its flexibility on Netbeans 5.5 version and centrino duo Core 2GHz with 22 Kbytes and 3 seconds average runtime. Experimental results for reduction algorithm validate our MppSoCGEN design flow and demonstrate the efficiency of generated architectures.Comment: 16 pages; International Journal of Computer Science & Information Technology (IJCSIT) Vol 4, No 2, April 201

arXiv.org e-Print Archive

Crossref

Multi-Softcore Architecture on FPGA

Author: Mohamed Abid
Mouna Baklouti
Publication venue
Publication date: 24/04/2020
Field of study

CiteSeerX

SCAC-Net: Reconfigurable Interconnection Network in SCAC Massively parallel SoC

Author: Abid Mohamed
Baklouti Mouna
Dekeyser Jean-Luc
Krichene Hana
Marquet Philippe
Meftali Samy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/02/2016
Field of study

International audienceParallel communication plays a critical role in massively parallel systems, especially in distributed memory systems executing parallel programs on shared data. Therefore, integrating an interconnection network in these systems becomes essential to ensure data inter-nodes exchange. Choose the most effective communication structure must meet certain criteria: speed, size and power consumption. Indeed, the communication phase should be as fast as possible to avoid compromising parallel computing, using small and low power consumption modules to facilitate the interconnection network extensibility in a scalable system. To meet these criteria and based on a module reuse methodology, we chose to integrate a reconfigurable SCAC-Net interconnection network to communicate data in SCAC Massively parallel SoC. This paper presents the detailed hardware implementation and discusses the performance evaluation of the proposed reconfigurable SCAC-Net network

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

IP based configurable SIMD massively parallel SoC

Author: Abid Mohamed
Baklouti Mouna
Dekeyser Jean-Luc
Marquet Philippe
Publication venue: HAL CCSD
Publication date: 31/08/2010
Field of study

International audienceSignificant advances in the field of configurable computing have enabled parallel processing within a single Field- Programmable Gate Array (FPGA) chip. This paper presents the implementation of a flexible and programmable Single Instruc- tion Multiple Data (SIMD) processing system on FPGA that can be adapted to the application. Its implementation is based on an IP (Intellectual Property) assembling approach making its design fast and easy. A generation tool is also developed to generate the SIMD configuration depending on the application requirements. The proposed parallel processing system on chip is portable, scalable and flexible since it can be customized to match the needs of a data parallel application. Based on FPGA, different SIMD configurations have been evaluated in terms of performance and area trade-offs. The proposed parametric system shows good results executing some signal processing applications such as parallel matrices multiplication, FIR filter and RGB to YIQ image color conversion

HAL - Lille 3

INRIA a CCSD electronic archive server

Cognitive Radio Programming: Existing Solutions and Open Issues

Author: Charles Henri-Pierre
Dardaillon Mickaël
Marquet Kevin
Martin Jérôme
Risset Tanguy
Publication venue: HAL CCSD
Publication date: 08/09/2013
Field of study

Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive

INRIA a CCSD electronic archive server

HAL-CEA

Klessydra-T: Designing Vector Coprocessors for Multi-Threaded Edge-Computing Cores

Author: Cheikh Abdallah
Mastrandrea Antonio
Menichelli Francesco
Olivieri Mauro
Scotti Giuseppe
Sordillo Stefano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Computation intensive kernels, such as convolutions, matrix multiplication and Fourier transform, are fundamental to edge-computing AI, signal processing and cryptographic applications. Interleaved-Multi-Threading (IMT) processor cores are interesting to pursue energy efficiency and low hardware cost for edge-computing, yet they need hardware acceleration schemes to run heavy computational workloads. Following a vector approach to accelerate computations, this study explores possible alternatives to implement vector coprocessing units in RISC-V cores, showing the synergy between IMT and data-level parallelism in the target workloads.Comment: Final revision accepted for publication on IEEE Micro Journa

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

A Memory-Centric Customizable Domain-Specific FPGA Overlay for Accelerating Machine Learning Applications

Author: Panahi Atiyehsadat
Publication venue: ScholarWorks@UARK
Publication date: 01/08/2022
Field of study

Low latency inferencing is of paramount importance to a wide range of real time and userfacing Machine Learning (ML) applications. Field Programmable Gate Arrays (FPGAs) offer unique advantages in delivering low latency as well as energy efficient accelertors for low latency inferencing. Unfortunately, creating machine learning accelerators in FPGAs is not easy, requiring the use of vendor specific CAD tools and low level digital and hardware microarchitecture design knowledge that the majority of ML researchers do not possess. The continued refinement of High Level Synthesis (HLS) tools can reduce but not eliminate the need for hardware-specific design knowledge. The designs by these tools can also produce inefficient use of FPGA resources that ultimately limit the performance of the neural network. This research investigated a new FPGA-based software-hardware codesigned overlay architecture that opens the advantages of FPGAs to the broader ML user community. As an overlay, the proposed design allows rapid coding and deployment of different ML network configurations and different data-widths, eliminating the prior barrier of needing to resynthesize each design. This brings important attributes of code portability over different FPGA families. The proposed overlay design is a Single-Instruction-Multiple-Data (SIMD) Processor-In-Memory (PIM) architecture developed as a programmable overlay for FPGAs. In contrast to point designs, it can be programmed to implement different types of machine learning algorithms. The overlay architecture integrates bit-serial Arithmetic Logic Units (ALUs) with distributed Block RAMs (BRAMs). The PIM design increases the size of arithmetic operations and on-chip storage capacity. User-visible inference latencies are reduced by exploiting concurrent accesses to network parameters (weights and biases) and partial results stored throughout the distributed BRAMs. Run-time performance comparisons show that the proposed design achieves a speedup compared to HLS-based or custom-tuned equivalent designs. Notably, the proposed design is programmable, allowing rapid design space exploration without the need to resynthesize when changing ML algorithms on the FPGA

ScholarWorks@UARK

UARK (University of Arkansas )

G-MPSoC: Generic Massively Parallel Architecture on FPGA

Author: Abid Mohamed
Baklouti Mouna
Dekeyser Jean-Luc
Krichene Hana
Marquet Philippe
Publication venue: 'World Scientific and Engineering Academy and Society (WSEAS)'
Publication date: 01/11/2015
Field of study

International audienceNowadays, recent intensive signal processing applications are evolving and are characterized by the diversity of algorithms (filtering, correlation, etc.) and their numerous parameters. Having a flexible and pro-grammable system that adapts to changing and various characteristics of these applications reduces the design cost. In this context, we propose in this paper Generic Massively Parallel architecture (G-MPSoC). G-MPSoC is a System-on-Chip based on a grid of clusters of Hardware and Software Computation Elements with different size, performance, and complexity. It is composed of parametric IP-reused modules: processor, controller, accelerator, memory, interconnection network, etc. to build different architecture configurations. The generic structure of G-MPSoC facilitates its adaptation to the intensive signal processing applications requirements. This paper presents G-MPSoC architecture and details its different components. The FPGA-based implementation and the experimental results validate the architectural model choice and show the effectiveness of this design

INRIA a CCSD electronic archive server