3 research outputs found

    Evaluation of high-speed FPGA IO for inter-board communication

    Get PDF
    Growing demand for computation power requires high speed interconnects between FPGA devices. While there are multiple solutions available it is still challenging to choose one suited for the particular task. Is it therefore extremely import for both academic and industrial purposes to have access to real world performance evaluation of high speed interconnect technologies commonly offered on FPGAs. In this thesis we study the feasibility of high-speed interconnect and find that it is most relevant to evaluate the performance of LVDS and dedicated transceivers for board-to-board communication scenario. To address this requirement we design evaluation of a system implemented in Altera Cyclone V devices and conduct measurements of the transmission performance and resource usage. LVDS inter-board communication was implemented as point-to-point topology between two FPGA boards. The maximum received data rate is 823 Mbps per channel. On the base of the transceiver interface, the chain topology was created for communication of three devices. The maximum measured speed in the transceiver system is 1822 Mbps. The average logic utilization of the designs is about 3% of the FPGA resources. At the same time, 38% of the global clocks are used in the transceiver design. On the base of the performed experiments, we conclude that required high-speed interconnection can be implemented by establishing FPGA-to-FPGA communication via LVDS and the dedicated transceivers interfaces

    Partially shared cache and adaptive replacement algorithm for NoC-based many-core systems

    Get PDF
    The Network-on-Chip(NoC) is a promising alternative to traditional bus-based architectures that has been widely applied to interconnect multi/many-core systems due to its scalable and modular design. Undoubtedly, the memory wall problem is one of the most important challenges; however, this problem can now be somewhat be alleviated by cache subsystems. In this paper, to overcome the high resource consumption and low data-sharing rate problems of the private cache scheme, we propose a partially shared cache structure and a corresponding replacement algorithm based on a mesh NoC. In this scheme, the L2 cache is shared by each group of four cores that connected as a cluster to a given node by the local bus. To maximize the performance of this partially shared cache structure, we propose a core-aware re-reference interval prediction (CA-RRIP) replacement algorithm. The algorithm performs dynamic virtual partitioning on the partially shared cache; the core that initiated the cache access request will be given top priority when a cache area needs to be replaced or inserted. This approach guarantees cache exclusivity and can mitigate interactions among cores using different access patterns. We implement the traditional private, the proposed partially shared and the row-shared cache subsystems in our experiments. The comparisons indicate that the overall system resource occupation can be reduced by 20% with the same number of cores, and the instructions per cycle(IPC) of the system could increase by up to 49.2%. Moreover, the system throughput(STP) increased by an average of 5.89%. Our experimental results showed that the proposed CA-RRIP algorithm also reduces the average cache miss rate of the system under various cache access patterns

    Arquiteturas de hardware para aceleração de algoritmos de reconstrução morfológica

    Get PDF
    Este trabalho apresenta um estudo da implementação de algoritmos para a reconstrução morfológica de imagens bio-medicas em FPGAs (Field Programmable Gate Arrays). As arquiteturas foram baseadas nos algoritmos Sequential Reconstruction (SR) e Fast Hybrid (FH) usando linguagem de descrição de hardware VHDL (Very High Description Language). A metodologia para avaliar a plataforma consistiu em verificar a arquitetura projetada no QuestaSim, fornecendo como dados de entrada as imagens a ser reconstruídas. Adicionalmente, a validação dos resultados da arquitetura foi feita usando linguagem C ou Matlab (usando a função imreconstruct). Além disso, um estudo consumo de recursos de hardware para diferentes tamanhos e conteúdos de imagens foram realizados com o intuito de verificar a aplicabilidade dos algoritmos em arquiteturas reconfiguráveis. Neste trabalho, para a aceleração do processo de reconstrução da imagem foi proposta uma arquitetura reconfigurável baseada no algoritmo FH junto com um algoritmo de aprendizagem de máquina, especificamente uma máquina de vetores de suporte (SVM). Para o treinamento da SVM foi usada uma metodologia de verificação/validação obtendo aproximadamente 20.000 dados de treinamento. Finalmente, foi implementada uma arquitetura que particiona a imagem original em quatro unidades de processamento, processando cada unidade em paralelo. O sistema final implementado fornece um pixel processado por cada ciclo de relógio, depois de um tempo de latência, sendo aproximadamente 8 vezes mais rápida que sua versão não particionada. Adicionalmente, foram feitas comparações rodando os algoritmos de reconstrução morfológica em um processador ARM embarcado dentro do FPGA.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).This work presents a study of the implementation of algorithms for the morphological reconstruction of bio-medical images in FPGAs (Field Programmable Gate Arrays). The architectures were based on Sequential Reconstruction (SR) algorithms and Fast Hybrid (FH), using VHDL (Very High Description Language). The methodology for the evaluation of the platform consisted of verifying the architecture designed in QuestaSim, providing the images to be reconstructed as input data. Additionally, the validation of the results of the architecture was made using C or Matlab languages (using the imreconstruct function). Additionally, a study of hardware resource consumption for different sizes and content of images was conducted, in order to verify the applicability of the algorithms in reconfigurable architectures. In this work, in order to accelerate the image reconstruction process, a reconfigurable architecture based on the FH algorithm is proposed together with machine learning, specifically a support vector machine (SVM). For the SVM training a verification/validation methodology was used, obtaining approximately 20,000 training data. Finally, an architecture was implemented that partitions the original image in four processing units, processing each unit in parallel. The final system implemented provides one pixel processed for each clock cycle, after a latency time, being approximately 8 times faster than its unpartitioned version. Lastly, comparisons were made by running the morphological reconstruction algorithms in an ARM processor embedded within the FPGA
    corecore