Search CORE

218 research outputs found

Field-Configurable GPU

Author: Pedro Rodrigues de Castro
Publication venue
Publication date: 09/07/2021
Field of study

Nesta dissertação pretende-se desenvolver uma arquitetura de processamento dedicada destinada à aceleração de aplicações específicas, inspirada na estrutura de unidades de processamento do tipo GPU. A unidade de processamento deverá ser programável e configurável para os requisitos de aplicações específicas, sendo adaptada aos tipos e à quantidade de recursos lógicos disponíveis num dispositivo FPGA selecionado. Pretende-se que o acelerador consiga tirar o máximo partido dos recursos disponíveis num determinado dispositivo FPGA (memória, unidades aritméticas, recursos lógicos) com o objetivo de maximizar o desempenho de aplicações selecionadas. Serão consideradas aplicações alvo no domínio do processamento de imagem e de "machine learning". Uma vez selecionada uma arquitetura base, a especialização para uma aplicação (ou classe de aplicações) terá por base a configuração de 3 componentes fundamentais: organização do sistema de memória distribuída (construído com os blocos de memória RAM internos da FPGA), organização das unidades de processamento aritmético (que podem ser heterogéneas) e dimensão dos caminhos de dados. O sistema a desenvolver deverá ser desenhado ao nível RTL, em Verilog, e contemplar um processo automatizado para personalizar o acelerador a partir de um conjunto de especificações definidas com base nas características da aplicação alvo. Esse processo de personalização poderá ser feito com base na definição de parâmetros em Verilog, ou também recorrendo a aplicações dedicadas, a desenvolver, para gerar diretamente código Verilog. Deverá também ser desenvolvido um conjunto elementar de ferramentas de suporte, nomeadamente para geração do código a executar pelo processador. Como validação final, pretende-se integrar e demonstrar o acelerador num sistema de processamento de imagem em tempo real

Repositório Aberto da Universidade do Porto

Evaluation of the parallel computational capabilities of embedded platforms for critical systems

Author: Jover-Alvarez Alvaro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 28/10/2021
Field of study

Modern critical systems need higher performance which cannot be delivered by the simple architectures used so far. Latest embedded architectures feature multi-cores and GPUs, which can be used to satisfy this need. In this thesis we parallelise relevant applications from multiple critical domains represented in the GPU4S benchmark suite, and perform a comparison of the parallel capabilities of candidate platforms for use in critical systems. In particular, we port the open source GPU4S Bench benchmarking suite in the OpenMP programming model, and we benchmark the candidate embedded heterogeneous multi-core platforms of the H2020 UP2DATE project, NVIDIA TX2, NVIDIA Xavier and Xilinx Zynq Ultrascale+, in order to drive the selection of the research platform which will be used in the next phases of the project. Our result indicate that in terms of CPU and GPU performance, the NVIDIA Xavier is the highest performing platform

UPCommons. Portal del coneixement obert de la UPC

Android Application Development for the Intel Platform

Author: Cohen Ryan
Wang Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Computer scienc

OAPEN Library

Improving GPU performance : reducing memory conflicts and latency

Author: Braak van den, G.J.W.
Publication venue: Technische Universiteit Eindhoven
Publication date: 25/11/2015
Field of study

Pure OAI Repository

Improving GPU performance : reducing memory conflicts and latency

Author: Braak van den, G.J.W.
Publication venue: Technische Universiteit Eindhoven
Publication date: 25/11/2015
Field of study

Pure OAI Repository

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

Media processor implementations of image rendering algorithms

Author: Smith Josephine
Publication venue: RIT Scholar Works
Publication date: 09/05/2003
Field of study

Demands for fast execution of image processing are a driving force for today\u27s computing market. Many image processing applications require intense numeric calculations to be done on large sets of data with minimal overhead time. To meet this challenge, several approaches have been used. Custom-designed hardware devices are very fast implementations used in many systems today. However, these devices are very expensive and inflexible. General purpose computers with enhanced multimedia instructions offer much greater flexibility but process data at a much slower rate than the custom-hardware devices. Digital signal processors (DSP\u27s) and media processors, such as the MAP-CA created by Equator Technologies, Inc., may be an efficient alternative that provides a low-cost combination of speed and flexibility. Today, DSP\u27s and media processors are commonly used in image and video encoding and decoding, including JPEG and MPEG processing techniques. Little work has been done to determine how well these processors can perform other image process ing techniques, specifically image rendering for printing. This project explores various image rendering algorithms and the performance achieved by running them on a me dia processor to determine if this type of processor is a viable competitor in the image rendering domain. Performance measurements obtained when implementing rendering algorithms on the MAP-CA show that a 4.1 speedup can be achieved with neighborhood-type processes, while point-type processes achieve an average speedup of 21.7 as compared to general purpose processor implementations

RIT Scholar Works

PC-grade parallel processing and hardware acceleration for large-scale data analysis

Author: Yang Su
Publication venue
Publication date: 01/01/2009
Field of study

Arguably, modern graphics processing units (GPU) are the first commodity, and desktop parallel processor. Although GPU programming was originated from the interactive rendering in graphical applications such as computer games, researchers in the field of general purpose computation on GPU (GPGPU) are showing that the power, ubiquity and low cost of GPUs makes them an ideal alternative platform for high-performance computing. This has resulted in the extensive exploration in using the GPU to accelerate general-purpose computations in many engineering and mathematical domains outside of graphics. However, limited to the development complexity caused by the graphics-oriented concepts and development tools for GPU-programming, GPGPU has mainly been discussed in the academic domain so far and has not yet fully fulfilled its promises in the real world. This thesis aims at exploiting GPGPU in the practical engineering domain and presented a novel contribution to GPGPU-driven linear time invariant (LTI) systems that are employed by the signal processing techniques in stylus-based or optical-based surface metrology and data processing. The core contributions that have been achieved in this project can be summarized as follow. Firstly, a thorough survey of the state-of-the-art of GPGPU applications and their development approaches has been carried out in this thesis. In addition, the category of parallel architecture pattern that the GPGPU belongs to has been specified, which formed the foundation of the GPGPU programming framework design in the thesis. Following this specification, a GPGPU programming framework is deduced as a general guideline to the various GPGPU programming models that are applied to a large diversity of algorithms in scientific computing and engineering applications. Considering the evolution of GPU’s hardware architecture, the proposed frameworks cover through the transition of graphics-originated concepts for GPGPU programming based on legacy GPUs and the abstraction of stream processing pattern represented by the compute unified device architecture (CUDA) in which GPU is considered as not only a graphics device but a streaming coprocessor of CPU. Secondly, the proposed GPGPU programming framework are applied to the practical engineering applications, namely, the surface metrological data processing and image processing, to generate the programming models that aim to carry out parallel computing for the corresponding algorithms. The acceleration performance of these models are evaluated in terms of the speed-up factor and the data accuracy, which enabled the generation of quantifiable benchmarks for evaluating consumer-grade parallel processors. It shows that the GPGPU applications outperform the CPU solutions by up to 20 times without significant loss of data accuracy and any noticeable increase in source code complexity, which further validates the effectiveness of the proposed GPGPU general programming framework. Thirdly, this thesis devised methods for carrying out result visualization directly on GPU by storing processed data in local GPU memory through making use of GPU’s rendering device features to achieve realtime interactions. The algorithms employed in this thesis included various filtering techniques, discrete wavelet transform, and the fast Fourier Transform which cover the common operations implemented in most LTI systems in spatial and frequency domains. Considering the employed GPUs’ hardware designs, especially the structure of the rendering pipelines, and the characteristics of the algorithms, the series of proposed GPGPU programming models have proven its feasibility, practicality, and robustness in real engineering applications. The developed GPGPU programming framework as well as the programming models are anticipated to be adaptable for future consumer-level computing devices and other computational demanding applications. In addition, it is envisaged that the devised principles and methods in the framework design are likely to have significant benefits outside the sphere of surface metrology.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository