114 research outputs found

    Whole-function vectorization

    Full text link
    Abstract—Data-parallel programming languages are an impor-tant component in today’s parallel computing landscape. Among those are domain-specific languages like shading languages in graphics (HLSL, GLSL, RenderMan, etc.) and “general-purpose” languages like CUDA or OpenCL. Current implementations of those languages on CPUs solely rely on multi-threading to imple-ment parallelism and ignore the additional intra-core parallelism provided by the SIMD instruction set of those processors (like Intel’s SSE and the upcoming AVX or Larrabee instruction sets). In this paper, we discuss several aspects of implementing data-parallel languages on machines with SIMD instruction sets. Our main contribution is a language- and platform-independent code transformation that performs whole-function vectorization on low-level intermediate code given by a control flow graph in SSA form. We evaluate our technique in two scenarios: First, incorpo-rated in a compiler for a domain-specific language used in real-time ray tracing. Second, in a stand-alone OpenCL driver. We observe average speedup factors of 3.9 for the ray tracer and factors between 0.6 and 5.2 for different OpenCL kernels. I

    Speeding up computer vision applications on mobile computing platforms

    Get PDF
    [CATALÀ] Aquest projecte investiga la manera d'accelerar nuclis de visió per computador a través de diferents tècniques d'optimització i paral·lelització. Hem portat l'algoritme KinectFusion a una plataforma mòbil fent servir OpenCL.[ANGLÈS] This project investigates ways of speeding up computer vision kernels through optimisation and parallelisation. We ported the KinectFusion algorithm to a mobile platform using OpenCL

    Efficient evolutionary-based neural architecture search in few GPU hours for image classification and medical image segmentation

    Get PDF
    Orientador: Lucas Ferrari de OliveiraTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 20/09/2021Inclui referências: p. 132-139Área de concentração: Ciência da ComputaçãoResumo: O uso de aprendizagem profunda (AP) está crescendo rapidamente, já que o poder computacional atual fornece otimização e inferência rápidas. Além disso, vários métodos exclusivos de AP estão evoluindo, permitindo resultados superiores em visão computacional, reconhecimento de voz e análise de texto. Os métodos AP extraem característica automaticamente para melhor representação de um problema específico, removendo o árduo trabalho do desenvolvimento de descritores de características dos métodos convencionais. Mesmo que esse processo sejaautomatizado, a criação inteligente de redes neurais é necessária para o aprendizado adequado da representação, o que requer conhecimento em AP. O campo de busca de arquiteturas neurais (BAN) foca no desenvolvimento de abordagens inteligentes que projetam redes robustas automaticamente para reduzir o conhecimento exigido para o desenvolvimento de redes eficientes. BAN pode fornecer maneiras de descobrir diferentes representações de rede, melhorando o estado da arte em diferentes aplicações. Embora BAN seja relativamente nova, várias abordagens foram desenvolvidas para descobrir modelos robustos. Métodos eficientes baseados em evolução são amplamente populares em BAN, mas seu alto consumo de placa gráfica (de alguns dias a meses)desencoraja o uso prático. No presente trabalho, propomos duas abordagens BAN baseadas na evolução eficiente com baixo custo de processamento, exigindo apenas algumas horas de processamento na placa gráfica (menos de doze em uma RTX 2080Ti) para descobrir modelos competitivos. Nossas abordagens extraem conceitos da programação de expressão gênica para representar e gerar redes baseadas em células robustas combinadas com rápido treinamento de candidatos, compartilhamento de peso e combinações dinâmicas. Além disso, os métodos propostos são empregados em um espaço de busca mais amplo, com mais células representando uma rede única. Nossa hipótese central é que BAN baseado na evolução pode ser usado em uma busca com baixo custo (combinada com uma estratégia robusta e busca eficiente) em diversas tarefas de visão computacional sem perder competitividade. Nossos métodos são avaliados em diferentes problemas para validar nossa hipótese: classificação de imagens e segmentação semântica de imagens médicas. Para tanto, as bases de dados CIFAR são estudadas para atarefa de classificação e o desafio CHAOS para a tarefa de segmentação. As menores taxas de erro encontradas nas bases CIFAR-10 e CIFAR-100 foram 2,17% ± 0,10 e 15,47% ± 0,51,respectivamente. Quanto às tarefas do desafio CHAOS, os valores de Dice ficaram entre 90% e96%. Os resultados obtidos com nossas propostas em ambas as tarefas mostraram a descoberta de redes robustas para ambas as tarefas com baixo custo na fase de busca, sendo competitivas em relação ao estado da arte em ambos os desafios.Abstract: Deep learning (DL) usage is growing fast since current computational power provides fast optimization and inference. Furthermore, several unique DL methods are evolving, enabling superior computer vision, speech recognition, and text analysis results. DL methods automatically extract features to represent a specific problem better, removing the hardworking of feature engineering from conventional methods. Even if this process is automated, intelligent network design is necessary for proper representation learning, which requires expertise in DL. The neural architecture search (NAS) field focuses on developing intelligent approaches that automatically design robust networks to reduce the expertise required for developing efficient networks. NAS may provide ways to discover different network representations, improving the state-of-the-art indifferent applications. Although NAS is relatively new, several approaches were developed for discovering robust models. Efficient evolutionary-based methods are widely popular in NAS, buttheir high GPU consumption (from a few days to months) discourages practical use. In the presentwork, we propose two efficient evolutionary-based NAS approaches with low-GPU cost, requiring only a few GPU hours (less than twelve in an RTX 2080Ti) to discover competitive models. Our approaches extract concepts from gene expression programming to represent and generate robust cell-based networks combined with fast candidate training, weight sharing, and dynamic combinations. Furthermore, the proposed methods are employed in a broader search space, withmore cells representing a unique network. Our central hypothesis is that evolutionary-based NAScan be used in a low-cost GPU search (combined with a robust strategy and efficient search) indiverse computer vision tasks without losing competitiveness. Our methods are evaluated indifferent problems to validate our hypothesis: image classification and medical image semantic segmentation. For this purpose, the CIFAR datasets are studied for the classification task andthe CHAOS challenge for the segmentation task. The lowest error rates found in CIFAR-10 andCIFAR-100 datasets were 2.17% ± 0.10 and 15.47% ± 0.51, respectively. As for the CHAOS challenge tasks, the dice scores were between 90% and 96%. The obtained results from our proposal in both tasks shown the discovery of robust networks for both tasks with little GPU costin the search phase, being competitive to state-of-the-art approaches in both challenges

    Enhancement of a Formula Student car perception system using a global 3D map

    Get PDF
    La implementació d'un precís sistema de localització i mapeig al nostre cotxe de Formula Student sense conductor ha revolucionat el sistema de percepció del cotxe d'aquesta temporada. Ara, cal un nou mètode que aprofiti aquest mapa 3D millorat. En aquest mapa s'obtenen les posicions dels cons, que es classifiquen amb la informació extreta de les imatges de la càmera per calcular els límits de la pista. Aquesta tesi proposa un nou sistema per classificar i fer un seguiment dels cons (CCAT) i un altre sistema (Urimits) per estendre els límits de pista parcials dependents del color fent servir cons no classificats. Tots dos sistemes han aconseguit una millora respecte als de la temporada passada pel que fa a abast i precisió. Ara és possible detectar els límits de la pista tot tancant la volta abans que el cotxe físicament completi la tornada.The implementation of an accurate localization and mapping system in our driverless Formula Student car has revolutionized this season's car perception pipeline. Now, a new system that takes advantage of this improved 3D map is needed. Cone positions are obtained in this map, and these are classified with the information extracted from camera images in order to compute the track limits. This thesis proposes a new system to classify and keep track of cones (CCAT) and another system (Urimits) to extend partial color dependant track limits using unclassified cones. Both systems have achieved an enhancement over last season's in range and accuracy. Now the possibility of detecting the whole track limits before the car completes the lap is possible

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity

    Evolutionary Reinforcement Learning: A Survey

    Full text link
    Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field

    Review on Computational Electromagnetics

    Get PDF
    Computational electromagnetics (CEM) is applied to model the interaction of electromagnetic fields with the objects like antenna, waveguides, aircraft and their environment using Maxwell equations.  In this paper the strength and weakness of various computational electromagnetic techniques are discussed. Performance of various techniques in terms accuracy, memory and computational time for application specific tasks such as modeling RCS (Radar cross section), space applications, thin wires, antenna arrays are presented in this paper

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches
    corecore