21 research outputs found

    Cross-Layer System Design for Autonomous Driving

    Full text link
    Autonomous driving has gained tremendous popularity and becomes one of the most emerging applications recently, which allows the vehicle to drive by itself without requiring help from a human. The demand of this application continues to grow leading to ever increasing investment from industry in the last decade. Unfortunately, autonomous driving systems remain unavailable to the public and are still under development even with the recent considerable advancement achieved in our community. Several key challenges are observed across the stack of autonomous driving systems and must be addressed to bridge the gap. This dissertation investigates cross-layer autonomous driving systems from hardware architecture, software algorithms to human-vehicle interaction. In the hardware architecture layer, we investigate and present the design constraints of autonomous driving systems. With an end-to-end autonomous driving system framework we built, we accelerate the computational bottlenecks identified and thoroughly investigate the implications and trade-offs across various accelerator platforms. In the software algorithm layer, we propose an accelerating technique for object recognition, which is one of the critical bottlenecks in autonomous driving systems. We exploit the similarity across frames in streaming videos for autonomous vehicles and reuse the intermediate outputs computed in the algorithm to reduce the computation required and improve the performance. In the human-vehicle interaction layer, we design a conversational in-vehicle interface framework which enables drivers to interact with vehicles by using natural human language to improve the usability of autonomous driving features. We also integrate this framework into a commercially available vehicle and conduct a real-world driving study.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/149951/1/shihclin_1.pd

    Compiler-centric across-stack deep learning acceleration

    Get PDF
    Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches. This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1× on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5× on average. The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware

    Programming Languages for Data-Intensive HPC Applications: a Systematic Mapping Study

    Get PDF
    A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity of the software for HPC, it is useful to identify programming languages that can be used to alleviate this issue. Because the existing literature on the topic of HPC is very dispersed, we performed a Systematic Mapping Study (SMS) in the context of the European COST Action cHiPSet. This literature study maps characteristics of various programming languages for data-intensive HPC applications, including category, typical user profiles, effectiveness, and type of articles. We organised the SMS in two phases. In the first phase, relevant articles are identified employing an automated keyword-based search in eight digital libraries. This lead to an initial sample of 420 papers, which was then narrowed down in a second phase by human inspection of article abstracts, titles and keywords to 152 relevant articles published in the period 2006–2018. The analysis of these articles enabled us to identify 26 programming languages referred to in 33 of relevant articles. We compared the outcome of the mapping study with results of our questionnaire-based survey that involved 57 HPC experts. The mapping study and the survey revealed that the desired features of programming languages for data-intensive HPC applications are portability, performance and usability. Furthermore, we observed that the majority of the programming languages used in the context of data-intensive HPC applications are text-based general-purpose programming languages. Typically these have a steep learning curve, which makes them difficult to adopt. We believe that the outcome of this study will inspire future research and development in programming languages for data-intensive HPC applications.Additional co-authors: Sabri Pllana, Ana Respício, José Simão, Luís Veiga, Ari Vis

    Communication patterns abstractions for programming SDN to optimize high-performance computing applications

    Get PDF
    Orientador : Luis Carlos Erpen de BonaCoorientadores : Magnos Martinello; Marcos Didonet Del FabroTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 04/09/2017Inclui referências : f. 95-113Resumo: A evolução da computação e das redes permitiu que múltiplos computadores fossem interconectados, agregando seus poderes de processamento para formar uma computação de alto desempenho (HPC). As aplicações que são executadas nesses ambientes processam enormes quantidades de informação, podendo levar várias horas ou até dias para completar suas execuções, motivando pesquisadores de varias áreas computacionais a estudar diferentes maneiras para acelerá-las. Durante o processamento, essas aplicações trocam grandes quantidades de dados entre os computadores, fazendo que a rede se torne um gargalo. A rede era considerada um recurso estático, não permitindo modificações dinâmicas para otimizar seus links ou dispositivos. Porém, as redes definidas por software (SDN) emergiram como um novo paradigma, permitindoa ser reprogramada de acordo com os requisitos dos usuários. SDN já foi usado para otimizar a rede para aplicações HPC específicas mas nenhum trabalho tira proveito dos padrões de comunicação expressos por elas. Então, o principal objetivo desta tese é pesquisar como esses padrões podem ser usados para ajustar a rede, criando novas abstrações para programá-la, visando acelerar as aplicações HPC. Para atingir esse objetivo, nós primeiramente pesquisamos todos os níveis de programabilidade do SDN. Este estudo resultou na nossa primeira contribuição, a criação de uma taxonomia para agrupar as abstrações de alto nível oferecidas pelas linguagens de programação SDN. Em seguida, nós investigamos os padrões de comunicação das aplicações HPC, observando seus comportamentos espaciais e temporais através da análise de suas matrizes de tráfego (TMs). Concluímos que as TMs podem representar as comunicações, além disso, percebemos que as aplicações tendem a transmitir as mesmas quantidades de dados entre os mesmos nós computacionais. A segunda contribuição desta tese é o desenvolvimento de um framework que permite evitar os fatores da rede que podem degradar o desempenho das aplicações, tais como, sobrecarga imposta pela topologia, o desbalanceamento na utilização dos links e problemas introduzidos pela programabilidade do SDN. O framework disponibiliza uma API e mantém uma base de dados de TMs, uma para cada padrão de comunicação, anotadas com restrições de largura de banda e latência. Essas informações são usadas para reprogramar os dispositivos da rede, alocando uniformemente as comunicações nos caminhos da rede. Essa abordagem reduziu o tempo de execução de benchmarks e aplicações reais em até 26.5%. Para evitar que o código da aplicação fosse modificado, como terceira contribuição, desenvolvemos um método para identificar automaticamente os padrões de comunicação. Esse método gera texturas visuais di_erentes para cada TM e, através de técnicas de aprendizagem de máquina (ML), identifica as aplicações que estão usando a rede. Em nossos experimentos, o método conseguiu uma taxa de acerto superior a 98%. Finalmente, nós incorporamos esse método ao framework, criando uma abstração que permite programar a rede sem a necessidade de alterar as aplicações HPC, diminuindo em média 15.8% seus tempos de execução. Palavras-chave: Redes Definidas por Software, Padrões de Comunicação, Aplicações HPC.Abstract: The evolution of computing and networking allowed multiple computers to be interconnected, aggregating their processing powers to form a high-performance computing (HPC). Applications that run in these computational environments process huge amounts of information, taking several hours or even days to complete their executions, motivating researchers from various computational fields to study different ways for accelerating them. During the processing, these applications exchange large amounts of data among the computers, causing the network to become a bottleneck. The network was considered a static resource, not allowing dynamic adjustments for optimizing its links or devices. However, Software-Defined Networking (SDN) emerged as a new paradigm, allowing the network to be reprogrammed according to users' requirements. SDN has already been used to optimize the network for specific HPC applications, but no existing work takes advantage of the communication patterns expressed by those applications. So, the main objective of this thesis is to research how these patterns can be used for tuning the network, creating new abstractions for programming it, aiming to speed up HPC applications. To achieve this goal, we first surveyed all SDN programmability levels. This study resulted in our first contribution, the creation of a taxonomy for grouping the high-level abstractions offered by SDN programming languages. Next, we investigated the communication patterns of HPC applications, observing their spatial and temporal behaviors by analyzing their traffic matrices (TMs). We conclude that TMs can represent the communications, furthermore, we realize that the applications tend to transmit the same amount of data among the same computational nodes. The second contribution of this thesis is the development of a framework for avoiding the network factors that can degrade the performance of applications, such as topology overhead, unbalanced links, and issues introduced by the SDN programmability. The framework provides an API and maintains a database of TMs, one for each communication pattern, annotated with bandwidth and latency constraints. This information is used to reprogram network devices, evenly placing the communications on the network paths. This approach reduced the execution time of benchmarks and real applications up to 26.5%. To prevent the application's source code to be modified, as a third contribution of our work, we developed a method to automatically identify the communication patterns. This method generates different visual textures for each TM and, through machine learning (ML) techniques, identifies the applications using the network. In our experiments the method succeeded with an accuracy rate over 98%. Finally, we incorporate this method into the framework, creating an abstraction that allows programming the network without changing the HPC applications, reducing on average 15.8% their execution times. Keywords: Software-Defined Networking, Communication Patterns, HPC Applications

    Memory-Efficient and Parallel Simulation of Super Carbon Nanotubes

    Get PDF
    Carbon nanotubes (CNTs) received much attention since their description in Nature in 1991. In principle, a carbon nanotube is a rolled up sheet of graphene, which can be imagined as a honeycomb grid of carbon atoms. This allotrope of carbon has many interesting properties like high tensile strength at very low weight or its high temperature resistance. This motivates the application of CNTs in material science to create new carbon nanotube enforced materials. They also possess interesting electronic properties since CNTs show either metallic or semiconducting behavior, depending on their configuration. The synthesis of branched carbon nanotubes allows the connection of straight CNTs to carbon nanotubes networks with branched tubes employed as junction elements. One of these networks are the so-called super carbon nanotubes (SCNTs) that were proposed in 2006. In that case, each carbon-carbon bond within the honeycomb grid is replaced by a CNT of equal size and each carbon atom by a Y-branched tube with three arms of equal length and a regular angle of 120° between the arms. This results in a structure that originates from tubes and regains the outer shape of a tube. It is also possible to repeat this process, replacing carbon-carbon bonds not with CNTs but with SCNTs, leading to very regular and self-similar structures of increasingly higher orders. Simulations demonstrate that the SCNTs also exhibit very interesting mechanical properties. They are even more flexible than CNTs and thus are good candidates for high strength com- posites or actuators with very low weight. Other applications arise again in microelectronics because of their configurable electronic behavior and in biology due to the biocompatibility of SCNTs. Despite progress in synthesizing processes for straight and branched CNTs, the production of SCNTs is still beyond current technological capabilities. In addition, real experiments at nanoscale are expensive and complex and hence, simulations are important to predict properties of SCNTs and to guide the experimental research. The atomic-scale finite element method (AFEM) already provides a well-established approach for simulations of CNTs at the atomic level. However, the model size of SCNTs grows very fast for larger tubes and the arising n-body and linear equation systems quickly exceed the memory capacity of available computer systems. This renders infeasible the simulation of large SCNTs on an atomic level, unless the regular structure of SCNTs can be taken into account to reduce the memory footprint. This thesis presents ways to exploit the symmetry and hierarchy within SCNTs enabling the simulation of higher order SCNTs. We develop structure-tailored and memory-saving data struc- tures which allow the storage of very large SCNTs models up to several billions of atoms while providing fast data access. We realize this with a novel graph data structure called Compressed Symmetric Graphs which is able to dynamically recompute large parts of structural information for tubes instead of storing them. We also present a new structure-aware and SMP-parallelized matrix-free solver for the linear equation systems involving the stiffness matrix, which employs an efficient caching mechanism for the data during the sparse matrix-vector multiplication. The matrix-free solver is twice as fast as a compressed row storage format-based reference solver, requiring only half the memory while caching all contributions of the matrix employed. We demonstrate that this solver, in combination with the Compressed Symmetric Graphs, is able to instantiate equation systems with matrices of an order higher than 5∗10^7 on a single compute node, while still fully caching all matrix data

    Linguagens para a Computação de Alto Desempenho, utilizadas no processamento de Big Data: Um Estudo de Mapeamento Sistemático

    Get PDF
    Big Data são conjuntos de informação de alto Volume, Velocidade e/ou Variedade que exigem formas inovadoras e económicas de processamento, que permitem uma melhor percepção, tomada de decisões e automação de processos. Desde 2002, a taxa de melhoria do desempenho em processadores simples diminuiu bruscamente. A fim de aumentar o poder dos processadores, foram utilizados múltiplos cores, em paralelo, num único chip. Para conseguir beneficiar deste tipo de arquiteturas, é necessário reescrever os programas sequenciais. O objetivo da Computação de Alto Desempenho (CAD) é estudar as metodologias e técnicas que permitem a exploração destas arquiteturas. O desafio é a necessidade de combinar o desenvolvimento de Software para a CAD com a gestão e análise de Big Data. Quando a computação paralela e distribuída é obrigatória, o código torna-se mais difícil. Para tal, é necessário saber quais são as linguagens a utilizar para facilitar essa tarefa. Pelo facto da literatura existente sobre o tópico da CAD se encontrar muito dispersa, foi conduzido um Estudo de Mapeamento Sistemático (EMS), que agrega caraterísticas sobre as diferentes linguagens encontradas (categoria; natureza; perfis de utilizador típicos; eficácia; tipos de artigos publicados na área), no processamento de Big Data, auxiliando estudantes, investigadores, ou outros profissionais que necessitem de uma introdução ou uma visão panorâmica sobre este tema. A pesquisa de artigos foi efetuada numa busca automatizada, baseada em palavraschave, nas bases de dados de 8 bibliotecas digitais selecionadas. Este processo resultou numa amostra inicial de 420 artigos, que foi reduzida a 152 artigos, publicados entre Janeiro de 2006 e Março de 2018. A análise manual desses artigos permitiu-nos identificar 26 linguagens em 33 publicações incluídas. Sumarizei e comparei as informações com as opiniões de profissionais. Os resultados indicaram que a maioria destas linguagens são Linguagem de Propósito Geral (LPG) em vez de Linguagem de Domínio Específico (LDE), o que nos leva a concluir que existe uma oportunidade de investigação aplicada de linguagens que tornem a codificação mais fácil para os especialistas do domínio

    Applicability of Recurrent Neural Networks to Player Data Analysis in Freemium Video Games

    Get PDF
    We demonstrate the applicability and practicality of recurrent neural networks (RNNs), a machine learning methodology suited for sequential data, on player data from the mobile video game My Singing Monsters. Since this data comes in as a stream of events, RNNs are a natural solution for analyzing this data with minimal preprocessing. We apply RNNs to monitor and forecast game metrics, predict player conversion, estimate lifetime player value, and cluster player behaviours. In each case, we discuss why the results are interesting, how the trained models can be applied in a business setting, and how the preliminary work can serve as a foundation for future research. Finally, as data on video game players is typically proprietary and confidential and results of research often go unpublished, this thesis serves to contribute to the literature on game user research

    Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

    No full text
    corecore