2 research outputs found

    swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

    Full text link
    The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

    Enabling the use of embedded and mobile technologies for high-performance computing

    Get PDF
    In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in High-Performance Computing(HPC). This transformation has been so effective that the November 2016 TOP500 list is still dominated by x86 architecture. In 2016, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smartphones andtablets, most of which are built with ARM-based Systems on Chips (SoC). This suggests that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC. This thesis addresses this question in detail.We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. Through development of real system prototypes and their performance analysis we assess the feasibility of building an HPCsystem based on mobile SoCs. Through simulation of the future mobile SoC, we identify the missing features and suggest improvements that would enable theuse of future mobile SoCs in HPC environment. Thus, we present design guidelines for future generations mobile SoCs, and HPC systems built around them, enabling the newclass of cheap supercomputers.A finales de la d茅cada de los 90, razones econ贸micas llevaron a la adopci贸n de procesadores de uso general en sistemas de Computaci贸n de Altas Prestaciones (HPC). Esta transformaci贸n ha sido tan efectiva que la lista TOP500 de noviembre de 2016 sigue aun dominada por la arquitectura x86. En 2016, el mayor mercado de productos b谩sicos en computaci贸n no son los ordenadores de sobremesa o los servidores, sino la computaci贸n m贸vil, que incluye tel茅fonos inteligentes y tabletas, la mayor铆a de los cuales est谩n construidos con sistemas en chip(SoC) de arquitectura ARM. Esto sugiere que una vez que los SoC m贸viles ofrezcan un rendimiento suficiente, podr谩n utilizarse para reducir el costo desistemas HPC. Esta tesis aborda esta cuesti贸n en detalle. Analizamos la tendencia del rendimiento de los SoC para m贸vil, compar谩ndola con la tendencia similar ocurrida en los a帽osnoventa. A trav茅s del desarrollo de prototipos de sistemas reales y su an谩lisis de rendimiento, evaluamos la factibilidad de construir unsistema HPC basado en SoCs m贸viles. A trav茅s de la simulaci贸n de SoCs m贸viles futuros, identificamos las caracter铆sticas que faltan y sugerimos mejoras quepermitir铆an su uso en entornos HPC. Por lo tanto, presentamos directrices de dise帽o para futuras generaciones de SoCs m贸viles y sistemas HPC construidos a sualrededor, para permitir la construcci贸n de una nueva clase de supercomputadores de coste reducido
    corecore