3 research outputs found

    Communication reduction techniques in numerical methods and deep neural networks

    Get PDF
    Inter-node communication has turned out to be one of the determining factors of the performance on modern HPC systems. Furthermore, the situation only gets worse with the ever-incresing size of the cores involved. Hence, this thesis explore the various possible techniques to reduce the communication during the execution of a parallel program. It turned out that there is no one-size-fit-all approach to the challenge. Despite this, the problems in each field, due to their unique characteristics, dispose of distinct opportunities for the communication reduction. The thesis, first devles into numerical linear algebra, develops an evolution of the Pipelined CG called IFCG. It eliminates the synchronizations normally take place towards the end of each iteration to increase the parallelism. Secondly, the thesis draws its attention on reducing the necessity to transfer the parameters between the CPU host and GPUs during a neural network training. It develops two routines: ADT and AWP in order to compress and decompress the weights with a reduced data representation format prior and right after the data transfer takes place. The compress rate is adjusted vis-脿-vis the L2-norm of the weights of every layer. In the third contribution, the thesis diminish the communication in model parallelizing a deep neural network. Instead of splitting and distributing the neurons of each layer to the available processes on the system, now it is done every other layers. This results in a 50% percent reduction of the communication whereas it introduces 50% of extra local FP computation.La comunicaci贸 entre els nodes de computaci贸 multi-core sorgeix com un dels factors principals que impacta el rendiment d鈥檜n sistema HPC d鈥檃vui en dia. I m茅s, mentre m茅s core es pusa, pitjor la situaci贸. Per tant aquesta tesi explora les possibles t猫cniques per a reduir la comunicaci贸 en l鈥檈xecuci贸 d鈥檜n programa paral路lel. Tot i aix貌, resulta que no existeix una sola t猫cnica que pugui resoldre aquest obstacle. Tot i que els problemes en cada 脿mbit, com que t茅 els seus propis caracristics, disposa variosos oportunitats per la reducci贸 de comunicaci贸. La tesi, en primer lloc, dins de l鈥櫭爉bit de l鈥櫭爈gebra lineal num猫riques desenvolupa un algoritme IFCG que 茅s una evoluci贸 de Pipelined CG. IFCG elimina les sincronitzacions normalment posa cap al final de cada iteraci贸 per augmentar el paral路lelisme. En la segona contribuci贸, la tesi dirigeix l鈥檃tenci贸 a reduir la necessitat de transferir els par脿metres entre el CPU i els GPUs durant l鈥檈ntrenament d鈥檜na xarxa neuronal. Desenvolupa rutines ADT i AWP per comprimir i descomprimir els pesos amb una representaci贸 de dades redu茂da abans i just despr猫s de la transfer猫ncia. La representaci贸 es decideix din脿micament segons el L2-norm dels pesos a cada capa. Al final la tesi disminueix la comunicaci贸 en paral路lelitzar el model duna xarxa neurona. En lloc de distribuir les neurones de cada capa als processos disponibles en el sistema, es fa cada dues capes. Aix铆 que corta com mitja de la comunicaci贸. En canvi, com que distribueix nom茅s cada dues capes, les capes restes es repliquen, resulta que incorre en una augmenta de 50% de computaci贸 local.Postprint (published version

    Communication reduction techniques in numerical methods and deep neural networks

    Get PDF
    Inter-node communication has turned out to be one of the determining factors of the performance on modern HPC systems. Furthermore, the situation only gets worse with the ever-incresing size of the cores involved. Hence, this thesis explore the various possible techniques to reduce the communication during the execution of a parallel program. It turned out that there is no one-size-fit-all approach to the challenge. Despite this, the problems in each field, due to their unique characteristics, dispose of distinct opportunities for the communication reduction. The thesis, first devles into numerical linear algebra, develops an evolution of the Pipelined CG called IFCG. It eliminates the synchronizations normally take place towards the end of each iteration to increase the parallelism. Secondly, the thesis draws its attention on reducing the necessity to transfer the parameters between the CPU host and GPUs during a neural network training. It develops two routines: ADT and AWP in order to compress and decompress the weights with a reduced data representation format prior and right after the data transfer takes place. The compress rate is adjusted vis-脿-vis the L2-norm of the weights of every layer. In the third contribution, the thesis diminish the communication in model parallelizing a deep neural network. Instead of splitting and distributing the neurons of each layer to the available processes on the system, now it is done every other layers. This results in a 50% percent reduction of the communication whereas it introduces 50% of extra local FP computation.La comunicaci贸 entre els nodes de computaci贸 multi-core sorgeix com un dels factors principals que impacta el rendiment d鈥檜n sistema HPC d鈥檃vui en dia. I m茅s, mentre m茅s core es pusa, pitjor la situaci贸. Per tant aquesta tesi explora les possibles t猫cniques per a reduir la comunicaci贸 en l鈥檈xecuci贸 d鈥檜n programa paral路lel. Tot i aix貌, resulta que no existeix una sola t猫cnica que pugui resoldre aquest obstacle. Tot i que els problemes en cada 脿mbit, com que t茅 els seus propis caracristics, disposa variosos oportunitats per la reducci贸 de comunicaci贸. La tesi, en primer lloc, dins de l鈥櫭爉bit de l鈥櫭爈gebra lineal num猫riques desenvolupa un algoritme IFCG que 茅s una evoluci贸 de Pipelined CG. IFCG elimina les sincronitzacions normalment posa cap al final de cada iteraci贸 per augmentar el paral路lelisme. En la segona contribuci贸, la tesi dirigeix l鈥檃tenci贸 a reduir la necessitat de transferir els par脿metres entre el CPU i els GPUs durant l鈥檈ntrenament d鈥檜na xarxa neuronal. Desenvolupa rutines ADT i AWP per comprimir i descomprimir els pesos amb una representaci贸 de dades redu茂da abans i just despr猫s de la transfer猫ncia. La representaci贸 es decideix din脿micament segons el L2-norm dels pesos a cada capa. Al final la tesi disminueix la comunicaci贸 en paral路lelitzar el model duna xarxa neurona. En lloc de distribuir les neurones de cada capa als processos disponibles en el sistema, es fa cada dues capes. Aix铆 que corta com mitja de la comunicaci贸. En canvi, com que distribueix nom茅s cada dues capes, les capes restes es repliquen, resulta que incorre en una augmenta de 50% de computaci贸 local
    corecore