4 research outputs found

    Exploiting and coping with sparsity to accelerate DNNs on CPUs

    Get PDF
    Deep Neural Networks (DNNs) have become ubiquitous, achieving state-of-the-art results across a wide range of tasks. While GPUs and domain specific accelerators are emerging, general-purpose CPUs hold a firm position in the DNN market due to their high flexibility, high availability, high memory capacity, and low latency. Various working sets in DNN workloads can be sparse, i.e., contain zeros. Depending on the source of the sparsity, the level of the sparsity varies. First, when the level is low enough, traditional sparse algorithms are not competitive against dense algorithms. In such cases, the common practice is to apply dense algorithms on uncompressed sparse inputs. However, this implies that a fraction of the computations are ineffectual because they operate on zero-valued inputs. Second, when the level is high, one may apply traditional sparse algorithms on compressed sparse inputs. Although such approach does not induce ineffectual computations, the indirection in a compressed format often causes irregular memory accesses, hampering the performance. This thesis studies how to improve DNN training and inference performance on CPUs by both discovering work-skipping opportunity in the first case and coping with the irregularity in the second case. To tackle the first case, this thesis proposes both a pure software approach and a software-transparent hardware approach. The software approach is called SparseTrain. It leverages the moderately sparse activations in Convolutional Neural Networks (CNNs) to speed up their training and inference. Such sparsity changes dynamically and is unstructured, i.e. it has no discernible patterns. SparseTrain detects the zeros inside a dense representation and dynamically skips over useless computations at run-time. The hardware approach is called the Sparsity Aware Vector Engine (SAVE). SAVE exploits the unstructured sparsity in both the activations and the weights. Similar to SparseTrain, SAVE also dynamically detects zeros in a dense representation and then skips ineffectual work. SAVE augments a CPU's vector processing pipeline. It assembles denser vector operands by combining effectual vector lanes from multiple vector instructions that contain ineffectual lanes. SAVE is general purpose. It accelerates any vector workload that has zeros in the inputs. Nonetheless, it contains optimizations targeting matrix multiplication based DNN models. Both SparseTrain and SAVE accelerate DNN training and inference on CPUs significantly. For the second case, this thesis focuses on a type of DNN that is severely impacted by the irregularity from sparsity --- Graph Neural Networks (GNNs). GNNs take graphs as the input, and graphs often contain highly sparse connections. This thesis proposes software optimizations that (i) overlap the irregular memory accesses with the compute, (ii) compress and decompress the features dynamically, and (iii) improve the temporal reuse of the features. The optimized implementation significantly outperforms a state-of-the-art GNN implementation. In addition, this thesis discusses the idea of offloading a GNN's irregular memory access phase to an augmented Direct Memory Access (DMA) engine, as a future work
    corecore