35,479 research outputs found

    Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

    Full text link
    We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

    Variational Inference for Stochastic Block Models from Sampled Data

    Full text link
    This paper deals with non-observed dyads during the sampling of a network and consecutive issues in the inference of the Stochastic Block Model (SBM). We review sampling designs and recover Missing At Random (MAR) and Not Missing At Random (NMAR) conditions for the SBM. We introduce variants of the variational EM algorithm for inferring the SBM under various sampling designs (MAR and NMAR) all available as an R package. Model selection criteria based on Integrated Classification Likelihood are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We explore two real-world networks from ethnology (seed circulation network) and biology (protein-protein interaction network), where the interpretations considerably depends on the sampling designs considered

    T-WAS and T-XAS algorithms for fiber-loop optical buffers

    No full text
    In optical packet/burst switched networks fiber loops provide a viable and compact means of contention resolution. For fixed size packets it is known that a basic void-avoiding schedule (VAS) can vastly outperform a more classical pre-reservation algorithm as FCFS. For the setting of a uniform distributed packet size and a restricted buffer size we proposed two novel forward-looking algorithms, WAS and XAS, that, in specific settings, outperform VAS up to 20% in terms of packet loss. This contribution extends the usage and improves the performance of the WAS and XAS algorithms by introducing an additional threshold variable. By optimizing this threshold, the process of selectively delaying packet longer than strictly necessary can be made more or less strict and as such be fitted to each setting. By Monte Carlo simulation it is shown that the resulting T-WAS and T-XAS algorithms are most effective for those instances where the algorithms without threshold can offer no or only limited performance improvement
    • …
    corecore