35,479 research outputs found
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
We show that DNN accelerator micro-architectures and their program mappings
represent specific choices of loop order and hardware parallelism for computing
the seven nested loops of DNNs, which enables us to create a formal taxonomy of
all existing dense DNN accelerators. Surprisingly, the loop transformations
needed to create these hardware variants can be precisely and concisely
represented by Halide's scheduling language. By modifying the Halide compiler
to generate hardware, we create a system that can fairly compare these prior
accelerators. As long as proper loop blocking schemes are used, and the
hardware can support mapping replicated loops, many different hardware
dataflows yield similar energy efficiency with good performance. This is
because the loop blocking can ensure that most data references stay on-chip
with good locality and the processing units have high resource utilization. How
resources are allocated, especially in the memory system, has a large impact on
energy and performance. By optimizing hardware resource allocation while
keeping throughput constant, we achieve up to 4.2X energy improvement for
Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long
Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202
Variational Inference for Stochastic Block Models from Sampled Data
This paper deals with non-observed dyads during the sampling of a network and
consecutive issues in the inference of the Stochastic Block Model (SBM). We
review sampling designs and recover Missing At Random (MAR) and Not Missing At
Random (NMAR) conditions for the SBM. We introduce variants of the variational
EM algorithm for inferring the SBM under various sampling designs (MAR and
NMAR) all available as an R package. Model selection criteria based on
Integrated Classification Likelihood are derived for selecting both the number
of blocks and the sampling design. We investigate the accuracy and the range of
applicability of these algorithms with simulations. We explore two real-world
networks from ethnology (seed circulation network) and biology (protein-protein
interaction network), where the interpretations considerably depends on the
sampling designs considered
T-WAS and T-XAS algorithms for fiber-loop optical buffers
In optical packet/burst switched networks fiber loops provide a viable and compact means of contention resolution. For fixed size packets it is known that a basic void-avoiding schedule (VAS) can vastly outperform a more classical pre-reservation algorithm as FCFS. For the setting of a uniform distributed packet size and a restricted buffer size we proposed two novel forward-looking algorithms, WAS and XAS, that, in specific settings, outperform VAS up to 20% in terms of packet loss. This contribution extends the usage and improves the performance of the WAS and XAS algorithms by introducing an additional threshold variable. By optimizing this threshold, the process of selectively delaying packet longer than strictly necessary can be made more or less strict and as such be fitted to each setting. By Monte Carlo simulation it is shown that the resulting T-WAS and T-XAS algorithms are most effective for those instances where the algorithms without threshold can offer no or only limited performance improvement
- …