352 research outputs found
Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity
Long Short-Term Memory (LSTM) recurrent networks are frequently used for
tasks involving time-sequential data such as speech recognition. Unlike
previous LSTM accelerators that either exploit spatial weight sparsity or
temporal activation sparsity, this paper proposes a new accelerator called
"Spartus" that exploits spatio-temporal sparsity to achieve ultralow latency
inference. Spatial sparsity is induced using a new Column-Balanced Targeted
Dropout (CBTD) structured pruning method, which produces structured sparse
weight matrices for balanced workloads. The pruned networks running on Spartus
hardware achieve weight sparsity of up to 96% and 94% with negligible accuracy
loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in
LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining
spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access
and associated arithmetic operations. The Spartus architecture is scalable and
supports real-time online speech recognition when implemented on small and
large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024
neurons averages 1 us. Exploiting spatio-temporal sparsity leads to 46X speedup
of Spartus over its theoretical hardware performance to achieve 9.4 TOp/s
effective batch-1 throughput and 1.1 TOp/s/W power efficiency.Comment: Preprint. Under revie
Intrinsic sparse LSTM using structured targeted dropout for efficient hardware inference
Recurrent Neural Networks (RNNs) are useful for speech recognition but their fully-connected structure leads to a large memory footprint, making it difficult to deploy them on resource-constrained embedded systems. Previous structured RNN pruning methods can effectively reduce RNN size; however, it is difficult to find a good balance between high sparsity and high task accuracy or the pruned models only lead to moderate speedup on custom hardware accelerators. This work proposes a novel structured pruning method called Structure Targeted Dropout (STD)-Intrinsic Sparse Structures (ISS) that stochastically drops grouped rows and columns of the weight matrices during training. The compressed networks are equivalent to a smaller dense network, which can be efficiently processed by Graphics Processing Units (GPUs). STD-ISS is evaluated on the TIMIT phone recognition task using Long Short-Term Memory (LSTM) RNNs. It outperforms previous state-of-the-art hardware-friendly methods on both accuracy and compression ratio. STD-ISS achieves a size compression ratio of up to 50× with <1% accuracy loss, leading to a 19.1× speedup on the embedded Jetson Xavier NX GPU platform
Directed diffraction without negative refraction
Using the FDTD method, we investigate the electromagnetic propagation in
two-dimensional photonic crystals, formed by parallel air cylinders in a
dielectric medium. The corresponding frequency band structure is computed using
the standard plane-wave expansion method. It is shown that within partial
bandgaps, waves tend to bend away from the forbidden directions. This
phenomenon perhaps need not be explained in terms of negative refraction or
`superlensing' behavior, contrast to what has been conjectured.Comment: 3 pages, 4 figure
- …