352 research outputs found

    Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

    Full text link
    Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this paper proposes a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new Column-Balanced Targeted Dropout (CBTD) structured pruning method, which produces structured sparse weight matrices for balanced workloads. The pruned networks running on Spartus hardware achieve weight sparsity of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 us. Exploiting spatio-temporal sparsity leads to 46X speedup of Spartus over its theoretical hardware performance to achieve 9.4 TOp/s effective batch-1 throughput and 1.1 TOp/s/W power efficiency.Comment: Preprint. Under revie

    Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

    Full text link

    Intrinsic sparse LSTM using structured targeted dropout for efficient hardware inference

    Full text link
    Recurrent Neural Networks (RNNs) are useful for speech recognition but their fully-connected structure leads to a large memory footprint, making it difficult to deploy them on resource-constrained embedded systems. Previous structured RNN pruning methods can effectively reduce RNN size; however, it is difficult to find a good balance between high sparsity and high task accuracy or the pruned models only lead to moderate speedup on custom hardware accelerators. This work proposes a novel structured pruning method called Structure Targeted Dropout (STD)-Intrinsic Sparse Structures (ISS) that stochastically drops grouped rows and columns of the weight matrices during training. The compressed networks are equivalent to a smaller dense network, which can be efficiently processed by Graphics Processing Units (GPUs). STD-ISS is evaluated on the TIMIT phone recognition task using Long Short-Term Memory (LSTM) RNNs. It outperforms previous state-of-the-art hardware-friendly methods on both accuracy and compression ratio. STD-ISS achieves a size compression ratio of up to 50× with <1% accuracy loss, leading to a 19.1× speedup on the embedded Jetson Xavier NX GPU platform

    Directed diffraction without negative refraction

    Full text link
    Using the FDTD method, we investigate the electromagnetic propagation in two-dimensional photonic crystals, formed by parallel air cylinders in a dielectric medium. The corresponding frequency band structure is computed using the standard plane-wave expansion method. It is shown that within partial bandgaps, waves tend to bend away from the forbidden directions. This phenomenon perhaps need not be explained in terms of negative refraction or `superlensing' behavior, contrast to what has been conjectured.Comment: 3 pages, 4 figure

    Energy-efficient activity-driven computing architectures for edge intelligence

    Full text link
    • …
    corecore