191 research outputs found
ESPnet-ONNX: Bridging a Gap Between Research and Production
In the field of deep learning, researchers often focus on inventing novel
neural network models and improving benchmarks. In contrast, application
developers are interested in making models suitable for actual products, which
involves optimizing a model for faster inference and adapting a model to
various platforms (e.g., C++ and Python). In this work, to fill the gap between
the two, we establish an effective procedure for optimizing a PyTorch-based
research-oriented model for deployment, taking ESPnet, a widely used toolkit
for speech processing, as an instance. We introduce different techniques to
ESPnet, including converting a model into an ONNX format, fusing nodes in a
graph, and quantizing parameters, which lead to approximately 1.3-2
speedup in various tasks (i.e., ASR, TTS, speech translation, and spoken
language understanding) while keeping its performance without any additional
training. Our ESPnet-ONNX will be publicly available at
https://github.com/espnet/espnet_onnxComment: Accepted to APSIPA ASC 202
Spectral pruning of fully connected layers
Training of neural networks can be reformulated in spectral space, by
allowing eigenvalues and eigenvectors of the network to act as target of the
optimization instead of the individual weights. Working in this setting, we
show that the eigenvalues can be used to rank the nodes' importance within the
ensemble. Indeed, we will prove that sorting the nodes based on their
associated eigenvalues, enables effective pre- and post-processing pruning
strategies to yield massively compacted networks (in terms of the number of
composing neurons) with virtually unchanged performance. The proposed methods
are tested for different architectures, with just a single or multiple hidden
layers, and against distinct classification tasks of general interest.Comment: 16 pages, 11 figures. Sections rearranged in v
- …