3 research outputs found
Convolutional neural network compression for natural language processing
Convolutional neural networks are modern models that are very efficient in
many classification tasks. They were originally created for image processing
purposes. Then some trials were performed to use them in different domains like
natural language processing. The artificial intelligence systems (like humanoid
robots) are very often based on embedded systems with constraints on memory,
power consumption etc. Therefore convolutional neural network because of its
memory capacity should be reduced to be mapped to given hardware. In this
paper, results are presented of compressing the efficient convolutional neural
networks for sentiment analysis. The main steps are quantization and pruning
processes. The method responsible for mapping compressed network to FPGA and
results of this implementation are presented. The described simulations showed
that 5-bit width is enough to have no drop in accuracy from floating point
version of the network. Additionally, significant memory footprint reduction
was achieved (from 85% up to 93%).Comment: 7 pages, 4 figures, 6 table
Retrain or not retrain? -- efficient pruning methods of deep CNN networks
Convolutional neural networks (CNN) play a major role in image processing
tasks like image classification, object detection, semantic segmentation. Very
often CNN networks have from several to hundred stacked layers with several
megabytes of weights. One of the possible methods to reduce complexity and
memory footprint is pruning. Pruning is a process of removing weights which
connect neurons from two adjacent layers in the network. The process of finding
near optimal solution with specified drop in accuracy can be more sophisticated
when DL model has higher number of convolutional layers. In the paper few
approaches based on retraining and no retraining are described and compared
together
Accelerating Natural Language Understanding in Task-Oriented Dialog
Task-oriented dialog models typically leverage complex neural architectures
and large-scale, pre-trained Transformers to achieve state-of-the-art
performance on popular natural language understanding benchmarks. However,
these models frequently have in excess of tens of millions of parameters,
making them impossible to deploy on-device where resource-efficiency is a major
concern. In this work, we show that a simple convolutional model compressed
with structured pruning achieves largely comparable results to BERT on ATIS and
Snips, with under 100K parameters. Moreover, we perform acceleration
experiments on CPUs, where we observe our multi-task model predicts intents and
slots nearly 63x faster than even DistilBERT.Comment: Accepted to ACL 2020 Workshop on NLP for Conversational A