4 research outputs found
Plug-in, Trainable Gate for Streamlining Arbitrary Neural Networks
Architecture optimization, which is a technique for finding an efficient
neural network that meets certain requirements, generally reduces to a set of
multiple-choice selection problems among alternative sub-structures or
parameters. The discrete nature of the selection problem, however, makes this
optimization difficult. To tackle this problem we introduce a novel concept of
a trainable gate function. The trainable gate function, which confers a
differentiable property to discretevalued variables, allows us to directly
optimize loss functions that include non-differentiable discrete values such as
0-1 selection. The proposed trainable gate can be applied to pruning. Pruning
can be carried out simply by appending the proposed trainable gate functions to
each intermediate output tensor followed by fine-tuning the overall model,
using any gradient-based training methods. So the proposed method can jointly
optimize the selection of the pruned channels while fine-tuning the weights of
the pruned model at the same time. Our experimental results demonstrate that
the proposed method efficiently optimizes arbitrary neural networks in various
tasks such as image classification, style transfer, optical flow estimation,
and neural machine translation.Comment: Accepted to AAAI 2020 (Poster
Injecting Logical Constraints into Neural Networks via Straight-Through Estimators
Injecting discrete logical constraints into neural network learning is one of
the main challenges in neuro-symbolic AI. We find that a
straight-through-estimator, a method introduced to train binary neural
networks, could effectively be applied to incorporate logical constraints into
neural network learning. More specifically, we design a systematic way to
represent discrete logical constraints as a loss function; minimizing this loss
using gradient descent via a straight-through-estimator updates the neural
network's weights in the direction that the binarized outputs satisfy the
logical constraints. The experimental results show that by leveraging GPUs and
batch training, this method scales significantly better than existing
neuro-symbolic methods that require heavy symbolic computation for computing
gradients. Also, we demonstrate that our method applies to different types of
neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer
labeled data by learning directly from known constraints.Comment: 27 page