88 research outputs found
γγο½°γ« γ γγ«γγ°γ³ γ γ³γγ₯γγγ£ ο½₯ γγ©γ¬γΉγγͺο½° γγ€γ γ·γ£γ«γ€ γγͺγ±γ« γ³γγ₯γγγ£ ο½₯ γγ©γ¬γΉγγͺο½° γ γ½γ γ¨γ€γγ§γ¦
PDF/A formatsAccess: via World Wide Webζ±δΊ¬ε€ε½θͺ倧ε¦ε€§ε¦ι’η·εε½ιε¦η η©Άη§ε士 (ε¦θ‘) θ«ζ (2017εΉ΄5ζ)Author's thesis (Ph.D)--Tokyo University of Foreign Studies, 2017εη²η¬¬228ε·"A dissertation submitted to the Graduate School of Global Studies in partial fulfillment of the Requirments for the Degree of Doctor of philosophy in Area and International Studies. Supervised by prof. Okada Akito"Bibliography: p. 140-161Summary in English and Japaneseζ±δΊ¬ε€ε½θͺε€§ε¦ (Tokyo University of Foreign Studies)ε士 (ε¦θ‘
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
The recent success of multiple neural architectures like CNNs, Transformers,
and MLP-Mixers motivated us to look for similarities and differences between
them. We found that these architectures can be interpreted through the lens of
a general concept of dimension mixing. Research on coupling flows and the
butterfly transform shows that partial and hierarchical signal mixing schemes
are sufficient for efficient and expressive function approximation. In this
work, we study group-wise sparse, non-linear, multi-layered and learnable
mixing schemes of inputs and find that they are complementary to many standard
neural architectures. Following our observations and drawing inspiration from
the Fast Fourier Transform, we generalize Butterfly Structure to use non-linear
mixer function allowing for MLP as mixing function called Butterfly MLP. We
were also able to mix along sequence dimension for Transformer-based
architectures called Butterfly Attention. Experiments on CIFAR and LRA datasets
demonstrate that the proposed Non-Linear Butterfly Mixers are efficient and
scale well when the host architectures are used as mixing function.
Additionally, we propose Patch-Only MLP-Mixer for processing spatial 2D signals
demonstrating a different dimension mixing strategy.Comment: 11 pages, 4 figures, 7 table
Importance Estimation with Random Gradient for Neural Network Pruning
Global Neuron Importance Estimation is used to prune neural networks for
efficiency reasons. To determine the global importance of each neuron or
convolutional kernel, most of the existing methods either use activation or
gradient information or both, which demands abundant labelled examples. In this
work, we use heuristics to derive importance estimation similar to Taylor First
Order (TaylorFO) approximation based methods. We name our methods TaylorFO-abs
and TaylorFO-sq. We propose two additional methods to improve these importance
estimation methods. Firstly, we propagate random gradients from the last layer
of a network, thus avoiding the need for labelled examples. Secondly, we
normalize the gradient magnitude of the last layer output before propagating,
which allows all examples to contribute similarly to the importance score. Our
methods with additional techniques perform better than previous methods when
tested on ResNet and VGG architectures on CIFAR-100 and STL-10 datasets.
Furthermore, our method also complements the existing methods and improves
their performances when combined with them.Comment: 7 pages, 2 figures, ICLR 2023 Workshop on Sparsity in Neural
Networks. arXiv admin note: text overlap with arXiv:2306.1320
Input Invex Neural Network
In this paper, we present a novel method to constrain invexity on Neural
Networks (NN). Invex functions ensure every stationary point is global minima.
Hence, gradient descent commenced from any point will lead to the global
minima. Another advantage of invexity on NN is to divide data space locally
into two connected sets with a highly non-linear decision boundary by simply
thresholding the output. To this end, we formulate a universal invex function
approximator and employ it to enforce invexity in NN. We call it Input Invex
Neural Networks (II-NN). We first fit data with a known invex function,
followed by modification with a NN, compare the direction of the gradient and
penalize the direction of gradient on NN if it contradicts with the direction
of reference invex function. In order to penalize the direction of the gradient
we perform Gradient Clipped Gradient Penalty (GC-GP). We applied our method to
the existing NNs for both image classification and regression tasks. From the
extensive empirical and qualitative experiments, we observe that our method
gives the performance similar to ordinary NN yet having invexity. Our method
outperforms linear NN and Input Convex Neural Network (ICNN) with a large
margin. We publish our code and implementation details at github.Comment: 20 page
NepBERTa : Nepali Language Model Trained in a Large Corpus
We would like to thank Googleβs TPU Research Cloud program for providing us with free and unlimited usage of TPU v3-128 for 90 days. It would not have been possible without the continuous support and response of the TRC team.Publisher PD
- β¦