14 research outputs found
Multiclassification of license plate based on deep convolution neural networks
In the classification of license plate there are some challenges such that the different sizes of plate numbers, the plates' background, and the number of the dataset of the plates. In this paper, a multiclass classification model established using deep convolutional neural network (CNN) to classify the license plate for three countries (Armenia, Belarus, Hungary) with the dataset of 600 images as 200 images for each class (160 for training and 40 for validation sets). Because of the small numbers of datasets, a preprocessing on the dataset is performed using pixel normalization and image data augmentation techniques (rotation, horizontal flip, zoom range) to increase the number of datasets. After that, we feed the augmented images into the convolution layer model, which consists of four blocks of convolution layer. For calculating and optimizing the efficiency of the classification model, a categorical cross-entropy and Adam optimizer used with a learning rate was 0.0001. The model's performance showed 99.17% and 97.50% of the training and validation sets accuracies sequentially, with total accuracy of classification is 96.66%. The time of training is lasting for 12 minutes. An anaconda python 3.7 and Keras Tensor flow backend are used
HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks
The behaviors of deep neural networks (DNNs) are notoriously resistant to
human interpretations. In this paper, we propose Hypergradient Data Relevance
Analysis, or HYDRA, which interprets the predictions made by DNNs as effects of
their training data. Existing approaches generally estimate data contributions
around the final model parameters and ignore how the training data shape the
optimization trajectory. By unrolling the hypergradient of test loss w.r.t. the
weights of training data, HYDRA assesses the contribution of training data
toward test data points throughout the training trajectory. In order to
accelerate computation, we remove the Hessian from the calculation and prove
that, under moderate conditions, the approximation error is bounded.
Corroborating this theoretical claim, empirical results indicate the error is
indeed small. In addition, we quantitatively demonstrate that HYDRA outperforms
influence functions in accurately estimating data contribution and detecting
noisy data labels. The source code is available at
https://github.com/cyyever/aaai_hydra_8686
Mnemosyne: Learning to Train Transformers with Transformers
In this work, we propose a new class of learnable optimizers, called
\textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit
attention Transformers that can learn to train entire neural network
architectures, including other Transformers, without any task-specific
optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM
optimizers (also with new feature engineering to mitigate catastrophic
forgetting of LSTMs), (b) can successfully train Transformers while using
simple meta-training strategies that require minimal computational resources,
(c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned
hyper-parameters (often producing top performing models). Furthermore,
Mnemosyne provides space complexity comparable to that of its hand-designed
first-order counterparts, which allows it to scale to training larger sets of
parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a)
fine-tuning a wide range of Vision Transformers (ViTs) from medium-size
architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT
models and (c) soft prompt-tuning large 11B+ T5XXL models. We complement our
results with a comprehensive theoretical analysis of the compact associative
memory used by Mnemosyne which we believe was never done before