929 research outputs found
Generation of Explicit Knowledge from Empirical Data through Pruning of Trainable Neural Networks
This paper presents a generalized technology of extraction of explicit
knowledge from data. The main ideas are 1) maximal reduction of network
complexity (not only removal of neurons or synapses, but removal all the
unnecessary elements and signals and reduction of the complexity of elements),
2) using of adjustable and flexible pruning process (the pruning sequence
shouldn't be predetermined - the user should have a possibility to prune
network on his own way in order to achieve a desired network structure for the
purpose of extraction of rules of desired type and form), and 3) extraction of
rules not in predetermined but any desired form. Some considerations and notes
about network architecture and training process and applicability of currently
developed pruning techniques and rule extraction algorithms are discussed. This
technology, being developed by us for more than 10 years, allowed us to create
dozens of knowledge-based expert systems. In this paper we present a
generalized three-step technology of extraction of explicit knowledge from
empirical data.Comment: 9 pages, The talk was given at the IJCNN '99 (Washington DC, July
1999
Artificial Neural Network Pruning to Extract Knowledge
Artificial Neural Networks (NN) are widely used for solving complex problems
from medical diagnostics to face recognition. Despite notable successes, the
main disadvantages of NN are also well known: the risk of overfitting, lack of
explainability (inability to extract algorithms from trained NN), and high
consumption of computing resources. Determining the appropriate specific NN
structure for each problem can help overcome these difficulties: Too poor NN
cannot be successfully trained, but too rich NN gives unexplainable results and
may have a high chance of overfitting. Reducing precision of NN parameters
simplifies the implementation of these NN, saves computing resources, and makes
the NN skills more transparent. This paper lists the basic NN simplification
problems and controlled pruning procedures to solve these problems. All the
described pruning procedures can be implemented in one framework. The developed
procedures, in particular, find the optimal structure of NN for each task,
measure the influence of each input signal and NN parameter, and provide a
detailed verbal description of the algorithms and skills of NN. The described
methods are illustrated by a simple example: the generation of explicit
algorithms for predicting the results of the US presidential election.Comment: IJCNN 202
Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models
In this work, we propose a method that combines two popular research areas by
injecting linguistic structures into pre-trained language models in the
parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel
adapter modules encoding different linguistic structures are combined using a
novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates
are used to determine the importance of these modules at each layer of the
model. To reduce the number of parameters, we first train the model for a fixed
small number of steps before pruning the experts based on their importance
scores. Our experiment results with three different pre-trained models show
that our approach can outperform state-of-the-art PEFT methods with a
comparable number of parameters. In addition, we provide additional analysis to
examine the experts selected by each model at each layer to provide insights
for future studies.Comment: 14 pages, 3 figures, Camera-Ready for EMNLP 2023 Findings (Long
Paper
Improving the Cross-Lingual Generalisation in Visual Question Answering
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment
With the continuous growth in the number of parameters of transformer-based
pretrained language models (PLMs), particularly the emergence of large language
models (LLMs) with billions of parameters, many natural language processing
(NLP) tasks have demonstrated remarkable success. However, the enormous size
and computational demands of these models pose significant challenges for
adapting them to specific downstream tasks, especially in environments with
limited computational resources. Parameter Efficient Fine-Tuning (PEFT) offers
an effective solution by reducing the number of fine-tuning parameters and
memory usage while achieving comparable performance to full fine-tuning. The
demands for fine-tuning PLMs, especially LLMs, have led to a surge in the
development of PEFT methods, as depicted in Fig. 1. In this paper, we present a
comprehensive and systematic review of PEFT methods for PLMs. We summarize
these PEFT methods, discuss their applications, and outline future directions.
Furthermore, we conduct experiments using several representative PEFT methods
to better understand their effectiveness in parameter efficiency and memory
efficiency. By offering insights into the latest advancements and practical
applications, this survey serves as an invaluable resource for researchers and
practitioners seeking to navigate the challenges and opportunities presented by
PEFT in the context of PLMs.Comment: 20 pages, 4 figure
- …