929 research outputs found

    Generation of Explicit Knowledge from Empirical Data through Pruning of Trainable Neural Networks

    Full text link
    This paper presents a generalized technology of extraction of explicit knowledge from data. The main ideas are 1) maximal reduction of network complexity (not only removal of neurons or synapses, but removal all the unnecessary elements and signals and reduction of the complexity of elements), 2) using of adjustable and flexible pruning process (the pruning sequence shouldn't be predetermined - the user should have a possibility to prune network on his own way in order to achieve a desired network structure for the purpose of extraction of rules of desired type and form), and 3) extraction of rules not in predetermined but any desired form. Some considerations and notes about network architecture and training process and applicability of currently developed pruning techniques and rule extraction algorithms are discussed. This technology, being developed by us for more than 10 years, allowed us to create dozens of knowledge-based expert systems. In this paper we present a generalized three-step technology of extraction of explicit knowledge from empirical data.Comment: 9 pages, The talk was given at the IJCNN '99 (Washington DC, July 1999

    Artificial Neural Network Pruning to Extract Knowledge

    Full text link
    Artificial Neural Networks (NN) are widely used for solving complex problems from medical diagnostics to face recognition. Despite notable successes, the main disadvantages of NN are also well known: the risk of overfitting, lack of explainability (inability to extract algorithms from trained NN), and high consumption of computing resources. Determining the appropriate specific NN structure for each problem can help overcome these difficulties: Too poor NN cannot be successfully trained, but too rich NN gives unexplainable results and may have a high chance of overfitting. Reducing precision of NN parameters simplifies the implementation of these NN, saves computing resources, and makes the NN skills more transparent. This paper lists the basic NN simplification problems and controlled pruning procedures to solve these problems. All the described pruning procedures can be implemented in one framework. The developed procedures, in particular, find the optimal structure of NN for each task, measure the influence of each input signal and NN parameter, and provide a detailed verbal description of the algorithms and skills of NN. The described methods are illustrated by a simple example: the generation of explicit algorithms for predicting the results of the US presidential election.Comment: IJCNN 202

    Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models

    Full text link
    In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their importance scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.Comment: 14 pages, 3 figures, Camera-Ready for EMNLP 2023 Findings (Long Paper

    Improving the Cross-Lingual Generalisation in Visual Question Answering

    Full text link
    While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available

    Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

    Full text link
    With the continuous growth in the number of parameters of transformer-based pretrained language models (PLMs), particularly the emergence of large language models (LLMs) with billions of parameters, many natural language processing (NLP) tasks have demonstrated remarkable success. However, the enormous size and computational demands of these models pose significant challenges for adapting them to specific downstream tasks, especially in environments with limited computational resources. Parameter Efficient Fine-Tuning (PEFT) offers an effective solution by reducing the number of fine-tuning parameters and memory usage while achieving comparable performance to full fine-tuning. The demands for fine-tuning PLMs, especially LLMs, have led to a surge in the development of PEFT methods, as depicted in Fig. 1. In this paper, we present a comprehensive and systematic review of PEFT methods for PLMs. We summarize these PEFT methods, discuss their applications, and outline future directions. Furthermore, we conduct experiments using several representative PEFT methods to better understand their effectiveness in parameter efficiency and memory efficiency. By offering insights into the latest advancements and practical applications, this survey serves as an invaluable resource for researchers and practitioners seeking to navigate the challenges and opportunities presented by PEFT in the context of PLMs.Comment: 20 pages, 4 figure
    • …
    corecore