3 research outputs found

    BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer

    Full text link
    Retrieval-based language models are increasingly employed in question-answering tasks. These models search in a corpus of documents for relevant information instead of having all factual knowledge stored in its parameters, thereby enhancing efficiency, transparency, and adaptability. We develop the first Norwegian retrieval-based model by adapting the REALM framework and evaluating it on various tasks. After training, we also separate the language model, which we call the reader, from the retriever components, and show that this can be fine-tuned on a range of downstream tasks. Results show that retrieval augmented language modeling improves the reader's performance on extractive question-answering, suggesting that this type of training improves language models' general ability to use context and that this does not happen at the expense of other abilities such as part-of-speech tagging, dependency parsing, named entity recognition, and lemmatization. Code, trained models, and data are made publicly available.Comment: Accepted for NoDaLiDa 2023, main conferenc

    To prune or not to prune: Exploring the effects of nodes in neural networks

    No full text
    Over the past decade, computational power has become more accessible, as hardware previously only found in research or military computers has become mainstream. These improvements have led to a vastly increased speed in repetitive calculations, allowing for the widespread adoption of machine learning and, especially, of deep neural networks. Whereas previously large neural networks were impossible to use due to limited memory and computer power, they have now become commonplace in many areas. But, with the increase in popularity of deep neural networks came a need to get a better grasp on their internal behavior. As such, research has deepened the understanding of layers that comprise these neural networks. Researchers now not only rely on fully connected layers but also locally connected layers or temporally connected layers. Further, they solved the mystery of deeper models becoming less performant, with residual layers. However, even though our understanding of neural network layers has considerably improved, our understanding of the components of these layers, the nodes, is still lacking. In this thesis, we focus on understanding how individual nodes contribute to a neural network's performance. To this end, we try to classify them using a metric that connects the individual nodes to the loss of the model. Furthermore, we also analyze how removing nodes can affect the neural network. Our underlying assumption is that not all nodes contribute to the neural network's performance. We believe that some nodes are redundant and that these nodes both do not actively contribute to the performance of the model and increase its needed resources. By removing them, we reduce wasted resources and speed up the processing time of the model at the training and inference stage. Our results show that not all nodes contribute to a model's performance. More specifically, some nodes are redundant as they extract the same features as others. Others negatively impact the performance as a whole, while some initially contribute positively to the network, but actually, only correct errors initiated by other nodes. Another important result is the increased stability between models from which we remove nodes. In other words, models of the same architecture trained on the same dataset, but with different initial weights, have a noticeably higher variation in performance before and after node pruning. This can be seen with the pruned convolutional neural network having a 5% reduction in its loss on average and a 19% lower loss variation. While this study makes no claim to provide a full understanding of how nodes impact neural networks, it does suggest promising paths for future works that could enhance our understanding of neural networks
    corecore