3 research outputs found
BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer
Retrieval-based language models are increasingly employed in
question-answering tasks. These models search in a corpus of documents for
relevant information instead of having all factual knowledge stored in its
parameters, thereby enhancing efficiency, transparency, and adaptability. We
develop the first Norwegian retrieval-based model by adapting the REALM
framework and evaluating it on various tasks. After training, we also separate
the language model, which we call the reader, from the retriever components,
and show that this can be fine-tuned on a range of downstream tasks. Results
show that retrieval augmented language modeling improves the reader's
performance on extractive question-answering, suggesting that this type of
training improves language models' general ability to use context and that this
does not happen at the expense of other abilities such as part-of-speech
tagging, dependency parsing, named entity recognition, and lemmatization. Code,
trained models, and data are made publicly available.Comment: Accepted for NoDaLiDa 2023, main conferenc
To prune or not to prune: Exploring the effects of nodes in neural networks
Over the past decade, computational power has become more accessible, as hardware previously only found in research or military computers has become mainstream. These improvements have led to a vastly increased speed in repetitive calculations, allowing for the widespread adoption of machine learning and, especially, of deep neural networks. Whereas previously large neural networks were impossible to use due to limited memory and computer power, they have now become commonplace in many areas. But, with the increase in popularity of deep neural networks came a need to get a better grasp on their internal behavior. As such, research has deepened the understanding of layers that comprise these neural networks. Researchers now not only rely on fully connected layers but also locally connected layers or temporally connected layers. Further, they solved the mystery of deeper models becoming less performant, with residual layers. However, even though our understanding of neural network layers has considerably improved, our understanding of the components of these layers, the nodes, is still lacking. In this thesis, we focus on understanding how individual nodes contribute to a neural network's performance. To this end, we try to classify them using a metric that connects the individual nodes to the loss of the model. Furthermore, we also analyze how removing nodes can affect the neural network. Our underlying assumption is that not all nodes contribute to the neural network's performance. We believe that some nodes are redundant and that these nodes both do not actively contribute to the performance of the model and increase its needed resources. By removing them, we reduce wasted resources and speed up the processing time of the model at the training and inference stage. Our results show that not all nodes contribute to a model's performance. More specifically, some nodes are redundant as they extract the same features as others. Others negatively impact the performance as a whole, while some initially contribute positively to the network, but actually, only correct errors initiated by other nodes. Another important result is the increased stability between models from which we remove nodes. In other words, models of the same architecture trained on the same dataset, but with different initial weights, have a noticeably higher variation in performance before and after node pruning. This can be seen with the pruned convolutional neural network having a 5% reduction in its loss on average and a 19% lower loss variation. While this study makes no claim to provide a full understanding of how nodes impact neural networks, it does suggest promising paths for future works that could enhance our understanding of neural networks