61 research outputs found

    On the differences between BERT and MT encoder spaces and how to address them in translation tasks

    Get PDF
    Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.Peer reviewe

    Tracking the Traces of Passivization and Negation in Contextualized Representations

    Get PDF
    Contextualized word representations encode rich information about syntax and semantics, alongside specificities of each context of use. While contextual variation does not always reflect actual meaning shifts, it can still reduce the similarity of embeddings for word instances having the same meaning. We explore the imprint of two specific linguistic alternations, namely passivization and negation, on the representations generated by neural models trained with two different objectives: masked language modeling and translation. Our exploration methodology is inspired by an approach previously proposed for removing societal biases from word vectors. We show that passivization and negation leave their traces on the representations, and that neutralizing this information leads to more similar embeddings for words that should preserve their meaning in the transformation. We also find clear differences in how the respective features generalize across datasets.Peer reviewe

    Learning and Using Context on a Humanoid Robot Using Latent Dirichlet Allocation

    Get PDF
    2014 Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob), Genoa, Italy, 13-16 October 2014In this work, we model context in terms of a set of concepts grounded in a robot's sensorimotor interactions with the environment. For this end, we treat context as a latent variable in Latent Dirichlet Allocation, which is widely used in computational linguistics for modeling topics in texts. The flexibility of our approach allows many-to-many relationships between objects and contexts, as well as between scenes and contexts. We use a concept web representation of the perceptions of the robot as a basis for context analysis. The detected contexts of the scene can be used for several cognitive problems. Our results demonstrate that the robot can use learned contexts to improve object recognition and planning.Scientific and Technological Research Council of Turkey (TUBiTAK

    Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

    Full text link
    This paper introduces Bayesian uncertainty modeling using Stochastic Weight Averaging-Gaussian (SWAG) in Natural Language Understanding (NLU) tasks. We apply the approach to standard tasks in natural language inference (NLI) and demonstrate the effectiveness of the method in terms of prediction accuracy and correlation with human annotation disagreements. We argue that the uncertainty representations in SWAG better reflect subjective interpretation and the natural variation that is also present in human language understanding. The results reveal the importance of uncertainty modeling, an often neglected aspect of neural language modeling, in NLU tasks.Comment: NoDaLiDa 2023 camera read

    Learning Context on a Humanoid Robot using Incremental Latent Dirichlet Allocation

    Get PDF
    In this article, we formalize and model context in terms of a set of concepts grounded in the sensorimotor interactions of a robot. The concepts are modeled as a web using Markov Random Field, inspired from the concept web hypothesis for representing concepts in humans. On this concept web, we treat context as a latent variable of Latent Dirichlet Allocation (LDA), which is a widely-used method in computational linguistics for modeling topics in texts. We extend the standard LDA method in order to make it incremental so that (i) it does not re-learn everything from scratch given new interactions (i.e., it is online) and (ii) it can discover and add a new context into its model when necessary. We demonstrate on the iCub platform that, partly owing to modeling context on top of the concept web, our approach is adaptive, online and robust: It is adaptive and online since it can learn and discover a new context from new interactions. It is robust since it is not affected by irrelevant stimuli and it can discover contexts after a few interactions only. Moreover, we show how to use the context learned in such a model for two important tasks: object recognition and planning.Scientific and Technological Research Council of TurkeyMarie Curie International Outgoing Fellowship titled “Towards Better Robot Manipulation: Improvement through Interaction

    Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

    Get PDF
    In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe the dataset construction and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of discretized prosodic prominence. We show that pre-trained contextualized word representations from BERT outperform the other models even with less than 10% of the training data. Finally we discuss the dataset in light of the results and point to future research and plans for further improving both the dataset and methods of predicting prosodic prominence from text. The dataset and the code for the models are publicly available.Peer reviewe

    Decoding Emotional Valence from Electroencephalographic Rhythmic Activity

    Get PDF
    We attempt to decode emotional valence from electroencephalographic rhythmic activity in a naturalistic setting. We employ a data-driven method developed in a previous study, Spectral Linear Discriminant Analysis, to discover the relationships between the classification task and independent neuronal sources, optimally utilizing multiple frequency bands. A detailed investigation of the classifier provides insight into the neuronal sources related with emotional valence, and the individual differences of the subjects in processing emotions. Our findings show: (1) sources whose locations are similar across subjects are consistently involved in emotional responses, with the involvement of parietal sources being especially significant, and (2) even though the locations of the involved neuronal sources are consistent, subjects can display highly varying degrees of valence-related EEG activity in the sources.Peer reviewe
    corecore