Search CORE

22 research outputs found

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Author: Andrew
Belinkov Yonatan
Comon Pierre
Conneau Alexis
Dehghani Mostafa
Devlin Jacob
Dosilovic F. K.
Hupkes Dieuwke
Jain Sarthak
Liu Nelson F.
Mikolov Tomas
Nagamine Tasha
Questions
Seo Min Joon
Tenney Ian
van der Maaten Laurens
Vaswani Ashish
Voorhees Ellen
Weston Jason
Yang Zhilin
Zadeh Lotfi A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2019
Field of study

Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers.Comment: Accepted at CIKM 201

arXiv.org e-Print Archive

Crossref

Learning Japanese-English Bilingual Word Embeddings by Using Language Specificity

Author: Artetxe M.
Cao H.
Conneau A.
Gouws S.
Klementiev A.
Le Q. V.
Smith S. L.
Song Y.
Soyer H.
Turian J.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Decoding the neural representation of story meanings across languages

Author: Andersson JLR
Conneau A
Dai AM
Johnson‐Laird PN
Kintsch W
Lau JH
Ma M
Mikolov T
Pedregosa F
Yogatama D
Zadbood A
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Machine learning to extract communication and history‐taking skills in OSCE transcripts

Author: Beth Barron
Conneau A
Glenn W. Jones
Jonathan Amiel
Kai A. Jones
Karan H. Jani
Lee LH
Lloyd M
Majumder MAA
Nelson EL
Noémie Elhadad
Ryan P
Subramanian S
Xiong W
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Research and Analysis in Fine-grained Sentiment of Film Reviews Based on Deep Learning

Author: Bengio Y
Bottou L.
Bridle J S
Conneau A
Glorot X.
Jin Zheng
Kim Y.
Leung K M
Limin Zheng
Liu P.
Lu Yang
Mikolov T.
Paltoglou G.
Pang B.
Wang J. H.
Wang X.
Zhang X.
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref