13 research outputs found
Understanding Learning Dynamics Of Language Models with SVCCA
Research has shown that neural models implicitly encode linguistic features,
but there has been no research showing \emph{how} these encodings arise as the
models are trained. We present the first study on the learning dynamics of
neural language models, using a simple and flexible analysis method called
Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to
compare learned representations across time and across models, without the need
to evaluate directly on annotated data. We probe the evolution of syntactic,
semantic, and topic representations and find that part-of-speech is learned
earlier than topic; that recurrent layers become more similar to those of a
tagger during training; and embedding layers less similar. Our results and
methods could inform better learning algorithms for NLP models, possibly to
incorporate linguistic information more effectively.Comment: Accepted for publication in NAACL 201
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
An Investigation into the Effects of Pre-training Data Distributions for Pathology Report Classification
Pre-trained transformer models have demonstrated success across many natural
language processing (NLP) tasks. In applying these models to the clinical
domain, a prevailing assumption is that pre-training language models from
scratch on large-scale biomedical data results in substantial improvements. We
test this assumption with 4 pathology classification tasks on a corpus of 2907
prostate cancer pathology reports. We evaluate 5 transformer pre-trained models
that are the same size but differ in pre-training corpora. Specifically, we
analyze 3 categories of models: 1)General-domain: BERT and Turing Natural
Language Representation (TNLR) models, which use general corpora for
pre-training, 2)Mixed-domain: BioBERT which is obtained from BERT by including
PubMed abstracts in pre-training and Clinical BioBERT which additionally
includes MIMIC-III clinical notes and 3)Domain-specific: PubMedBERT which is
pre-trained from scratch on PubMed abstracts. We find the mixed-domain and
domain-specific models exhibit faster feature disambiguation during
fine-tuning. However, the domain-specific model, PubMedBERT, can overfit to
minority classes when presented with class imbalance, a common scenario in
pathology report data. At the same time, the mixed-domain models are more
resistant to overfitting. Our findings indicate that the use of general natural
language and domain-specific corpora in pre-training serve complementary
purposes for pathology report classification. The first enables resistance to
overfitting when fine-tuning on an imbalanced dataset while the second allows
for more accurate modelling of the fine-tuning domain. An expert evaluation is
also conducted to reveal common outlier modes of each model. Our results could
inform better fine-tuning practices in the clinical domain, to possibly
leverage the benefits of mixed-domain models for imbalanced downstream
datasets