43 research outputs found
Probing Brain Context-Sensitivity with Masked-Attention Generation
Two fundamental questions in neurolinguistics concerns the brain regions that
integrate information beyond the lexical level, and the size of their window of
integration. To address these questions we introduce a new approach named
masked-attention generation. It uses GPT-2 transformers to generate word
embeddings that capture a fixed amount of contextual information. We then
tested whether these embeddings could predict fMRI brain activity in humans
listening to naturalistic text. The results showed that most of the cortex
within the language network is sensitive to contextual information, and that
the right hemisphere is more sensitive to longer contexts than the left.
Masked-attention generation supports previous analyses of context-sensitivity
in the brain, and complements them by quantifying the window size of context
integration per voxel.Comment: 2 pages, 2 figures, CCN 202
Language acquisition: do children and language models follow similar learning stages?
During language acquisition, children follow a typical sequence of learning
stages, whereby they first learn to categorize phonemes before they develop
their lexicon and eventually master increasingly complex syntactic structures.
However, the computational principles that lead to this learning trajectory
remain largely unknown. To investigate this, we here compare the learning
trajectories of deep language models to those of children. Specifically, we
test whether, during its training, GPT-2 exhibits stages of language
acquisition comparable to those observed in children aged between 18 months and
6 years. For this, we train 48 GPT-2 models from scratch and evaluate their
syntactic and semantic abilities at each training step, using 96 probes curated
from the BLiMP, Zorro and BIG-Bench benchmarks. We then compare these
evaluations with the behavior of 54 children during language production. Our
analyses reveal three main findings. First, similarly to children, the language
models tend to learn linguistic skills in a systematic order. Second, this
learning scheme is parallel: the language tasks that are learned last improve
from the very first training steps. Third, some - but not all - learning stages
are shared between children and these language models. Overall, these results
shed new light on the principles of language acquisition, and highlight
important divergences in how humans and modern algorithms learn to process
natural language.Comment: Accepted to ACL 2023. *Equal Contributio
The emergence of number and syntax units in LSTM language models
Recent work has shown that LSTMs trained on a generic language modeling
objective capture syntax-sensitive generalizations such as long-distance number
agreement. We have however no mechanistic understanding of how they accomplish
this remarkable feat. Some have conjectured it depends on heuristics that do
not truly take hierarchical structure into account. We present here a detailed
study of the inner mechanics of number tracking in LSTMs at the single neuron
level. We discover that long-distance number information is largely managed by
two `number units'. Importantly, the behaviour of these units is partially
controlled by other units independently shown to track syntactic structure. We
conclude that LSTMs are, to some extent, implementing genuinely syntactic
processing mechanisms, paving the way to a more general understanding of
grammatical encoding in LSTMs.Comment: To appear in Proceedings of NAACL, Minneapolis, MN, 201
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps
International audienceNeural Language Models (NLMs) have made tremendous advances during the last years, achieving impressive performance on various linguistic tasks. Capitalizing on this, studies in neuroscience have started to use NLMs to study neural activity in the human brain during language processing. However, many questions remain unanswered regarding which factors determine the ability of a neural language model to capture brain activity (aka its 'brain score'). Here, we make first steps in this direction and examine the impact of test loss, training corpus and model architecture (comparing GloVe, LSTM, GPT-2 and BERT), on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that (1) untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words, with the untrained LSTM outperforming the transformerbased models, being less impacted by the effect of context; (2) that training NLP models improves brain scores in the same brain regions irrespective of the model's architecture; (3) that Perplexity (test loss) is not a good predictor of brain score; (4) that training data have a strong influence on the outcome and, notably, that off-the-shelf models may lack statistical power to detect brain activations. Overall, we outline the impact of modeltraining choices, and suggest good practices for future studies aiming at explaining the human language system using neural language models
Limitations of conventional drinking water technologies in pollutant removal
This chapter gives an overview of the more traditional drinking water treatment from ground and surface waters. Water is treated to meet the objectives of drinking water quality and standards. Water treatment and water quality are therefore closely connected. The objectives for water treatment are to prevent acute diseases by exposure to pathogens, to prevent long-term adverse health effects by exposure to chemicals and micropollutants, and finally to create a drinking water that is palatable and is conditioned in such a way that transport from the treatment works to the customer will not lead to quality deterioration. Traditional treatment technologies as described in this chapter are mainly designed to remove macro parameters such as suspended solids, natural organic matter, dissolved iron and manganese, etc. The technologies have however only limited performance for removal of micropollutants. Advancing analytical technologies and increased and changing use of compounds however show strong evidence of new and emerging threats to drinking water quality. Therefore, more advanced treatment technologies are required.</p
Modelling human choices: MADeM and decision‑making
Research supported by FAPESP 2015/50122-0 and DFG-GRTK 1740/2. RP and AR are also part of the Research, Innovation and Dissemination Center for Neuromathematics FAPESP grant (2013/07699-0). RP is supported by a FAPESP scholarship (2013/25667-8). ACR is partially supported by a CNPq fellowship (grant 306251/2014-0)