2,395 research outputs found
Shades of meaning: Uncovering the geometry of ambiguous word representations through contextualised language models
Lexical ambiguity presents a profound and enduring challenge to the language
sciences. Researchers for decades have grappled with the problem of how
language users learn, represent and process words with more than one meaning.
Our work offers new insight into psychological understanding of lexical
ambiguity through a series of simulations that capitalise on recent advances in
contextual language models. These models have no grounded understanding of the
meanings of words at all; they simply learn to predict words based on the
surrounding context provided by other words. Yet, our analyses show that their
representations capture fine-grained meaningful distinctions between
unambiguous, homonymous, and polysemous words that align with lexicographic
classifications and psychological theorising. These findings provide
quantitative support for modern psychological conceptualisations of lexical
ambiguity and raise new challenges for understanding of the way that contextual
information shapes the meanings of words across different timescales
How sketches work: a cognitive theory for improved system design
Evidence is presented that in the early stages of design or composition the
mental processes used by artists for visual invention require a different type of
support from those used for visualising a nearly complete object. Most research
into machine visualisation has as its goal the production of realistic images which
simulate the light pattern presented to the retina by real objects. In contrast sketch
attributes preserve the results of cognitive processing which can be used
interactively to amplify visual thought. The traditional attributes of sketches
include many types of indeterminacy which may reflect the artist's need to be
"vague".
Drawing on contemporary theories of visual cognition and neuroscience this
study discusses in detail the evidence for the following functions which are better
served by rough sketches than by the very realistic imagery favoured in machine
visualising systems.
1. Sketches are intermediate representational types which facilitate the
mental translation between descriptive and depictive modes of representing visual
thought.
2. Sketch attributes exploit automatic processes of perceptual retrieval and
object recognition to improve the availability of tacit knowledge for visual
invention.
3. Sketches are percept-image hybrids. The incomplete physical attributes
of sketches elicit and stabilise a stream of super-imposed mental images which
amplify inventive thought.
4. By segregating and isolating meaningful components of visual
experience, sketches may assist the user to attend selectively to a limited part of a
visual task, freeing otherwise over-loaded cognitive resources for visual thought.
5. Sequences of sketches and sketching acts support the short term episodic
memory for cognitive actions. This assists creativity, providing voluntary control
over highly practised mental processes which can otherwise become stereotyped.
An attempt is made to unite the five hypothetical functions. Drawing on the
Baddeley and Hitch model of working memory, it is speculated that the five
functions may be related to a limited capacity monitoring mechanism which makes
tacit visual knowledge explicitly available for conscious control and manipulation.
It is suggested that the resources available to the human brain for imagining nonexistent
objects are a cultural adaptation of visual mechanisms which evolved in
early hominids for responding to confusing or incomplete stimuli from immediately
present objects and events. Sketches are cultural inventions which artificially
mimic aspects of such stimuli in order to capture these shared resources for the
different purpose of imagining objects which do not yet exist.
Finally the implications of the theory for the design of improved machine
systems is discussed. The untidy attributes of traditional sketches are revealed to
include cultural inventions which serve subtle cognitive functions. However
traditional media have many short-comings which it should be possible to correct
with new technology. Existing machine systems for sketching tend to imitate nonselectively
the media bound properties of sketches without regard to the functions
they serve. This may prove to be a mistake. It is concluded that new system
designs are needed in which meaningfully structured data and specialised imagery
amplify without interference or replacement the impressive but limited creative
resources of the visual brain
Training dynamics of neural language models
Why do artificial neural networks model language so well? We claim that in order to answer this question and understand the biases that lead to such high performing language models---and all models that handle language---we must analyze the training process. For decades, linguists have used the tools of developmental linguistics to study human bias towards linguistic structure. Similarly, we wish to consider a neural network's training dynamics, i.e., the analysis of training in practice and the study of why our optimization methods work when applied. This framing shows us how structural patterns and linguistic properties are gradually built up, revealing more about why LSTM models learn so effectively on language data.
To explore these questions, we might be tempted to appropriate methods from developmental linguistics, but we do not wish to make cognitive claims, so we avoid analogizing between human and artificial language learners. We instead use mathematical tools designed for investigating language model training dynamics. These tools can take advantage of crucial differences between child development and model training: we have access to activations, weights, and gradients in a learning model, and can manipulate learning behavior directly or by perturbing inputs. While most research in training dynamics has focused on vision tasks, language offers direct annotation of its well-documented and intuitive latent hierarchical structures (e.g., syntax and semantics) and is therefore an ideal domain for exploring the effect of training dynamics on the representation of such structure.
Focusing on LSTM models, we investigate the natural sparsity of gradients and activations, finding that word representations are focused on just a few neurons late in training. Similarity analysis reveals how word embeddings learned for different tasks are highly similar at the beginning of training, but gradually become task-specific. Using synthetic data and measuring feature interactions, we also discover that hierarchical representations in LSTMs may be a result of their learning strategy: they tend to build new trees out of familiar phrases, by mingling together the meaning of constituents so they depend on each other. These discoveries constitute just a few possible explanations for how LSTMs learn generalized language representations, with further theories on more architectures to be uncovered by the growing field of NLP training dynamics
Recommended from our members
Sociolinguistically Driven Approaches for Just Natural Language Processing
Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and indeed they can be harmful; NLP systems reproduce stereotypes, prevent speakers of non-standard language varieties from participating fully in public discourse, and re-inscribe historical patterns of linguistic stigmatization and discrimination. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work on NLP, fairness and justice in machine learning, and the relationships between language and social justice. In this thesis, we propose to address two questions at this intersection: i) How can we conceptualize harms arising from NLP systems?, and ii) How can we quantify such harms?
We propose the following contributions. First, we contribute a model in order to collect the first large dataset of African American Language (AAL)-like social media text. We use the dataset to quantify the performance of two types of NLP systems, identifying disparities in model performance between Mainstream U.S. English (MUSE)- and AAL-like text. Turning to the landscape of bias in NLP more broadly, we then provide a critical survey of the emerging literature on bias in NLP and identify its limitations. Drawing on work across sociology, sociolinguistics, linguistic anthropology, social psychology, and education, we provide an account of the relationships between language and injustice, propose a taxonomy of harms arising from NLP systems grounded in those relationships, and propose a set of guiding research questions for work on bias in NLP. Finally, we adapt the measurement modeling framework from the quantitative social sciences to effectively evaluate approaches for quantifying bias in NLP systems. We conclude with a discussion of recent work on bias through the lens of style in NLP, raising a set of normative questions for future work
Material Symbols
What is the relation between the material, conventional symbol structures that we encounter in the spoken and written word, and human thought? A common assumption, that structures a wide variety of otherwise competing views, is that the way in which these material, conventional symbol-structures do their work is by being translated into some kind of content-matching inner code. One alternative to this view is the tempting but thoroughly elusive idea that we somehow think in some natural language (such as English). In the present treatment I explore a third option, which I shall call the “complementarity” view of language. According to this third view the actual symbol structures of a given language add cognitive value by complementing (without being replicated by) the more basic modes of operation and representation endemic to the biological brain. The “cognitive bonus” that language brings is, on this model, not to be cashed out either via the ultimately mysterious notion of “thinking in a given natural language” or via some process of exhaustive translation into another inner code. Instead, we should try to think in terms of a kind of coordination dynamics in which the forms and structures of a language qua material symbol system play a key and irreducible role. Understanding language as a complementary cognitive resource is, I argue, an important part of the much larger project (sometimes glossed in terms of the “extended mind”) of understanding human cognition as essentially and multiply hybrid: as involving a complex interplay between internal biological resources and external non-biological resources
- …