15 research outputs found
Enriching ImageNet with Human Similarity Judgments and Psychological Embeddings
Advances in object recognition flourished in part because of the availability
of high-quality datasets and associated benchmarks. However, these
benchmarks---such as ILSVRC---are relatively task-specific, focusing
predominately on predicting class labels. We introduce a publicly-available
dataset that embodies the task-general capabilities of human perception and
reasoning. The Human Similarity Judgments extension to ImageNet (ImageNet-HSJ)
is composed of human similarity judgments that supplement the ILSVRC validation
set. The new dataset supports a range of task and performance metrics,
including the evaluation of unsupervised learning algorithms. We demonstrate
two methods of assessment: using the similarity judgments directly and using a
psychological embedding trained on the similarity judgments. This embedding
space contains an order of magnitude more points (i.e., images) than previous
efforts based on human judgments. Scaling to the full 50,000 image set was made
possible through a selective sampling process that used variational Bayesian
inference and model ensembles to sample aspects of the embedding space that
were most uncertain. This methodological innovation not only enables scaling,
but should also improve the quality of solutions by focusing sampling where it
is needed. To demonstrate the utility of ImageNet-HSJ, we used the similarity
ratings and the embedding space to evaluate how well several popular models
conform to human similarity judgments. One finding is that more complex models
that perform better on task-specific benchmarks do not better conform to human
semantic judgments. In addition to the human similarity judgments, pre-trained
psychological embeddings and code for inferring variational embeddings are made
publicly available. Collectively, ImageNet-HSJ assets support the appraisal of
internal representations and the development of more human-like models
Learning as the Unsupervised Alignment of Conceptual Systems
Concept induction requires the extraction and naming of concepts from noisy
perceptual experience. For supervised approaches, as the number of concepts
grows, so does the number of required training examples. Philosophers,
psychologists, and computer scientists, have long recognized that children can
learn to label objects without being explicitly taught. In a series of
computational experiments, we highlight how information in the environment can
be used to build and align conceptual systems. Unlike supervised learning, the
learning problem becomes easier the more concepts and systems there are to
master. The key insight is that each concept has a unique signature within one
conceptual system (e.g., images) that is recapitulated in other systems (e.g.,
text or audio). As predicted, children's early concepts form readily aligned
systems.Comment: This is a post-peer-review, pre-copyedit version of an article
published in Nature Machine Intelligence. The final authenticated version is
available online at: https://doi.org/10.1038/s42256-019-0132-
System alignment supports cross-domain learning and zero-shot generalisation
Recent findings suggest conceptual relationships hold across modalities. For instance, if two concepts occur in similar linguistic contexts, they also likely occur in similar visual contexts. These similarity structures may provide a valuable signal for alignment when learning to map between domains, such as when learning the names of objects. To assess this possibility, we conducted a paired-associate learning experiment in which participants mapped objects that varied on two visual features to locations that varied along two spatial dimensions. We manipulated whether the featural and spatial systems were aligned or misaligned. Although system alignment was not required to complete this supervised learning task, we found that participants learned more efficiently when systems aligned and that aligned systems facilitated zero-shot generalisation. We fit a variety of models to individuals' responses and found that models which included an offline unsupervised alignment mechanism best accounted for human performance. Our results provide empirical evidence that people align entire representation systems to accelerate learning, even when learning seemingly arbitrary associations between two domains
Signatures of cross-modal alignment in children's early concepts
Whether supervised or unsupervised, human and machine learning is usually characterized as event-based. However, learning may also proceed by systems alignment in which mappings are inferred between entire systems, such as visual and linguistic systems. Systems alignment is possible because items that share similar visual contexts, such as a car and a truck, will also tend to share similar linguistic contexts. Because of the mirrored similarity relationships across systems, the visual and linguistic systems can be aligned at some later time absent either input. In a series of simulation studies, we considered whether children's early concepts support systems alignment. We found that children's early concepts are close to optimal for inferring novel concepts through systems alignment, enabling agents to correctly infer more than 85% of visual-word mappings absent supervision. One possible explanation for why children's early concepts support systems alignment is that they are distinguished structurally by their dense semantic neighborhoods. Artificial agents using these structural features to select concepts proved highly effective, both in environments mirroring children's conceptual world and those that exclude the concepts that children commonly acquire. For children, systems alignment and event-based learning likely complement one another. Likewise, artificial systems can benefit from incorporating these developmental principles
The Costs and Benefits of Goal-Directed Attention in Deep Convolutional Neural Networks
People deploy top-down, goal-directed attention to accomplish tasks, such as
finding lost keys. By tuning the visual system to relevant information sources,
object recognition can become more efficient (a benefit) and more biased toward
the target (a potential cost). Motivated by selective attention in
categorisation models, we developed a goal-directed attention mechanism that
can process naturalistic (photographic) stimuli. Our attention mechanism can be
incorporated into any existing deep convolutional neural network (DCNNs). The
processing stages in DCNNs have been related to ventral visual stream. In that
light, our attentional mechanism incorporates top-down influences from
prefrontal cortex (PFC) to support goal-directed behaviour. Akin to how
attention weights in categorisation models warp representational spaces, we
introduce a layer of attention weights to the mid-level of a DCNN that amplify
or attenuate activity to further a goal. We evaluated the attentional mechanism
using photographic stimuli, varying the attentional target. We found that
increasing goal-directed attention has benefits (increasing hit rates) and
costs (increasing false alarm rates). At a moderate level, attention improves
sensitivity (i.e., increases ) at only a moderate increase in bias
for tasks involving standard images, blended images, and natural adversarial
images chosen to fool DCNNs. These results suggest that goal-directed attention
can reconfigure general-purpose DCNNs to better suit the current task goal,
much like PFC modulates activity along the ventral stream. In addition to being
more parsimonious and brain consistent, the mid-level attention approach
performed better than a standard machine learning approach for transfer
learning, namely retraining the final network layer to accommodate the new
task
Demystifying unsupervised learning: how it helps and hurts
Humans and machines rarely have access to explicit external feedback or supervision, yet manage to learn. Most modern machine learning systems succeed because they benefit from unsupervised data. Humans are also expected to benefit and yet, mysteriously, empirical results are mixed. Does unsupervised learning help humans or not? Here, we argue that the mixed results are not conflicting answers to this question, but reflect that humans self-reinforce their predictions in the absence of supervision, which can help or hurt depending on whether predictions and task align. We use this framework to synthesize empirical results across various domains to clarify when unsupervised learning will help or hurt. This provides new insights into the fundamentals of learning with implications for instruction and lifelong learning