15 research outputs found

    Enriching ImageNet with Human Similarity Judgments and Psychological Embeddings

    Get PDF
    Advances in object recognition flourished in part because of the availability of high-quality datasets and associated benchmarks. However, these benchmarks---such as ILSVRC---are relatively task-specific, focusing predominately on predicting class labels. We introduce a publicly-available dataset that embodies the task-general capabilities of human perception and reasoning. The Human Similarity Judgments extension to ImageNet (ImageNet-HSJ) is composed of human similarity judgments that supplement the ILSVRC validation set. The new dataset supports a range of task and performance metrics, including the evaluation of unsupervised learning algorithms. We demonstrate two methods of assessment: using the similarity judgments directly and using a psychological embedding trained on the similarity judgments. This embedding space contains an order of magnitude more points (i.e., images) than previous efforts based on human judgments. Scaling to the full 50,000 image set was made possible through a selective sampling process that used variational Bayesian inference and model ensembles to sample aspects of the embedding space that were most uncertain. This methodological innovation not only enables scaling, but should also improve the quality of solutions by focusing sampling where it is needed. To demonstrate the utility of ImageNet-HSJ, we used the similarity ratings and the embedding space to evaluate how well several popular models conform to human similarity judgments. One finding is that more complex models that perform better on task-specific benchmarks do not better conform to human semantic judgments. In addition to the human similarity judgments, pre-trained psychological embeddings and code for inferring variational embeddings are made publicly available. Collectively, ImageNet-HSJ assets support the appraisal of internal representations and the development of more human-like models

    Learning as the Unsupervised Alignment of Conceptual Systems

    Get PDF
    Concept induction requires the extraction and naming of concepts from noisy perceptual experience. For supervised approaches, as the number of concepts grows, so does the number of required training examples. Philosophers, psychologists, and computer scientists, have long recognized that children can learn to label objects without being explicitly taught. In a series of computational experiments, we highlight how information in the environment can be used to build and align conceptual systems. Unlike supervised learning, the learning problem becomes easier the more concepts and systems there are to master. The key insight is that each concept has a unique signature within one conceptual system (e.g., images) that is recapitulated in other systems (e.g., text or audio). As predicted, children's early concepts form readily aligned systems.Comment: This is a post-peer-review, pre-copyedit version of an article published in Nature Machine Intelligence. The final authenticated version is available online at: https://doi.org/10.1038/s42256-019-0132-

    System alignment supports cross-domain learning and zero-shot generalisation

    Get PDF
    Recent findings suggest conceptual relationships hold across modalities. For instance, if two concepts occur in similar linguistic contexts, they also likely occur in similar visual contexts. These similarity structures may provide a valuable signal for alignment when learning to map between domains, such as when learning the names of objects. To assess this possibility, we conducted a paired-associate learning experiment in which participants mapped objects that varied on two visual features to locations that varied along two spatial dimensions. We manipulated whether the featural and spatial systems were aligned or misaligned. Although system alignment was not required to complete this supervised learning task, we found that participants learned more efficiently when systems aligned and that aligned systems facilitated zero-shot generalisation. We fit a variety of models to individuals' responses and found that models which included an offline unsupervised alignment mechanism best accounted for human performance. Our results provide empirical evidence that people align entire representation systems to accelerate learning, even when learning seemingly arbitrary associations between two domains

    Signatures of cross-modal alignment in children's early concepts

    Get PDF
    Whether supervised or unsupervised, human and machine learning is usually characterized as event-based. However, learning may also proceed by systems alignment in which mappings are inferred between entire systems, such as visual and linguistic systems. Systems alignment is possible because items that share similar visual contexts, such as a car and a truck, will also tend to share similar linguistic contexts. Because of the mirrored similarity relationships across systems, the visual and linguistic systems can be aligned at some later time absent either input. In a series of simulation studies, we considered whether children's early concepts support systems alignment. We found that children's early concepts are close to optimal for inferring novel concepts through systems alignment, enabling agents to correctly infer more than 85% of visual-word mappings absent supervision. One possible explanation for why children's early concepts support systems alignment is that they are distinguished structurally by their dense semantic neighborhoods. Artificial agents using these structural features to select concepts proved highly effective, both in environments mirroring children's conceptual world and those that exclude the concepts that children commonly acquire. For children, systems alignment and event-based learning likely complement one another. Likewise, artificial systems can benefit from incorporating these developmental principles

    The Costs and Benefits of Goal-Directed Attention in Deep Convolutional Neural Networks

    Get PDF
    People deploy top-down, goal-directed attention to accomplish tasks, such as finding lost keys. By tuning the visual system to relevant information sources, object recognition can become more efficient (a benefit) and more biased toward the target (a potential cost). Motivated by selective attention in categorisation models, we developed a goal-directed attention mechanism that can process naturalistic (photographic) stimuli. Our attention mechanism can be incorporated into any existing deep convolutional neural network (DCNNs). The processing stages in DCNNs have been related to ventral visual stream. In that light, our attentional mechanism incorporates top-down influences from prefrontal cortex (PFC) to support goal-directed behaviour. Akin to how attention weights in categorisation models warp representational spaces, we introduce a layer of attention weights to the mid-level of a DCNN that amplify or attenuate activity to further a goal. We evaluated the attentional mechanism using photographic stimuli, varying the attentional target. We found that increasing goal-directed attention has benefits (increasing hit rates) and costs (increasing false alarm rates). At a moderate level, attention improves sensitivity (i.e., increases d′d^\prime) at only a moderate increase in bias for tasks involving standard images, blended images, and natural adversarial images chosen to fool DCNNs. These results suggest that goal-directed attention can reconfigure general-purpose DCNNs to better suit the current task goal, much like PFC modulates activity along the ventral stream. In addition to being more parsimonious and brain consistent, the mid-level attention approach performed better than a standard machine learning approach for transfer learning, namely retraining the final network layer to accommodate the new task

    Demystifying unsupervised learning: how it helps and hurts

    Get PDF
    Humans and machines rarely have access to explicit external feedback or supervision, yet manage to learn. Most modern machine learning systems succeed because they benefit from unsupervised data. Humans are also expected to benefit and yet, mysteriously, empirical results are mixed. Does unsupervised learning help humans or not? Here, we argue that the mixed results are not conflicting answers to this question, but reflect that humans self-reinforce their predictions in the absence of supervision, which can help or hurt depending on whether predictions and task align. We use this framework to synthesize empirical results across various domains to clarify when unsupervised learning will help or hurt. This provides new insights into the fundamentals of learning with implications for instruction and lifelong learning
    corecore