26,932 research outputs found
Capturing human category representations by sampling in deep feature spaces
Understanding how people represent categories is a core problem in cognitive
science. Decades of research have yielded a variety of formal theories of
categories, but validating them with naturalistic stimuli is difficult. The
challenge is that human category representations cannot be directly observed
and running informative experiments with naturalistic stimuli such as images
requires a workable representation of these stimuli. Deep neural networks have
recently been successful in solving a range of computer vision tasks and
provide a way to compactly represent image features. Here, we introduce a
method to estimate the structure of human categories that combines ideas from
cognitive science and machine learning, blending human-based algorithms with
state-of-the-art deep image generators. We provide qualitative and quantitative
results as a proof-of-concept for the method's feasibility. Samples drawn from
human distributions rival those from state-of-the-art generative models in
quality and outperform alternative methods for estimating the structure of
human categories.Comment: 6 pages, 5 figures, 1 table. Accepted as a paper to the 40th Annual
Meeting of the Cognitive Science Society (CogSci 2018
Modeling Human Categorization of Natural Images Using Deep Feature Representations
Over the last few decades, psychologists have developed sophisticated formal
models of human categorization using simple artificial stimuli. In this paper,
we use modern machine learning methods to extend this work into the realm of
naturalistic stimuli, enabling human categorization to be studied over the
complex visual domain in which it evolved and developed. We show that
representations derived from a convolutional neural network can be used to
model behavior over a database of >300,000 human natural image classifications,
and find that a group of models based on these representations perform well,
near the reliability of human judgments. Interestingly, this group includes
both exemplar and prototype models, contrasting with the dominance of exemplar
models in previous work. We are able to improve the performance of the
remaining models by preprocessing neural network representations to more
closely capture human similarity judgments.Comment: 13 pages, 7 figures, 6 tables. Preliminary work presented at CogSci
201
Characterizing the impact of geometric properties of word embeddings on task performance
Analysis of word embedding properties to inform their use in downstream NLP
tasks has largely been studied by assessing nearest neighbors. However,
geometric properties of the continuous feature space contribute directly to the
use of embedding features in downstream models, and are largely unexplored. We
consider four properties of word embedding geometry, namely: position relative
to the origin, distribution of features in the vector space, global pairwise
distances, and local pairwise distances. We define a sequence of
transformations to generate new embeddings that expose subsets of these
properties to downstream models and evaluate change in task performance to
understand the contribution of each property to NLP models. We transform
publicly available pretrained embeddings from three popular toolkits (word2vec,
GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model
linguistic information in the vector space, and extrinsic tasks, which use
vectors as input to machine learning models. We find that intrinsic evaluations
are highly sensitive to absolute position, while extrinsic tasks rely primarily
on local similarity. Our findings suggest that future embedding models and
post-processing techniques should focus primarily on similarity to nearby
points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space
Representations for NLP (RepEval 2019). 7 pages + reference
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
- …