1,518 research outputs found
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
When labeled training data is scarce, a promising data augmentation approach
is to generate visual features of unknown classes using their attributes. To
learn the class conditional distribution of CNN features, these models rely on
pairs of image features and class attributes. Hence, they can not make use of
the abundance of unlabeled data samples. In this paper, we tackle any-shot
learning problems i.e. zero-shot and few-shot, in a unified feature generating
framework that operates in both inductive and transductive learning settings.
We develop a conditional generative model that combines the strength of VAE and
GANs and in addition, via an unconditional discriminator, learns the marginal
feature distribution of unlabeled images. We empirically show that our model
learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA
and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e.
inductive and transductive (generalized) zero- and few-shot learning settings.
We also demonstrate that our learned features are interpretable: we visualize
them by inverting them back to the pixel space and we explain them by
generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
Analysis of Student Behaviour in Habitable Worlds Using Continuous Representation Visualization
We introduce a novel approach to visualizing temporal clickstream behaviour
in the context of a degree-satisfying online course, Habitable Worlds, offered
through Arizona State University. The current practice for visualizing
behaviour within a digital learning environment has been to generate plots
based on hand engineered or coded features using domain knowledge. While this
approach has been effective in relating behaviour to known phenomena, features
crafted from domain knowledge are not likely well suited to make unfamiliar
phenomena salient and thus can preclude discovery. We introduce a methodology
for organically surfacing behavioural regularities from clickstream data,
conducting an expert in-the-loop hyperparameter search, and identifying
anticipated as well as newly discovered patterns of behaviour. While these
visualization techniques have been used before in the broader machine learning
community to better understand neural networks and relationships between word
vectors, we apply them to online behavioural learner data and go a step
further; exploring the impact of the parameters of the model on producing
tangible, non-trivial observations of behaviour that are suggestive of
pedagogical improvement to the course designers and instructors. The
methodology introduced in this paper led to an improved understanding of
passing and non-passing student behaviour in the course and is widely
applicable to other datasets of clickstream activity where investigators and
stakeholders wish to organically surface principal patterns of behaviour
Neural View-Interpolation for Sparse Light Field Video
We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution
Sequential Modeling Enables Scalable Learning for Large Vision Models
We introduce a novel sequential modeling approach which enables learning a
Large Vision Model (LVM) without making use of any linguistic data. To do this,
we define a common format, "visual sentences", in which we can represent raw
images and videos as well as annotated data sources such as semantic
segmentations and depth reconstructions without needing any meta-knowledge
beyond the pixels. Once this wide variety of visual data (comprising 420
billion tokens) is represented as sequences, the model can be trained to
minimize a cross-entropy loss for next token prediction. By training across
various scales of model architecture and data diversity, we provide empirical
evidence that our models scale effectively. Many different vision tasks can be
solved by designing suitable visual prompts at test time.Comment: Website: https://yutongbai.com/lvm.htm
A Framework for Creative Visualization-Opportunities Workshops
Applied visualization researchers often work closely with domain collaborators to explore new and useful applications of visualization. The early stages of collaborations are typically time consuming for all stakeholders as researchers piece together an understanding of domain challenges from disparate discussions and meetings. A number of recent projects, however, report on the use of creative visualization-opportunities (CVO) workshops to accelerate the early stages of applied work, eliciting a wealth of requirements in a few days of focused work. Yet, there is no established guidance for how to use such workshops effectively. In this paper, we present the results of two-year collaboration in which we analyzed the use of 17 workshops in 10 visualization contexts. Its primary contribution is a framework for CVO workshops that 1) identifies a process model for using workshops; 2) describes a structure of what happens within effective workshops; 3) recommends 25 actionable guidelines for future workshops; and 4) presents an example workshop and workshop methods. The creation of this framework exemplifies the use of critical reflection to learn about visualization in practice from diverse studies and experience
- …