Search CORE

1,518 research outputs found

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

Author: Akata Zeynep
Schiele Bernt
Sharma Saurabh
Xian Yongqin
Publication venue
Publication date: 01/01/2019
Field of study

When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

International Migration, Integration and Social Cohesion online publications

Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

Author: Bouman Katherine L.
Freeman William T.
Wu Jiajun
Xue Tianfan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/05/2019
Field of study

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first two authors contributed equally to this work. Project page: http://visualdynamics.csail.mit.ed

arXiv.org e-Print Archive

DSpace@MIT

Caltech Authors

Analysis of Student Behaviour in Habitable Worlds Using Continuous Representation Visualization

Author: Horodyskyj Lev
Pardos Zachary A.
Publication venue
Publication date: 25/12/2018
Field of study

We introduce a novel approach to visualizing temporal clickstream behaviour in the context of a degree-satisfying online course, Habitable Worlds, offered through Arizona State University. The current practice for visualizing behaviour within a digital learning environment has been to generate plots based on hand engineered or coded features using domain knowledge. While this approach has been effective in relating behaviour to known phenomena, features crafted from domain knowledge are not likely well suited to make unfamiliar phenomena salient and thus can preclude discovery. We introduce a methodology for organically surfacing behavioural regularities from clickstream data, conducting an expert in-the-loop hyperparameter search, and identifying anticipated as well as newly discovered patterns of behaviour. While these visualization techniques have been used before in the broader machine learning community to better understand neural networks and relationships between word vectors, we apply them to online behavioural learner data and go a step further; exploring the impact of the parameters of the model on producing tangible, non-trivial observations of behaviour that are suggestive of pedagogical improvement to the course designers and instructors. The methodology introduced in this paper led to an improved understanding of passing and non-passing student behaviour in the course and is widely applicable to other datasets of clickstream activity where investigators and stakeholders wish to organically surface principal patterns of behaviour

arXiv.org e-Print Archive

eScholarship - University of California

Neural View-Interpolation for Sparse Light Field Video

Author: Bemana M.
Myszkowski K.
Ritschel T.
Seidel H.
Publication venue
Publication date: 01/01/2019
Field of study

We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution

MPG.PuRe

Sequential Modeling Enables Scalable Learning for Large Vision Models

Author: Bai Yutong
Bar Amir
Darrell Trevor
Efros Alexei A
Geng Xinyang
Malik Jitendra
Mangalam Karttikeya
Yuille Alan
Publication venue
Publication date: 01/12/2023
Field of study

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.Comment: Website: https://yutongbai.com/lvm.htm

arXiv.org e-Print Archive

A Framework for Creative Visualization-Opportunities Workshops

Author: Dykes J.
Goodwin S.
Jones S.
Kerzner E.
Meyer M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/08/2018
Field of study

Applied visualization researchers often work closely with domain collaborators to explore new and useful applications of visualization. The early stages of collaborations are typically time consuming for all stakeholders as researchers piece together an understanding of domain challenges from disparate discussions and meetings. A number of recent projects, however, report on the use of creative visualization-opportunities (CVO) workshops to accelerate the early stages of applied work, eliciting a wealth of requirements in a few days of focused work. Yet, there is no established guidance for how to use such workshops effectively. In this paper, we present the results of two-year collaboration in which we analyzed the use of 17 workshops in 10 visualization contexts. Its primary contribution is a framework for CVO workshops that 1) identifies a process model for using workshops; 2) describes a structure of what happens within effective workshops; 3) recommends 25 actionable guidelines for future workshops; and 4) presents an example workshop and workshop methods. The creation of this framework exemplifies the use of critical reflection to learn about visualization in practice from diverse studies and experience

arXiv.org e-Print Archive

City Research Online

Crossref

RMIT Research Repository