164,081 research outputs found
VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository
Researchers currently rely on ad hoc datasets to train automated
visualization tools and evaluate the effectiveness of visualization designs.
These exemplars often lack the characteristics of real-world datasets, and
their one-off nature makes it difficult to compare different techniques. In
this paper, we present VizNet: a large-scale corpus of over 31 million datasets
compiled from open data repositories and online visualization galleries. On
average, these datasets comprise 17 records over 3 dimensions and across the
corpus, we find 51% of the dimensions record categorical data, 44%
quantitative, and only 5% temporal. VizNet provides the necessary common
baseline for comparing visualization design techniques, and developing
benchmark models and algorithms for automating visual analysis. To demonstrate
VizNet's utility as a platform for conducting online crowdsourced experiments
at scale, we replicate a prior study assessing the influence of user task and
data distribution on visual encoding effectiveness, and extend it by
considering an additional task: outlier detection. To contend with running such
studies at scale, we demonstrate how a metric of perceptual effectiveness can
be learned from experimental results, and show its predictive power across test
datasets.Comment: CHI'1
Linear dynamical neural population models through nonlinear embeddings
A body of recent work in modeling neural activity focuses on recovering
low-dimensional latent features that capture the statistical structure of
large-scale neural populations. Most such approaches have focused on linear
generative models, where inference is computationally tractable. Here, we
propose fLDS, a general class of nonlinear generative models that permits the
firing rate of each neuron to vary as an arbitrary smooth function of a latent,
linear dynamical state. This extra flexibility allows the model to capture a
richer set of neural variability than a purely linear model, but retains an
easily visualizable low-dimensional latent space. To fit this class of
non-conjugate models we propose a variational inference scheme, along with a
novel approximate posterior capable of capturing rich temporal correlations
across time. We show that our techniques permit inference in a wide class of
generative models.We also show in application to two neural datasets that,
compared to state-of-the-art neural population models, fLDS captures a much
larger proportion of neural variability with a small number of latent
dimensions, providing superior predictive performance and interpretability.Comment: NIPS 201
BAR: Bayesian Activity Recognition using variational inference
Uncertainty estimation in deep neural networks is essential for designing
reliable and robust AI systems. Applications such as video surveillance for
identifying suspicious activities are designed with deep neural networks
(DNNs), but DNNs do not provide uncertainty estimates. Capturing reliable
uncertainty estimates in safety and security critical applications will help to
establish trust in the AI system. Our contribution is to apply Bayesian deep
learning framework to visual activity recognition application and quantify
model uncertainty along with principled confidence. We utilize the stochastic
variational inference technique while training the Bayesian DNNs to infer the
approximate posterior distribution around model parameters and perform Monte
Carlo sampling on the posterior of model parameters to obtain the predictive
distribution. We show that the Bayesian inference applied to DNNs provide
reliable confidence measures for visual activity recognition task as compared
to conventional DNNs. We also show that our method improves the visual activity
recognition precision-recall AUC by 6.2% compared to non-Bayesian baseline. We
evaluate our models on Moments-In-Time (MiT) activity recognition dataset by
selecting a subset of in- and out-of-distribution video samples
A Unified Treatment of Predictive Model Comparison
The predictive performance of any inferential model is critical to its
practical success, but quantifying predictive performance is a subtle
statistical problem. In this paper I show how the natural structure of any
inferential problem defines a canonical measure of relative predictive
performance and then demonstrate how approximations of this measure yield many
of the model comparison techniques popular in statistics and machine learning.Comment: 20 pages, 11 figure
ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections
Deep neural networks have become ubiquitous for applications related to
visual recognition and language understanding tasks. However, it is often
prohibitive to use typical neural networks on devices like mobile phones or
smart watches since the model sizes are huge and cannot fit in the limited
memory available on such devices. While these devices could make use of machine
learning models running on high-performance data centers with CPUs or GPUs,
this is not feasible for many applications because data can be privacy
sensitive and inference needs to be performed directly "on" device.
We introduce a new architecture for training compact neural networks using a
joint optimization framework. At its core lies a novel objective that jointly
trains using two different types of networks--a full trainer neural network
(using existing architectures like Feed-forward NNs or LSTM RNNs) combined with
a simpler "projection" network that leverages random projections to transform
inputs or intermediate representations into bits. The simpler network encodes
lightweight and efficient-to-compute operations in bit space with a low memory
footprint. The two networks are trained jointly using backpropagation, where
the projection network learns from the full network similar to apprenticeship
learning. Once trained, the smaller network can be used directly for inference
at low memory and computation cost. We demonstrate the effectiveness of the new
approach at significantly shrinking the memory requirements of different types
of neural networks while preserving good accuracy on visual recognition and
text classification tasks. We also study the question "how many neural bits are
required to solve a given task?" using the new framework and show empirical
results contrasting model predictive capacity (in bits) versus accuracy on
several datasets
Contrastive Predictive Coding Based Feature for Automatic Speaker Verification
This thesis describes our ongoing work on Contrastive Predictive Coding (CPC)
features for speaker verification. CPC is a recently proposed representation
learning framework based on predictive coding and noise contrastive estimation.
We focus on incorporating CPC features into the standard automatic speaker
verification systems, and we present our methods, experiments, and analysis.
This thesis also details necessary background knowledge in past and recent work
on automatic speaker verification systems, conventional speech features, and
the motivation and techniques behind CPC
Image Reconstruction with Predictive Filter Flow
We propose a simple, interpretable framework for solving a wide range of
image reconstruction problems such as denoising and deconvolution. Given a
corrupted input image, the model synthesizes a spatially varying linear filter
which, when applied to the input image, reconstructs the desired output. The
model parameters are learned using supervised or self-supervised training. We
test this model on three tasks: non-uniform motion blur removal,
lossy-compression artifact reduction and single image super resolution. We
demonstrate that our model substantially outperforms state-of-the-art methods
on all these tasks and is significantly faster than optimization-based
approaches to deconvolution. Unlike models that directly predict output pixel
values, the predicted filter flow is controllable and interpretable, which we
demonstrate by visualizing the space of predicted filters for different tasks.Comment: https://www.ics.uci.edu/~skong2/pff.htm
Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference
Deep neural networks (DNNs) provide state-of-the-art results for a multitude
of applications, but the approaches using DNNs for multimodal audiovisual
applications do not consider predictive uncertainty associated with individual
modalities. Bayesian deep learning methods provide principled confidence and
quantify predictive uncertainty. Our contribution in this work is to propose an
uncertainty aware multimodal Bayesian fusion framework for activity
recognition. We demonstrate a novel approach that combines deterministic and
variational layers to scale Bayesian DNNs to deeper architectures. Our
experiments using in- and out-of-distribution samples selected from a subset of
Moments-in-Time (MiT) dataset show a more reliable confidence measure as
compared to the non-Bayesian baseline and the Monte Carlo dropout (MC dropout)
approximate Bayesian inference. We also demonstrate the uncertainty estimates
obtained from the proposed framework can identify out-of-distribution data on
the UCF101 and MiT datasets. In the multimodal setting, the proposed framework
improved precision-recall AUC by 10.2% on the subset of MiT dataset as compared
to non-Bayesian baseline.Comment: Accepted at ICCV 2019 for Oral presentatio
Representation Learning with Contrastive Predictive Coding
While supervised learning has enabled great progress in many applications,
unsupervised learning has not seen such widespread adoption, and remains an
important and challenging endeavor for artificial intelligence. In this work,
we propose a universal unsupervised learning approach to extract useful
representations from high-dimensional data, which we call Contrastive
Predictive Coding. The key insight of our model is to learn such
representations by predicting the future in latent space by using powerful
autoregressive models. We use a probabilistic contrastive loss which induces
the latent space to capture information that is maximally useful to predict
future samples. It also makes the model tractable by using negative sampling.
While most prior work has focused on evaluating representations for a
particular modality, we demonstrate that our approach is able to learn useful
representations achieving strong performance on four distinct domains: speech,
images, text and reinforcement learning in 3D environments
SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control
In this work, we present an approach to deep visuomotor control using
structured deep dynamics models. Our deep dynamics model, a variant of
SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an
encoder-decoder structure. Unlike prior work, our dynamics model is structured:
given an input scene, our network explicitly learns to segment salient parts
and predict their pose-embedding along with their motion modeled as a change in
the pose space due to the applied actions. We train our model using a pair of
point clouds separated by an action and show that given supervision only in the
form of point-wise data associations between the frames our network is able to
learn a meaningful segmentation of the scene along with consistent poses. We
further show that our model can be used for closed-loop control directly in the
learned low-dimensional pose space, where the actions are computed by
minimizing error in the pose space using gradient-based methods, similar to
traditional model-based control. We present results on controlling a Baxter
robot from raw depth data in simulation and in the real world and compare
against two baseline deep networks. Our method runs in real-time, achieves good
prediction of scene dynamics and outperforms the baseline methods on multiple
control runs. Video results can be found at:
https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/Comment: 8 pages, Initial submission to IEEE International Conference on
Robotics and Automation (ICRA) 201
- …