164,081 research outputs found

    VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository

    Full text link
    Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNet's utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.Comment: CHI'1

    Linear dynamical neural population models through nonlinear embeddings

    Full text link
    A body of recent work in modeling neural activity focuses on recovering low-dimensional latent features that capture the statistical structure of large-scale neural populations. Most such approaches have focused on linear generative models, where inference is computationally tractable. Here, we propose fLDS, a general class of nonlinear generative models that permits the firing rate of each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state. This extra flexibility allows the model to capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent space. To fit this class of non-conjugate models we propose a variational inference scheme, along with a novel approximate posterior capable of capturing rich temporal correlations across time. We show that our techniques permit inference in a wide class of generative models.We also show in application to two neural datasets that, compared to state-of-the-art neural population models, fLDS captures a much larger proportion of neural variability with a small number of latent dimensions, providing superior predictive performance and interpretability.Comment: NIPS 201

    BAR: Bayesian Activity Recognition using variational inference

    Full text link
    Uncertainty estimation in deep neural networks is essential for designing reliable and robust AI systems. Applications such as video surveillance for identifying suspicious activities are designed with deep neural networks (DNNs), but DNNs do not provide uncertainty estimates. Capturing reliable uncertainty estimates in safety and security critical applications will help to establish trust in the AI system. Our contribution is to apply Bayesian deep learning framework to visual activity recognition application and quantify model uncertainty along with principled confidence. We utilize the stochastic variational inference technique while training the Bayesian DNNs to infer the approximate posterior distribution around model parameters and perform Monte Carlo sampling on the posterior of model parameters to obtain the predictive distribution. We show that the Bayesian inference applied to DNNs provide reliable confidence measures for visual activity recognition task as compared to conventional DNNs. We also show that our method improves the visual activity recognition precision-recall AUC by 6.2% compared to non-Bayesian baseline. We evaluate our models on Moments-In-Time (MiT) activity recognition dataset by selecting a subset of in- and out-of-distribution video samples

    A Unified Treatment of Predictive Model Comparison

    Full text link
    The predictive performance of any inferential model is critical to its practical success, but quantifying predictive performance is a subtle statistical problem. In this paper I show how the natural structure of any inferential problem defines a canonical measure of relative predictive performance and then demonstrate how approximations of this measure yield many of the model comparison techniques popular in statistics and machine learning.Comment: 20 pages, 11 figure

    ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

    Full text link
    Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly "on" device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks--a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler "projection" network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question "how many neural bits are required to solve a given task?" using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets

    Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

    Full text link
    This thesis describes our ongoing work on Contrastive Predictive Coding (CPC) features for speaker verification. CPC is a recently proposed representation learning framework based on predictive coding and noise contrastive estimation. We focus on incorporating CPC features into the standard automatic speaker verification systems, and we present our methods, experiments, and analysis. This thesis also details necessary background knowledge in past and recent work on automatic speaker verification systems, conventional speech features, and the motivation and techniques behind CPC

    Image Reconstruction with Predictive Filter Flow

    Full text link
    We propose a simple, interpretable framework for solving a wide range of image reconstruction problems such as denoising and deconvolution. Given a corrupted input image, the model synthesizes a spatially varying linear filter which, when applied to the input image, reconstructs the desired output. The model parameters are learned using supervised or self-supervised training. We test this model on three tasks: non-uniform motion blur removal, lossy-compression artifact reduction and single image super resolution. We demonstrate that our model substantially outperforms state-of-the-art methods on all these tasks and is significantly faster than optimization-based approaches to deconvolution. Unlike models that directly predict output pixel values, the predicted filter flow is controllable and interpretable, which we demonstrate by visualizing the space of predicted filters for different tasks.Comment: https://www.ics.uci.edu/~skong2/pff.htm

    Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference

    Full text link
    Deep neural networks (DNNs) provide state-of-the-art results for a multitude of applications, but the approaches using DNNs for multimodal audiovisual applications do not consider predictive uncertainty associated with individual modalities. Bayesian deep learning methods provide principled confidence and quantify predictive uncertainty. Our contribution in this work is to propose an uncertainty aware multimodal Bayesian fusion framework for activity recognition. We demonstrate a novel approach that combines deterministic and variational layers to scale Bayesian DNNs to deeper architectures. Our experiments using in- and out-of-distribution samples selected from a subset of Moments-in-Time (MiT) dataset show a more reliable confidence measure as compared to the non-Bayesian baseline and the Monte Carlo dropout (MC dropout) approximate Bayesian inference. We also demonstrate the uncertainty estimates obtained from the proposed framework can identify out-of-distribution data on the UCF101 and MiT datasets. In the multimodal setting, the proposed framework improved precision-recall AUC by 10.2% on the subset of MiT dataset as compared to non-Bayesian baseline.Comment: Accepted at ICCV 2019 for Oral presentatio

    Representation Learning with Contrastive Predictive Coding

    Full text link
    While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments

    SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control

    Full text link
    In this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured: given an input scene, our network explicitly learns to segment salient parts and predict their pose-embedding along with their motion modeled as a change in the pose space due to the applied actions. We train our model using a pair of point clouds separated by an action and show that given supervision only in the form of point-wise data associations between the frames our network is able to learn a meaningful segmentation of the scene along with consistent poses. We further show that our model can be used for closed-loop control directly in the learned low-dimensional pose space, where the actions are computed by minimizing error in the pose space using gradient-based methods, similar to traditional model-based control. We present results on controlling a Baxter robot from raw depth data in simulation and in the real world and compare against two baseline deep networks. Our method runs in real-time, achieves good prediction of scene dynamics and outperforms the baseline methods on multiple control runs. Video results can be found at: https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/Comment: 8 pages, Initial submission to IEEE International Conference on Robotics and Automation (ICRA) 201
    • …
    corecore