11,181 research outputs found
VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository
Researchers currently rely on ad hoc datasets to train automated
visualization tools and evaluate the effectiveness of visualization designs.
These exemplars often lack the characteristics of real-world datasets, and
their one-off nature makes it difficult to compare different techniques. In
this paper, we present VizNet: a large-scale corpus of over 31 million datasets
compiled from open data repositories and online visualization galleries. On
average, these datasets comprise 17 records over 3 dimensions and across the
corpus, we find 51% of the dimensions record categorical data, 44%
quantitative, and only 5% temporal. VizNet provides the necessary common
baseline for comparing visualization design techniques, and developing
benchmark models and algorithms for automating visual analysis. To demonstrate
VizNet's utility as a platform for conducting online crowdsourced experiments
at scale, we replicate a prior study assessing the influence of user task and
data distribution on visual encoding effectiveness, and extend it by
considering an additional task: outlier detection. To contend with running such
studies at scale, we demonstrate how a metric of perceptual effectiveness can
be learned from experimental results, and show its predictive power across test
datasets.Comment: CHI'1
Meta Dropout: Learning to Perturb Features for Generalization
A machine learning model that generalizes well should obtain low errors on
unseen test examples. Thus, if we know how to optimally perturb training
examples to account for test examples, we may achieve better generalization
performance. However, obtaining such perturbation is not possible in standard
machine learning frameworks as the distribution of the test data is unknown. To
tackle this challenge, we propose a novel regularization method, meta-dropout,
which learns to perturb the latent features of training examples for
generalization in a meta-learning framework. Specifically, we meta-learn a
noise generator which outputs a multiplicative noise distribution for latent
features, to obtain low errors on the test instances in an input-dependent
manner. Then, the learned noise generator can perturb the training examples of
unseen tasks at the meta-test time for improved generalization. We validate our
method on few-shot classification datasets, whose results show that it
significantly improves the generalization performance of the base model, and
largely outperforms existing regularization methods such as information
bottleneck, manifold mixup, and information dropout
Coupled Recurrent Network (CRN)
Many semantic video analysis tasks can benefit from multiple, heterogenous
signals. For example, in addition to the original RGB input sequences,
sequences of optical flow are usually used to boost the performance of human
action recognition in videos. To learn from these heterogenous input sources,
existing methods reply on two-stream architectural designs that contain
independent, parallel streams of Recurrent Neural Networks (RNNs). However,
two-stream RNNs do not fully exploit the reciprocal information contained in
the multiple signals, let alone exploit it in a recurrent manner. To this end,
we propose in this paper a novel recurrent architecture, termed Coupled
Recurrent Network (CRN), to deal with multiple input sources. In CRN, the
parallel streams of RNNs are coupled together. Key design of CRN is a Recurrent
Interpretation Block (RIB) that supports learning of reciprocal feature
representations from multiple signals in a recurrent manner. Different from
RNNs which stack the training loss at each time step or the last time step, we
propose an effective and efficient training strategy for CRN. Experiments show
the efficacy of the proposed CRN. In particular, we achieve the new state of
the art on the benchmark datasets of human action recognition and multi-person
pose estimation
ChronoNet: A Deep Recurrent Neural Network for Abnormal EEG Identification
Brain-related disorders such as epilepsy can be diagnosed by analyzing
electroencephalograms (EEG). However, manual analysis of EEG data requires
highly trained clinicians, and is a procedure that is known to have relatively
low inter-rater agreement (IRA). Moreover, the volume of the data and the rate
at which new data becomes available make manual interpretation a
time-consuming, resource-hungry, and expensive process. In contrast, automated
analysis of EEG data offers the potential to improve the quality of patient
care by shortening the time to diagnosis and reducing manual error. In this
paper, we focus on one of the first steps in interpreting an EEG session -
identifying whether the brain activity is abnormal or normal. To solve this
task, we propose a novel recurrent neural network (RNN) architecture termed
ChronoNet which is inspired by recent developments from the field of image
classification and designed to work efficiently with EEG data. ChronoNet is
formed by stacking multiple 1D convolution layers followed by deep gated
recurrent unit (GRU) layers where each 1D convolution layer uses multiple
filters of exponentially varying lengths and the stacked GRU layers are densely
connected in a feed-forward manner. We used the recently released TUH Abnormal
EEG Corpus dataset for evaluating the performance of ChronoNet. Unlike previous
studies using this dataset, ChronoNet directly takes time-series EEG as input
and learns meaningful representations of brain activity patterns. ChronoNet
outperforms the previously reported best results by 7.79% thereby setting a new
benchmark for this dataset. Furthermore, we demonstrate the domain-independent
nature of ChronoNet by successfully applying it to classify speech commands.Comment: 8 pages, 2 figures, 2 table
Quda: Natural Language Queries for Visual Data Analytics
Visualization-oriented natural language interfaces (V-NLIs) have been
explored and developed in recent years. One challenge faced by V-NLIs is in the
formation of effective design decisions that usually requires a deep
understanding of user queries. Learning-based approaches have shown potential
in V-NLIs and reached state-of-the-art performance in various NLP tasks.
However, because of the lack of sufficient training samples that cater to
visual data analytics, cutting-edge techniques have rarely been employed to
facilitate the development of V-NLIs. We present a new dataset, called Quda, to
help V-NLIs understand free-form natural language. Our dataset contains 14;035
diverse user queries annotated with 10 low-level analytic tasks that assist in
the deployment of state-of-the-art techniques for parsing complex human
language. We achieve this goal by first gathering seed queries with data
analysts who are target users of V-NLIs. Then we employ extensive crowd force
for paraphrase generation and validation. We demonstrate the usefulness of Quda
in building V-NLIs by creating a prototype that makes effective design
decisions for free-form user queries. We also show that Quda can be beneficial
for a wide range of applications in the visualization community by analyzing
the design tasks described in academic publications.Comment: This work isn't sufficiently exhaustive. We need to do some new work
on thi
Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
As deep reinforcement learning driven by visual perception becomes more
widely used there is a growing need to better understand and probe the learned
agents. Understanding the decision making process and its relationship to
visual inputs can be very valuable to identify problems in learned behavior.
However, this topic has been relatively under-explored in the research
community. In this work we present a method for synthesizing visual inputs of
interest for a trained agent. Such inputs or states could be situations in
which specific actions are necessary. Further, critical states in which a very
high or a very low reward can be achieved are often interesting to understand
the situational awareness of the system as they can correspond to risky states.
To this end, we learn a generative model over the state space of the
environment and use its latent space to optimize a target function for the
state of interest. In our experiments we show that this method can generate
insights for a variety of environments and reinforcement learning methods. We
explore results in the standard Atari benchmark games as well as in an
autonomous driving simulator. Based on the efficiency with which we have been
able to identify behavioural weaknesses with this technique, we believe this
general approach could serve as an important tool for AI safety applications
SkipNet: Learning Dynamic Routing in Convolutional Networks
While deeper convolutional networks are needed to achieve maximum accuracy in
visual perception tasks, for many inputs shallower networks are sufficient. We
exploit this observation by learning to skip convolutional layers on a
per-input basis. We introduce SkipNet, a modified residual network, that uses a
gating network to selectively skip convolutional blocks based on the
activations of the previous layer. We formulate the dynamic skipping problem in
the context of sequential decision making and propose a hybrid learning
algorithm that combines supervised learning and reinforcement learning to
address the challenges of non-differentiable skipping decisions. We show
SkipNet reduces computation by 30-90% while preserving the accuracy of the
original model on four benchmark datasets and outperforms the state-of-the-art
dynamic networks and static compression methods. We also qualitatively evaluate
the gating policy to reveal a relationship between image scale and saliency and
the number of layers skipped.Comment: ECCV 2018 Camera ready version. Code is available at
https://github.com/ucbdrive/skipne
How is Gaze Influenced by Image Transformations? Dataset and Model
Data size is the bottleneck for developing deep saliency models, because
collecting eye-movement data is very time consuming and expensive. Most of
current studies on human attention and saliency modeling have used high quality
stereotype stimuli. In real world, however, captured images undergo various
types of transformations. Can we use these transformations to augment existing
saliency datasets? Here, we first create a novel saliency dataset including
fixations of 10 observers over 1900 images degraded by 19 types of
transformations. Second, by analyzing eye movements, we find that observers
look at different locations over transformed versus original images. Third, we
utilize the new data over transformed images, called data augmentation
transformation (DAT), to train deep saliency models. We find that label
preserving DATs with negligible impact on human gaze boost saliency prediction,
whereas some other DATs that severely impact human gaze degrade the
performance. These label preserving valid augmentation transformations provide
a solution to enlarge existing saliency datasets. Finally, we introduce a novel
saliency model based on generative adversarial network (dubbed GazeGAN). A
modified UNet is proposed as the generator of the GazeGAN, which combines
classic skip connections with a novel center-surround connection (CSC), in
order to leverage multi level features. We also propose a histogram loss based
on Alternative Chi Square Distance (ACS HistLoss) to refine the saliency map in
terms of luminance distribution. Extensive experiments and comparisons over 3
datasets indicate that GazeGAN achieves the best performance in terms of
popular saliency evaluation metrics, and is more robust to various
perturbations. Our code and data are available at:
https://github.com/CZHQuality/Sal-CFS-GAN
How to improve the interpretability of kernel learning
In recent years, machine learning researchers have focused on methods to
construct flexible and interpretable prediction models. However, an
interpretability evaluation, a relationship between generalization performance
and an interpretability of the model and a method for improving the
interpretability have to be considered. In this paper, a quantitative index of
the interpretability is proposed and its rationality is proved, and equilibrium
problem between the interpretability and the generalization performance is
analyzed. Probability upper bound of the sum of the two performances is
analyzed. For traditional supervised kernel machine learning problem, a
universal learning framework is put forward to solve the equilibrium problem
between the two performances. The condition for global optimal solution based
on the framework is deduced. The learning framework is applied to the
least-squares support vector machine and is evaluated by some experiments.Comment: arXiv admin note: text overlap with arXiv:1811.0774
- …