397 research outputs found
ProbAct: A Probabilistic Activation Function for Deep Neural Networks
Activation functions play an important role in training artificial neural
networks. The majority of currently used activation functions are deterministic
in nature, with their fixed input-output relationship. In this work, we propose
a novel probabilistic activation function, called ProbAct. ProbAct is
decomposed into a mean and variance and the output value is sampled from the
formed distribution, making ProbAct a stochastic activation function. The
values of mean and variances can be fixed using known functions or trained for
each element. In the trainable ProbAct, the mean and the variance of the
activation distribution is trained within the back-propagation framework
alongside other parameters. We show that the stochastic perturbation induced
through ProbAct acts as a viable generalization technique for feature
augmentation. In our experiments, we compare ProbAct with well-known activation
functions on classification tasks on different modalities: Images(CIFAR-10,
CIFAR-100, and STL-10) and Text (Large Movie Review). We show that ProbAct
increases the classification accuracy by +2-3% compared to ReLU or other
conventional activation functions on both original datasets and when datasets
are reduced to 50% and 25% of the original size. Finally, we show that ProbAct
learns an ensemble of models by itself that can be used to estimate the
uncertainties associated with the prediction and provides robustness to noisy
inputs
Neural Network Multitask Learning for Traffic Flow Forecasting
Traditional neural network approaches for traffic flow forecasting are
usually single task learning (STL) models, which do not take advantage of the
information provided by related tasks. In contrast to STL, multitask learning
(MTL) has the potential to improve generalization by transferring information
in training signals of extra tasks. In this paper, MTL based neural networks
are used for traffic flow forecasting. For neural network MTL, a
backpropagation (BP) network is constructed by incorporating traffic flows at
several contiguous time instants into an output layer. Nodes in the output
layer can be seen as outputs of different but closely related STL tasks.
Comprehensive experiments on urban vehicular traffic flow data and comparisons
with STL show that MTL in BP neural networks is a promising and effective
approach for traffic flow forecasting
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
In this paper we introduce a generative parametric model capable of producing
high quality samples of natural images. Our approach uses a cascade of
convolutional networks within a Laplacian pyramid framework to generate images
in a coarse-to-fine fashion. At each level of the pyramid, a separate
generative convnet model is trained using the Generative Adversarial Nets (GAN)
approach (Goodfellow et al.). Samples drawn from our model are of significantly
higher quality than alternate approaches. In a quantitative assessment by human
evaluators, our CIFAR10 samples were mistaken for real images around 40% of the
time, compared to 10% for samples drawn from a GAN baseline model. We also show
samples from models trained on the higher resolution images of the LSUN scene
dataset
Recursive Autoconvolution for Unsupervised Learning of Convolutional Neural Networks
In visual recognition tasks, such as image classification, unsupervised
learning exploits cheap unlabeled data and can help to solve these tasks more
efficiently. We show that the recursive autoconvolution operator, adopted from
physics, boosts existing unsupervised methods by learning more discriminative
filters. We take well established convolutional neural networks and train their
filters layer-wise. In addition, based on previous works we design a network
which extracts more than 600k features per sample, but with the total number of
trainable parameters greatly reduced by introducing shared filters in higher
layers. We evaluate our networks on the MNIST, CIFAR-10, CIFAR-100 and STL-10
image classification benchmarks and report several state of the art results
among other unsupervised methods.Comment: 8 pages, accepted to International Joint Conference on Neural
Networks (IJCNN 2017
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach
With the advent of Big Data, nowadays in many applications databases
containing large quantities of similar time series are available. Forecasting
time series in these domains with traditional univariate forecasting procedures
leaves great potentials for producing accurate forecasts untapped. Recurrent
neural networks (RNNs), and in particular Long Short-Term Memory (LSTM)
networks, have proven recently that they are able to outperform
state-of-the-art univariate time series forecasting methods in this context
when trained across all available time series. However, if the time series
database is heterogeneous, accuracy may degenerate, so that on the way towards
fully automatic forecasting methods in this space, a notion of similarity
between the time series needs to be built into the methods. To this end, we
present a prediction model that can be used with different types of RNN models
on subgroups of similar time series, which are identified by time series
clustering techniques. We assess our proposed methodology using LSTM networks,
a widely popular RNN variant. Our method achieves competitive results on
benchmarking datasets under competition evaluation procedures. In particular,
in terms of mean sMAPE accuracy, it consistently outperforms the baseline LSTM
model and outperforms all other methods on the CIF2016 forecasting competition
dataset
Activation Ensembles for Deep Neural Networks
Many activation functions have been proposed in the past, but selecting an
adequate one requires trial and error. We propose a new methodology of
designing activation functions within a neural network at each layer. We call
this technique an "activation ensemble" because it allows the use of multiple
activation functions at each layer. This is done by introducing additional
variables, , at each activation layer of a network to allow for
multiple activation functions to be active at each neuron. By design,
activations with larger values at a neuron is equivalent to having the
largest magnitude. Hence, those higher magnitude activations are "chosen" by
the network. We implement the activation ensembles on a variety of datasets
using an array of Feed Forward and Convolutional Neural Networks. By using the
activation ensemble, we achieve superior results compared to traditional
techniques. In addition, because of the flexibility of this methodology, we
more deeply explore activation functions and the features that they capture
Non-linear Multitask Learning with Deep Gaussian Processes
We present a multi-task learning formulation for Deep Gaussian processes
(DGPs), through non-linear mixtures of latent processes. The latent space is
composed of private processes that capture within-task information and shared
processes that capture across-task dependencies. We propose two different
methods for segmenting the latent space: through hard coding shared and
task-specific processes or through soft sharing with Automatic Relevance
Determination kernels. We show that our formulation is able to improve the
learning performance and transfer information between the tasks, outperforming
other probabilistic multi-task learning models across real-world and
benchmarking settings
Unsupervised Deep Embedding for Clustering Analysis
Clustering is central to many data-driven application domains and has been
studied extensively in terms of distance functions and grouping algorithms.
Relatively little work has focused on learning representations for clustering.
In this paper, we propose Deep Embedded Clustering (DEC), a method that
simultaneously learns feature representations and cluster assignments using
deep neural networks. DEC learns a mapping from the data space to a
lower-dimensional feature space in which it iteratively optimizes a clustering
objective. Our experimental evaluations on image and text corpora show
significant improvement over state-of-the-art methods.Comment: icml201
Better AI through Logical Scaffolding
We describe the concept of logical scaffolds, which can be used to improve
the quality of software that relies on AI components. We explain how some of
the existing ideas on runtime monitors for perception systems can be seen as a
specific instance of logical scaffolds. Furthermore, we describe how logical
scaffolds may be useful for improving AI programs beyond perception systems, to
include general prediction systems and agent behavior models.Comment: CAV Workshop on Formal Methods for ML-enabled Autonomous Systems 201
Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
We perform scalable approximate inference in a continuous-depth Bayesian
neural network family. In this model class, uncertainty about separate weights
in each layer gives hidden units that follow a stochastic differential
equation. We demonstrate gradient-based stochastic variational inference in
this infinite-parameter setting, producing arbitrarily-flexible approximate
posteriors. We also derive a novel gradient estimator that approaches zero
variance as the approximate posterior over weights approaches the true
posterior. This approach brings continuous-depth Bayesian neural nets to a
competitive comparison against discrete-depth alternatives, while inheriting
the memory-efficient training and tunable precision of Neural ODEs
- …