19,723 research outputs found
Discontinuous Piecewise Polynomial Neural Networks
An artificial neural network is presented based on the idea of connections
between units that are only active for a specific range of input values and
zero outside that range (and so are not evaluated outside the active range).
The connection function is represented by a polynomial with compact support.
The finite range of activation allows for great activation sparsity in the
network and means that theoretically you are able to add computational power to
the network without increasing the computational time required to evaluate the
network for a given input. The polynomial order ranges from first to fifth
order. Unit dropout is used for regularization and a parameter free weight
update is used. Better performance is obtained by moving from piecewise linear
connections to piecewise quadratic, even better performance can be obtained by
moving to higher order polynomials. The algorithm is tested on the MAGIC Gamma
ray data set as well as the MNIST data set
Multi-parametric Solution-path Algorithm for Instance-weighted Support Vector Machines
An instance-weighted variant of the support vector machine (SVM) has
attracted considerable attention recently since they are useful in various
machine learning tasks such as non-stationary data analysis, heteroscedastic
data modeling, transfer learning, learning to rank, and transduction. An
important challenge in these scenarios is to overcome the computational
bottleneck---instance weights often change dynamically or adaptively, and thus
the weighted SVM solutions must be repeatedly computed. In this paper, we
develop an algorithm that can efficiently and exactly update the weighted SVM
solutions for arbitrary change of instance weights. Technically, this
contribution can be regarded as an extension of the conventional solution-path
algorithm for a single regularization parameter to multiple instance-weight
parameters. However, this extension gives rise to a significant problem that
breakpoints (at which the solution path turns) have to be identified in
high-dimensional space. To facilitate this, we introduce a parametric
representation of instance weights. We also provide a geometric interpretation
in weight space using a notion of critical region: a polyhedron in which the
current affine solution remains to be optimal. Then we find breakpoints at
intersections of the solution path and boundaries of polyhedrons. Through
extensive experiments on various practical applications, we demonstrate the
usefulness of the proposed algorithm.Comment: Submitted to Journal of Machine Learning Researc
MgNet: A Unified Framework of Multigrid and Convolutional Neural Network
We develop a unified model, known as MgNet, that simultaneously recovers some
convolutional neural networks (CNN) for image classification and multigrid (MG)
methods for solving discretized partial differential equations (PDEs). This
model is based on close connections that we have observed and uncovered between
the CNN and MG methodologies. For example, pooling operation and feature
extraction in CNN correspond directly to restriction operation and iterative
smoothers in MG, respectively. As the solution space is often the dual of the
data space in PDEs, the analogous concept of feature space and data space
(which are dual to each other) is introduced in CNN. With such connections and
new concept in the unified model, the function of various convolution
operations and pooling used in CNN can be better understood. As a result,
modified CNN models (with fewer weights and hyper parameters) are developed
that exhibit competitive and sometimes better performance in comparison with
existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.Comment: 30 page
Extensions of Morse-Smale Regression with Application to Actuarial Science
The problem of subgroups is ubiquitous in scientific research (ex. disease
heterogeneity, spatial distributions in ecology...), and piecewise regression
is one way to deal with this phenomenon. Morse-Smale regression offers a way to
partition the regression function based on level sets of a defined function and
that function's basins of attraction. This topologically-based piecewise
regression algorithm has shown promise in its initial applications, but the
current implementation in the literature has been limited to elastic net and
generalized linear regression. It is possible that nonparametric methods, such
as random forest or conditional inference trees, may provide better prediction
and insight through modeling interaction terms and other nonlinear
relationships between predictors and a given outcome.
This study explores the use of several machine learning algorithms within a
Morse-Smale piecewise regression framework, including boosted regression with
linear baselearners, homotopy-based LASSO, conditional inference trees, random
forest, and a wide neural network framework called extreme learning machines.
Simulations on Tweedie regression problems with varying Tweedie parameter and
dispersion suggest that many machine learning approaches to Morse-Smale
piecewise regression improve the original algorithm's performance, particularly
for outcomes with lower dispersion and linear or a mix of linear and nonlinear
predictor relationships. On a real actuarial problem, several of these new
algorithms perform as good as or better than the original Morse-Smale
regression algorithm, and most provide information on the nature of predictor
relationships within each partition to provide insight into differences between
dataset partitions.Comment: 14 pages, 10 figure
A Tropical Approach to Neural Networks with Piecewise Linear Activations
We present a new, unifying approach following some recent developments on the
complexity of neural networks with piecewise linear activations. We treat
neural network layers with piecewise linear activations as tropical
polynomials, which generalize polynomials in the so-called or
tropical algebra, with possibly real-valued exponents. Motivated by the
discussion in (arXiv:1402.1869), this approach enables us to refine their upper
bounds on linear regions of layers with ReLU or leaky ReLU activations to
, where are the
number of inputs and outputs, respectively. Additionally, we recover their
upper bounds on maxout layers. Our work follows a novel path, exclusively under
the lens of tropical geometry, which is independent of the improvements
reported in (arXiv:1611.01491, arXiv:1711.02114). Finally, we present a
geometric approach for effective counting of linear regions using random
sampling in order to avoid the computational overhead of exact counting
approachesComment: v2: Removed morphological perceptron section and added vertex
sampling section. Updated references. 18 pages, 7 figure
Verification for Machine Learning, Autonomy, and Neural Networks Survey
This survey presents an overview of verification techniques for autonomous
systems, with a focus on safety-critical autonomous cyber-physical systems
(CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances
in artificial intelligence (AI) and machine learning (ML) through approaches
such as deep neural networks (DNNs), embedded in so-called learning enabled
components (LECs) that accomplish tasks from classification to control.
Recently, the formal methods and formal verification community has developed
methods to characterize behaviors in these LECs with eventual goals of formally
verifying specifications for LECs, and this article presents a survey of many
of these recent approaches
Predicting Nearly As Well As the Optimal Twice Differentiable Regressor
We study nonlinear regression of real valued data in an individual sequence
manner, where we provide results that are guaranteed to hold without any
statistical assumptions. We address the convergence and undertraining issues of
conventional nonlinear regression methods and introduce an algorithm that
elegantly mitigates these issues via an incremental hierarchical structure,
(i.e., via an incremental decision tree). Particularly, we present a piecewise
linear (or nonlinear) regression algorithm that partitions the regressor space
in a data driven manner and learns a linear model at each region. Unlike the
conventional approaches, our algorithm gradually increases the number of
disjoint partitions on the regressor space in a sequential manner according to
the observed data. Through this data driven approach, our algorithm
sequentially and asymptotically achieves the performance of the optimal twice
differentiable regression function for any data sequence with an unknown and
arbitrary length. The computational complexity of the introduced algorithm is
only logarithmic in the data length under certain regularity conditions. We
provide the explicit description of the algorithm and demonstrate the
significant gains for the well-known benchmark real data sets and chaotic
signals
Learning Halfspaces and Neural Networks with Random Initialization
We study non-convex empirical risk minimization for learning halfspaces and
neural networks. For loss functions that are -Lipschitz continuous, we
present algorithms to learn halfspaces and multi-layer neural networks that
achieve arbitrarily small excess risk . The time complexity is
polynomial in the input dimension and the sample size , but exponential
in the quantity . These algorithms run multiple
rounds of random initialization followed by arbitrary optimization steps. We
further show that if the data is separable by some neural network with constant
margin , then there is a polynomial-time algorithm for learning a
neural network that separates the training data with margin .
As a consequence, the algorithm achieves arbitrary generalization error
with sample and time complexity. We
establish the same learnability result when the labels are randomly flipped
with probability .Comment: 31 page
Learning Compressed Transforms with Low Displacement Rank
The low displacement rank (LDR) framework for structured matrices represents
a matrix through two displacement operators and a low-rank residual. Existing
use of LDR matrices in deep learning has applied fixed displacement operators
encoding forms of shift invariance akin to convolutions. We introduce a class
of LDR matrices with more general displacement operators, and explicitly learn
over both the operators and the low-rank component. This class generalizes
several previous constructions while preserving compression and efficient
computation. We prove bounds on the VC dimension of multi-layer neural networks
with structured weight matrices and show empirically that our compact
parameterization can reduce the sample complexity of learning. When replacing
weight layers in fully-connected, convolutional, and recurrent neural networks
for image classification and language modeling tasks, our new classes exceed
the accuracy of existing compression approaches, and on some tasks also
outperform general unstructured layers while using more than 20x fewer
parameters.Comment: NeurIPS 2018. Code available at
https://github.com/HazyResearch/structured-net
Deep Frank-Wolfe For Neural Network Optimization
Learning a deep neural network requires solving a challenging optimization
problem: it is a high-dimensional, non-convex and non-smooth minimization
problem with a large number of terms. The current practice in neural network
optimization is to rely on the stochastic gradient descent (SGD) algorithm or
its adaptive variants. However, SGD requires a hand-designed schedule for the
learning rate. In addition, its adaptive variants tend to produce solutions
that generalize less well on unseen data than SGD with a hand-designed
schedule. We present an optimization method that offers empirically the best of
both worlds: our algorithm yields good generalization performance while
requiring only one hyper-parameter. Our approach is based on a composite
proximal framework, which exploits the compositional nature of deep neural
networks and can leverage powerful convex optimization algorithms by design.
Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes
an optimal step-size in closed-form at each time-step. We further show that the
descent direction is given by a simple backward pass in the network, yielding
the same computational cost per iteration as SGD. We present experiments on the
CIFAR and SNLI data sets, where we demonstrate the significant superiority of
our method over Adam, Adagrad, as well as the recently proposed BPGrad and
AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed
learning rate schedule, and show that it provides similar generalization while
converging faster. The code is publicly available at
https://github.com/oval-group/dfw.Comment: Published as a conference paper at ICLR 201
- …