Search CORE

19,723 research outputs found

Discontinuous Piecewise Polynomial Neural Networks

Author: Loverich John
Publication venue
Publication date: 14/06/2016
Field of study

An artificial neural network is presented based on the idea of connections between units that are only active for a specific range of input values and zero outside that range (and so are not evaluated outside the active range). The connection function is represented by a polynomial with compact support. The finite range of activation allows for great activation sparsity in the network and means that theoretically you are able to add computational power to the network without increasing the computational time required to evaluate the network for a given input. The polynomial order ranges from first to fifth order. Unit dropout is used for regularization and a parameter free weight update is used. Better performance is obtained by moving from piecewise linear connections to piecewise quadratic, even better performance can be obtained by moving to higher order polynomials. The algorithm is tested on the MAGIC Gamma ray data set as well as the MNIST data set

arXiv.org e-Print Archive

Multi-parametric Solution-path Algorithm for Instance-weighted Support Vector Machines

Author: Harada Naoyuki
Karasuyama Masayuki
Sugiyama Masashi
Takeuchi Ichiro
Publication venue
Publication date: 01/11/2010
Field of study

An instance-weighted variant of the support vector machine (SVM) has attracted considerable attention recently since they are useful in various machine learning tasks such as non-stationary data analysis, heteroscedastic data modeling, transfer learning, learning to rank, and transduction. An important challenge in these scenarios is to overcome the computational bottleneck---instance weights often change dynamically or adaptively, and thus the weighted SVM solutions must be repeatedly computed. In this paper, we develop an algorithm that can efficiently and exactly update the weighted SVM solutions for arbitrary change of instance weights. Technically, this contribution can be regarded as an extension of the conventional solution-path algorithm for a single regularization parameter to multiple instance-weight parameters. However, this extension gives rise to a significant problem that breakpoints (at which the solution path turns) have to be identified in high-dimensional space. To facilitate this, we introduce a parametric representation of instance weights. We also provide a geometric interpretation in weight space using a notion of critical region: a polyhedron in which the current affine solution remains to be optimal. Then we find breakpoints at intersections of the solution path and boundaries of polyhedrons. Through extensive experiments on various practical applications, we demonstrate the usefulness of the proposed algorithm.Comment: Submitted to Journal of Machine Learning Researc

arXiv.org e-Print Archive

MgNet: A Unified Framework of Multigrid and Convolutional Neural Network

Author: He Juncai
Xu Jinchao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2019
Field of study

We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyper parameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.Comment: 30 page

arXiv.org e-Print Archive

Extensions of Morse-Smale Regression with Application to Actuarial Science

Author: Farrelly Colleen M.
Publication venue
Publication date: 17/08/2017
Field of study

The problem of subgroups is ubiquitous in scientific research (ex. disease heterogeneity, spatial distributions in ecology...), and piecewise regression is one way to deal with this phenomenon. Morse-Smale regression offers a way to partition the regression function based on level sets of a defined function and that function's basins of attraction. This topologically-based piecewise regression algorithm has shown promise in its initial applications, but the current implementation in the literature has been limited to elastic net and generalized linear regression. It is possible that nonparametric methods, such as random forest or conditional inference trees, may provide better prediction and insight through modeling interaction terms and other nonlinear relationships between predictors and a given outcome. This study explores the use of several machine learning algorithms within a Morse-Smale piecewise regression framework, including boosted regression with linear baselearners, homotopy-based LASSO, conditional inference trees, random forest, and a wide neural network framework called extreme learning machines. Simulations on Tweedie regression problems with varying Tweedie parameter and dispersion suggest that many machine learning approaches to Morse-Smale piecewise regression improve the original algorithm's performance, particularly for outcomes with lower dispersion and linear or a mix of linear and nonlinear predictor relationships. On a real actuarial problem, several of these new algorithms perform as good as or better than the original Morse-Smale regression algorithm, and most provide information on the nature of predictor relationships within each partition to provide insight into differences between dataset partitions.Comment: 14 pages, 10 figure

arXiv.org e-Print Archive

A Tropical Approach to Neural Networks with Piecewise Linear Activations

Author: Charisopoulos Vasileios
Maragos Petros
Publication venue
Publication date: 30/01/2019
Field of study

We present a new, unifying approach following some recent developments on the complexity of neural networks with piecewise linear activations. We treat neural network layers with piecewise linear activations as tropical polynomials, which generalize polynomials in the so-called

(\max, +)

or tropical algebra, with possibly real-valued exponents. Motivated by the discussion in (arXiv:1402.1869), this approach enables us to refine their upper bounds on linear regions of layers with ReLU or leaky ReLU activations to

\min\left\{ 2^m, \sum_{j=0}^n \binom{m}{j} \right\}

, where

n, m

are the number of inputs and outputs, respectively. Additionally, we recover their upper bounds on maxout layers. Our work follows a novel path, exclusively under the lens of tropical geometry, which is independent of the improvements reported in (arXiv:1611.01491, arXiv:1711.02114). Finally, we present a geometric approach for effective counting of linear regions using random sampling in order to avoid the computational overhead of exact counting approachesComment: v2: Removed morphological perceptron section and added vertex sampling section. Updated references. 18 pages, 7 figure

arXiv.org e-Print Archive

Verification for Machine Learning, Autonomy, and Neural Networks Survey

Author: Hamilton Nathaniel
Johnson Taylor T.
Lopez Diego Manzanas
Musau Patrick
Rosenfeld Joel
Wild Ayana A.
Xiang Weiming
Yang Xiaodong
Publication venue
Publication date: 03/10/2018
Field of study

This survey presents an overview of verification techniques for autonomous systems, with a focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components (LECs) that accomplish tasks from classification to control. Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this article presents a survey of many of these recent approaches

arXiv.org e-Print Archive

Predicting Nearly As Well As the Optimal Twice Differentiable Regressor

Author: Kozat Suleyman S.
Sayin Muhammed O.
Vanli N. Denizcan
Publication venue
Publication date: 06/10/2014
Field of study

We study nonlinear regression of real valued data in an individual sequence manner, where we provide results that are guaranteed to hold without any statistical assumptions. We address the convergence and undertraining issues of conventional nonlinear regression methods and introduce an algorithm that elegantly mitigates these issues via an incremental hierarchical structure, (i.e., via an incremental decision tree). Particularly, we present a piecewise linear (or nonlinear) regression algorithm that partitions the regressor space in a data driven manner and learns a linear model at each region. Unlike the conventional approaches, our algorithm gradually increases the number of disjoint partitions on the regressor space in a sequential manner according to the observed data. Through this data driven approach, our algorithm sequentially and asymptotically achieves the performance of the optimal twice differentiable regression function for any data sequence with an unknown and arbitrary length. The computational complexity of the introduced algorithm is only logarithmic in the data length under certain regularity conditions. We provide the explicit description of the algorithm and demonstrate the significant gains for the well-known benchmark real data sets and chaotic signals

arXiv.org e-Print Archive

Learning Halfspaces and Neural Networks with Random Initialization

Author: Jordan Michael I.
Lee Jason D.
Wainwright Martin J.
Zhang Yuchen
Publication venue
Publication date: 24/11/2015
Field of study

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are

L

-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk

\epsilon>0

. The time complexity is polynomial in the input dimension

d

and the sample size

n

, but exponential in the quantity

(L/\epsilon^2)\log(L/\epsilon)

. These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin

\gamma>0

, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin

\Omega(\gamma)

. As a consequence, the algorithm achieves arbitrary generalization error

\epsilon>0

with

{\rm poly}(d,1/\epsilon)

sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability

\eta<1/2

.Comment: 31 page

arXiv.org e-Print Archive

Learning Compressed Transforms with Low Displacement Rank

Author: Dao Tri
Gu Albert
Rudra Atri
Ré Christopher
Thomas Anna T.
Publication venue
Publication date: 01/01/2019
Field of study

The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both the operators and the low-rank component. This class generalizes several previous constructions while preserving compression and efficient computation. We prove bounds on the VC dimension of multi-layer neural networks with structured weight matrices and show empirically that our compact parameterization can reduce the sample complexity of learning. When replacing weight layers in fully-connected, convolutional, and recurrent neural networks for image classification and language modeling tasks, our new classes exceed the accuracy of existing compression approaches, and on some tasks also outperform general unstructured layers while using more than 20x fewer parameters.Comment: NeurIPS 2018. Code available at https://github.com/HazyResearch/structured-net

arXiv.org e-Print Archive

Deep Frank-Wolfe For Neural Network Optimization

Author: Berrada Leonard
Kumar M. Pawan
Zisserman Andrew
Publication venue
Publication date: 30/04/2019
Field of study

Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In addition, its adaptive variants tend to produce solutions that generalize less well on unseen data than SGD with a hand-designed schedule. We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter. Our approach is based on a composite proximal framework, which exploits the compositional nature of deep neural networks and can leverage powerful convex optimization algorithms by design. Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes an optimal step-size in closed-form at each time-step. We further show that the descent direction is given by a simple backward pass in the network, yielding the same computational cost per iteration as SGD. We present experiments on the CIFAR and SNLI data sets, where we demonstrate the significant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed learning rate schedule, and show that it provides similar generalization while converging faster. The code is publicly available at https://github.com/oval-group/dfw.Comment: Published as a conference paper at ICLR 201

arXiv.org e-Print Archive