2,076 research outputs found

    Parameterizing and Aggregating Activation Functions in Deep Neural Networks

    Get PDF
    The nonlinear activation functions applied by each neuron in a neural network are essential for making neural networks powerful representational models. If these are omitted, even deep neural networks reduce to simple linear regression due to the fact that a linear combination of linear combinations is still a linear combination. In much of the existing literature on neural networks, just one or two activation functions are selected for the entire network, even though the use of heterogenous activation functions has been shown to produce superior results in some cases. Even less often employed are activation functions that can adapt their nonlinearities as network parameters along with standard weights and biases. This dissertation presents a collection of papers that advance the state of heterogenous and parameterized activation functions. Contributions of this dissertation include three novel parametric activation functions and applications of each, a study evaluating the utility of the parameters in parametric activation functions, an aggregated activation approach to modeling time-series data as an alternative to recurrent neural networks, and an improvement upon existing work that aggregates neuron inputs using product instead of sum

    Developments in the theory of randomized shortest paths with a comparison of graph node distances

    Get PDF
    There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments.Comment: 30 pages, 4 figures, 3 table

    Deep Learning with Functional Inputs

    Full text link
    We present a methodology for integrating functional data into deep densely connected feed-forward neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying functional weights; these results were confirmed through real applications and simulation studies. A forthcoming R package is developed on top of a popular deep learning library (Keras) allowing for general use of the approach.Comment: 28 pages, 6 figures, submitted to JCG
    • …
    corecore