69,340 research outputs found

    Do ideas have shape? Plato's theory of forms as the continuous limit of artificial neural networks

    Get PDF
    We show that ResNets converge, in the infinite depth limit, to a generalization of image registration algorithms. In this generalization, images are replaced by abstractions (ideas) living in high dimensional RKHS spaces, and material points are replaced by data points. Whereas computational anatomy aligns images via deformations of the material space, this generalization aligns ideas by via transformations of their RKHS. This identification of ResNets as idea registration algorithms has several remarkable consequences. The search for good architectures can be reduced to that of good kernels, and we show that the composition of idea registration blocks with reduced equivariant multi-channel kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and groups of transformations. Minimizers of L2L_2 regularized ResNets satisfy a discrete least action principle implying the near preservation of the norm of weights and biases across layers. The parameters of trained ResNets can be identified as solutions of an autonomous Hamiltonian system defined by the activation function and the architecture of the ANN. Momenta variables provide a sparse representation of the parameters of a ResNet. The registration regularization strategy provides a provably robust alternative to Dropout for ANNs. Pointwise RKHS error estimates lead to deterministic error estimates for ANNs.Comment: 56 page

    Do ideas have shape? Plato's theory of forms as the continuous limit of artificial neural networks

    Get PDF
    We show that ResNets converge, in the infinite depth limit, to a generalization of image registration algorithms. In this generalization, images are replaced by abstractions (ideas) living in high dimensional RKHS spaces, and material points are replaced by data points. Whereas computational anatomy aligns images via deformations of the material space, this generalization aligns ideas by via transformations of their RKHS. This identification of ResNets as idea registration algorithms has several remarkable consequences. The search for good architectures can be reduced to that of good kernels, and we show that the composition of idea registration blocks with reduced equivariant multi-channel kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and groups of transformations. Minimizers of L2 regularized ResNets satisfy a discrete least action principle implying the near preservation of the norm of weights and biases across layers. The parameters of trained ResNets can be identified as solutions of an autonomous Hamiltonian system defined by the activation function and the architecture of the ANN. Momenta variables provide a sparse representation of the parameters of a ResNet. The registration regularization strategy provides a provably robust alternative to Dropout for ANNs. Pointwise RKHS error estimates lead to deterministic error estimates for ANNs

    Singular solutions, momentum maps and computational anatomy

    Full text link
    This paper describes the variational formulation of template matching problems of computational anatomy (CA); introduces the EPDiff evolution equation in the context of an analogy between CA and fluid dynamics; discusses the singular solutions for the EPDiff equation and explains why these singular solutions exist (singular momentum map). Then it draws the consequences of EPDiff for outline matching problem in CA and gives numerical examples

    Deriving Grover's lower bound from simple physical principles

    Get PDF
    Grover's algorithm constitutes the optimal quantum solution to the search problem and provides a quadratic speed-up over all possible classical search algorithms. Quantum interference between computational paths has been posited as a key resource behind this computational speed-up. However there is a limit to this interference, at most pairs of paths can ever interact in a fundamental way. Could more interference imply more computational power? Sorkin has defined a hierarchy of possible interference behaviours---currently under experimental investigation---where classical theory is at the first level of the hierarchy and quantum theory belongs to the second. Informally, the order in the hierarchy corresponds to the number of paths that have an irreducible interaction in a multi-slit experiment. In this work, we consider how Grover's speed-up depends on the order of interference in a theory. Surprisingly, we show that the quadratic lower bound holds regardless of the order of interference. Thus, at least from the point of view of the search problem, post-quantum interference does not imply a computational speed-up over quantum theory.Comment: Updated title and exposition in response to referee comments. 6+2 pages, 5 figure

    An Efficient Dual Approach to Distance Metric Learning

    Full text link
    Distance metric learning is of fundamental interest in machine learning because the distance metric employed can significantly affect the performance of many learning methods. Quadratic Mahalanobis metric learning is a popular approach to the problem, but typically requires solving a semidefinite programming (SDP) problem, which is computationally expensive. Standard interior-point SDP solvers typically have a complexity of O(D6.5)O(D^{6.5}) (with DD the dimension of input data), and can thus only practically solve problems exhibiting less than a few thousand variables. Since the number of variables is D(D+1)/2D (D+1) / 2 , this implies a limit upon the size of problem that can practically be solved of around a few hundred dimensions. The complexity of the popular quadratic Mahalanobis metric learning approach thus limits the size of problem to which metric learning can be applied. Here we propose a significantly more efficient approach to the metric learning problem based on the Lagrange dual formulation of the problem. The proposed formulation is much simpler to implement, and therefore allows much larger Mahalanobis metric learning problems to be solved. The time complexity of the proposed method is O(D3)O (D ^ 3) , which is significantly lower than that of the SDP approach. Experiments on a variety of datasets demonstrate that the proposed method achieves an accuracy comparable to the state-of-the-art, but is applicable to significantly larger problems. We also show that the proposed method can be applied to solve more general Frobenius-norm regularized SDP problems approximately
    • …
    corecore