33,314 research outputs found

    QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks

    Full text link
    Recent progress in computer vision-oriented neural network designs is mostly driven by capturing high-order neural interactions among inputs and features. And there emerged a variety of approaches to accomplish this, such as Transformers and its variants. However, these interactions generate a large amount of intermediate state and/or strong data dependency, leading to considerable memory consumption and computing cost, and therefore compromising the overall runtime performance. To address this challenge, we rethink the high-order interactive neural network design with a quadratic computing approach. Specifically, we propose QuadraNet -- a comprehensive model design methodology from neuron reconstruction to structural block and eventually to the overall neural network implementation. Leveraging quadratic neurons' intrinsic high-order advantages and dedicated computation optimization schemes, QuadraNet could effectively achieve optimal cognition and computation performance. Incorporating state-of-the-art hardware-aware neural architecture search and system integration techniques, QuadraNet could also be well generalized in different hardware constraint settings and deployment scenarios. The experiment shows thatQuadraNet achieves up to 1.5×\times throughput, 30% less memory footprint, and similar cognition performance, compared with the state-of-the-art high-order approaches.Comment: ASP-DAC 2024 Best Paper Nominatio

    Stochastic Training of Neural Networks via Successive Convex Approximations

    Full text link
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System

    Surrogate modeling approximation using a mixture of experts based on EM joint estimation

    Get PDF
    An automatic method to combine several local surrogate models is presented. This method is intended to build accurate and smooth approximation of discontinuous functions that are to be used in structural optimization problems. It strongly relies on the Expectation-Maximization (EM) algorithm for Gaussian mixture models (GMM). To the end of regression, the inputs are clustered together with their output values by means of parameter estimation of the joint distribution. A local expert is then built (linear, quadratic, artificial neural network, moving least squares) on each cluster. Lastly, the local experts are combined using the Gaussian mixture model parameters found by the EM algorithm to obtain a global model. This method is tested over both mathematical test cases and an engineering optimization problem from aeronautics and is found to improve the accuracy of the approximation

    Optimal Observer Design Using Reinforcement Learning and Quadratic Neural Networks

    Full text link
    This paper introduces an innovative approach based on policy iteration (PI), a reinforcement learning (RL) algorithm, to obtain an optimal observer with a quadratic cost function. This observer is designed for systems with a given linearized model and a stabilizing Luenberger observer gain. We utilize two-layer quadratic neural networks (QNN) for policy evaluation and derive a linear correction term using the input and output data. This correction term effectively rectifies inaccuracies introduced by the linearized model employed within the observer design. A unique feature of the proposed methodology is that the QNN is trained through convex optimization. The main advantage is that the QNN's input-output mapping has an analytical expression as a quadratic form, which can then be used to obtain a linear correction term policy. This is in stark contrast to the available techniques in the literature that must train a second neural network to obtain policy improvement. It is proven that the obtained linear correction term is optimal for linear systems, as both the value function and the QNN's input-output mapping are quadratic. The proposed method is applied to a simple pendulum, demonstrating an enhanced correction term policy compared to relying solely on the linearized model. This shows its promise for addressing nonlinear systems

    Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

    Full text link
    In practical Bayesian optimization, we must often search over structures with differing numbers of parameters. For instance, we may wish to search over neural network architectures with an unknown number of layers. To relate performance data gathered for different architectures, we define a new kernel for conditional parameter spaces that explicitly includes information about which parameters are relevant in a given structure. We show that this kernel improves model quality and Bayesian optimization results over several simpler baseline kernels.Comment: 6 pages, 3 figures. Appeared in the NIPS 2013 workshop on Bayesian optimizatio

    Reactive Planar Manipulation with Convex Hybrid MPC

    Full text link
    This paper presents a reactive controller for planar manipulation tasks that leverages machine learning to achieve real-time performance. The approach is based on a Model Predictive Control (MPC) formulation, where the goal is to find an optimal sequence of robot motions to achieve a desired object motion. Due to the multiple contact modes associated with frictional interactions, the resulting optimization program suffers from combinatorial complexity when tasked with determining the optimal sequence of modes. To overcome this difficulty, we formulate the search for the optimal mode sequences offline, separately from the search for optimal control inputs online. Using tools from machine learning, this leads to a convex hybrid MPC program that can be solved in real-time. We validate our algorithm on a planar manipulation experimental setup where results show that the convex hybrid MPC formulation with learned modes achieves good closed-loop performance on a trajectory tracking problem
    • …
    corecore