283 research outputs found

    Adaptive Normalized Risk-Averting Training For Deep Neural Networks

    Full text link
    This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard LpL_p-norm error. By analyzing the gradient on the convexity index λ\lambda, we explain the reason why to learn λ\lambda adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.Comment: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets, code available at https://github.com/cauchyturing/ANRA

    Adopting Robustness and Optimality in Fitting and Learning

    Full text link
    We generalized a modified exponentialized estimator by pushing the robust-optimal (RO) index λ\lambda to −∞-\infty for achieving robustness to outliers by optimizing a quasi-Minimin function. The robustness is realized and controlled adaptively by the RO index without any predefined threshold. Optimality is guaranteed by expansion of the convexity region in the Hessian matrix to largely avoid local optima. Detailed quantitative analysis on both robustness and optimality are provided. The results of proposed experiments on fitting tasks for three noisy non-convex functions and the digits recognition task on the MNIST dataset consolidate the conclusions.Comment: arXiv admin note: text overlap with arXiv:1506.0269

    inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data

    Full text link
    The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identify crash potential. Lately, Transformer has emerged as a potential deep neural network that fundamentally operates through attention-based mechanisms. Transformer has several functional benefits over extant deep learning models such as Long Short-Term Memory (LSTM), Convolution Neural Network (CNN), etc. Firstly, Transformer can readily handle long-term dependencies in a data sequence. Secondly, Transformer can parallelly process all elements in a data sequence during training. Finally, Transformer does not have the vanishing gradient issue. Realizing the immense possibility of Transformer, this paper proposes inTersection-Transformer (inTformer), a time-embedded attention-based Transformer model that can effectively predict intersection crash likelihood in real-time. The proposed model was evaluated using connected vehicle data extracted from INRIX's Signal Analytics Platform. The data was parallelly formatted and stacked at different timesteps to develop nine inTformer models. The best inTformer model achieved a sensitivity of 73%. This model was also compared to earlier studies on crash likelihood prediction at intersections and with several established deep learning models trained on the same connected vehicle dataset. In every scenario, this inTformer outperformed the benchmark models confirming the viability of the proposed inTformer architecture.Comment: 29 pages, 7 figures, 9 table

    Peer-to-Peer Energy Trading in Smart Residential Environment with User Behavioral Modeling

    Get PDF
    Electric power systems are transforming from a centralized unidirectional market to a decentralized open market. With this shift, the end-users have the possibility to actively participate in local energy exchanges, with or without the involvement of the main grid. Rapidly reducing prices for Renewable Energy Technologies (RETs), supported by their ease of installation and operation, with the facilitation of Electric Vehicles (EV) and Smart Grid (SG) technologies to make bidirectional flow of energy possible, has contributed to this changing landscape in the distribution side of the traditional power grid. Trading energy among users in a decentralized fashion has been referred to as Peer- to-Peer (P2P) Energy Trading, which has attracted significant attention from the research and industry communities in recent times. However, previous research has mostly focused on engineering aspects of P2P energy trading systems, often neglecting the central role of users in such systems. P2P trading mechanisms require active participation from users to decide factors such as selling prices, storing versus trading energy, and selection of energy sources among others. The complexity of these tasks, paired with the limited cognitive and time capabilities of human users, can result sub-optimal decisions or even abandonment of such systems if performance is not satisfactory. Therefore, it is of paramount importance for P2P energy trading systems to incorporate user behavioral modeling that captures users’ individual trading behaviors, preferences, and perceived utility in a realistic and accurate manner. Often, such user behavioral models are not known a priori in real-world settings, and therefore need to be learned online as the P2P system is operating. In this thesis, we design novel algorithms for P2P energy trading. By exploiting a variety of statistical, algorithmic, machine learning, and behavioral economics tools, we propose solutions that are able to jointly optimize the system performance while taking into account and learning realistic model of user behavior. The results in this dissertation has been published in IEEE Transactions on Green Communications and Networking 2021, Proceedings of IEEE Global Communication Conference 2022, Proceedings of IEEE Conference on Pervasive Computing and Communications 2023 and ACM Transactions on Evolutionary Learning and Optimization 2023

    On Tilted Losses in Machine Learning: Theory and Applications

    Full text link
    Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0116

    Application of Spectral Solution and Neural Network Techniques in Plasma Modeling for Electric Propulsion

    Get PDF
    A solver for Poisson\u27s equation was developed using the Radix-2 FFT method first invented by Carl Friedrich Gauss. Its performance was characterized using simulated data and identical boundary conditions to those found in a Hall Effect Thruster. The characterization showed errors below machine-zero with noise-free data, and above 20% noise-to-signal strength, the error increased linearly with the noise. This solver can be implemented into AFRL\u27s plasma simulator, the Thermophysics Universal Research Framework (TURF) and used to quickly and accurately compute the electric field based on charge distributions. The validity of a machine learning approach and data-based complex system modeling approach was demonstrated. To this end, several multilayer perceptrons were created and validated against AFRL-provided Hall Thruster test data, with two networks showing mean error below 1% and standard deviations below 10%. These results, while not ready for implementation as a replacement for lookup tables, strongly suggest paths for future work and the development of networks that would be acceptable in such a role, saving both RAM space and time in plasma simulations
    • …
    corecore