803 research outputs found

    Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

    Full text link
    Recently there are a considerable amount of work devoted to the study of the algorithmic stability and generalization for stochastic gradient descent (SGD). However, the existing stability analysis requires to impose restrictive assumptions on the boundedness of gradients, strong smoothness and convexity of loss functions. In this paper, we provide a fine-grained analysis of stability and generalization for SGD by substantially relaxing these assumptions. Firstly, we establish stability and generalization for SGD by removing the existing bounded gradient assumptions. The key idea is the introduction of a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting using stability approach. Secondly, the smoothness assumption is relaxed by considering loss functions with Holder continuous (sub)gradients for which we show that optimal bounds are still achieved by balancing computation and stability. To our best knowledge, this gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions. Finally, we study learning problems with (strongly) convex objectives but non-convex loss functions.Comment: to appear in ICML 202

    Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks

    Full text link
    While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping. As compared to existing analysis on GD, our new analysis requires a relaxed overparameterization assumption and also applies to SGD. The key for the improvement is a better estimation of the smallest eigenvalues of the Hessian matrices of the empirical risks and the loss function along the trajectories of GD and SGD by providing a refined estimation of their iterates.Comment: to appear in Neural Information Processing Systems (NeurIPS 2022

    ADAPTIVE TRANSMISSION POWER IN LOW-POWER AND LOSSY NETWORK

    Get PDF
    Techniques are provided herein for intelligent transmission power control under different transmission patterns in a connected grid mesh. The transmission patterns include asynchronized transmission, broadcast transmission, and unicast transmission. They also provide a mechanism to help data packets compete against interference on specific channels and help high priority Quality of Service (QoS) packet have a greater chance to be received when congestion occurs. This enables the connected grid mesh to achieve higher reliability of communication with efficient power consumption

    Emergent Communication in Interactive Sketch Question Answering

    Full text link
    Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, previous works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.Comment: Accepted by NeurIPS 202

    A Reinforced Improved Attention Model for Abstractive Text Summarization

    Get PDF
    • …
    corecore