300 research outputs found

    Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

    Full text link
    ResNet structure has achieved great empirical success since its debut. Recent work established the convergence of learning over-parameterized ResNet with a scaling factor τ=1/L\tau=1/L on the residual branch where LL is the network depth. However, it is not clear how learning ResNet behaves for other values of τ\tau. In this paper, we fully characterize the convergence theory of gradient descent for learning over-parameterized ResNet with different values of τ\tau. Specifically, with hiding logarithmic factor and constant coefficients, we show that for τ≤1/L\tau\le 1/\sqrt{L} gradient descent is guaranteed to converge to the global minma, and especially when τ≤1/L\tau\le 1/L the convergence is irrelevant of the network depth. Conversely, we show that for τ>L−12+c\tau>L^{-\frac{1}{2}+c}, the forward output grows at least with rate LcL^c in expectation and then the learning fails because of gradient explosion for large LL. This means the bound τ≤1/L\tau\le 1/\sqrt{L} is sharp for learning ResNet with arbitrary depth. To the best of our knowledge, this is the first work that studies learning ResNet with full range of τ\tau.Comment: 31 page

    Silicon photonic subsystem for broadband and RoF detection while enabling carrier reuse

    Get PDF
    We experimentally validate a silicon photonic subsystem designed for passive optical networks with carrier reuse. The subsystem is intended for future wavelength division multiplexed (WDM) PONs. It enables radio-over-fiber signals to cohabit an assigned wavelength slot without perturbing the PON signal, and conserving carrier power for the uplink. A microring modulator remodulates the residual carrier for the RoF uplink. We successfully detected the dropped an 8 GHz broadband signal and five 125 MHz radio-over-fiber signals. Two 125 MHz radio over fiber signals are remodulated onto the carrier. The uplink signal shows good performance, validating the residual downlink signals have been well rejected by the microring filters. The subsystem conserves a clean carrier for remodulation with good signal-to-carrier ratio

    SiP-based SSBI cancellation for OFDM

    Get PDF
    We propose for the first time to use a silicon photonics (SiP) solution for a passive optical network to both reduce signal-signal beat interference (SSBI) and recuperate a part of the downlink carrier for use in the uplink. The Kramers-Kronig (KK) receiver for direct detection of advanced modulation formats overcomes SSBI at the cost of a moderate carrier to signal ratio (>6 dB) and high oversampling (4X). We propose an optical SSBI solution that achieves better performance than KK and requires only standard sampling and low (3 dB) carrier to signal power ratio. The receiver is conceived for the downlink in passive optical networks, where carrier signal must be husbanded for re-use in the uplink. Using cost effective and power efficient SiP, the receiver filters the incoming signal, suppresses SSBI, and routes a portion of the carrier for use in the uplink. We experimentally examine the SSBI suppression in this paper. While previous demonstrations used bulky, discrete components, we achieve significant Q-factor improvement with a simple SiP solution. We examine the optimal frequency offset between the carrier and the microring resonator center frequency. The robustness to frequency drift, as well as the impact of imperfect filtering, is discussed and quantified

    Lion: Adversarial Distillation of Proprietary Large Language Models

    Full text link
    The practice of transferring knowledge from a sophisticated, proprietary large language model (LLM) to a compact, open-source LLM has garnered considerable attention. Previous works have focused on a unidirectional knowledge distillation way by aligning the responses of the student model with those of the teacher model to a set of instructions. Nevertheless, they overlooked the possibility of incorporating any reciprocal "feedback"--identifying challenging instructions where the student model's performance falls short--to boost the student model's proficiency iteratively. To this end, we propose a novel adversarial distillation framework for a more efficient knowledge transfer. Leveraging the versatile role adaptability of LLMs, we prompt the teacher model to identify "hard" instructions and generate new "hard" instructions for the student model, creating a three-stage adversarial loop of imitation, discrimination, and generation. By applying this adversarial framework, we successfully transfer knowledge from ChatGPT to a student model (named Lion), using a mere 70k training data. Our results show that Lion-13B not only achieves comparable open-ended generation capabilities to ChatGPT but surpasses conventional state-of-the-art (SOTA) instruction-tuned models like Vicuna-13B by 55.4% in challenging zero-shot reasoning benchmarks such as BIG-Bench Hard (BBH) and 16.7% on AGIEval. Code and model can be found at https://github.com/YJiangcm/Lion.Comment: 21 pages, 5 figures, EMNLP 2023 main conferenc

    Polarization-insensitive silicon microring modulator for single sideband modulation

    Get PDF
    We propose and experimentally demonstrate a polarization-insensitive single sideband modulator based on silicon microring modulators (MRM). The proposed modulator splits and modulates the two orthogonal polarization states of an input laser in a loopback structure, with an on-chip silicon polarization splitter rotator (PSR), overcoming the polarization dependence of the silicon photonic modulator. The IQ configuration of the modulator enables single sideband modulation, thus improving the resistance of the modulated signal to chromatic dispersion and extending the transmission reach. The adoption of an MRM relieves the bandwidth limitation in polarizationdiverse versions of SiP Mach-Zehnder modulators (MZM). Our experiments validate the proposed modulator polarization insensitivity and transmission performanc

    Towards Accelerating Training of Batch Normalization: A Manifold Perspective

    Full text link
    Batch normalization (BN) has become a crucial component across diverse deep neural networks. The network with BN is invariant to positively linear re-scaling of weights, which makes there exist infinite functionally equivalent networks with various scales of weights. However, optimizing these equivalent networks with the first-order method such as stochastic gradient descent will converge to different local optima owing to different gradients across training. To alleviate this, we propose a quotient manifold \emph{PSI manifold}, in which all the equivalent weights of the network with BN are regarded as the same one element. Then, gradient descent and stochastic gradient descent on the PSI manifold are also constructed. The two algorithms guarantee that every group of equivalent weights (caused by positively re-scaling) converge to the equivalent optima. Besides that, we give the convergence rate of the proposed algorithms on PSI manifold and justify that they accelerate training compared with the algorithms on the Euclidean weight space. Empirical studies show that our algorithms can consistently achieve better performances over various experimental settings

    Popularity Ratio Maximization: Surpassing Competitors through Influence Propagation

    Full text link
    In this paper, we present an algorithmic study on how to surpass competitors in popularity by strategic promotions in social networks. We first propose a novel model, in which we integrate the Preferential Attachment (PA) model for popularity growth with the Independent Cascade (IC) model for influence propagation in social networks called PA-IC model. In PA-IC, a popular item and a novice item grab shares of popularity from the natural popularity growth via the PA model, while the novice item tries to gain extra popularity via influence cascade in a social network. The {\em popularity ratio} is defined as the ratio of the popularity measure between the novice item and the popular item. We formulate {\em Popularity Ratio Maximization (PRM)} as the problem of selecting seeds in multiple rounds to maximize the popularity ratio in the end. We analyze the popularity ratio and show that it is monotone but not submodular. To provide an effective solution, we devise a surrogate objective function and show that empirically it is very close to the original objective function while theoretically, it is monotone and submodular. We design two efficient algorithms, one for the overlapping influence and non-overlapping seeds (across rounds) setting and the other for the non-overlapping influence and overlapping seed setting, and further discuss how to deal with other models and problem variants. Our empirical evaluation further demonstrates that the proposed PRM-IMM method consistently achieves the best popularity promotion compared to other methods. Our theoretical and empirical analyses shed light on the interplay between influence maximization and preferential attachment in social networks.Comment: 22 pages, 8 figures, to be appear SIGMOD 202
    • …
    corecore