Search CORE

303 research outputs found

Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

Author: Chen Wei
Liu Tie-Yan
Yi Mingyang
Yu Da
Zhang Huishuai
Publication venue
Publication date: 12/07/2019
Field of study

ResNet structure has achieved great empirical success since its debut. Recent work established the convergence of learning over-parameterized ResNet with a scaling factor

\tau=1/L

on the residual branch where

L

is the network depth. However, it is not clear how learning ResNet behaves for other values of

\tau

. In this paper, we fully characterize the convergence theory of gradient descent for learning over-parameterized ResNet with different values of

\tau

. Specifically, with hiding logarithmic factor and constant coefficients, we show that for

\tau\le 1/\sqrt{L}

gradient descent is guaranteed to converge to the global minma, and especially when

\tau\le 1/L

the convergence is irrelevant of the network depth. Conversely, we show that for

\tau>L^{-\frac{1}{2}+c}

, the forward output grows at least with rate

L^c

in expectation and then the learning fails because of gradient explosion for large

L

. This means the bound

\tau\le 1/\sqrt{L}

is sharp for learning ResNet with arbitrary depth. To the best of our knowledge, this is the first work that studies learning ResNet with full range of

\tau

.Comment: 31 page

arXiv.org e-Print Archive

Silicon photonic subsystem for broadband and RoF detection while enabling carrier reuse

Author: Lyu Mingyang
Rusch Leslie
Shi Wei
Publication venue: 'The Optical Society'
Publication date: 01/05/2020
Field of study

We experimentally validate a silicon photonic subsystem designed for passive optical networks with carrier reuse. The subsystem is intended for future wavelength division multiplexed (WDM) PONs. It enables radio-over-fiber signals to cohabit an assigned wavelength slot without perturbing the PON signal, and conserving carrier power for the uplink. A microring modulator remodulates the residual carrier for the RoF uplink. We successfully detected the dropped an 8 GHz broadband signal and five 125 MHz radio-over-fiber signals. Two 125 MHz radio over fiber signals are remodulated onto the carrier. The uplink signal shows good performance, validating the residual downlink signals have been well rejected by the microring filters. The subsystem conserves a clean carrier for remodulation with good signal-to-carrier ratio

CorpusUL

SiP-based SSBI cancellation for OFDM

Author: Lyu Mingyang
Rusch Leslie
Shi Wei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

We propose for the first time to use a silicon photonics (SiP) solution for a passive optical network to both reduce signal-signal beat interference (SSBI) and recuperate a part of the downlink carrier for use in the uplink. The Kramers-Kronig (KK) receiver for direct detection of advanced modulation formats overcomes SSBI at the cost of a moderate carrier to signal ratio (>6 dB) and high oversampling (4X). We propose an optical SSBI solution that achieves better performance than KK and requires only standard sampling and low (3 dB) carrier to signal power ratio. The receiver is conceived for the downlink in passive optical networks, where carrier signal must be husbanded for re-use in the uplink. Using cost effective and power efficient SiP, the receiver filters the incoming signal, suppresses SSBI, and routes a portion of the carrier for use in the uplink. We experimentally examine the SSBI suppression in this paper. While previous demonstrations used bulky, discrete components, we achieve significant Q-factor improvement with a simple SiP solution. We examine the optimal frequency offset between the carrier and the microring resonator center frequency. The robustness to frequency drift, as well as the impact of imperfect filtering, is discussed and quantified

CorpusUL

Lion: Adversarial Distillation of Proprietary Large Language Models

Author: Chan Chunkit
Chen Mingyang
Jiang Yuxin
Wang Wei
Publication venue
Publication date: 13/10/2023
Field of study

The practice of transferring knowledge from a sophisticated, proprietary large language model (LLM) to a compact, open-source LLM has garnered considerable attention. Previous works have focused on a unidirectional knowledge distillation way by aligning the responses of the student model with those of the teacher model to a set of instructions. Nevertheless, they overlooked the possibility of incorporating any reciprocal "feedback"--identifying challenging instructions where the student model's performance falls short--to boost the student model's proficiency iteratively. To this end, we propose a novel adversarial distillation framework for a more efficient knowledge transfer. Leveraging the versatile role adaptability of LLMs, we prompt the teacher model to identify "hard" instructions and generate new "hard" instructions for the student model, creating a three-stage adversarial loop of imitation, discrimination, and generation. By applying this adversarial framework, we successfully transfer knowledge from ChatGPT to a student model (named Lion), using a mere 70k training data. Our results show that Lion-13B not only achieves comparable open-ended generation capabilities to ChatGPT but surpasses conventional state-of-the-art (SOTA) instruction-tuned models like Vicuna-13B by 55.4% in challenging zero-shot reasoning benchmarks such as BIG-Bench Hard (BBH) and 16.7% on AGIEval. Code and model can be found at https://github.com/YJiangcm/Lion.Comment: 21 pages, 5 figures, EMNLP 2023 main conferenc

arXiv.org e-Print Archive

Polarization-insensitive silicon microring modulator for single sideband modulation

Author: Guan Xun
Lyu Mingyang
Rusch Leslie
Shi Wei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/06/2018
Field of study

We propose and experimentally demonstrate a polarization-insensitive single sideband modulator based on silicon microring modulators (MRM). The proposed modulator splits and modulates the two orthogonal polarization states of an input laser in a loopback structure, with an on-chip silicon polarization splitter rotator (PSR), overcoming the polarization dependence of the silicon photonic modulator. The IQ configuration of the modulator enables single sideband modulation, thus improving the resistance of the modulated signal to chromatic dispersion and extending the transmission reach. The adoption of an MRM relieves the bandwidth limitation in polarizationdiverse versions of SiP Mach-Zehnder modulators (MZM). Our experiments validate the proposed modulator polarization insensitivity and transmission performanc

CorpusUL

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Author: Chen Wei
Ma Zhi-Ming
Meng Qi
Yi Mingyang
Publication venue
Publication date: 08/01/2021
Field of study

Batch normalization (BN) has become a crucial component across diverse deep neural networks. The network with BN is invariant to positively linear re-scaling of weights, which makes there exist infinite functionally equivalent networks with various scales of weights. However, optimizing these equivalent networks with the first-order method such as stochastic gradient descent will converge to different local optima owing to different gradients across training. To alleviate this, we propose a quotient manifold \emph{PSI manifold}, in which all the equivalent weights of the network with BN are regarded as the same one element. Then, gradient descent and stochastic gradient descent on the PSI manifold are also constructed. The two algorithms guarantee that every group of equivalent weights (caused by positively re-scaling) converge to the equivalent optima. Besides that, we give the convergence rate of the proposed algorithms on PSI manifold and justify that they accelerate training compared with the algorithms on the Euclidean weight space. Empirical studies show that our algorithms can consistently achieve better performances over various experimental settings

arXiv.org e-Print Archive

Popularity Ratio Maximization: Surpassing Competitors through Influence Propagation

Author: Bi Sheng
Chen Wei
Liao Hao
Mao Rui
Wu Jiao
Zhang Wei
Zhou Mingyang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/04/2023
Field of study

In this paper, we present an algorithmic study on how to surpass competitors in popularity by strategic promotions in social networks. We first propose a novel model, in which we integrate the Preferential Attachment (PA) model for popularity growth with the Independent Cascade (IC) model for influence propagation in social networks called PA-IC model. In PA-IC, a popular item and a novice item grab shares of popularity from the natural popularity growth via the PA model, while the novice item tries to gain extra popularity via influence cascade in a social network. The {\em popularity ratio} is defined as the ratio of the popularity measure between the novice item and the popular item. We formulate {\em Popularity Ratio Maximization (PRM)} as the problem of selecting seeds in multiple rounds to maximize the popularity ratio in the end. We analyze the popularity ratio and show that it is monotone but not submodular. To provide an effective solution, we devise a surrogate objective function and show that empirically it is very close to the original objective function while theoretically, it is monotone and submodular. We design two efficient algorithms, one for the overlapping influence and non-overlapping seeds (across rounds) setting and the other for the non-overlapping influence and overlapping seed setting, and further discuss how to deal with other models and problem variants. Our empirical evaluation further demonstrates that the proposed PRM-IMM method consistently achieves the best popularity promotion compared to other methods. Our theoretical and empirical analyses shed light on the interplay between influence maximization and preferential attachment in social networks.Comment: 22 pages, 8 figures, to be appear SIGMOD 202

arXiv.org e-Print Archive