323 research outputs found
Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization
ResNet structure has achieved great empirical success since its debut. Recent
work established the convergence of learning over-parameterized ResNet with a
scaling factor on the residual branch where is the network
depth. However, it is not clear how learning ResNet behaves for other values of
. In this paper, we fully characterize the convergence theory of gradient
descent for learning over-parameterized ResNet with different values of .
Specifically, with hiding logarithmic factor and constant coefficients, we show
that for gradient descent is guaranteed to converge to the
global minma, and especially when the convergence is irrelevant
of the network depth. Conversely, we show that for ,
the forward output grows at least with rate in expectation and then the
learning fails because of gradient explosion for large . This means the
bound is sharp for learning ResNet with arbitrary depth.
To the best of our knowledge, this is the first work that studies learning
ResNet with full range of .Comment: 31 page
Silicon photonic subsystem for broadband and RoF detection while enabling carrier reuse
We experimentally validate a silicon photonic subsystem designed for passive optical networks
with carrier reuse. The subsystem is intended for future wavelength division multiplexed
(WDM) PONs. It enables radio-over-fiber signals to cohabit an assigned wavelength slot without
perturbing the PON signal, and conserving carrier power for the uplink. A microring modulator
remodulates the residual carrier for the RoF uplink. We successfully detected the dropped an
8 GHz broadband signal and five 125 MHz radio-over-fiber signals. Two 125 MHz radio over
fiber signals are remodulated onto the carrier. The uplink signal shows good performance,
validating the residual downlink signals have been well rejected by the microring filters. The
subsystem conserves a clean carrier for remodulation with good signal-to-carrier ratio
SiP-based SSBI cancellation for OFDM
We propose for the first time to use a silicon photonics (SiP) solution for a passive optical network to both reduce signal-signal beat interference (SSBI) and recuperate a part of the downlink carrier for use in the uplink. The Kramers-Kronig (KK) receiver for direct detection of advanced modulation formats overcomes SSBI at the cost of a moderate carrier to signal ratio (>6 dB) and high oversampling (4X). We propose an optical SSBI solution that achieves better performance than KK and requires only standard sampling and low (3 dB) carrier to signal power ratio. The receiver is conceived for the downlink in passive optical networks, where carrier signal must be husbanded for re-use in the uplink. Using cost effective and power efficient SiP, the receiver filters the incoming signal, suppresses SSBI, and routes a portion of the carrier for use in the uplink. We experimentally examine the SSBI suppression in this paper. While previous demonstrations used bulky, discrete components, we achieve significant Q-factor improvement with a simple SiP solution. We examine the optimal frequency offset between the carrier and the microring resonator center frequency. The robustness to frequency drift, as well as the impact of imperfect filtering, is discussed and quantified
Lion: Adversarial Distillation of Proprietary Large Language Models
The practice of transferring knowledge from a sophisticated, proprietary
large language model (LLM) to a compact, open-source LLM has garnered
considerable attention. Previous works have focused on a unidirectional
knowledge distillation way by aligning the responses of the student model with
those of the teacher model to a set of instructions. Nevertheless, they
overlooked the possibility of incorporating any reciprocal
"feedback"--identifying challenging instructions where the student model's
performance falls short--to boost the student model's proficiency iteratively.
To this end, we propose a novel adversarial distillation framework for a more
efficient knowledge transfer. Leveraging the versatile role adaptability of
LLMs, we prompt the teacher model to identify "hard" instructions and generate
new "hard" instructions for the student model, creating a three-stage
adversarial loop of imitation, discrimination, and generation. By applying this
adversarial framework, we successfully transfer knowledge from ChatGPT to a
student model (named Lion), using a mere 70k training data. Our results show
that Lion-13B not only achieves comparable open-ended generation capabilities
to ChatGPT but surpasses conventional state-of-the-art (SOTA) instruction-tuned
models like Vicuna-13B by 55.4% in challenging zero-shot reasoning benchmarks
such as BIG-Bench Hard (BBH) and 16.7% on AGIEval. Code and model can be found
at https://github.com/YJiangcm/Lion.Comment: 21 pages, 5 figures, EMNLP 2023 main conferenc
Polarization-insensitive silicon microring modulator for single sideband modulation
We propose and experimentally demonstrate a
polarization-insensitive single sideband modulator based on silicon microring modulators (MRM). The proposed modulator
splits and modulates the two orthogonal polarization states of
an input laser in a loopback structure, with an on-chip silicon
polarization splitter rotator (PSR), overcoming the polarization
dependence of the silicon photonic modulator. The IQ configuration of the modulator enables single sideband modulation, thus
improving the resistance of the modulated signal to chromatic
dispersion and extending the transmission reach. The adoption
of an MRM relieves the bandwidth limitation in polarizationdiverse versions of SiP Mach-Zehnder modulators (MZM). Our
experiments validate the proposed modulator polarization insensitivity and transmission performanc
Towards Accelerating Training of Batch Normalization: A Manifold Perspective
Batch normalization (BN) has become a crucial component across diverse deep
neural networks. The network with BN is invariant to positively linear
re-scaling of weights, which makes there exist infinite functionally equivalent
networks with various scales of weights. However, optimizing these equivalent
networks with the first-order method such as stochastic gradient descent will
converge to different local optima owing to different gradients across
training. To alleviate this, we propose a quotient manifold \emph{PSI
manifold}, in which all the equivalent weights of the network with BN are
regarded as the same one element. Then, gradient descent and stochastic
gradient descent on the PSI manifold are also constructed. The two algorithms
guarantee that every group of equivalent weights (caused by positively
re-scaling) converge to the equivalent optima. Besides that, we give the
convergence rate of the proposed algorithms on PSI manifold and justify that
they accelerate training compared with the algorithms on the Euclidean weight
space. Empirical studies show that our algorithms can consistently achieve
better performances over various experimental settings
HDA-LVIO: A High-Precision LiDAR-Visual-Inertial Odometry in Urban Environments with Hybrid Data Association
To enhance localization accuracy in urban environments, an innovative
LiDAR-Visual-Inertial odometry, named HDA-LVIO, is proposed by employing hybrid
data association. The proposed HDA_LVIO system can be divided into two
subsystems: the LiDAR-Inertial subsystem (LIS) and the Visual-Inertial
subsystem (VIS). In the LIS, the LiDAR pointcloud is utilized to calculate the
Iterative Closest Point (ICP) error, serving as the measurement value of Error
State Iterated Kalman Filter (ESIKF) to construct the global map. In the VIS,
an incremental method is firstly employed to adaptively extract planes from the
global map. And the centroids of these planes are projected onto the image to
obtain projection points. Then, feature points are extracted from the image and
tracked along with projection points using Lucas-Kanade (LK) optical flow.
Next, leveraging the vehicle states from previous intervals, sliding window
optimization is performed to estimate the depth of feature points.
Concurrently, a method based on epipolar geometric constraints is proposed to
address tracking failures for feature points, which can improve the accuracy of
depth estimation for feature points by ensuring sufficient parallax within the
sliding window. Subsequently, the feature points and projection points are
hybridly associated to construct reprojection error, serving as the measurement
value of ESIKF to estimate vehicle states. Finally, the localization accuracy
of the proposed HDA-LVIO is validated using public datasets and data from our
equipment. The results demonstrate that the proposed algorithm achieves
obviously improvement in localization accuracy compared to various existing
algorithms
Popularity Ratio Maximization: Surpassing Competitors through Influence Propagation
In this paper, we present an algorithmic study on how to surpass competitors
in popularity by strategic promotions in social networks. We first propose a
novel model, in which we integrate the Preferential Attachment (PA) model for
popularity growth with the Independent Cascade (IC) model for influence
propagation in social networks called PA-IC model. In PA-IC, a popular item and
a novice item grab shares of popularity from the natural popularity growth via
the PA model, while the novice item tries to gain extra popularity via
influence cascade in a social network. The {\em popularity ratio} is defined as
the ratio of the popularity measure between the novice item and the popular
item. We formulate {\em Popularity Ratio Maximization (PRM)} as the problem of
selecting seeds in multiple rounds to maximize the popularity ratio in the end.
We analyze the popularity ratio and show that it is monotone but not
submodular. To provide an effective solution, we devise a surrogate objective
function and show that empirically it is very close to the original objective
function while theoretically, it is monotone and submodular. We design two
efficient algorithms, one for the overlapping influence and non-overlapping
seeds (across rounds) setting and the other for the non-overlapping influence
and overlapping seed setting, and further discuss how to deal with other models
and problem variants. Our empirical evaluation further demonstrates that the
proposed PRM-IMM method consistently achieves the best popularity promotion
compared to other methods. Our theoretical and empirical analyses shed light on
the interplay between influence maximization and preferential attachment in
social networks.Comment: 22 pages, 8 figures, to be appear SIGMOD 202
- …