68 research outputs found
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Bilevel optimization has shown its utility across various machine learning
settings, yet most algorithms in practice require second-order information,
making it challenging to scale them up. Only recently, a paradigm of
first-order algorithms emerged, capable of effectively addressing bilevel
optimization problems. Nevertheless, the practical efficiency of this paradigm
remains unverified, particularly in the context of large language models
(LLMs). This paper introduces the first scalable instantiation of this paradigm
called ScaleBiO, focusing on bilevel optimization for large-scale LLM data
reweighting. By combining with a recently proposed memory-efficient training
technique called LISA, our novel algorithm allows the paradigm to scale to
34-billion-parameter LLMs on eight A40 GPUs, marking the first successful
application of bilevel optimization under practical scenarios for large-sized
LLMs. Empirically, extensive experiments on data reweighting verify the
effectiveness of ScaleBiO for different-scaled models, including GPT-2,
LLaMA-3-8B, GPT-NeoX-20B, and Yi-34B, where bilevel optimization succeeds in
filtering irrelevant data samples and selecting informative samples.
Theoretically, ScaleBiO ensures the optimality of the learned data weights,
along with a convergence guarantee matching the conventional first-order
bilevel optimization paradigm on smooth and strongly convex objectives
DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics
The Federated Learning (FL) paradigm is known to face challenges under
heterogeneous client data. Local training on non-iid distributed data results
in deflected local optimum, which causes the client models drift further away
from each other and degrades the aggregated global model's performance. A
natural solution is to gather all client data onto the server, such that the
server has a global view of the entire data distribution. Unfortunately, this
reduces to regular training, which compromises clients' privacy and conflicts
with the purpose of FL. In this paper, we put forth an idea to collect and
leverage global knowledge on the server without hindering data privacy. We
unearth such knowledge from the dynamics of the global model's trajectory.
Specifically, we first reserve a short trajectory of global model snapshots on
the server. Then, we synthesize a small pseudo dataset such that the model
trained on it mimics the dynamics of the reserved global model trajectory.
Afterward, the synthesized data is used to help aggregate the deflected clients
into the global model. We name our method Dynafed, which enjoys the following
advantages: 1) we do not rely on any external on-server dataset, which requires
no additional cost for data collection; 2) the pseudo data can be synthesized
in early communication rounds, which enables Dynafed to take effect early for
boosting the convergence and stabilizing training; 3) the pseudo data only
needs to be synthesized once and can be directly utilized on the server to help
aggregation in subsequent rounds. Experiments across extensive benchmarks are
conducted to showcase the effectiveness of Dynafed. We also provide insights
and understanding of the underlying mechanism of our method
Li2NiO2F a new oxyfluoride disordered rocksalt cathode material
Lithium-rich disordered rocksalts such as Li1.3Nb0.3Mn0.4O2 and Li2MnO2F are being investigated as high energy density cathodes for next generation Li-ion batteries. They can support the (de) lithiation of lithium ions over large compositional ranges while preserving the same overall structure. Here, we present a new Ni-rich oxyfluoride cathode, Li2NiO2F, with a disordered rocksalt structure. Li2NiO2F and can deliver a discharge capacity of 200 mAh g−1 at an average voltage of 3.2 V
Corrigendum to: The TianQin project: current progress on science and technology
In the originally published version, this manuscript included an error related to indicating the corresponding author within the author list. This has now been corrected online to reflect the fact that author Jun Luo is the corresponding author of the article
High-strength, highly conductive and woven organic hydrogel fibers for flexible electronics
Sustainable MXene/PDA hydrogel with core-shell structure tailored for highly efficient solar evaporation and long-term desalination
Effective Bilevel Optimization via Minimax Reformulation
Bilevel optimization has found successful applications in various machine
learning problems, including hyper-parameter optimization, data cleaning, and
meta-learning. However, its huge computational cost presents a significant
challenge for its utilization in large-scale problems. This challenge arises
due to the nested structure of the bilevel formulation, where each
hyper-gradient computation necessitates a costly inner optimization procedure.
To address this issue, we propose a reformulation of bilevel optimization as a
minimax problem, effectively decoupling the outer-inner dependency. Under mild
conditions, we show these two problems are equivalent. Furthermore, we
introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve
the resulting minimax problem with convergence guarantees. Extensive
experimental results demonstrate that our method outperforms state-of-the-art
bilevel methods while significantly reducing the computational cost.Comment: Typos and intended inclusion of additional experiment
Vitamin D alleviates hypoxia/reoxygenation-induced injury of human trophoblast HTR-8 cells by activating autophagy
Effect of Glycerol on an N-Vinylpyrrolidone-Based Photopolymer for Transmission Holography
N-vinylpyrrolidone (NVP) has a large molecular structure, so it is difficult to diffuse during holographic recording, especially at low spatial frequencies. We used glycerol to promote the diffusion of NVP, and successfully improved the holographic performance of the photopolymer at low spatial frequencies. As the concentration of glycerol increases, the holographic performance first increases and then remains stable. The optimal concentration of glycerol is 0.21 mol/L. At this concentration, the maximum diffraction efficiency of the photopolymer is 84%, the refractive index modulation is 1.95 × 10−3, and the photosensitive sensitivity is 7.91 × 10−4 cm2/mJ. Compared with the control group, the maximum diffraction efficiency, maximum refractive index modulation and photosensitivity at low spatial frequencies (800 lp/mm) have increased by 11.19 times, 4.69 times and 1.71 times, respectively. Using the optimized photopolymer for transmission holographic recording and reproduction, we have obtained a clear and bright transmission hologram. The photopolymer modified with glycerol is expected to be applied to the fields of holography, diffractive optics, and so on.</jats:p
Effect of continuous equal channel angular pressing on microstructure and properties of Al-Ti-C alloy
The Al-Ti-C alloy was extruded in multiple passes in a continuous manner by continuous equal channel angular pressing process. Through observation of the microstructure evolution, the mechanism of grain refinement and changes in mechanical properties were discussed.The results show that continuous equal channel angular pressing process can effectively refine the microstructure of Al-Ti-C alloy, and the grain size is reduced to about 1 μm.The deformation induction is the most important grain refinement mechanism in the deformation process.The accumulation of high density dislocations causes cracks at the interface between the Al matrix and TiAl3 and voids inside the TiAl3. The cracks further propagate through the entire TiAl3 particles, ultimately leading to the refinement of the second phase TiAl3 structure.At the same time, the pinning mechanism and shearing mechanism of the fine second phase TiAl3 structure promote the refinement of the Al matrix.After one pass of continuous equal channel angular pressing, the hardness of the alloy increases most obviously, which is 59.2% higher than that of the original state.With the increase of the number of extrusion passes, the increasing trend of hardness slows down, the plasticity of the alloy decreases, and toughness increases
- …
