68 research outputs found

    ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

    Full text link
    Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particularly in the context of large language models (LLMs). This paper introduces the first scalable instantiation of this paradigm called ScaleBiO, focusing on bilevel optimization for large-scale LLM data reweighting. By combining with a recently proposed memory-efficient training technique called LISA, our novel algorithm allows the paradigm to scale to 34-billion-parameter LLMs on eight A40 GPUs, marking the first successful application of bilevel optimization under practical scenarios for large-sized LLMs. Empirically, extensive experiments on data reweighting verify the effectiveness of ScaleBiO for different-scaled models, including GPT-2, LLaMA-3-8B, GPT-NeoX-20B, and Yi-34B, where bilevel optimization succeeds in filtering irrelevant data samples and selecting informative samples. Theoretically, ScaleBiO ensures the optimality of the learned data weights, along with a convergence guarantee matching the conventional first-order bilevel optimization paradigm on smooth and strongly convex objectives

    DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics

    Full text link
    The Federated Learning (FL) paradigm is known to face challenges under heterogeneous client data. Local training on non-iid distributed data results in deflected local optimum, which causes the client models drift further away from each other and degrades the aggregated global model's performance. A natural solution is to gather all client data onto the server, such that the server has a global view of the entire data distribution. Unfortunately, this reduces to regular training, which compromises clients' privacy and conflicts with the purpose of FL. In this paper, we put forth an idea to collect and leverage global knowledge on the server without hindering data privacy. We unearth such knowledge from the dynamics of the global model's trajectory. Specifically, we first reserve a short trajectory of global model snapshots on the server. Then, we synthesize a small pseudo dataset such that the model trained on it mimics the dynamics of the reserved global model trajectory. Afterward, the synthesized data is used to help aggregate the deflected clients into the global model. We name our method Dynafed, which enjoys the following advantages: 1) we do not rely on any external on-server dataset, which requires no additional cost for data collection; 2) the pseudo data can be synthesized in early communication rounds, which enables Dynafed to take effect early for boosting the convergence and stabilizing training; 3) the pseudo data only needs to be synthesized once and can be directly utilized on the server to help aggregation in subsequent rounds. Experiments across extensive benchmarks are conducted to showcase the effectiveness of Dynafed. We also provide insights and understanding of the underlying mechanism of our method

    Li2NiO2F a new oxyfluoride disordered rocksalt cathode material

    Get PDF
    Lithium-rich disordered rocksalts such as Li1.3Nb0.3Mn0.4O2 and Li2MnO2F are being investigated as high energy density cathodes for next generation Li-ion batteries. They can support the (de) lithiation of lithium ions over large compositional ranges while preserving the same overall structure. Here, we present a new Ni-rich oxyfluoride cathode, Li2NiO2F, with a disordered rocksalt structure. Li2NiO2F and can deliver a discharge capacity of 200 mAh g−1 at an average voltage of 3.2 V

    Corrigendum to: The TianQin project: current progress on science and technology

    Get PDF
    In the originally published version, this manuscript included an error related to indicating the corresponding author within the author list. This has now been corrected online to reflect the fact that author Jun Luo is the corresponding author of the article

    Effective Bilevel Optimization via Minimax Reformulation

    Full text link
    Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.Comment: Typos and intended inclusion of additional experiment

    Effect of Glycerol on an N-Vinylpyrrolidone-Based Photopolymer for Transmission Holography

    No full text
    N-vinylpyrrolidone (NVP) has a large molecular structure, so it is difficult to diffuse during holographic recording, especially at low spatial frequencies. We used glycerol to promote the diffusion of NVP, and successfully improved the holographic performance of the photopolymer at low spatial frequencies. As the concentration of glycerol increases, the holographic performance first increases and then remains stable. The optimal concentration of glycerol is 0.21 mol/L. At this concentration, the maximum diffraction efficiency of the photopolymer is 84%, the refractive index modulation is 1.95 × 10−3, and the photosensitive sensitivity is 7.91 × 10−4 cm2/mJ. Compared with the control group, the maximum diffraction efficiency, maximum refractive index modulation and photosensitivity at low spatial frequencies (800 lp/mm) have increased by 11.19 times, 4.69 times and 1.71 times, respectively. Using the optimized photopolymer for transmission holographic recording and reproduction, we have obtained a clear and bright transmission hologram. The photopolymer modified with glycerol is expected to be applied to the fields of holography, diffractive optics, and so on.</jats:p

    Effect of continuous equal channel angular pressing on microstructure and properties of Al-Ti-C alloy

    No full text
    The Al-Ti-C alloy was extruded in multiple passes in a continuous manner by continuous equal channel angular pressing process. Through observation of the microstructure evolution, the mechanism of grain refinement and changes in mechanical properties were discussed.The results show that continuous equal channel angular pressing process can effectively refine the microstructure of Al-Ti-C alloy, and the grain size is reduced to about 1 μm.The deformation induction is the most important grain refinement mechanism in the deformation process.The accumulation of high density dislocations causes cracks at the interface between the Al matrix and TiAl3 and voids inside the TiAl3. The cracks further propagate through the entire TiAl3 particles, ultimately leading to the refinement of the second phase TiAl3 structure.At the same time, the pinning mechanism and shearing mechanism of the fine second phase TiAl3 structure promote the refinement of the Al matrix.After one pass of continuous equal channel angular pressing, the hardness of the alloy increases most obviously, which is 59.2% higher than that of the original state.With the increase of the number of extrusion passes, the increasing trend of hardness slows down, the plasticity of the alloy decreases, and toughness increases
    corecore