38 research outputs found

    Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

    Full text link
    Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during fine-tuning, the inherent size of pre-trained LLM weights continues to be a pressing concern. Even though quantization techniques are widely proposed to ease memory demands and accelerate LLM inference, most of these techniques are geared towards the deployment phase. To bridge this gap, this paper presents Parameter-Efficient and Quantization-aware Adaptation (PEQA) - a simple yet effective method that combines the advantages of PEFT with quantized LLMs. By updating solely the quantization scales, PEQA can be directly applied to quantized LLMs, ensuring seamless task transitions. Parallel to existing PEFT methods, PEQA significantly reduces the memory overhead associated with the optimizer state. Furthermore, it leverages the advantages of quantization to substantially reduce model sizes. Even after fine-tuning, the quantization structure of a PEQA-tuned LLM remains intact, allowing for accelerated inference on the deployment stage. We employ PEQA-tuning for task-specific adaptation on LLMs with up to 65 billion parameters. To assess the logical reasoning and language comprehension of PEQA-tuned LLMs, we fine-tune low-bit quantized LLMs using a instruction dataset. Our results show that even when LLMs are quantized to below 4-bit precision, their capabilities in language modeling, few-shot in-context learning, and comprehension can be resiliently restored to (or even improved over) their full-precision original performances with PEQA.Comment: Published at NeurIPS 2023. Camera-ready versio

    AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

    Full text link
    There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. During the adaptation phase, the binary values are frozen for all tasks, while the scaling factors are fine-tuned for the downstream task. We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters.Comment: Findings of EMNLP 202

    Structural optimization in magnetic fields using the homogenization design method.

    Full text link
    This dissertation is purposed to study the optimal topology design of structures in magnetic fields using the homogenization design method. The applications are classified into two parts: frequency response optimization of a structure which is excited by magnetic forces and magnetic energy optimization of a structure to maximize the magnetic energy/vector potential. For the topology optimization of a structure using the homogenization design method, the accuracy of the finite element analysis is important since the homogenization design method is based on the results of the finite element analysis. A new hexahedral eight node element is formulated based on the displacement method to overcome shear and volumetric locking for the three dimensional elastic analysis. Also, another hexahedral eight node element is formulated to perform a simple, static electromagnetic analysis. The examples verifies that these formulations are effective for elastic structural analysis and magnetic field analysis. The topology optimization of a structure which is excited by magnetic forces is an important issue to minimize the vibration/noise level of an electric machine. In this dissertation, the magnetic force is computed using the Maxwell stress method based on the finite element analysis of magnetic fields. The optimization problem is formulated to minimize the frequency response based on the homogenization design method. The examples shows that this method successfully decreased the vibration level of a structure excited by magnetic forces. To improve the performance of electric machinery, it is necessary to obtain an optimal topology of a structure in magnetic fields to maximize the magnetic energy. In this dissertation, a design process is formulated to achieve this goal based on the homogenization design methodology. The application is possible not only for simple linear cases but also for nonlinear cases when saturation effect is considered. The examples shows that the homogenization design method can be extended to obtain the optimal topology of a structure in magnetic fields considering magnetic energy.PhDApplied SciencesElectrical engineeringElectromagneticsMechanical engineeringPure SciencesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/132042/2/9938575.pd

    Shape Design of the Surface Mounted Permanent Magnet in a Synchronous Machine

    No full text

    Magnetic Actuator Design Using Level Set Based Topology Optimization

    Get PDF
    This paper presents a novel design methodology for optimum structural design of magnetic actuators using a level set based topology optimization method where the level set method can represent the precise boundary shape of a structure and also deal with complex topological changes during the optimization process. The distribution of ferromagnetic material is represented by introducing a level set function into the definition of the magnetic reluctivity. The optimization problem is defined to obtain optimal configurations that maximize the magnetic energy of actuators under a minimum bound of total volume. The movement of the implicit moving boundaries of the structure is driven by a transformation of design sensitivities of the objective and the constraints into speed functions that govern the level set propagation. The proposed method is applied to the structural design of magnetic actuators, and is confirmed to be useful for achieving optimal configurations that deliver higher performance and lighter weight designs

    Magnetic Actuator Design Using Level Set Based Topology Optimization

    Get PDF
    This paper presents a novel design methodology for optimum structural design of magnetic actuators using a level set based topology optimization method where the level set method can represent the precise boundary shape of a structure and also deal with complex topological changes during the optimization process. The distribution of ferromagnetic material is represented by introducing a level set function into the definition of the magnetic reluctivity. The optimization problem is defined to obtain optimal configurations that maximize the magnetic energy of actuators under a minimum bound of total volume. The movement of the implicit moving boundaries of the structure is driven by a transformation of design sensitivities of the objective and the constraints into speed functions that govern the level set propagation. The proposed method is applied to the structural design of magnetic actuators, and is confirmed to be useful for achieving optimal configurations that deliver higher performance and lighter weight designs
    corecore