38 research outputs found
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Large language models (LLMs) face the challenges in fine-tuning and
deployment due to their high memory demands and computational costs. While
parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage
of the optimizer state during fine-tuning, the inherent size of pre-trained LLM
weights continues to be a pressing concern. Even though quantization techniques
are widely proposed to ease memory demands and accelerate LLM inference, most
of these techniques are geared towards the deployment phase. To bridge this
gap, this paper presents Parameter-Efficient and Quantization-aware Adaptation
(PEQA) - a simple yet effective method that combines the advantages of PEFT
with quantized LLMs. By updating solely the quantization scales, PEQA can be
directly applied to quantized LLMs, ensuring seamless task transitions.
Parallel to existing PEFT methods, PEQA significantly reduces the memory
overhead associated with the optimizer state. Furthermore, it leverages the
advantages of quantization to substantially reduce model sizes. Even after
fine-tuning, the quantization structure of a PEQA-tuned LLM remains intact,
allowing for accelerated inference on the deployment stage. We employ
PEQA-tuning for task-specific adaptation on LLMs with up to 65 billion
parameters. To assess the logical reasoning and language comprehension of
PEQA-tuned LLMs, we fine-tune low-bit quantized LLMs using a instruction
dataset. Our results show that even when LLMs are quantized to below 4-bit
precision, their capabilities in language modeling, few-shot in-context
learning, and comprehension can be resiliently restored to (or even improved
over) their full-precision original performances with PEQA.Comment: Published at NeurIPS 2023. Camera-ready versio
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
There are growing interests in adapting large-scale language models using
parameter-efficient fine-tuning methods. However, accelerating the model itself
and achieving better inference efficiency through model compression has not
been thoroughly explored yet. Model compression could provide the benefits of
reducing memory footprints, enabling low-precision computations, and ultimately
achieving cost-effective inference. To combine parameter-efficient adaptation
and model compression, we propose AlphaTuning consisting of post-training
quantization of the pre-trained language model and fine-tuning only some parts
of quantized parameters for a target task. Specifically, AlphaTuning works by
employing binary-coding quantization, which factorizes the full-precision
parameters into binary parameters and a separate set of scaling factors. During
the adaptation phase, the binary values are frozen for all tasks, while the
scaling factors are fine-tuned for the downstream task. We demonstrate that
AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full
fine-tuning on a variety of downstream tasks while achieving >10x compression
ratio under 4-bit quantization and >1,000x reduction in the number of trainable
parameters.Comment: Findings of EMNLP 202
Structural optimization in magnetic fields using the homogenization design method.
This dissertation is purposed to study the optimal topology design of structures in magnetic fields using the homogenization design method. The applications are classified into two parts: frequency response optimization of a structure which is excited by magnetic forces and magnetic energy optimization of a structure to maximize the magnetic energy/vector potential. For the topology optimization of a structure using the homogenization design method, the accuracy of the finite element analysis is important since the homogenization design method is based on the results of the finite element analysis. A new hexahedral eight node element is formulated based on the displacement method to overcome shear and volumetric locking for the three dimensional elastic analysis. Also, another hexahedral eight node element is formulated to perform a simple, static electromagnetic analysis. The examples verifies that these formulations are effective for elastic structural analysis and magnetic field analysis. The topology optimization of a structure which is excited by magnetic forces is an important issue to minimize the vibration/noise level of an electric machine. In this dissertation, the magnetic force is computed using the Maxwell stress method based on the finite element analysis of magnetic fields. The optimization problem is formulated to minimize the frequency response based on the homogenization design method. The examples shows that this method successfully decreased the vibration level of a structure excited by magnetic forces. To improve the performance of electric machinery, it is necessary to obtain an optimal topology of a structure in magnetic fields to maximize the magnetic energy. In this dissertation, a design process is formulated to achieve this goal based on the homogenization design methodology. The application is possible not only for simple linear cases but also for nonlinear cases when saturation effect is considered. The examples shows that the homogenization design method can be extended to obtain the optimal topology of a structure in magnetic fields considering magnetic energy.PhDApplied SciencesElectrical engineeringElectromagneticsMechanical engineeringPure SciencesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/132042/2/9938575.pd
Magnetic Actuator Design Using Level Set Based Topology Optimization
This paper presents a novel design methodology for optimum structural design of magnetic actuators using a level set based topology optimization method where the level set method can represent the precise boundary shape of a structure and also deal with complex topological changes during the optimization process. The distribution of ferromagnetic material is represented by introducing a level set function into the definition of the magnetic reluctivity. The optimization problem is defined to obtain optimal configurations that maximize the magnetic energy of actuators under a minimum bound of total volume. The movement of the implicit moving boundaries of the structure is driven by a transformation of design sensitivities of the objective and the constraints into speed functions that govern the level set propagation. The proposed method is applied to the structural design of magnetic actuators, and is confirmed to be useful for achieving optimal configurations that deliver higher performance and lighter weight designs
Magnetic Actuator Design Using Level Set Based Topology Optimization
This paper presents a novel design methodology for optimum structural design of magnetic actuators using a level set based topology optimization method where the level set method can represent the precise boundary shape of a structure and also deal with complex topological changes during the optimization process. The distribution of ferromagnetic material is represented by introducing a level set function into the definition of the magnetic reluctivity. The optimization problem is defined to obtain optimal configurations that maximize the magnetic energy of actuators under a minimum bound of total volume. The movement of the implicit moving boundaries of the structure is driven by a transformation of design sensitivities of the objective and the constraints into speed functions that govern the level set propagation. The proposed method is applied to the structural design of magnetic actuators, and is confirmed to be useful for achieving optimal configurations that deliver higher performance and lighter weight designs