2,191 research outputs found
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
This work studies post-training parameter quantization in large language
models (LLMs). We introduce quantization with incoherence processing (QuIP), a
new method based on the insight that quantization benefits from
weight and Hessian matrices, i.e., from the weights being
even in magnitude and the directions in which it is important to round them
accurately being unaligned with the coordinate axes. QuIP consists of two
steps: (1) an adaptive rounding procedure minimizing a quadratic proxy
objective; (2) efficient pre- and post-processing that ensures weight and
Hessian incoherence via multiplication by random orthogonal matrices. We
complement QuIP with the first theoretical analysis for an LLM-scale
quantization algorithm, and show that our theory also applies to an existing
method, OPTQ. Empirically, we find that our incoherence preprocessing improves
several existing quantization algorithms and yields the first LLM quantization
methods that produce viable results using only two bits per weight. Our code
can be found at https://github.com/Cornell-RelaxML/QuIP
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
We propose a memory-efficient finetuning algorithm for large language models
(LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit
precision on as little as one 48GB GPU. Our method, modular low-rank adaptation
(ModuLoRA), integrates any user-specified weight quantizer with finetuning via
low-rank adapters (LoRAs). Our approach relies on a simple
quantization-agnostic backward pass that adaptively materializes low-precision
LLM weights from a custom black-box quantization module. This approach enables
finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit
OPTQ quantization often outperforms finetuning that relies on less
sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains
competitive performance on text classification, natural language infernece, and
instruction following tasks using significantly less memory than existing
approaches, and we also surpass the state-of-the-art ROUGE score on a popular
summarization task. We release ModuLoRA together with a series of low-precision
models--including the first family of 3-bit instruction following Alpaca
LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and
finetuning LLMs on consumer GPUs
Derived categories of Burniat surfaces and exceptional collections
We construct an exceptional collection of maximal possible length
6 on any of the Burniat surfaces with , a 4-dimensional family of
surfaces of general type with . We also calculate the DG algebra of
endomorphisms of this collection and show that the subcategory generated by
this collection is the same for all Burniat surfaces.
The semiorthogonal complement of is an "almost
phantom" category: it has trivial Hochschild homology, and K_0(\mathcal
A)=\bZ_2^6.Comment: 15 pages, 1 figure; further remarks expande
Photodisintegration of He into p+t
The two-body photodisintegration of He into a proton and a triton has
been studied using the CEBAF Large-Acceptance Spectrometer (CLAS) at Jefferson
Laboratory. Real photons produced with the Hall-B bremsstrahlung-tagging system
in the energy range from 0.35 to 1.55 GeV were incident on a liquid He
target. This is the first measurement of the photodisintegration of He
above 0.4 GeV. The differential cross sections for the He
reaction have been measured as a function of photon-beam energy and
proton-scattering angle, and are compared with the latest model calculations by
J.-M. Laget. At 0.6-1.2 GeV, our data are in good agreement only with the
calculations that include three-body mechanisms, thus confirming their
importance. These results reinforce the conclusion of our previous study of the
three-body breakup of He that demonstrated the great importance of
three-body mechanisms in the energy region 0.5-0.8 GeV .Comment: 13 pages submitted in one tgz file containing 2 tex file and 22
postscrip figure
Differential cross sections and spin density matrix elements for the reaction gamma p -> p omega
High-statistics differential cross sections and spin density matrix elements
for the reaction gamma p -> p omega have been measured using the CLAS at
Jefferson Lab for center-of-mass (CM) energies from threshold up to 2.84 GeV.
Results are reported in 112 10-MeV wide CM energy bins, each subdivided into
cos(theta_CM) bins of width 0.1. These are the most precise and extensive omega
photoproduction measurements to date. A number of prominent structures are
clearly present in the data. Many of these have not previously been observed
due to limited statistics in earlier measurements
Exclusive electroproduction on the proton at CLAS
The reaction has been measured, using the 5.754
GeV electron beam of Jefferson Lab and the CLAS detector. This represents the
largest ever set of data for this reaction in the valence region. Integrated
and differential cross sections are presented. The , and
dependences of the cross section are compared to theoretical calculations based
on -channel meson-exchange Regge theory on the one hand and on quark handbag
diagrams related to Generalized Parton Distributions (GPDs) on the other hand.
The Regge approach can describe at the 30% level most of the features
of the present data while the two GPD calculations that are presented in this
article which succesfully reproduce the high energy data strongly underestimate
the present data. The question is then raised whether this discrepancy
originates from an incomplete or inexact way of modelling the GPDs or the
associated hard scattering amplitude or whether the GPD formalism is simply
inapplicable in this region due to higher-twists contributions, incalculable at
present.Comment: 29 pages, 29 figure
First Measurement of Beam-Recoil Observables Cx and Cz in Hyperon Photoproduction
Spin transfer from circularly polarized real photons to recoiling hyperons
has been measured for the reactions and
. The data were obtained using the CLAS
detector at Jefferson Lab for center-of-mass energies between 1.6 and 2.53
GeV, and for . For the , the
polarization transfer coefficient along the photon momentum axis, , was
found to be near unity for a wide range of energy and kaon production angles.
The associated transverse polarization coefficient, , is smaller than
by a roughly constant difference of unity. Most significantly, the {\it
total} polarization vector, including the induced polarization ,
has magnitude consistent with unity at all measured energies and production
angles when the beam is fully polarized. For the this simple
phenomenology does not hold. All existing hadrodynamic models are in poor
agreement with these results.Comment: 28 pages, 18 figures, Submitted to Physical Review
Search for the pentaquark in the reaction
A search for the \thp in the reaction was completed
using the CLAS detector at Jefferson Lab. A study of the same reaction,
published earlier, reported the observation of a narrow \thp resonance. The
present experiment, with more than 30 times the integrated luminosity of our
earlier measurement, does not show any evidence for a narrow pentaquark
resonance. The angle-integrated upper limit on \thp production in the mass
range of 1.52 to 1.56 GeV/c for the reaction is
0.3 nb (95% CL). This upper limit depends on assumptions made for the mass and
angular distribution of \thp production. Using \lamstar production as an
empirical measure of rescattering in the deuteron, the cross section upper
limit for the elementary reaction is estimated to be
a factor of 10 higher, {\it i.e.}, nb (95% CL).Comment: 5 figures, submitted to PRL, revised for referee comment
Search For Heavy Pointlike Dirac Monopoles
We have searched for central production of a pair of photons with high
transverse energies in collisions at TeV using of data collected with the D\O detector at the Fermilab Tevatron in
1994--1996. If they exist, virtual heavy pointlike Dirac monopoles could
rescatter pairs of nearly real photons into this final state via a box diagram.
We observe no excess of events above background, and set lower 95% C.L. limits
of on the mass of a spin 0, 1/2, or 1 Dirac
monopole.Comment: 12 pages, 4 figure
Search for New Physics in e mu X Data at D0 Using Sleuth: A Quasi-Model-Independent Search Strategy for New Physics
We present a quasi-model-independent search for the physics responsible for
electroweak symmetry breaking. We define final states to be studied, and
construct a rule that identifies a set of relevant variables for any particular
final state. A new algorithm ("Sleuth") searches for regions of excess in those
variables and quantifies the significance of any detected excess. After
demonstrating the sensitivity of the method, we apply it to the semi-inclusive
channel e mu X collected in 108 pb^-1 of ppbar collisions at sqrt(s) = 1.8 TeV
at the D0 experiment during 1992-1996 at the Fermilab Tevatron. We find no
evidence of new high p_T physics in this sample.Comment: 23 pages, 12 figures. Submitted to Physical Review
- âŠ