2,191 research outputs found

    QuIP: 2-Bit Quantization of Large Language Models With Guarantees

    Full text link
    This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent\textit{incoherent} weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/Cornell-RelaxML/QuIP

    ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

    Full text link
    We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit OPTQ quantization often outperforms finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language infernece, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models--including the first family of 3-bit instruction following Alpaca LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs

    Derived categories of Burniat surfaces and exceptional collections

    Full text link
    We construct an exceptional collection ΄\Upsilon of maximal possible length 6 on any of the Burniat surfaces with KX2=6K_X^2=6, a 4-dimensional family of surfaces of general type with pg=q=0p_g=q=0. We also calculate the DG algebra of endomorphisms of this collection and show that the subcategory generated by this collection is the same for all Burniat surfaces. The semiorthogonal complement A\mathcal A of ΄\Upsilon is an "almost phantom" category: it has trivial Hochschild homology, and K_0(\mathcal A)=\bZ_2^6.Comment: 15 pages, 1 figure; further remarks expande

    Photodisintegration of 4^4He into p+t

    Full text link
    The two-body photodisintegration of 4^4He into a proton and a triton has been studied using the CEBAF Large-Acceptance Spectrometer (CLAS) at Jefferson Laboratory. Real photons produced with the Hall-B bremsstrahlung-tagging system in the energy range from 0.35 to 1.55 GeV were incident on a liquid 4^4He target. This is the first measurement of the photodisintegration of 4^4He above 0.4 GeV. The differential cross sections for the γ\gamma4^4He→pt\to pt reaction have been measured as a function of photon-beam energy and proton-scattering angle, and are compared with the latest model calculations by J.-M. Laget. At 0.6-1.2 GeV, our data are in good agreement only with the calculations that include three-body mechanisms, thus confirming their importance. These results reinforce the conclusion of our previous study of the three-body breakup of 3^3He that demonstrated the great importance of three-body mechanisms in the energy region 0.5-0.8 GeV .Comment: 13 pages submitted in one tgz file containing 2 tex file and 22 postscrip figure

    Differential cross sections and spin density matrix elements for the reaction gamma p -> p omega

    Full text link
    High-statistics differential cross sections and spin density matrix elements for the reaction gamma p -> p omega have been measured using the CLAS at Jefferson Lab for center-of-mass (CM) energies from threshold up to 2.84 GeV. Results are reported in 112 10-MeV wide CM energy bins, each subdivided into cos(theta_CM) bins of width 0.1. These are the most precise and extensive omega photoproduction measurements to date. A number of prominent structures are clearly present in the data. Many of these have not previously been observed due to limited statistics in earlier measurements

    Exclusive ρ0\rho^0 electroproduction on the proton at CLAS

    Full text link
    The ep→eâ€Čpρ0e p\to e^\prime p \rho^0 reaction has been measured, using the 5.754 GeV electron beam of Jefferson Lab and the CLAS detector. This represents the largest ever set of data for this reaction in the valence region. Integrated and differential cross sections are presented. The WW, Q2Q^2 and tt dependences of the cross section are compared to theoretical calculations based on tt-channel meson-exchange Regge theory on the one hand and on quark handbag diagrams related to Generalized Parton Distributions (GPDs) on the other hand. The Regge approach can describe at the ≈\approx 30% level most of the features of the present data while the two GPD calculations that are presented in this article which succesfully reproduce the high energy data strongly underestimate the present data. The question is then raised whether this discrepancy originates from an incomplete or inexact way of modelling the GPDs or the associated hard scattering amplitude or whether the GPD formalism is simply inapplicable in this region due to higher-twists contributions, incalculable at present.Comment: 29 pages, 29 figure

    First Measurement of Beam-Recoil Observables Cx and Cz in Hyperon Photoproduction

    Full text link
    Spin transfer from circularly polarized real photons to recoiling hyperons has been measured for the reactions γ⃗+p→K++Λ⃗\vec\gamma + p \to K^+ + \vec\Lambda and γ⃗+p→K++Σ⃗0\vec\gamma + p \to K^+ + \vec\Sigma^0. The data were obtained using the CLAS detector at Jefferson Lab for center-of-mass energies WW between 1.6 and 2.53 GeV, and for −0.85<cos⁡ξK+c.m.<+0.95-0.85<\cos\theta_{K^+}^{c.m.}< +0.95. For the Λ\Lambda, the polarization transfer coefficient along the photon momentum axis, CzC_z, was found to be near unity for a wide range of energy and kaon production angles. The associated transverse polarization coefficient, CxC_x, is smaller than CzC_z by a roughly constant difference of unity. Most significantly, the {\it total} Λ\Lambda polarization vector, including the induced polarization PP, has magnitude consistent with unity at all measured energies and production angles when the beam is fully polarized. For the Σ0\Sigma^0 this simple phenomenology does not hold. All existing hadrodynamic models are in poor agreement with these results.Comment: 28 pages, 18 figures, Submitted to Physical Review

    Search for the Θ+\Theta^+ pentaquark in the reaction γd→pK−K+n\gamma d \to p K^- K^+ n

    Full text link
    A search for the \thp in the reaction Îłd→pK−K+n\gamma d \to pK^-K^+n was completed using the CLAS detector at Jefferson Lab. A study of the same reaction, published earlier, reported the observation of a narrow \thp resonance. The present experiment, with more than 30 times the integrated luminosity of our earlier measurement, does not show any evidence for a narrow pentaquark resonance. The angle-integrated upper limit on \thp production in the mass range of 1.52 to 1.56 GeV/c2^2 for the Îłd→pK−Θ+\gamma d \to pK^-\Theta^+ reaction is 0.3 nb (95% CL). This upper limit depends on assumptions made for the mass and angular distribution of \thp production. Using \lamstar production as an empirical measure of rescattering in the deuteron, the cross section upper limit for the elementary Îłn→K−Θ+\gamma n \to K^-\Theta^+ reaction is estimated to be a factor of 10 higher, {\it i.e.}, ∌3\sim 3 nb (95% CL).Comment: 5 figures, submitted to PRL, revised for referee comment

    Search For Heavy Pointlike Dirac Monopoles

    Get PDF
    We have searched for central production of a pair of photons with high transverse energies in ppˉp\bar p collisions at s=1.8\sqrt{s} = 1.8 TeV using 70pb−170 pb^{-1} of data collected with the D\O detector at the Fermilab Tevatron in 1994--1996. If they exist, virtual heavy pointlike Dirac monopoles could rescatter pairs of nearly real photons into this final state via a box diagram. We observe no excess of events above background, and set lower 95% C.L. limits of 610,870,or1580GeV/c2610, 870, or 1580 GeV/c^2 on the mass of a spin 0, 1/2, or 1 Dirac monopole.Comment: 12 pages, 4 figure

    Search for New Physics in e mu X Data at D0 Using Sleuth: A Quasi-Model-Independent Search Strategy for New Physics

    Get PDF
    We present a quasi-model-independent search for the physics responsible for electroweak symmetry breaking. We define final states to be studied, and construct a rule that identifies a set of relevant variables for any particular final state. A new algorithm ("Sleuth") searches for regions of excess in those variables and quantifies the significance of any detected excess. After demonstrating the sensitivity of the method, we apply it to the semi-inclusive channel e mu X collected in 108 pb^-1 of ppbar collisions at sqrt(s) = 1.8 TeV at the D0 experiment during 1992-1996 at the Fermilab Tevatron. We find no evidence of new high p_T physics in this sample.Comment: 23 pages, 12 figures. Submitted to Physical Review
    • 

    corecore