446 research outputs found
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
Finetuning large language models (LLMs) has been empirically effective on a
variety of downstream tasks. Existing approaches to finetuning an LLM either
focus on parameter-efficient finetuning, which only updates a small number of
trainable parameters, or attempt to reduce the memory footprint during the
training phase of the finetuning. Typically, the memory footprint during
finetuning stems from three contributors: model weights, optimizer states, and
intermediate activations. However, existing works still require considerable
memory and none can simultaneously mitigate memory footprint for all three
sources. In this paper, we present Quantized Side Tuing (QST), which enables
memory-efficient and fast finetuning of LLMs by operating through a dual-stage
process. First, QST quantizes an LLM's model weights into 4-bit to reduce the
memory footprint of the LLM's original weights; QST also introduces a side
network separated from the LLM, which utilizes the hidden states of the LLM to
make task-specific predictions. Using a separate side network avoids performing
backpropagation through the LLM, thus reducing the memory requirement of the
intermediate activations. Furthermore, QST leverages several low-rank adaptors
and gradient-free downsample modules to significantly reduce the trainable
parameters, so as to save the memory footprint of the optimizer states.
Experiments show that QST can reduce the total memory footprint by up to 2.3
and speed up the finetuning process by up to 3 while
achieving competent performance compared with the state-of-the-art. When it
comes to full finetuning, QST can reduce the total memory footprint up to 7
SpotServe: Serving Generative Large Language Models on Preemptible Instances
The high computational and memory requirements of generative large language
models (LLMs) make it challenging to serve them cheaply. This paper aims to
reduce the monetary cost for serving LLMs by leveraging preemptible GPU
instances on modern clouds, which offer accesses to spare GPUs at a much
cheaper price than regular instances but may be preempted by the cloud at any
time. Serving LLMs on preemptible instances requires addressing challenges
induced by frequent instance preemptions and the necessity of migrating
instances to handle these preemptions.
This paper presents SpotServe, the first distributed LLM serving system on
preemptible instances. Several key techniques in SpotServe realize fast and
reliable serving of generative LLMs on cheap preemptible instances. First,
SpotServe dynamically adapts the LLM parallelization configuration for dynamic
instance availability and fluctuating workload, while balancing the trade-off
among the overall throughput, inference latency and monetary costs. Second, to
minimize the cost of migrating instances for dynamic reparallelization, the
task of migrating instances is formulated as a bipartite graph matching
problem, which uses the Kuhn-Munkres algorithm to identify an optimal migration
plan that minimizes communications. Finally, to take advantage of the grace
period offered by modern clouds, we introduce stateful inference recovery, a
new inference mechanism that commits inference progress at a much finer
granularity and allows SpotServe to cheaply resume inference upon preemption.
We evaluate on real spot instance preemption traces and various popular LLMs
and show that SpotServe can reduce the P99 tail latency by 2.4 - 9.1x compared
with the best existing LLM serving systems. We also show that SpotServe can
leverage the price advantage of preemptive instances, saving 54% monetary cost
compared with only using on-demand instances.Comment: ASPLOS 202
Overall survival and cancer-specific survival were improved in local treatment of metastatic prostate cancer
BackgroundFor metastatic prostate cancer (mPCa), radical prostatectomy (RP) and radiation therapy (RT) may improve overall survival (OS) and cancer-specific survival (CSS). Compared with RT, RP shows significant advantages in improving patient outcomes. External beam radiation therapy (EBRT) even slightly elevates CSM with no statistical difference in OS compared with no local treatment (NLT).ObjectiveTo evaluate OS and CSS after local treatment (LT) (including RP and RT) versus NLT in mPCa.Design, setting, and participantsWithin the Surveillance, Epidemiology and End Results (SEER) database (2000-2018), 20098 patients with metastatic prostate cancer were selected in this study, of which 19433 patients had no local treatment, 377 patients with radical prostate treatment, and 288 patients with RT.Outcome measurements and statistical analysisMultivariable competing risks regression analysis after propensity score matching (PSM) was used to calculate CSM. Multivariable Cox regression analysis was used to identify the risk factors. Kaplan-Meier methods were used to calculate OS.Results and limitationsA total of 20098 patients were included: NLT (n = 19433), RP (n=377) and RT (n=288). In a competing risk regression analysis after PSM (ratio 1:1), RP resulted in a significantly lower CSM (hazard ratio [HR] 0.36, 95% confidence interval [CI] 0.29-0.45) than NLT, while RT showed a slightly lower CSM (HR 0.77, 95% CI 0.63-0.95). In a competing risk regression analysis after PSM (ratio 1:1), RP led to a lower CSM (HR 0.56, 95% CI 0.41-0.76) versus RT. As for all-cause mortality (ACM), RP (HR 0.37, 95% CI 0.31-0.45) and RT (HR 0.66, 95% CI 0.56-0.79). also showed a downward trend. In terms of OS, RP and RT significantly improved the survival probability compared with NLT, with the effect of RP being more pronounced. Obviously, older age, Gleason scores â„8, AJCC T3-T4 stage, AJCC N1, AJCC M1b-M1c were all associated with higher CSM (PÂ <0.05). The same results held true for ACM. The limitation of this article is that it is not possible to assess the effect of differences in systemic therapy on CSM in mPCa patients and clinical trials are needed to verify the results.ConclusionsFor patients with mPCa, both RP and RT are beneficial to patients, and the efficacy of RP is better than RT from the perspective of CSM and ACM. Older age, higher gleason scores and the more advanced AJCC TNM stage all put patients at higher risk of dying.Patient summaryA large population-based cancer database showed that in addition to first-line therapy (hormonal treatment), RP and radiotherapy can also benefit patients with mPCa
Minute-cadence Observations of the LAMOST Fields with the TMTS: III. Statistic Study of the Flare Stars from the First Two Years
Tsinghua University-Ma Huateng Telescopes for Survey (TMTS) aims to detect
fast-evolving transients in the Universe, which has led to the discovery of
thousands of short-period variables and eclipsing binaries since 2020. In this
paper, we present the observed properties of 125 flare stars identified by the
TMTS within the first two years, with an attempt to constrain their eruption
physics. As expected, most of these flares were recorded in late-type red stars
with > 2.0 mag, however, the flares associated with
bluer stars tend to be on average more energetic and have broader profiles. The
peak flux (F_peak) of the flare is found to depend strongly on the equivalent
duration (ED) of the energy release, i.e., , which is consistent with results derived from the Kepler
and Evryscope samples. This relation is likely related to the magnetic loop
emission, while -- for the more popular non-thermal electron heating model -- a
specific time evolution may be required to generate this relation. We notice
that flares produced by hotter stars have a flatter relation compared to that from cooler stars. This is related to the
statistical discrepancy in light-curve shape of flare events with different
colors. In spectra from LAMOST, we find that flare stars have apparently
stronger H alpha emission than inactive stars, especially at the low
temperature end, suggesting that chromospheric activity plays an important role
in producing flares. On the other hand, the subclass having frequent flares are
found to show H alpha emission of similar strength in their spectra to that
recorded with only a single flare but similar effective temperature, implying
that the chromospheric activity may not be the only trigger for eruptions.Comment: 17 pages, 15 figures, 2 tables, refereed version. For associated data
files, see https://cdsarc.cds.unistra.fr/viz-bin/cat/J/MNRAS/523/219
A spectral data release for 104 Type II Supernovae from the Tsinghua Supernova Group
We present 206 unpublished optical spectra of 104 type II supernovae obtained
by the Xinglong 2.16m telescope and Lijiang 2.4m telescope during the period
from 2011 to 2018, spanning the phases from about 1 to 200 days after the SN
explosion. The spectral line identifications, evolution of line velocities and
pseudo equivalent widths, as well as correlations between some important
spectral parameters are presented. Our sample displays a large range in
expansion velocities. For instance, the Fe~{\sc ii} velocities measured
from spectra at days after the explosion vary from ${\rm 2000\ km\
s^{-1}}{\rm 5500\ km\ s^{-1}}{\rm 3872 \pm
949\ km\ s^{-1}}\beta\alpha\beta\alpha$
(a/e). In our sample, two objects show possibly flash-ionized features at early
phases. Besides, we noticed that multiple high-velocity components may exist on
the blue side of hydrogen lines of SN 2013ab, possibly suggesting that these
features arise from complex line forming region. All our spectra can be found
in WISeREP and Zenodo
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Multidifferential study of identified charged hadron distributions in -tagged jets in proton-proton collisions at 13 TeV
Jet fragmentation functions are measured for the first time in proton-proton
collisions for charged pions, kaons, and protons within jets recoiling against
a boson. The charged-hadron distributions are studied longitudinally and
transversely to the jet direction for jets with transverse momentum 20 GeV and in the pseudorapidity range . The
data sample was collected with the LHCb experiment at a center-of-mass energy
of 13 TeV, corresponding to an integrated luminosity of 1.64 fb. Triple
differential distributions as a function of the hadron longitudinal momentum
fraction, hadron transverse momentum, and jet transverse momentum are also
measured for the first time. This helps constrain transverse-momentum-dependent
fragmentation functions. Differences in the shapes and magnitudes of the
measured distributions for the different hadron species provide insights into
the hadronization process for jets predominantly initiated by light quarks.Comment: All figures and tables, along with machine-readable versions and any
supplementary material and additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-013.html (LHCb
public pages
Study of the decay
The decay is studied
in proton-proton collisions at a center-of-mass energy of TeV
using data corresponding to an integrated luminosity of 5
collected by the LHCb experiment. In the system, the
state observed at the BaBar and Belle experiments is
resolved into two narrower states, and ,
whose masses and widths are measured to be where the first uncertainties are statistical and the second
systematic. The results are consistent with a previous LHCb measurement using a
prompt sample. Evidence of a new
state is found with a local significance of , whose mass and width
are measured to be and , respectively. In addition, evidence of a new decay mode
is found with a significance of
. The relative branching fraction of with respect to the
decay is measured to be , where the first
uncertainty is statistical, the second systematic and the third originates from
the branching fractions of charm hadron decays.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-028.html (LHCb
public pages
Measurement of the ratios of branching fractions and
The ratios of branching fractions
and are measured, assuming isospin symmetry, using a
sample of proton-proton collision data corresponding to 3.0 fb of
integrated luminosity recorded by the LHCb experiment during 2011 and 2012. The
tau lepton is identified in the decay mode
. The measured values are
and
, where the first uncertainty is
statistical and the second is systematic. The correlation between these
measurements is . Results are consistent with the current average
of these quantities and are at a combined 1.9 standard deviations from the
predictions based on lepton flavor universality in the Standard Model.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-039.html (LHCb
public pages
Measurement of forward charged hadron flow harmonics in peripheral PbPb collisions at âsNN = 5.02 TeV with the LHCb detector
Flow harmonic coefficients,
v
n
, which are the key to studying the hydrodynamics of the quark-gluon plasma (QGP) created in heavy-ion collisions, have been measured in various collision systems and kinematic regions and using various particle species. The study of flow harmonics in a wide pseudorapidity range is particularly valuable to understand the temperature dependence of the shear viscosity to entropy density ratio of the QGP. This paper presents the first LHCb results of the second- and the third-order flow harmonic coefficients of charged hadrons as a function of transverse momentum in the forward region, corresponding to pseudorapidities between 2.0 and 4.9, using the data collected from PbPb collisions in 2018 at a center-of-mass energy of 5.02
TeV
. The coefficients measured using the two-particle angular correlation analysis method are smaller than the central-pseudorapidity measurements at ALICE and ATLAS from the same collision system but share similar features
- âŠ