53 research outputs found
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Large Language Models (LLMs) exhibit impressive reasoning and data
augmentation capabilities in various NLP tasks. However, what about small
models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant
fundamentals, chain of thought, and common mistakes for most NLP samples, which
makes annotation more than just an answer, thus allowing other models to learn
"why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot
score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even
more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we
augmented 58 NLP datasets and taught various student models with different
parameters from OPT and BLOOM series in a multi-task setting. The experimental
results indicate that the data augmentation provided by TeacherLM has brought
significant benefits. We will release the TeacherLM series of models and
augmented datasets as open-source.Comment: 5 figures, 15 page
GLM-130B: An Open Bilingual Pre-trained Model
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language
model with 130 billion parameters. It is an attempt to open-source a 100B-scale
model at least as good as GPT-3 (davinci) and unveil how models of such a scale
can be successfully pre-trained. Over the course of this effort, we face
numerous unexpected technical and engineering challenges, particularly on loss
spikes and divergence. In this paper, we introduce the training process of
GLM-130B including its design choices, training strategies for both efficiency
and stability, and engineering efforts. The resultant GLM-130B model offers
significant outperformance over GPT-3 175B (davinci) on a wide range of popular
English benchmarks while the performance advantage is not observed in OPT-175B
and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN
3.0 260B -- the largest Chinese language model -- across related benchmarks.
Finally, we leverage a unique scaling property of GLM-130B to reach INT4
quantization without post training, with almost no performance loss, making it
the first among 100B-scale models and more importantly, allowing its effective
inference on 4RTX 3090 (24G) or 8RTX 2080 Ti (11G) GPUs, the
most affordable GPUs required for using 100B-scale models. The GLM-130B model
weights are publicly accessible and its code, training logs, related toolkit,
and lessons learned are open-sourced at
\url{https://github.com/THUDM/GLM-130B/}.Comment: Accepted to ICLR 202
AgentBench: Evaluating LLMs as Agents
Large Language Models (LLMs) are becoming increasingly smart and autonomous,
targeting real-world pragmatic missions beyond traditional NLP tasks. As a
result, there has been an urgent need to evaluate LLMs as agents on challenging
tasks in interactive environments. We present AgentBench, a multi-dimensional
evolving benchmark that currently consists of 8 distinct environments to assess
LLM-as-Agent's reasoning and decision-making abilities in a multi-turn
open-ended generation setting. Our extensive test over 27 API-based and
open-sourced (OSS) LLMs shows that, while top commercial LLMs present a strong
ability of acting as agents in complex environments, there is a significant
disparity in performance between them and OSS competitors. We identify the
typical reasons of failures in environments and LLMs, showing that poor
long-term reasoning, decision-making, and instruction following abilities are
the main obstacles for developing usable LLM agents. Training on code and high
quality multi-turn alignment data could improve agent performance. Datasets,
environments, and an integrated evaluation package for AgentBench are released
at \url{https://github.com/THUDM/AgentBench}.Comment: 55 page
Multi-messenger observations of a binary neutron star merger
On 2017 August 17 a binary neutron star coalescence candidate (later designated GW170817) with merger time 12:41:04 UTC was observed through gravitational waves by the Advanced LIGO and Advanced Virgo detectors. The Fermi Gamma-ray Burst Monitor independently detected a gamma-ray burst (GRB 170817A) with a time delay of ~1.7 s with respect to the merger time. From the gravitational-wave signal, the source was initially localized to a sky region of 31 deg2 at a luminosity distance of 40+8-8 Mpc and with component masses consistent with neutron stars. The component masses were later measured to be in the range 0.86 to 2.26 Mo. An extensive observing campaign was launched across the electromagnetic spectrum leading to the discovery of a bright optical transient (SSS17a, now with the IAU identification of AT 2017gfo) in NGC 4993 (at ~40 Mpc) less than 11 hours after the merger by the One- Meter, Two Hemisphere (1M2H) team using the 1 m Swope Telescope. The optical transient was independently detected by multiple teams within an hour. Subsequent observations targeted the object and its environment. Early ultraviolet observations revealed a blue transient that faded within 48 hours. Optical and infrared observations showed a redward evolution over ~10 days. Following early non-detections, X-ray and radio emission were discovered at the transientâs position ~9 and ~16 days, respectively, after the merger. Both the X-ray and radio emission likely arise from a physical process that is distinct from the one that generates the UV/optical/near-infrared emission. No ultra-high-energy gamma-rays and no neutrino candidates consistent with the source were found in follow-up searches. These observations support the hypothesis that GW170817 was produced by the merger of two neutron stars in NGC4993 followed by a short gamma-ray burst (GRB 170817A) and a kilonova/macronova powered by the radioactive decay of r-process nuclei synthesized in the ejecta
Multi-messenger Observations of a Binary Neutron Star Merger
On 2017 August 17 a binary neutron star coalescence candidate (later
designated GW170817) with merger time 12:41:04 UTC was observed through
gravitational waves by the Advanced LIGO and Advanced Virgo detectors.
The Fermi Gamma-ray Burst Monitor independently detected a gamma-ray
burst (GRB 170817A) with a time delay of ⌠1.7 {{s}} with respect to
the merger time. From the gravitational-wave signal, the source was
initially localized to a sky region of 31 deg2 at a
luminosity distance of {40}-8+8 Mpc and with
component masses consistent with neutron stars. The component masses
were later measured to be in the range 0.86 to 2.26 {M}ÈŻ
. An extensive observing campaign was launched across the
electromagnetic spectrum leading to the discovery of a bright optical
transient (SSS17a, now with the IAU identification of AT 2017gfo) in NGC
4993 (at ⌠40 {{Mpc}}) less than 11 hours after the merger by the
One-Meter, Two Hemisphere (1M2H) team using the 1 m Swope Telescope. The
optical transient was independently detected by multiple teams within an
hour. Subsequent observations targeted the object and its environment.
Early ultraviolet observations revealed a blue transient that faded
within 48 hours. Optical and infrared observations showed a redward
evolution over âŒ10 days. Following early non-detections, X-ray and
radio emission were discovered at the transientâs position ⌠9
and ⌠16 days, respectively, after the merger. Both the X-ray and
radio emission likely arise from a physical process that is distinct
from the one that generates the UV/optical/near-infrared emission. No
ultra-high-energy gamma-rays and no neutrino candidates consistent with
the source were found in follow-up searches. These observations support
the hypothesis that GW170817 was produced by the merger of two neutron
stars in NGC 4993 followed by a short gamma-ray burst (GRB 170817A) and
a kilonova/macronova powered by the radioactive decay of r-process
nuclei synthesized in the ejecta.</p
Repression of G1/S Transition by Transient Inhibition of miR-10404 Expression in Drosophila Primordial Germ Cells
DataSheet_1_N6-methyladenosine regulators-related immune genes enable predict graft loss and discriminate T-cell mediate rejection in kidney transplantation biopsies for cause.zip
ObjectiveThe role of m6A modification in kidney transplant-associated immunity, especially in alloimmunity, still remains unknown. This study aims to explore the potential value of m6A-related immune genes in predicting graft loss and diagnosing T cell mediated rejection (TCMR), as well as the possible role they play in renal graft dysfunction.MethodsRenal transplant-related cohorts and transcript expression data were obtained from the GEO database. First, we conducted correlation analysis in the discovery cohort to identify the m6A-related immune genes. Then, lasso regression and random forest were used respectively to build prediction models in the prognosis and diagnosis cohort, to predict graft loss and discriminate TCMR in dysfunctional renal grafts. Connectivity map (CMap) analysis was applied to identify potential therapeutic compounds for TCMR.ResultsThe prognostic prediction model effectively predicts the prognosis and survival of renal grafts with clinical indications (PConclusionsTogether, our findings explore the value of m6A-related immune genes in predicting the prognosis of renal grafts and diagnosis of TCMR.</p
Enhanced activation of mechanistic target of rapamycin complex 1 signaling in eruptive xanthomas
- âŠ