173 research outputs found
LawBench: Benchmarking Legal Knowledge of Large Language Models
Large language models (LLMs) have demonstrated strong capabilities in various
aspects. However, when applying them to the highly specialized, safe-critical
legal domain, it is unclear how much legal knowledge they possess and whether
they can reliably perform legal-related tasks. To address this gap, we propose
a comprehensive evaluation benchmark LawBench. LawBench has been meticulously
crafted to have precise assessment of the LLMs' legal capabilities from three
cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize
needed legal concepts, articles and facts; (2) Legal knowledge understanding:
whether LLMs can comprehend entities, events and relationships within legal
text; (3) Legal knowledge applying: whether LLMs can properly utilize their
legal knowledge and make necessary reasoning steps to solve realistic legal
tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label
classification (SLC), multi-label classification (MLC), regression, extraction
and generation. We perform extensive evaluations of 51 LLMs on LawBench,
including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific
LLMs. The results show that GPT-4 remains the best-performing LLM in the legal
domain, surpassing the others by a significant margin. While fine-tuning LLMs
on legal specific text brings certain improvements, we are still a long way
from obtaining usable and reliable LLMs in legal tasks. All data, model
predictions and evaluation code are released in
https://github.com/open-compass/LawBench/. We hope this benchmark provides
in-depth understanding of the LLMs' domain-specified capabilities and speed up
the development of LLMs in the legal domain
GNL3L stabilizes the TRF1 complex and promotes mitotic transition
Telomeric repeat binding factor 1 (TRF1) is a component of the multiprotein complex “shelterin,” which organizes the telomere into a high-order structure. TRF1 knockout embryos suffer from severe growth defects without apparent telomere dysfunction, suggesting an obligatory role for TRF1 in cell cycle control. To date, the mechanism regulating the mitotic increase in TRF1 protein expression and its function in mitosis remains unclear. Here, we identify guanine nucleotide-binding protein-like 3 (GNL3L), a GTP-binding protein most similar to nucleostemin, as a novel TRF1-interacting protein in vivo. GNL3L binds TRF1 in the nucleoplasm and is capable of promoting the homodimerization and telomeric association of TRF1, preventing promyelocytic leukemia body recruitment of telomere-bound TRF1, and stabilizing TRF1 protein by inhibiting its ubiquitylation and binding to FBX4, an E3 ubiquitin ligase for TRF1. Most importantly, the TRF1 protein-stabilizing activity of GNL3L mediates the mitotic increase of TRF1 protein and promotes the metaphase-to-anaphase transition. This work reveals novel aspects of TRF1 modulation by GNL3L
Potential of UAV-Based Active Sensing for Monitoring Rice Leaf Nitrogen Status
Unmanned aerial vehicle (UAV) based active canopy sensors can serve as a promising sensing solution for the estimation of crop nitrogen (N) status with great applicability and flexibility. This study was endeavored to determine the feasibility of UAV-based active sensing to monitor the leaf N status of rice (Oryza sativa L.) and to examine the transferability of handheld-based predictive models to UAV-based active sensing. In this 3-year multi-locational study, varied N-rates (0–405 kg N ha−1) field experiments were conducted using five rice varieties. Plant samples and sensing data were collected at critical growth stages for growth analysis and monitoring. The portable active canopy sensor RapidSCAN CS-45 with red, red edge, and near infrared wavebands was used in handheld mode and aerial mode on a gimbal under a multi-rotor UAV. The results showed the great potential of UAV-based active sensing for monitoring rice leaf N status. The vegetation index-based regression models were built and evaluated based on Akaike information criterion and independent validation to predict rice leaf dry matter, leaf area index, and leaf N accumulation. Vegetation indices composed of near-infrared and red edge bands (NDRE or RERVI) acquired at a 1.5 m aviation height had a good performance for the practical application. Future studies are needed on the proper operation mode and means for precision N management with this system
Comparison of Peptide Array Substrate Phosphorylation of c-Raf and Mitogen Activated Protein Kinase Kinase Kinase 8
Kinases are pivotal regulators of cellular physiology. The human genome contains more than 500 putative kinases, which exert their action via the phosphorylation of specific substrates. The determinants of this specificity are still only partly understood and as a consequence it is difficult to predict kinase substrate preferences from the primary structure, hampering the understanding of kinase function in physiology and prompting the development of technologies that allow easy assessment of kinase substrate consensus sequences. Hence, we decided to explore the usefulness of phosphorylation of peptide arrays comprising of 1176 different peptide substrates with recombinant kinases for determining kinase substrate preferences, based on the contribution of individual amino acids to total array phosphorylation. Employing this technology, we were able to determine the consensus peptide sequences for substrates of both c-Raf and Mitogen Activated Protein Kinase Kinase Kinase 8, two highly homologous kinases with distinct signalling roles in cellular physiology. The results show that although consensus sequences for these two kinases identified through our analysis share important chemical similarities, there is still some sequence specificity that could explain the different biological action of the two enzymes. Thus peptide arrays are a useful instrument for deducing substrate consensus sequences and highly homologous kinases can differ in their requirement for phosphorylation events
Synthekines are surrogate cytokine and growth factor agonists that compel signaling through non-natural receptor dimers
Secrets of RLHF in Large Language Models Part I: PPO
Large language models (LLMs) have formulated a blueprint for the advancement
of artificial general intelligence. Its primary objective is to function as a
human-centric (helpful, honest, and harmless) assistant. Alignment with humans
assumes paramount significance, and reinforcement learning with human feedback
(RLHF) emerges as the pivotal technological paradigm underpinning this pursuit.
Current technical routes usually include \textbf{reward models} to measure
human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize
policy model outputs, and \textbf{process supervision} to improve step-by-step
reasoning capabilities. However, due to the challenges of reward design,
environment interaction, and agent training, coupled with huge trial and error
cost of large language models, there is a significant barrier for AI
researchers to motivate the development of technical alignment and safe landing
of LLMs. The stable training of RLHF has still been a puzzle. In the first
report, we dissect the framework of RLHF, re-evaluate the inner workings of
PPO, and explore how the parts comprising PPO algorithms impact policy agent
training. We identify policy constraints being the key factor for the effective
implementation of the PPO algorithm. Therefore, we explore the PPO-max, an
advanced version of PPO algorithm, to efficiently improve the training
stability of the policy model. Based on our main results, we perform a
comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT.
The absence of open-source implementations has posed significant challenges to
the investigation of LLMs alignment. Therefore, we are eager to release
technical reports, reward models and PPO code
The LDL Receptor Clustering Motif Interacts with the Clathrin Terminal Domain in a Reverse Turn Conformation
Previously the hexapeptide motif FXNPXY807 in the cytoplasmic tail of the LDL receptor was shown to be essential for clustering in clathrin-coated pits. We used nuclear magnetic resonance line-broadening and transferred nuclear Overhauser effect measurements to identify the molecule in the clathrin lattice that interacts with this hexapeptide, and determined the structure of the bound motif. The wild-type peptide bound in a single conformation with a reverse turn at residues NPVY. Tyr807Ser, a peptide that harbors a mutation that disrupts receptor clustering, displayed markedly reduced interactions. Clustering motif peptides interacted with clathrin cages assembled in the presence or absence of AP2, with recombinant clathrin terminal domains, but not with clathrin hubs. The identification of terminal domains as the primary site of interaction for FXNPXY807 suggests that adaptor molecules are not required for receptor-mediated endocytosis of LDL, and that at least two different tyrosine-based internalization motifs exist for clustering receptors in coated pits
Nucleostemin prevents telomere damage by promoting PML-IV recruitment to SUMOylated TRF1
Continuously dividing cells must be protected from telomeric and nontelomeric DNA damage in order to maintain their proliferative potential. Here, we report a novel telomere-protecting mechanism regulated by nucleostemin (NS). NS depletion increased the number of telomere damage foci in both telomerase-active (TA(+)) and alternative lengthening of telomere (ALT) cells and decreased the percentage of damaged telomeres associated with ALT-associated PML bodies (APB) and the number of APB in ALT cells. Mechanistically, NS could promote the recruitment of PML-IV to SUMOylated TRF1 in TA(+) and ALT cells. This event was stimulated by DNA damage. Supporting the importance of NS and PML-IV in telomere protection, we demonstrate that loss of NS or PML-IV increased the frequency of telomere damage and aberration, reduced telomeric length, and perturbed the TRF2(ΔBΔM)-induced telomeric recruitment of RAD51. Conversely, overexpression of either NS or PML-IV protected ALT and TA(+) cells from telomere damage. This work reveals a novel mechanism in telomere protection
CKI isoforms α and ε regulate Star–PAP target messages by controlling Star–PAP poly(A) polymerase activity and phosphoinositide stimulation
Star–PAP is a non-canonical, nuclear poly(A) polymerase (PAP) that is regulated by the lipid signaling molecule phosphatidylinositol 4,5 bisphosphate (PI4,5P2), and is required for the expression of a select set of mRNAs. It was previously reported that a PI4,5P2 sensitive CKI isoform, CKIα associates with and phosphorylates Star–PAP in its catalytic domain. Here, we show that the oxidative stress-induced by tBHQ treatment stimulates the CKI mediated phosphorylation of Star–PAP, which is critical for both its polyadenylation activity and stimulation by PI4,5P2. CKI activity was required for the expression and efficient 3′-end processing of its target mRNAs in vivo as well as the polyadenylation activity of Star–PAP in vitro. Specific CKI activity inhibitors (IC261 and CKI7) block in vivo Star–PAP activity, but the knockdown of CKIα did not equivalently inhibit the expression of Star–PAP targets. We show that in addition to CKIα, Star–PAP associates with another CKI isoform, CKIε in the Star–PAP complex that phosphorylates Star–PAP and complements the loss of CKIα. Knockdown of both CKI isoforms (α and ε) resulted in the loss of expression and the 3′-end processing of Star–PAP targets similar to the CKI activity inhibitors. Our results demonstrate that CKI isoforms α and ε modulate Star–PAP activity and regulates Star–PAP target messages
Deciphering the Arginine-Binding Preferences at the Substrate-Binding Groove of Ser/Thr Kinases by Computational Surface Mapping
Protein kinases are key signaling enzymes that catalyze the transfer of γ-phosphate from an ATP molecule to a phospho-accepting residue in the substrate. Unraveling the molecular features that govern the preference of kinases for particular residues flanking the phosphoacceptor is important for understanding kinase specificities toward their substrates and for designing substrate-like peptidic inhibitors. We applied ANCHORSmap, a new fragment-based computational approach for mapping amino acid side chains on protein surfaces, to predict and characterize the preference of kinases toward Arginine binding. We focus on positions P−2 and P−5, commonly occupied by Arginine (Arg) in substrates of basophilic Ser/Thr kinases. The method accurately identified all the P−2/P−5 Arg binding sites previously determined by X-ray crystallography and produced Arg preferences that corresponded to those experimentally found by peptide arrays. The predicted Arg-binding positions and their associated pockets were analyzed in terms of shape, physicochemical properties, amino acid composition, and in-silico mutagenesis, providing structural rationalization for previously unexplained trends in kinase preferences toward Arg moieties. This methodology sheds light on several kinases that were described in the literature as having non-trivial preferences for Arg, and provides some surprising departures from the prevailing views regarding residues that determine kinase specificity toward Arg. In particular, we found that the preference for a P−5 Arg is not necessarily governed by the 170/230 acidic pair, as was previously assumed, but by several different pairs of acidic residues, selected from positions 133, 169, and 230 (PKA numbering). The acidic residue at position 230 serves as a pivotal element in recognizing Arg from both the P−2 and P−5 positions
- …