9 research outputs found
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Extending large language models to effectively handle long contexts requires
instruction fine-tuning on input sequences of similar length. To address this,
we present LongAlign -- a recipe of the instruction data, training, and
evaluation for long context alignment. First, we construct a long
instruction-following dataset using Self-Instruct. To ensure the data
diversity, it covers a broad range of tasks from various long context sources.
Second, we adopt the packing and sorted batching strategies to speed up
supervised fine-tuning on data with varied length distributions. Additionally,
we develop a loss weighting method to balance the contribution to the loss
across different sequences during packing training. Third, we introduce the
LongBench-Chat benchmark for evaluating instruction-following capabilities on
queries of 10k-100k in length. Experiments show that LongAlign outperforms
existing recipes for LLMs in long context tasks by up to 30\%, while also
maintaining their proficiency in handling short, generic tasks. The code, data,
and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign
Metric Multi-View Graph Clustering
Graph-based methods have hitherto been used to pursue the coherent patterns of data due to its ease of implementation and efficiency. These methods have been increasingly applied in multi-view learning and achieved promising performance in various clustering tasks. However, despite their noticeable empirical success, existing graph-based multi-view clustering methods may still suffer the suboptimal solution considering that multi-view data can be very complicated in raw feature space. Moreover, existing methods usually adopt the similarity metric by an ad hoc approach, which largely simplifies the relationship among real-world data and results in an inaccurate output. To address these issues, we propose to seamlessly integrates metric learning and graph learning for multi-view clustering. Specifically, we employ a useful metric to depict the inherent structure with linearity-aware of affinity graph representation learned based on the self-expressiveness property. Furthermore, instead of directly utilizing the raw features, we prefer to recover a smooth representation such that the geometric structure of the original data can be retained. We model the above concerns into a unified learning framework, and hence complements each learning subtask in a mutual reinforcement manner. The empirical studies corroborate our theoretical findings, and demonstrate that the proposed method is able to boost the multi-view clustering performance
Structural Characteristics and the Antioxidant and Hypoglycemic Activities of a Polysaccharide from Lonicera caerulea L. Pomace
In this study, a novel polysaccharide, LPP, was obtained from Lonicera caerulea L. pomace by ultrasonic-assisted heating and was purified by Sephadex G-100. The structural characteristics of LPP showed that the molecular weight (Mw) was 8.53 × 104 Da; that it was mainly composed of galacturonic acid, followed by galactose; that it possessed the characteristic functional groups of polysaccharides; and that it had an absence of O-glycosidic bonds and crystalline and triple helix structures. Furthermore, LPP exhibited a favorable thermodynamic stability and antioxidant, hypoglycemic, and hypolipidemic activities in a dose-dependent manner in vitro, demonstrating that LPP can be used as an agent to regulate glycolipid metabolism. Additionally, the relationship between its bio-activities is discussed in this paper. The results revealed that the RP, •OH, and NO2− radicals had synergistic promoting effects, and polysaccharides with a strong antioxidant ability may have excellent hypoglycemic and hypolipidemic effects. Collectively, these results suggest that LPP has a strong bio-activity, and that Lonicera caerulea L. pomace can be used as a potential polysaccharide source
Structural Characteristics and the Antioxidant and Hypoglycemic Activities of a Polysaccharide from <i>Lonicera caerulea</i> L. Pomace
In this study, a novel polysaccharide, LPP, was obtained from Lonicera caerulea L. pomace by ultrasonic-assisted heating and was purified by Sephadex G-100. The structural characteristics of LPP showed that the molecular weight (Mw) was 8.53 × 104 Da; that it was mainly composed of galacturonic acid, followed by galactose; that it possessed the characteristic functional groups of polysaccharides; and that it had an absence of O-glycosidic bonds and crystalline and triple helix structures. Furthermore, LPP exhibited a favorable thermodynamic stability and antioxidant, hypoglycemic, and hypolipidemic activities in a dose-dependent manner in vitro, demonstrating that LPP can be used as an agent to regulate glycolipid metabolism. Additionally, the relationship between its bio-activities is discussed in this paper. The results revealed that the RP, •OH, and NO2− radicals had synergistic promoting effects, and polysaccharides with a strong antioxidant ability may have excellent hypoglycemic and hypolipidemic effects. Collectively, these results suggest that LPP has a strong bio-activity, and that Lonicera caerulea L. pomace can be used as a potential polysaccharide source
Combined oral low-dose cyclophosphamide endocrine therapy may improve clinical response among patients with metastatic breast cancer via Tregs in TLSs
Abstract Despite limited research on refractory and/or endocrine therapy failure in elderly metastatic breast cancer (MBC) patients, a prior study showed that low-dose oral cyclophosphamide (CY) can improve the overall survival rate of MBC patients, possibly through the immunoregulation of regulatory T cells (Tregs). We preliminarily investigated the combination of endocrine therapy (ET) with oral low-dose CY as salvage therapy in elderly patients via peripheral blood regulatory T-cell analyses. In addition, we evaluated the associations of tumor tertiary lymphoid structures (TLSs) with therapeutic outcomes. HR+/HER2− advanced breast cancer patients who received low-dose CY combined with ET or ET only from April 2015 to August 2021 were enrolled in this retrospective study. The primary outcome was the clinical control rate (CCR), and the secondary outcome was progression-free survival (PFS). Circulating T lymphocyte subpopulations represented by Tregs were monitored during treatment by flow cytometry methods. TLSs wereconfirmed by hematoxylin–eosin staining of pretreatment specimens, and CD3, CD4, and Foxp3 were detected using Opal multicolor immunofluorescence. A total of 85 patients who received CY + ET and 50 patients who received ET only were enrolled, the percentage of patients who received CCR was 73% (62/85) vs. 70% (45/50), and the objective response rate (ORR) was 28% (24/85) vs. 24% (12/50). No deaths occurred during the study period. The mean PFS time was 13 vs. 11 months (P = 0.03). In the CY + ET group, decreases in CD4+/CD25+/Foxp3+ T cells (P < 0.001) were favorable for both clinical control and prolonged PFS (P < 0.001). Compared with patients without TLSs, those with TLSs were more likely to have better clinical control and PFS (mean time = 6 months), and a greater number of Treg cells during TLS pretreatment correlated with longer PFS (P = 0.043). Oral low-dose CY combined with standard ET exerts immunological effects by decreasing Treg levels to achieve improved clinical responses. Moreover, patients with TLSs might benefit more from such therapy than those without TLSs, and a high Treg cell count in TLSs before treatment predicts better therapeutic efficacy
Determination of Genetic Effects of <i>LIPK</i> and <i>LIPJ</i> Genes on Milk Fatty Acids in Dairy Cattle
In our previous genome-wide association study (GWAS) on milk fatty acids (FAs) in Chinese Holstein, we discovered 83 genome-wide significant single nucleotide polymorphisms (SNPs) associated with milk FAs. Two of them were close to lipase family member K (LIPK) and lipase family member J (LIPJ), respectively. Hence, this study is a follow-up to verify whether the LIPK and LIPJ have significant genetic effects on milk FAs in dairy cattle. By re-sequencing the entire exons, and 3 kb of 5′ and 3′ flanking regions, two and seven SNPs were identified in LIPK and LIPJ, respectively, including a novel SNP, ss158213049726. With the Haploview 4.1 software, we found that five of the SNPs in LIPJ formed a haplotype block (D′ = 0.96 ~ 1.00). Single-locus association analyses revealed that each SNP in LIPK and LIPJ was significantly associated with at least one milk FA (p = < 1.00 × 10−4 ~ 4.88 × 10−2), and the haplotype-based association analyses showed significant genetic effects on nine milk FAs (p = < 1.00 × 10−4 ~ 3.98 × 10−2). Out of these SNPs, the missense mutation in LIPK gene, rs42774527, could change the protein secondary structure and function predicted by SOPMA, SIFT, and PROVEAN softwares. With the Genomatix software, we predicted that two SNPs, rs110322221 in LIPK and rs211373799 in LIPJ, altered the transcription factors binding sites (TFBSs), indicating their potential regulation on promoter activity of the genes. Furthermore, we found that both LIPK and LIPJ had relatively high expressions in the mammary gland. In conclusion, our research is the first to demonstrate that LIPK and LIPJ genes have significant associations with milk FAs, and the identified SNPs might be served as genetic markers to optimize breeding programs for milk FAs in dairy cattle. This research deserves in-depth verification