Search CORE

110 research outputs found

Augmenting Large Language Model Translators via Translation Memories

Author: Cao Zhiquan
Fan Yuchun
Li Bei
Li Yinqiao
Mu Yongyu
Reheman Abudurexiti
Xiao Tong
Zhang Chunliang
Zhu Jingbo
Publication venue
Publication date: 27/05/2023
Field of study

Using translation memories (TMs) as prompts is a promising approach to in-context learning of machine translation models. In this work, we take a step towards prompting large language models (LLMs) with TMs and making them better translators. We find that the ability of LLMs to ``understand'' prompts is indeed helpful for making better use of TMs. Experiments show that the results of a pre-trained LLM translator can be greatly improved by using high-quality TM-based prompts. These results are even comparable to those of the state-of-the-art NMT systems which have access to large-scale in-domain bilingual data and are well tuned on the downstream tasks.Comment: Accepted to Findings of ACL 202

arXiv.org e-Print Archive

Cross-layer Attention Sharing for Large Language Models

Author: Fan Yuchun
He Qiaozhi
Li Hengyu
Mu Yongyu
Wang Chenglong
Wu Yuzhang
Xiao Tong
Yang Murun
Zhu Jingbo
Publication venue
Publication date: 03/08/2024
Field of study

As large language models (LLMs) evolve, the increase in model depth and parameter number leads to substantial redundancy. To enhance the efficiency of the attention mechanism, previous works primarily compress the KV cache or group attention heads, while largely overlooking redundancy between layers. Our comprehensive analyses across various LLMs show that highly similar attention patterns persist within most layers. It's intuitive to save the computation by sharing attention weights across layers. However, further analysis reveals two challenges: (1) Directly sharing the weight matrix without carefully rearranging the attention heads proves to be ineffective; (2) Shallow layers are vulnerable to small deviations in attention weights. Driven by these insights, we introduce LiSA, a lightweight substitute for self-attention in well-trained LLMs. LiSA employs tiny feed-forward networks to align attention heads between adjacent layers and low-rank matrices to approximate differences in layer-wise attention weights. Evaluations encompassing 13 typical benchmarks demonstrate that LiSA maintains high response quality in terms of accuracy and perplexity while reducing redundant attention calculations within 53-84% of the total layers. Our implementations of LiSA achieve a 6X compression of Q and K, with maximum throughput improvements of 19.5% for LLaMA3-8B and 32.3% for LLaMA2-7B.Comment: Working in proces

arXiv.org e-Print Archive

Large Language Models are Parallel Multilingual Learners

Author: Cao Zhiquan
Feng Peinan
Li Bei
Liu Tongran
Mu Yongyu
Song Kai
Wang Chenglong
Wu Yuzhang
Xiao Tong
Zhang Chunliang
Zhu Jingbo
Publication venue
Publication date: 13/03/2024
Field of study

In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities. To test this capability, we design extensive experiments encompassing 8 typical datasets, 7 languages and 8 state-of-the-art multilingual LLMs. Experimental results show that (1) incorporating more languages help PiM surpass the conventional ICL further; (2) even combining with the translations that are inferior to baseline performance can also help. Moreover, by examining the activated neurons in LLMs, we discover a counterintuitive but interesting phenomenon. Contrary to the common thought that PiM would activate more neurons than monolingual input to leverage knowledge learned from diverse languages, PiM actually inhibits neurons and promotes more precise neuron activation especially when more languages are added. This phenomenon aligns with the neuroscience insight about synaptic pruning, which removes less used neural connections, strengthens remainders, and then enhances brain intelligence.Comment: Working in proces

arXiv.org e-Print Archive

The Applications of Finite Element Analysis in Proximal Humeral Fractures

Author: Daping Wang
Jiaming Cui
Kang Chen
Wei You
Weimin Zhu
Yongyu Ye
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Proximal humeral fractures are common and most challenging, due to the complexity of the glenohumeral joint, especially in the geriatric population with impacted fractures, that the development of implants continues because currently the problems with their fixation are not solved. Pre-, intra-, and postoperative assessments are crucial in management of those patients. Finite element analysis, as one of the valuable tools, has been implemented as an effective and noninvasive method to analyze proximal humeral fractures, providing solid evidence for management of troublesome patients. However, no review article about the applications and effects of finite element analysis in assessing proximal humeral fractures has been reported yet. This review article summarized the applications, contribution, and clinical significance of finite element analysis in assessing proximal humeral fractures. Furthermore, the limitations of finite element analysis, the difficulties of more realistic simulation, and the validation and also the creation of validated FE models were discussed. We concluded that although some advancements in proximal humeral fractures researches have been made by using finite element analysis, utility of this powerful tool for routine clinical management and adequate simulation requires more state-of-the-art studies to provide evidence and bases

Crossref

Directory of Open Access Journals

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

Author: Huang Pengcheng
Li Bei
Mu Yongyu
Wu Yuzhang
Xiao Chunyang
Xiao Tong
Zhu Jingbo
Publication venue
Publication date: 18/07/2024
Field of study

Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and, in some cases, ignore the given constraints. This is in part because LLMs might be overly confident in their predictions, overriding the influence of the constraints. To overcome this overiding behaviour, we propose to add a revision process that encourages LLMs to correct the outputs by prompting them about the constraints that have not yet been met. We evaluate our approach on four constrained translation tasks, encompassing both lexical and structural constraints in multiple constraint domains. Experiments show 15\% improvement in constraint-based translation accuracy over standard LLMs and the approach also significantly outperforms neural machine translation (NMT) state-of-the-art methods.16 page

arXiv.org e-Print Archive

Hybrid Alignment Training for Large Language Models

Author: Chang Kaiyan
Li Bei
Liu Tongran
Mu Yongyu
Wang Chenglong
Xiao Tong
Zhou Hang
Zhu Jingbo
Publication venue
Publication date: 21/06/2024
Field of study

Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guarantee to simultaneously align with the instructions and human preferences well. To response to these, in this work, we propose a Hybrid Alignment Training (Hbat) approach, based on alternating alignment and modified elastic weight consolidation methods. The basic idea is to alternate between different objectives during alignment training, so that better collaboration can be achieved between the two alignment tasks.We experiment with Hbat on summarization and dialogue tasks. Experimental results show that the proposed \textsc{Hbat} can significantly outperform all baselines. Notably, Hbat yields consistent performance gains over the traditional two-stage alignment training when using both proximal policy optimization and direct preference optimization.accepted by ACL (Findings) 202

arXiv.org e-Print Archive

Values of lymphocyte-related ratios in predicting the clinical outcome of acute ischemic stroke patients receiving intravenous thrombolysis based on different etiologies

Author: Chunyan Lei
Keyang Chen
Linhu Zhao
Lu Wang
Qionghua Deng
Xiaoyan Zhu
Yongyu Li
Yu Gu
Publication venue: Frontiers Media S.A.
Publication date: 01/05/2025
Field of study

BackgroundWhile neutrophil-to-lymphocyte ratio (NLR), lymphocyte-to-monocyte ratio (LMR), and platelet-to-lymphocyte ratio (PLR) have been associated with acute ischemic stroke (AIS) outcomes, their differential predictive value across etiological subtypes (TOAST classification) in thrombolysis-treated patients remains underexplored.MethodsIn this retrospective cohort study, we analyzed 381 AIS patients receiving intravenous thrombolysis. Hematological indices were calculated from pre-thrombolysis. Using multivariable logistic regression adjusted for age, NIHSS, and comorbidities, we assessed associations between baseline ratios and 90-day unfavorable outcomes (mRS 3–6). Receiver operating characteristic (ROC) analysis was used to determine optimal cutoffs stratified by TOAST subtypes.ResultsA total of 381 patients were included in the study. NLR showed superior predictive performance: large-artery atherosclerosis: AUC = 0.702 (aOR = 1.35, 95%CI = 1.14–1.61, p = 0.001), small-artery occlusion: AUC = 0.750 (aOR = 1.51, 95%CI = 1.08–2.10, p = 0.015), cardioembolic stroke: AUC = 0.679 (aOR = 1.82, 95%CI = 1.07–3.10, p = 0.028). LMR showed predictive value only in large-artery atherosclerosis (AUC = 0.632, p = 0.004). Optimal NLR cutoffs: 3.19 (large-artery), 3.94 (small-artery), 3.17 (cardioembolic stroke).ConclusionNLR emerged as a robust, subtype-specific predictor of post-thrombolysis outcomes, particularly in atherosclerotic stroke variants. These findings supported NLR’s clinical utility for risk stratification in thrombolysis-eligible AIS patients

Directory of Open Access Journals

N

Author: Feng Zhang
Guoguang Chen
Lili Ren
North
Rosenblum
Sheldrick
Spek
Wang
Yongyu Lu
Yu Zhou
Zhu
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date
Field of study

Crossref

RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Author: Du Quan
Gan Yang
He Qiaozhi
Huo Yifu
Liu Tongran
Mu Yongyu
Wang Chenglong
Xiao Tong
Yang Di
Yang Murun
Zhang Chunliang
Zhu Jingbo
Publication venue
Publication date: 21/08/2024
Field of study

Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a three-phase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA-1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization

arXiv.org e-Print Archive

Human tumor necrosis factor (TNF)-alpha-induced protein 8-like 2 suppresses hepatocellular carcinoma metastasis through inhibiting Rac1

Author: Chun Guo
Faliang Zhu
Jianing Wang
Li Zhang
Lining Zhang
Qun wang
Shen Dai
Xiaoyan Wang
Xuelei Cao
Yongyu Shi
Youhai H Chen
Yue Sun
Publication venue: Springer Science and Business Media LLC
Publication date: 26/11/2013
Field of study

Abstract Background Tumor invasion and metastasis are the major reasons for leading death of patients with hepatocellular carcinoma (HCC). Therefore, to identify molecules that can suppress invasion and metastasis of tumor will provide novel targets for HCC therapies. Tumor necrosis factor (TNF)-alpha-induced protein 8-like 2, TIPE2, is a novel immune negative molecule and an inhibitor of the oncogenic Ras in mice but its function in human is unclear. Our previous research has shown that TIPE2 is downregulated in human primary HCC compared with the paired adjacent non-tumor tissues. Results In present study, we provide evidence that TIPE2 inhibits effectively human hepatocellular carcinoma metastasis. The forced expression of TIPE2 in HCC-derived cell lines markedly inhibits tumor cell growth, migration and invasion in vitro and suppresses growth and metastasis of HCC in vivo. Clinical information from a cohort of 112 patients reveals that loss or reduced expression of TIPE2 in primary HCC tissues is significantly associated with tumor metastasis. Mechanically, TIPE2 inhibits the migration and invasion through targeting Rac1 and then reduces F-actin polymerization and expression of matrix metallopeptidase 9 (MMP9) and urokinase plasminogen activator (uPA). Conclusion Our results indicate that human TIPE2 is endogenous inhibitor of Rac1 in HCC by which it attenuates invasion and metastasis of HCC. The data suggest that TIPE2 will be a new target for HCC therapy. </jats:sec

Crossref

Springer - Publisher Connector

PubMed Central