110 research outputs found
Augmenting Large Language Model Translators via Translation Memories
Using translation memories (TMs) as prompts is a promising approach to
in-context learning of machine translation models. In this work, we take a step
towards prompting large language models (LLMs) with TMs and making them better
translators. We find that the ability of LLMs to ``understand'' prompts is
indeed helpful for making better use of TMs. Experiments show that the results
of a pre-trained LLM translator can be greatly improved by using high-quality
TM-based prompts. These results are even comparable to those of the
state-of-the-art NMT systems which have access to large-scale in-domain
bilingual data and are well tuned on the downstream tasks.Comment: Accepted to Findings of ACL 202
Cross-layer Attention Sharing for Large Language Models
As large language models (LLMs) evolve, the increase in model depth and
parameter number leads to substantial redundancy. To enhance the efficiency of
the attention mechanism, previous works primarily compress the KV cache or
group attention heads, while largely overlooking redundancy between layers. Our
comprehensive analyses across various LLMs show that highly similar attention
patterns persist within most layers. It's intuitive to save the computation by
sharing attention weights across layers. However, further analysis reveals two
challenges: (1) Directly sharing the weight matrix without carefully
rearranging the attention heads proves to be ineffective; (2) Shallow layers
are vulnerable to small deviations in attention weights. Driven by these
insights, we introduce LiSA, a lightweight substitute for self-attention in
well-trained LLMs. LiSA employs tiny feed-forward networks to align attention
heads between adjacent layers and low-rank matrices to approximate differences
in layer-wise attention weights. Evaluations encompassing 13 typical benchmarks
demonstrate that LiSA maintains high response quality in terms of accuracy and
perplexity while reducing redundant attention calculations within 53-84% of the
total layers. Our implementations of LiSA achieve a 6X compression of Q and K,
with maximum throughput improvements of 19.5% for LLaMA3-8B and 32.3% for
LLaMA2-7B.Comment: Working in proces
Large Language Models are Parallel Multilingual Learners
In this study, we reveal an in-context learning (ICL) capability of
multilingual large language models (LLMs): by translating the input to several
languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which
significantly enhances their comprehension abilities. To test this capability,
we design extensive experiments encompassing 8 typical datasets, 7 languages
and 8 state-of-the-art multilingual LLMs. Experimental results show that (1)
incorporating more languages help PiM surpass the conventional ICL further; (2)
even combining with the translations that are inferior to baseline performance
can also help. Moreover, by examining the activated neurons in LLMs, we
discover a counterintuitive but interesting phenomenon. Contrary to the common
thought that PiM would activate more neurons than monolingual input to leverage
knowledge learned from diverse languages, PiM actually inhibits neurons and
promotes more precise neuron activation especially when more languages are
added. This phenomenon aligns with the neuroscience insight about synaptic
pruning, which removes less used neural connections, strengthens remainders,
and then enhances brain intelligence.Comment: Working in proces
The Applications of Finite Element Analysis in Proximal Humeral Fractures
Proximal humeral fractures are common and most challenging, due to the complexity of the glenohumeral joint, especially in the geriatric population with impacted fractures, that the development of implants continues because currently the problems with their fixation are not solved. Pre-, intra-, and postoperative assessments are crucial in management of those patients. Finite element analysis, as one of the valuable tools, has been implemented as an effective and noninvasive method to analyze proximal humeral fractures, providing solid evidence for management of troublesome patients. However, no review article about the applications and effects of finite element analysis in assessing proximal humeral fractures has been reported yet. This review article summarized the applications, contribution, and clinical significance of finite element analysis in assessing proximal humeral fractures. Furthermore, the limitations of finite element analysis, the difficulties of more realistic simulation, and the validation and also the creation of validated FE models were discussed. We concluded that although some advancements in proximal humeral fractures researches have been made by using finite element analysis, utility of this powerful tool for routine clinical management and adequate simulation requires more state-of-the-art studies to provide evidence and bases
Translate-and-Revise: Boosting Large Language Models for Constrained Translation
Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and, in some cases, ignore the given constraints. This is in part because LLMs might be overly confident in their predictions, overriding the influence of the constraints. To overcome this overiding behaviour, we propose to add a revision process that encourages LLMs to correct the outputs by prompting them about the constraints that have not yet been met. We evaluate our approach on four constrained translation tasks, encompassing both lexical and structural constraints in multiple constraint domains. Experiments show 15\% improvement in constraint-based translation accuracy over standard LLMs and the approach also significantly outperforms neural machine translation (NMT) state-of-the-art methods.16 page
Hybrid Alignment Training for Large Language Models
Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guarantee to simultaneously align with the instructions and human preferences well. To response to these, in this work, we propose a Hybrid Alignment Training (Hbat) approach, based on alternating alignment and modified elastic weight consolidation methods. The basic idea is to alternate between different objectives during alignment training, so that better collaboration can be achieved between the two alignment tasks.We experiment with Hbat on summarization and dialogue tasks. Experimental results show that the proposed \textsc{Hbat} can significantly outperform all baselines. Notably, Hbat yields consistent performance gains over the traditional two-stage alignment training when using both proximal policy optimization and direct preference optimization.accepted by ACL (Findings) 202
Values of lymphocyte-related ratios in predicting the clinical outcome of acute ischemic stroke patients receiving intravenous thrombolysis based on different etiologies
BackgroundWhile neutrophil-to-lymphocyte ratio (NLR), lymphocyte-to-monocyte ratio (LMR), and platelet-to-lymphocyte ratio (PLR) have been associated with acute ischemic stroke (AIS) outcomes, their differential predictive value across etiological subtypes (TOAST classification) in thrombolysis-treated patients remains underexplored.MethodsIn this retrospective cohort study, we analyzed 381 AIS patients receiving intravenous thrombolysis. Hematological indices were calculated from pre-thrombolysis. Using multivariable logistic regression adjusted for age, NIHSS, and comorbidities, we assessed associations between baseline ratios and 90-day unfavorable outcomes (mRS 3–6). Receiver operating characteristic (ROC) analysis was used to determine optimal cutoffs stratified by TOAST subtypes.ResultsA total of 381 patients were included in the study. NLR showed superior predictive performance: large-artery atherosclerosis: AUC = 0.702 (aOR = 1.35, 95%CI = 1.14–1.61, p = 0.001), small-artery occlusion: AUC = 0.750 (aOR = 1.51, 95%CI = 1.08–2.10, p = 0.015), cardioembolic stroke: AUC = 0.679 (aOR = 1.82, 95%CI = 1.07–3.10, p = 0.028). LMR showed predictive value only in large-artery atherosclerosis (AUC = 0.632, p = 0.004). Optimal NLR cutoffs: 3.19 (large-artery), 3.94 (small-artery), 3.17 (cardioembolic stroke).ConclusionNLR emerged as a robust, subtype-specific predictor of post-thrombolysis outcomes, particularly in atherosclerotic stroke variants. These findings supported NLR’s clinical utility for risk stratification in thrombolysis-eligible AIS patients
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Large vision-language models (LVLMs) often fail to align with human
preferences, leading to issues like generating misleading content without
proper visual context (also known as hallucination). A promising solution to
this problem is using human-preference alignment techniques, such as best-of-n
sampling and reinforcement learning. However, these techniques face the
difficulty arising from the scarcity of visual preference data, which is
required to train a visual reward model (VRM). In this work, we continue the
line of research. We present a Robust Visual Reward Model (RoVRM) which
improves human-preference alignment for LVLMs. RoVRM leverages auxiliary
textual preference data through a three-phase progressive training and optimal
transport-based preference data selection to effectively mitigate the scarcity
of visual preference data. We experiment with RoVRM on the commonly used
vision-language tasks based on the LLaVA-1.5-7B and -13B models. Experimental
results demonstrate that RoVRM consistently outperforms traditional VRMs.
Furthermore, our three-phase progressive training and preference data selection
approaches can yield consistent performance gains over ranking-based alignment
techniques, such as direct preference optimization
Human tumor necrosis factor (TNF)-alpha-induced protein 8-like 2 suppresses hepatocellular carcinoma metastasis through inhibiting Rac1
Abstract
Background
Tumor invasion and metastasis are the major reasons for leading death of patients with hepatocellular carcinoma (HCC). Therefore, to identify molecules that can suppress invasion and metastasis of tumor will provide novel targets for HCC therapies. Tumor necrosis factor (TNF)-alpha-induced protein 8-like 2, TIPE2, is a novel immune negative molecule and an inhibitor of the oncogenic Ras in mice but its function in human is unclear. Our previous research has shown that TIPE2 is downregulated in human primary HCC compared with the paired adjacent non-tumor tissues.
Results
In present study, we provide evidence that TIPE2 inhibits effectively human hepatocellular carcinoma metastasis. The forced expression of TIPE2 in HCC-derived cell lines markedly inhibits tumor cell growth, migration and invasion in vitro and suppresses growth and metastasis of HCC in vivo. Clinical information from a cohort of 112 patients reveals that loss or reduced expression of TIPE2 in primary HCC tissues is significantly associated with tumor metastasis. Mechanically, TIPE2 inhibits the migration and invasion through targeting Rac1 and then reduces F-actin polymerization and expression of matrix metallopeptidase 9 (MMP9) and urokinase plasminogen activator (uPA).
Conclusion
Our results indicate that human TIPE2 is endogenous inhibitor of Rac1 in HCC by which it attenuates invasion and metastasis of HCC. The data suggest that TIPE2 will be a new target for HCC therapy.
</jats:sec
- …
