39 research outputs found
Guiding AMR Parsing with Reverse Graph Linearization
Abstract Meaning Representation (AMR) parsing aims to extract an abstract
semantic graph from a given sentence. The sequence-to-sequence approaches,
which linearize the semantic graph into a sequence of nodes and edges and
generate the linearized graph directly, have achieved good performance.
However, we observed that these approaches suffer from structure loss
accumulation during the decoding process, leading to a much lower F1-score for
nodes and edges decoded later compared to those decoded earlier. To address
this issue, we propose a novel Reverse Graph Linearization (RGL) enhanced
framework. RGL defines both default and reverse linearization orders of an AMR
graph, where most structures at the back part of the default order appear at
the front part of the reversed order and vice versa. RGL incorporates the
reversed linearization to the original AMR parser through a two-pass
self-distillation mechanism, which guides the model when generating the default
linearizations. Our analysis shows that our proposed method significantly
mitigates the problem of structure loss accumulation, outperforming the
previously best AMR parsing model by 0.8 and 0.5 Smatch scores on the AMR 2.0
and AMR 3.0 dataset, respectively. The code are available at
https://github.com/pkunlp-icler/AMR_reverse_graph_linearization.Comment: Findings of EMNLP202
On the Pareto Front of Multilingual Neural Machine Translation
In this work, we study how the performance of a given direction changes with
its sampling ratio in Multilingual Neural Machine Translation (MNMT). By
training over 200 multilingual models with various model sizes, data sizes, and
language directions, we find it interesting that the performance of certain
translation direction does not always improve with the increase of its weight
in the multi-task optimization objective. Accordingly, scalarization method
leads to a multitask trade-off front that deviates from the traditional Pareto
front when there exists data imbalance in the training corpus, which poses a
great challenge to improve the overall performance of all directions. Based on
our observations, we propose the Double Power Law to predict the unique
performance trade-off front in MNMT, which is robust across various languages,
data adequacy, and the number of tasks. Finally, we formulate the sample ratio
selection problem in MNMT as an optimization problem based on the Double Power
Law. In our experiments, it achieves better performance than temperature
searching and gradient manipulation methods with only 1/5 to 1/2 of the total
training budget. We release the code at
https://github.com/pkunlp-icler/ParetoMNMT for reproduction.Comment: NeurIPS 202
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
In this study, we explore the potential of Multimodal Large Language Models
(MLLMs) in improving embodied decision-making processes for agents. While Large
Language Models (LLMs) have been widely used due to their advanced reasoning
skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual
understanding and reasoning capabilities. We investigate whether
state-of-the-art MLLMs can handle embodied decision-making in an end-to-end
manner and whether collaborations between LLMs and MLLMs can enhance
decision-making. To address these questions, we introduce a new benchmark
called PCA-EVAL, which evaluates embodied decision-making from the perspectives
of Perception, Cognition, and Action. Additionally, we propose HOLMES, a
multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs
to gather multimodal information for informed decision-making. We compare
end-to-end embodied decision-making and HOLMES on our benchmark and find that
the GPT4-Vision model demonstrates strong end-to-end embodied decision-making
abilities, outperforming GPT4-HOLMES in terms of average decision accuracy
(+3%). However, this performance is exclusive to the latest GPT4-Vision model,
surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate
that powerful MLLMs like GPT4-Vision hold promise for decision-making in
embodied agents, offering new avenues for MLLM research. Code and data are open
at https://github.com/pkunlp-icler/PCA-EVAL/.Comment: FMDM@NeurIPS2023, Code and data:
https://github.com/pkunlp-icler/PCA-EVAL
Artificial disc and vertebra system: a novel motion preservation device for cervical spinal disease after vertebral corpectomy
OBJECTIVE: To determine the range of motion and stability of the human cadaveric cervical spine after the implantation of a novel artificial disc and vertebra system by comparing an intact group and a fusion group. METHODS: Biomechanical tests were conducted on 18 human cadaveric cervical specimens. The range of motion and the stability index range of motion were measured to study the function and stability of the artificial disc and vertebra system of the intact group compared with the fusion group. RESULTS: In all cases, the artificial disc and vertebra system maintained intervertebral motion and reestablished vertebral height at the operative level. After its implantation, there was no significant difference in the range of motion (ROM) of C3-7 in all directions in the non-fusion group compared with the intact group (p>;0.05), but significant differences were detected in flexion, extension and axial rotation compared with the fusion group (
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Artificial Intelligence (AI), along with the recent progress in biomedical
language understanding, is gradually changing medical practice. With the
development of biomedical language understanding benchmarks, AI applications
are widely used in the medical field. However, most benchmarks are limited to
English, which makes it challenging to replicate many of the successes in
English for other languages. To facilitate research in this direction, we
collect real-world biomedical data and present the first Chinese Biomedical
Language Understanding Evaluation (CBLUE) benchmark: a collection of natural
language understanding tasks including named entity recognition, information
extraction, clinical diagnosis normalization, single-sentence/sentence-pair
classification, and an associated online platform for model evaluation,
comparison, and analysis. To establish evaluation on these tasks, we report
empirical results with the current 11 pre-trained Chinese models, and
experimental results show that state-of-the-art neural models perform by far
worse than the human ceiling. Our benchmark is released at
\url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&lang=en-us}
Clustering of Chinese Sentences Using the SMM Model
The purpose of this article is to research the clustering method based on statistical model, then deal with the Chinese sentence clustering problem on bilingual lexicographical platform. In the view of cooccurrence data, we develop the Sentence Cluster Model as a multidimensional SMM, and get the solution of parameter estimation by EM algorithm. Based on this model, we represent three methods for sentence clustering, and use Rand index to evaluate our method through experiments on corpus with comparison to the k-means algorithm. We mainly discuss the result on aspect of word sense distinction, part-of-speech distinction and window size choosing.Computer Science, Artificial IntelligenceComputer Science, CyberneticsEngineering, Electrical & ElectronicEICPCI-S(ISTP)
On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation
While multilingual neural machine translation has achieved great success, it
suffers from the off-target issue, where the translation is in the wrong
language. This problem is more pronounced on zero-shot translation tasks. In
this work, we find that failing in encoding discriminative target language
signal will lead to off-target and a closer lexical distance (i.e.,
KL-divergence) between two languages' vocabularies is related with a higher
off-target rate. We also find that solely isolating the vocab of different
languages in the decoder can alleviate the problem. Motivated by the findings,
we propose Language Aware Vocabulary Sharing (LAVS), a simple and effective
algorithm to construct the multilingual vocabulary, that greatly alleviates the
off-target problem of the translation model by increasing the KL-divergence
between languages. We conduct experiments on a multilingual machine translation
benchmark in 11 languages. Experiments show that the off-target rate for 90
translation tasks is reduced from 29\% to 8\%, while the overall BLEU score is
improved by an average of 1.9 points without extra training cost or sacrificing
the supervised directions' performance. We release the code at
\href{https://github.com/chenllliang/Off-Target-MNMT}{https://github.com/chenllliang/Off-Target-MNMT}
for reproduction.Comment: Findings of ACL 202