274 research outputs found
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs
The attribution of question answering is to provide citations for supporting
generated statements, and has attracted wide research attention. The current
methods for automatically evaluating the attribution, which are often based on
Large Language Models (LLMs), are still inadequate, particularly in recognizing
subtle differences between attributions, and complex relationships between
citations and statements. To compare these attribution evaluation methods and
develop new ones, we introduce a set of fine-grained categories (i.e.,
supportive, insufficient, contradictory and irrelevant) for measuring the
attribution, and develop a Complex Attributed Question Answering (CAQA)
benchmark by leveraging knowledge graphs (KGs) for automatically generating
attributions of different categories to question-answer pairs. Our analysis
reveals that existing evaluators perform poorly under fine-grained attribution
settings and exhibit weaknesses in complex citation-statement reasoning. Our
CAQA benchmark, validated with human annotations, emerges as a promising tool
for selecting and developing LLM attribution evaluators.Comment: 13 pages, 5 figure
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Open-vocabulary semantic segmentation is a challenging task, which requires
the model to output semantic masks of an image beyond a close-set vocabulary.
Although many efforts have been made to utilize powerful CLIP models to
accomplish this task, they are still easily overfitting to training classes due
to the natural gaps in semantic information between training and new classes.
To overcome this challenge, we propose a novel framework for openvocabulary
semantic segmentation called EBSeg, incorporating an Adaptively Balanced
Decoder (AdaB Decoder) and a Semantic Structure Consistency loss (SSC Loss).
The AdaB Decoder is designed to generate different image embeddings for both
training and new classes. Subsequently, these two types of embeddings are
adaptively balanced to fully exploit their ability to recognize training
classes and generalization ability for new classes. To learn a consistent
semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in
the image feature space with that in the text feature space of CLIP, thereby
improving the generalization ability of our model. Furthermore, we employ a
frozen SAM image encoder to complement the spatial information that CLIP
features lack due to the low training image resolution and image-level
supervision inherent in CLIP. Extensive experiments conducted across various
benchmarks demonstrate that the proposed EBSeg outperforms the state-of-the-art
methods. Our code and trained models will be here:
https://github.com/slonetime/EBSeg.Comment: CVPR202
Learn from Yesterday: A Semi-Supervised Continual Learning Method for Supervision-Limited Text-to-SQL Task Streams
Conventional text-to-SQL studies are limited to a single task with a
fixed-size training and test set. When confronted with a stream of tasks common
in real-world applications, existing methods struggle with the problems of
insufficient supervised data and high retraining costs. The former tends to
cause overfitting on unseen databases for the new task, while the latter makes
a full review of instances from past tasks impractical for the model, resulting
in forgetting of learned SQL structures and database schemas. To address the
problems, this paper proposes integrating semi-supervised learning (SSL) and
continual learning (CL) in a stream of text-to-SQL tasks and offers two
promising solutions in turn. The first solution Vanilla is to perform
self-training, augmenting the supervised training data with predicted
pseudo-labeled instances of the current task, while replacing the full volume
retraining with episodic memory replay to balance the training efficiency with
the performance of previous tasks. The improved solution SFNet takes advantage
of the intrinsic connection between CL and SSL. It uses in-memory past
information to help current SSL, while adding high-quality pseudo instances in
memory to improve future replay. The experiments on two datasets shows that
SFNet outperforms the widely-used SSL-only and CL-only baselines on multiple
metrics.Comment: Accepted by AAAI-202
Real-time Local Feature with Global Visual Information Enhancement
Local feature provides compact and invariant image representation for various
visual tasks. Current deep learning-based local feature algorithms always
utilize convolution neural network (CNN) architecture with limited receptive
field. Besides, even with high-performance GPU devices, the computational
efficiency of local features cannot be satisfactory. In this paper, we tackle
such problems by proposing a CNN-based local feature algorithm. The proposed
method introduces a global enhancement module to fuse global visual clues in a
light-weight network, and then optimizes the network by novel deep
reinforcement learning scheme from the perspective of local feature matching
task. Experiments on the public benchmarks demonstrate that the proposal can
achieve considerable robustness against visual interference and meanwhile run
in real time.Comment: 6 pages, 5 figures, 2 tables. Accepted by ICIEA 202
Study on the biomechanical properties of 3D printed blended esophageal stents with different structural parameters based on patient CT
Introduction: Esophageal stenting is a widely used treatment for esophageal diseases, which can also be used for adjuvant therapy and feeding after chemotherapy for esophageal cancer. The structural parameters of the stent have a significant impact on its mechanical properties and patient comfort.Methods: In the present work, we reconstructed the esophagus model based on the patient’s computed tomography (CT) data, and designed stents with different structural parameters. We used 3D printing technology to achieve rapid production of the designed stents by using Thermoplastic polyurethane (TPU)/Poly-ε-caprolactone (PCL) blends as the materials. The mechanical properties and effects on the esophagus of polymer stents with four different structural parameters of diameter, wall thickness, length and flaring were investigated by in vitro tests of radial compression and migration of the stents, as well as by finite element simulations of the stent implantation process in the esophagus and of the stent migration process. An artificial neural network model was established to predict the radial force of the stent and the maximum equivalent stress of the esophagus during implantation based on these four structural parameters.Results: The results show that wall thickness was the structural parameter that had the greatest impact on the radial force of the stent (statistically significant, p < 0.01), and flaring was the structural parameter that had the greatest impact on the maximum equivalent stress of the esophageal wall after stent implantation (statistically significant, p < 0.01). No. 6 stent had a maximum radial force of 18.07 N, which exceeded that of commercial esophageal stents and had good mechanical properties. And the maximum equivalent force on the esophagus caused by its implantation was only 30.39 kPa, which can improve patient comfort. The predicted values of the constructed back propagation (BP) neural network model had an error of less than 10% from the true values, and the overall prediction accuracies were both above 97%, which can provide guidance for optimizing the design of the stent and for clinical research.Discussion: 3D printing technology presents a wide range of applications for the rapid fabrication of personalized TPU/PCL blend stents that are more suitable for individual patients
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
Recent advancements in generative Large Language Models(LLMs) have been
remarkable, however, the quality of the text generated by these models often
reveals persistent issues. Evaluating the quality of text generated by these
models, especially in open-ended text, has consistently presented a significant
challenge. Addressing this, recent work has explored the possibility of using
LLMs as evaluators. While using a single LLM as an evaluation agent shows
potential, it is filled with significant uncertainty and instability. To
address these issues, we propose the MATEval: A "Multi-Agent Text Evaluation
framework" where all agents are played by LLMs like GPT-4. The MATEval
framework emulates human collaborative discussion methods, integrating multiple
agents' interactions to evaluate open-ended text. Our framework incorporates
self-reflection and Chain-of-Thought (CoT) strategies, along with feedback
mechanisms, enhancing the depth and breadth of the evaluation process and
guiding discussions towards consensus, while the framework generates
comprehensive evaluation reports, including error localization, error types and
scoring. Experimental results show that our framework outperforms existing
open-ended text evaluation methods and achieves the highest correlation with
human evaluation, which confirms the effectiveness and advancement of our
framework in addressing the uncertainties and instabilities in evaluating
LLMs-generated text. Furthermore, our framework significantly improves the
efficiency of text evaluation and model iteration in industrial scenarios.Comment: This paper has been accepted as a long paper presentation by DASFAA
2024 Industrial Trac
MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model
In the realm of data-driven AI technology, the application of open-source
large language models (LLMs) in robotic task planning represents a significant
milestone. Recent robotic task planning methods based on open-source LLMs
typically leverage vast task planning datasets to enhance models' planning
abilities. While these methods show promise, they struggle with complex
long-horizon tasks, which require comprehending more context and generating
longer action sequences. This paper addresses this limitation by proposing
MLDT, theMulti-Level Decomposition Task planning method. This method
innovatively decomposes tasks at the goal-level, task-level, and action-level
to mitigate the challenge of complex long-horizon tasks. In order to enhance
open-source LLMs' planning abilities, we introduce a goal-sensitive corpus
generation method to create high-quality training data and conduct instruction
tuning on the generated corpus. Since the complexity of the existing datasets
is not high enough, we construct a more challenging dataset, LongTasks, to
specifically evaluate planning ability on complex long-horizon tasks. We
evaluate our method using various LLMs on four datasets in VirtualHome. Our
results demonstrate a significant performance enhancement in robotic task
planning, showcasing MLDT's effectiveness in overcoming the limitations of
existing methods based on open-source LLMs as well as its practicality in
complex, real-world scenarios
Geochemical characteristics of dissolved heavy metals in Zhujiang River, Southwest China: spatial-temporal distribution, source, export flux estimation, and a water quality assessment
To investigate the sources and spatial-temporal distribution of dissolved heavy metals in river water, and to evaluate the water quality, a total of 162 water samples were collected from 81 key sampling points in high and low flow seasons separately in the Zhujiang River, Southwest China. Ten dissolved heavy metals (V, Cr, Mn, Co, Ni, Cu, Mo, Cd, Ba, and Pb) in the Zhujiang River water exhibit little variation at temporal scale, but vary with a significant spatial heterogeneity. Furthermore, different metals present different variation trends along the main channel of the Zhujiang River. Our results suggest that Ba (14.72 μg L−1 in low flow season and 12.50 μg L−1 in high flow season) and Cr (6.85 μg L−1 in low flow season and 7.52 μg L−1 in high flow season) are consistently the most abundant metals in the two sampling periods. According to the water quality index (WQI values ranged from 1.3 to 43.9) and health risk assessment, metals investigated in Zhujiang River are below the hazard level (all hazard index (HI) < 1). Application of statistical approaches, including correlation matrix and principal component analysis (PCA), identify three principal components that account for 61.74% of the total variance, the results conclude that the anthropogenic heavy metals (V, Cr, Ni, and Cu) are greatly impacted by the dilution effect, and the heavy metals in Zhujiang River are mainly presented a natural sources signature from the perspective of entire basin. Moreover, our results reveal that the estimated export budget of several heavy metals including V (735.6 t year−1), Cr (1,561.1 t year−1), Ni (498.2 t year−1), and Mo (118.9 t year−1) to the ocean are higher than the world average
- …