274 research outputs found

    Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs

    Full text link
    The attribution of question answering is to provide citations for supporting generated statements, and has attracted wide research attention. The current methods for automatically evaluating the attribution, which are often based on Large Language Models (LLMs), are still inadequate, particularly in recognizing subtle differences between attributions, and complex relationships between citations and statements. To compare these attribution evaluation methods and develop new ones, we introduce a set of fine-grained categories (i.e., supportive, insufficient, contradictory and irrelevant) for measuring the attribution, and develop a Complex Attributed Question Answering (CAQA) benchmark by leveraging knowledge graphs (KGs) for automatically generating attributions of different categories to question-answer pairs. Our analysis reveals that existing evaluators perform poorly under fine-grained attribution settings and exhibit weaknesses in complex citation-statement reasoning. Our CAQA benchmark, validated with human annotations, emerges as a promising tool for selecting and developing LLM attribution evaluators.Comment: 13 pages, 5 figure

    Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

    Full text link
    Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this challenge, we propose a novel framework for openvocabulary semantic segmentation called EBSeg, incorporating an Adaptively Balanced Decoder (AdaB Decoder) and a Semantic Structure Consistency loss (SSC Loss). The AdaB Decoder is designed to generate different image embeddings for both training and new classes. Subsequently, these two types of embeddings are adaptively balanced to fully exploit their ability to recognize training classes and generalization ability for new classes. To learn a consistent semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP, thereby improving the generalization ability of our model. Furthermore, we employ a frozen SAM image encoder to complement the spatial information that CLIP features lack due to the low training image resolution and image-level supervision inherent in CLIP. Extensive experiments conducted across various benchmarks demonstrate that the proposed EBSeg outperforms the state-of-the-art methods. Our code and trained models will be here: https://github.com/slonetime/EBSeg.Comment: CVPR202

    Learn from Yesterday: A Semi-Supervised Continual Learning Method for Supervision-Limited Text-to-SQL Task Streams

    Full text link
    Conventional text-to-SQL studies are limited to a single task with a fixed-size training and test set. When confronted with a stream of tasks common in real-world applications, existing methods struggle with the problems of insufficient supervised data and high retraining costs. The former tends to cause overfitting on unseen databases for the new task, while the latter makes a full review of instances from past tasks impractical for the model, resulting in forgetting of learned SQL structures and database schemas. To address the problems, this paper proposes integrating semi-supervised learning (SSL) and continual learning (CL) in a stream of text-to-SQL tasks and offers two promising solutions in turn. The first solution Vanilla is to perform self-training, augmenting the supervised training data with predicted pseudo-labeled instances of the current task, while replacing the full volume retraining with episodic memory replay to balance the training efficiency with the performance of previous tasks. The improved solution SFNet takes advantage of the intrinsic connection between CL and SSL. It uses in-memory past information to help current SSL, while adding high-quality pseudo instances in memory to improve future replay. The experiments on two datasets shows that SFNet outperforms the widely-used SSL-only and CL-only baselines on multiple metrics.Comment: Accepted by AAAI-202

    Real-time Local Feature with Global Visual Information Enhancement

    Full text link
    Local feature provides compact and invariant image representation for various visual tasks. Current deep learning-based local feature algorithms always utilize convolution neural network (CNN) architecture with limited receptive field. Besides, even with high-performance GPU devices, the computational efficiency of local features cannot be satisfactory. In this paper, we tackle such problems by proposing a CNN-based local feature algorithm. The proposed method introduces a global enhancement module to fuse global visual clues in a light-weight network, and then optimizes the network by novel deep reinforcement learning scheme from the perspective of local feature matching task. Experiments on the public benchmarks demonstrate that the proposal can achieve considerable robustness against visual interference and meanwhile run in real time.Comment: 6 pages, 5 figures, 2 tables. Accepted by ICIEA 202

    Study on the biomechanical properties of 3D printed blended esophageal stents with different structural parameters based on patient CT

    Get PDF
    Introduction: Esophageal stenting is a widely used treatment for esophageal diseases, which can also be used for adjuvant therapy and feeding after chemotherapy for esophageal cancer. The structural parameters of the stent have a significant impact on its mechanical properties and patient comfort.Methods: In the present work, we reconstructed the esophagus model based on the patient’s computed tomography (CT) data, and designed stents with different structural parameters. We used 3D printing technology to achieve rapid production of the designed stents by using Thermoplastic polyurethane (TPU)/Poly-ε-caprolactone (PCL) blends as the materials. The mechanical properties and effects on the esophagus of polymer stents with four different structural parameters of diameter, wall thickness, length and flaring were investigated by in vitro tests of radial compression and migration of the stents, as well as by finite element simulations of the stent implantation process in the esophagus and of the stent migration process. An artificial neural network model was established to predict the radial force of the stent and the maximum equivalent stress of the esophagus during implantation based on these four structural parameters.Results: The results show that wall thickness was the structural parameter that had the greatest impact on the radial force of the stent (statistically significant, p < 0.01), and flaring was the structural parameter that had the greatest impact on the maximum equivalent stress of the esophageal wall after stent implantation (statistically significant, p < 0.01). No. 6 stent had a maximum radial force of 18.07 N, which exceeded that of commercial esophageal stents and had good mechanical properties. And the maximum equivalent force on the esophagus caused by its implantation was only 30.39 kPa, which can improve patient comfort. The predicted values of the constructed back propagation (BP) neural network model had an error of less than 10% from the true values, and the overall prediction accuracies were both above 97%, which can provide guidance for optimizing the design of the stent and for clinical research.Discussion: 3D printing technology presents a wide range of applications for the rapid fabrication of personalized TPU/PCL blend stents that are more suitable for individual patients

    MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

    Full text link
    Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluators. While using a single LLM as an evaluation agent shows potential, it is filled with significant uncertainty and instability. To address these issues, we propose the MATEval: A "Multi-Agent Text Evaluation framework" where all agents are played by LLMs like GPT-4. The MATEval framework emulates human collaborative discussion methods, integrating multiple agents' interactions to evaluate open-ended text. Our framework incorporates self-reflection and Chain-of-Thought (CoT) strategies, along with feedback mechanisms, enhancing the depth and breadth of the evaluation process and guiding discussions towards consensus, while the framework generates comprehensive evaluation reports, including error localization, error types and scoring. Experimental results show that our framework outperforms existing open-ended text evaluation methods and achieves the highest correlation with human evaluation, which confirms the effectiveness and advancement of our framework in addressing the uncertainties and instabilities in evaluating LLMs-generated text. Furthermore, our framework significantly improves the efficiency of text evaluation and model iteration in industrial scenarios.Comment: This paper has been accepted as a long paper presentation by DASFAA 2024 Industrial Trac

    MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

    Full text link
    In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, theMulti-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs' planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT's effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios

    Geochemical characteristics of dissolved heavy metals in Zhujiang River, Southwest China: spatial-temporal distribution, source, export flux estimation, and a water quality assessment

    Get PDF
    To investigate the sources and spatial-temporal distribution of dissolved heavy metals in river water, and to evaluate the water quality, a total of 162 water samples were collected from 81 key sampling points in high and low flow seasons separately in the Zhujiang River, Southwest China. Ten dissolved heavy metals (V, Cr, Mn, Co, Ni, Cu, Mo, Cd, Ba, and Pb) in the Zhujiang River water exhibit little variation at temporal scale, but vary with a significant spatial heterogeneity. Furthermore, different metals present different variation trends along the main channel of the Zhujiang River. Our results suggest that Ba (14.72 μg L−1 in low flow season and 12.50 μg L−1 in high flow season) and Cr (6.85 μg L−1 in low flow season and 7.52 μg L−1 in high flow season) are consistently the most abundant metals in the two sampling periods. According to the water quality index (WQI values ranged from 1.3 to 43.9) and health risk assessment, metals investigated in Zhujiang River are below the hazard level (all hazard index (HI) < 1). Application of statistical approaches, including correlation matrix and principal component analysis (PCA), identify three principal components that account for 61.74% of the total variance, the results conclude that the anthropogenic heavy metals (V, Cr, Ni, and Cu) are greatly impacted by the dilution effect, and the heavy metals in Zhujiang River are mainly presented a natural sources signature from the perspective of entire basin. Moreover, our results reveal that the estimated export budget of several heavy metals including V (735.6 t year−1), Cr (1,561.1 t year−1), Ni (498.2 t year−1), and Mo (118.9 t year−1) to the ocean are higher than the world average
    corecore