42 research outputs found

    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

    Full text link
    Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question.Comment: EMNLP 202

    SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

    Full text link
    Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ further incorporates common spoken characteristics such as word-by-word processing and reasoning in spoken language. Based on these characteristics, we present cross-turn slot and reasoning slot detection as new challenges. We conduct experiments on various baselines, including text-modal models, newly proposed dual-modal models, and LLMs, e.g., ChatGPT. The results show that the current models still have substantial room for improvement in spoken conversation, where the most advanced dialogue state tracker only achieves 25.65% in joint goal accuracy and the SOTA end-to-end model only correctly completes the user request in 52.1% of dialogues. The dataset, code, and leaderboard are available: https://spokenwoz.github.io/SpokenWOZ-github.io/

    Livestock production, greenhouse gas emissions, air pollution, and grassland conservation: Quasi-natural experimental evidence

    Get PDF
    Serious climate challenges and environmental concerns have led to calls to mitigate greenhouse effects and pollution by controlling livestock production. In this study, we performed a cross-boundary quasi-natural experimental analysis of the Mongolian Plateau to examine the causal effects of livestock reduction on greenhouse gas (GHG) emissions and air pollutants. Aimed at grassland conservation by controlling overgrazing, China’s grassland ecological compensation policy (GECP) unintendedly offered the opportunity to estimate the causal effects of livestock reduction. To this end, we used official statistical data, remote sensing data, reanalysis data, and household survey data. Empirical findings based on the synthetic difference-in-differences (SDID) approach showed that with the implementation of the GECP, livestock reduction reduced atmospheric GHG and air pollutant concentrations and increased grassland quality and carbon sequestration in grasslands. We extended the basic SDID to the dynamic SDID and used it to estimate the causal effects in each policy year, which presented that the policy effects were more pronounced after several years of continuous implementation. The pathway analysis revealed that atmospheric CH4 concentrations decreased with the reduction in animal CH4 emissions and that the PM2.5 and PM10 concentrations decreased with grassland restoration. These findings provided empirical references for reforming the global food system to ensure both food security and environmental protection

    From cropland to cropped field: A robust algorithm for national-scale mapping by fusing time series of Sentinel-1 and Sentinel-2

    Get PDF
    Detailed and updated maps of actively cropped fields on a national scale are vital for global food security. Unfortunately, this information is not provided in existing land cover datasets, especially lacking in smallholder farmer systems. Mapping national-scale cropped fields remains challenging due to the spectral confusion with abandoned vegetated land, and their high heterogeneity over large areas. This study proposed a large-area mapping framework for automatically identifying actively cropped fields by fusing Vegetation-Soil-Pigment indices and Synthetic-aperture radar (SAR) time-series images (VSPS). Three temporal indicators were proposed and highlighted cropped fields by consistently higher values due to cropping activities. The proposed VSPS algorithm was exploited for national-scale mapping in China without regional adjustments using Sentinel-2 and Sentinel-1 images. Agriculture in China illustrated great heterogeneity and has experienced tremendous changes such as non-grain orientation and cropland abandonment. Yet, little is known about the locations and extents of cropped fields cultivated with field crops on a national scale. Here, we produced the first national-scale 20 m updated map of cropped and fallow/abandoned land in China and found that 77 % of national cropland (151.23 million hectares) was actively cropped in 2020. We found that fallow/abandoned cropland in mountainous and hilly regions were far more than we expected, which was significantly underestimated by the commonly applied VImax-based approach based on the MODIS images. The VSPS method illustrates robust generalization capabilities, which obtained an overall accuracy of 94 % based on 4,934 widely spread reference sites. The proposed mapping framework is capable of detecting cropped fields with a full consideration of a high diversity of cropping systems and complexity of fallow/abandoned cropland. The processing codes on Google Earth Engine were provided and hoped to stimulate operational agricultural mapping on cropped fields with finer resolution from the national to the global scale

    Twofold Symmetry Observed in Bi2_{2}Te3_{3}/FeTe Interfacial Superconductor

    Full text link
    Superconducting pairing symmetry are crucial in understanding the microscopic superconducting mechanism of a superconductor. Here we report the observation of a twofold superconducting gap symmetry in an interfacial superconductor Bi2_{2}Te3_{3}/FeTe, by employing quasiparticle interference (QPI) technique in scanning tunneling microscopy and macroscopic magnetoresistance measurements. The QPI patterns corresponding to energies inside and outside the gap reveal a clear anisotropic superconducting gap. Furthermore, both the in-plane angle-dependent magnetoresistance and in-plane upper critical field exhibit a clear twofold symmetry. This twofold symmetry align with the Te-Te direction in FeTe, which weakens the possible generation by bi-collinear antiferromagnetism order. Our finding provides key information in further understanding of the topological properties in Bi2_{2}Te3_{3}/FeTe superconducting system and propels further theoretical interests in the paring mechanism in the system

    Correlation Between Circulating Tumor Cell DNA Genomic Alterations and Mesenchymal CTCs or CTC-Associated White Blood Cell Clusters in Hepatocellular Carcinoma

    Get PDF
    PurposeLiquid biopsy is attracting attention as a method of real-time monitoring of patients with tumors. It can be used to understand the temporal and spatial heterogeneity of tumors and has good clinical application prospects. We explored a new type of circulating tumor cell (CTC) enrichment technology combined with next-generation sequencing (NGS) to analyze the correlation between genomic alterations in circulating tumor cells of hepatocellular carcinoma and the counts of mesenchymal CTCs and CTC-associated white blood cell (CTC-WBC) clusters.MethodsWe collected peripheral blood samples from 29 patients with hepatocellular carcinoma from January 2016 to December 2019. We then used the CanPatrol™ system to capture and analyze mesenchymal CTCs and CTC-WBC clusters for all the patients. A customized Illumina panel was used for DNA sequencing and the Mann–Whitney U test was used to test the correlation between mesenchymal CTCs, CTC-WBC cluster counts, and specific genomic changes.ResultsAt least one somatic hotspot mutation was detected in each of the 29 sequenced patients. A total of 42 somatic hot spot mutations were detected in tumor tissue DNA, and 39 mutations were detected in CTC-DNA, all of which included common changes in PTEN, MET, EGFR, RET, and FGFR3. The number of mesenchymal CTCs was positively correlated with the somatic genomic alterations in the PTEN and MET genes (PTEN, P = 0.021; MET, P  = 0.008, Mann–Whitney U test) and negatively correlated with the somatic genomic alterations in the EGFR gene (P = 0.006, Mann–Whitney U test). The number of CTC-WBC clusters was positively correlated with the somatic genomic alterations in RET genes (P  = 0.01, Mann–Whitney U test) and negatively correlated with the somatic genomic alterations in FGFR3 (P = 0.039, Mann–Whitney U test).ConclusionsWe report a novel method of a CTC enrichment platform combined with NGS technology to analyze genetic variation, which further demonstrates the potential clinical application of this method for spatiotemporal heterogeneity monitoring of hepatocellular carcinoma. We found that the number of peripheral blood mesenchymal CTCs and CTC-WBC clusters in patients with hepatocellular carcinoma was related to a specific genome profile

    CDBA: a novel multi-branch feature fusion model for EEG-based emotion recognition

    Get PDF
    EEG-based emotion recognition through artificial intelligence is one of the major areas of biomedical and machine learning, which plays a key role in understanding brain activity and developing decision-making systems. However, the traditional EEG-based emotion recognition is a single feature input mode, which cannot obtain multiple feature information, and cannot meet the requirements of intelligent and high real-time brain computer interface. And because the EEG signal is nonlinear, the traditional methods of time domain or frequency domain are not suitable. In this paper, a CNN-DSC-Bi-LSTM-Attention (CDBA) model based on EEG signals for automatic emotion recognition is presented, which contains three feature-extracted channels. The normalized EEG signals are used as an input, the feature of which is extracted by multi-branching and then concatenated, and each channel feature weight is assigned through the attention mechanism layer. Finally, Softmax was used to classify EEG signals. To evaluate the performance of the proposed CDBA model, experiments were performed on SEED and DREAMER datasets, separately. The validation experimental results show that the proposed CDBA model is effective in classifying EEG emotions. For triple-category (positive, neutral and negative) and four-category (happiness, sadness, fear and neutrality), the classification accuracies were respectively 99.44% and 99.99% on SEED datasets. For five classification (Valence 1—Valence 5) on DREAMER datasets, the accuracy is 84.49%. To further verify and evaluate the model accuracy and credibility, the multi-classification experiments based on ten-fold cross-validation were conducted, the elevation indexes of which are all higher than other models. The results show that the multi-branch feature fusion deep learning model based on attention mechanism has strong fitting and generalization ability and can solve nonlinear modeling problems, so it is an effective emotion recognition method. Therefore, it is helpful to the diagnosis and treatment of nervous system diseases, and it is expected to be applied to emotion-based brain computer interface systems

    Evaluation and probabilistic prediction of shear strength for RC beams without shear reinforcement

    No full text
    Over the past decades, a large number of shear strength models have been developed by numerous researchers. The majority of the developed shear strength models works in deterministic manner with a collected database and a simplified mechanical or semi-empirical representation. However, the uncertainty of shear strength was neglected in application. It is difficult for engineers to choose an appropriate prediction model in the engineering practice due to the large scatter among the deterministic predictions. Therefore, this report aimed to evaluate the accuracy of proposed deterministic shear strength prediction models and develop a probabilistic prediction model of shear strength for RC beams without shear reinforcement. Research on shear behavior in the past one hundred years was reviewed with attention paid to load transfer mechanisms development. Furthermore, evaluations of eight well-known shear strength models were carried out to investigate the reliability of deterministic prediction models of shear strength for RC beams without shear reinforcement based on the database of 127 tested beams. It concludes that the shear strength predicted by design provisions of CSA (2004) is conservative with relatively high accuracy. A probabilistic model to predict the shear strength of RC beams without shear reinforcement was proposed by both mechanical approach and data-driven approach. Specifically, a function giving the probabilistic shear strength was derived from the commonly known relationship between the shear forces and the rate of change in bending moment along the beam to reflect the combination of beam action and arch action. The GLUE method was then adopted to update the probability distribution of two unknown model parameters, specifically, k1 and k2, as posterior Weibull distribution from the prior uniform distribution. The mean prediction and standard deviation prediction models were proposed to give a prediction band of shear strength for each specimen for the purpose of facilitating use in engineering practice of this probabilistic prediction model.Bachelor of Engineering (Civil

    Prediction of Arrival Time of Pure Electric Bus Based on FA-BP Algorithm

    No full text
    To establish a suitable pure electric bus arrival time prediction model, this paper takes pure electric bus as the research object. Based on the analysis of the influencing factors of the arrival time of the pure electric bus, the BP neural network arrival time prediction model optimized by the firefly algorithm (FA-BP prediction model) is established by selecting vehicle type, SOC value, battery age, and time as input conditions. The model is trained and tested by using bus operation data. The root mean square error of the Kalman filter model is 0.351, of the BP neural network model is 0.059, and of the FA-BP prediction model is 0.04. The results show that the model in this paper effectively improves the prediction accuracy and has good reliability and feasibility. It can provide some theoretical references for pure electric bus operation and managers and provide some basis for improving bus reliability
    corecore