51 research outputs found

    Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

    Full text link
    Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks. However, it is difficult to know whether the models are reasoning based on deep understandings of truth and logic, or leveraging their memorized patterns in a relatively superficial way. In this work, we explore testing LLMs' reasoning by engaging with them in a debate-like conversation, where given a question, the LLM and the user need to discuss to make the correct decision starting from opposing arguments. Upon mitigating the Clever Hans effect, our task requires the LLM to not only achieve the correct answer on its own, but also be able to hold and defend its belief instead of blindly believing or getting misled by the user's (invalid) arguments and critiques, thus testing in greater depth whether the LLM grasps the essence of the reasoning required to solve the problem. Across a range of complex reasoning benchmarks spanning math, commonsense, logic and BIG-Bench tasks, we find that despite their impressive performance as reported in existing work on generating correct step-by-step solutions in the beginning, LLMs like ChatGPT cannot maintain their beliefs in truth for a significant portion of examples when challenged by oftentimes absurdly invalid arguments. Our work points to danger zones of model alignment, and also suggests more careful treatments and interpretations of the recent findings that LLMs can improve their responses based on feedback.Comment: EMNLP-23 (findings

    Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

    Full text link
    Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.Comment: ACL-23 Camera Ready. Code and model input/output are available at https://github.com/sunlab-osu/Understanding-Co

    Mind2Web: Towards a Generalist Agent for the Web

    Full text link
    We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns. Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents. While the raw HTML of real-world websites are often too large to be fed to LLMs, we show that first filtering it with a small LM significantly improves the effectiveness and efficiency of LLMs. Our solution demonstrates a decent level of performance, even on websites or entire domains the model has never seen before, but there is still a substantial room to improve towards truly generalizable agents. We open-source our dataset, model implementation, and trained models (https://osu-nlp-group.github.io/Mind2Web) to facilitate further research on building a generalist agent for the web.Comment: website: https://osu-nlp-group.github.io/Mind2We

    Fatty acid metabolism is related to the immune microenvironment changes of gastric cancer and RGS2 is a new tumor biomarker

    Get PDF
    BackgroundAlterations in lipid metabolism promote tumor progression. However, the role of lipid metabolism in the occurrence and development of gastric cancer have not been fully clarifiedMethodHere, genes that are related to fatty acid metabolism and differentially-expressed between normal and gastric cancer tissues were identified in the TCGA-STAD cohort. The intersection of identified differentially-expressed genes with Geneset was determined to obtain 78 fatty acid metabolism-related genes. The ConsensusClusterPlus R package was used to perform differentially-expressed genes, which yielded divided two gastric cancer subtypes termed cluster 1 and cluster 2.ResultsPatients in cluster 2 was found to display poorer prognosis than patients in cluster 1. Using machine learning method to select 8 differentially expressed genes among subtypes to construct fatty acid prognostic risk score model (FARS), which was found to display good prognostic efficacy. We also identified that certain anticancer drugs, such as bortezomib, elesclomol, GW843682X, and nilotinib, showed significant sensitivity in the high FARS score group. RGS2 was selected as the core gene upon an analysis of the gastric cancer single-cell, and Western blotting and immunofluorescence staining results revealed high level of expression of this gene in gastric cancer cells. The results of immunohistochemical staining showed that a large amount of RGS2 was deposited in the stroma in gastric cancer. A pan-cancer analysis also revealed a significant association of RGS2 with TMB, TIDE, and CD8+ T-cell infiltration in other cancer types as well. RGS2 may thus be studied further as a new target for immunotherapy in future studies on gastric cancer.ConclusionIn summary, the FARS model developed here enhances our understanding of lipid metabolism in the TME in gastric cancer, and provides a theoretical basis for predicting tumor prognosis and clinical treatment

    A comparative analysis of aerosol microphysical, optical and radiative properties during the Spring Festival holiday over Beijing and surrounding regions

    Get PDF
    Using ground-based data, meteorological observations, and atmospheric environmental monitoring data, a comparative analysis of the microphysical and optical properties, and radiative forcing of aerosols was conducted between three stations in different developed environments during a severe air pollution episode during the Spring Festival over Beijing. During the most polluted period, the daily peak values of the aerosol optical depth were ~1.62, ~1.73, and ~0.74, which were about 2.6, 2.9, and 2.1 times higher than the background levels at the CAMS, Xianghe, and Shangdianzi sites, respectively. The daily peak values of the single scattering albedo were ~0.95, ~0.96, and ~0.87. The volume of fine-mode particles varied from 0.04 to 0.21 µm3 µm-2, 0.06 to 0.17 µm3 µm-2, and 0.01 to 0.10 µm3 µm-2, which were about 0.3 to 5.8, 1.1 to 4.7, and 1.2 to 8.9 times greater than the background values, respectively. The daily absorption aerosol optical depth was ~0.01 to ~0.13 at CAMS, ~0.03 to ~0.14 at Xianghe, and ~0.01 to ~0.09 at Shangdianzi, and the absorption Ångström exponents reflected a significant increase in organic aerosols over CAMS and Xianghe and in black carbon over Shangdianzi. Aerosol radiative forcing at the bottom of the atmosphere varied from -20 to -130, -40 to -150, and -10 to -110 W m-2 for the whole holiday period, indicating the cooling effect. The potential source contribution function and concentration-weighted trajectory analysis showed that Beijing, the southern parts of Hebei and Shanxi, and the central northern part of Shandong contributed greatly to the pollution

    Nutlin-3 overcomes arsenic trioxide resistance and tumor metastasis mediated by mutant p53 in Hepatocellular Carcinoma

    Get PDF
    Background: Arsenic trioxide has been demonstrated as an effective anti-cancer drug against leukemia and solid tumors both in vitro and in vivo. However, recent phase II trials demonstrated that single agent arsenic trioxide was poorly effective against hepatocellular carcinoma (HCC), which might be due to drug resistance. Methods: Mutation detection of p53 gene in arsenic trioxide resistant HCC cell lines was performed. The therapeutic effects of arsenic trioxide and Nutlin-3 on HCC were evaluated both in vitro and in vivo. A series of experiments including MTT, apoptosis assays, co-Immunoprecipitation, siRNA transfection, lentiviral infection, cell migration, invasion, and epithelial-mesenchy-mal transition (EMT) assays were performed to investigate the underlying mechanisms. Results: The acquisition of p53 mutation contributed to arsenic trioxide resistance and enhanced metastatic potential of HCC cells. Mutant p53 (Mutp53) silence could re-sensitize HCC resistant cells to arsenic trioxide and inhibit the metastatic activities, while mutp53 overexpression showed the opposite effects. Neither arsenic trioxide nor Nutlin-3 could exhibit obvious effects against arsenic trioxide resistant HCC cells, while combination of them showed significant effects. Nutlin-3 can not only increase the intracellular arsenicals through inhibition of p-gp but also promote the p73 activation and mutp53 degradation mediated by arsenic trioxide. In vivo experiments indicated that Nutlin-3 can potentiate the antitumor activities of arsenic trioxide in an orthotopic hepatic tumor model and inhibit the metastasis to lung. Conclusions: Acquisitions of p53 mutations contributed to the resistance of HCC to arsenic trioxide. Nutlin-3 could overcome arsenic trioxide resistance and inhibit tumor metastasis through p73 activation and promoting mutant p53 degradation mediated by arsenic trioxide

    Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history

    Get PDF
    Colobines are a unique group of Old World monkeys that principally eat leaves and seeds rather than fruits and insects. We report the sequencing at 146Ă— coverage, de novo assembly and analyses of the genome of a male golden snub-nosed monkey (Rhinopithecus roxellana) and resequencing at 30Ă— coverage of three related species (Rhinopithecus bieti, Rhinopithecus brelichi and Rhinopithecus strykeri). Comparative analyses showed that Asian colobines have an enhanced ability to derive energy from fatty acids and to degrade xenobiotics. We found evidence for functional evolution in the colobine RNASE1 gene, encoding a key secretory RNase that digests the high concentrations of bacterial RNA derived from symbiotic microflora. Demographic reconstructions indicated that the profile of ancient effective population sizes for R. roxellana more closely resembles that of giant panda rather than its congeners. These findings offer new insights into the dietary adaptations and evolutionary history of colobine primates

    Research on Optimal Charging of Power Lithium-Ion Batteries in Wide Temperature Range Based on Variable Weighting Factors

    No full text
    With the popularity of electric vehicles (EV), the charging technology has become one of the bottleneck problems that limit the large-scale deployment of EVs. In this paper, a charging method using multi-stage constant current based on SOC (MCCS) is proposed, and then the charging time, charging capacity and temperature increase of the battery are optimized by multi-objective particle swarm optimization (MOPSO) algorithm. The influence of the number of charging stages, the cut-off voltage, the combination of different target weight factors and the ambient temperature on the charging strategy is further compared and discussed. Finally, according to the ambient temperature and users’ requirements of charging time, a charging strategy suitable for the specific situation is obtained by adjusting the weight factors, and the results are analyzed and justified on the basis of the experiments. The results show that the proposed strategy can intelligently make more reasonable adjustments according to the ambient temperature on the basis of meeting the charging demands of users

    Research on the Combined Control Strategy of Low Temperature Charging and Heating of Lithium-Ion Power Battery Based on Adaptive Fuzzy Control

    No full text
    A low temperature environment will lead to the decrease of chemistry reaction rate and increase of the internal resistance of the lithium battery. In addition, the excessive charging current will cause the lithium to separate out and even the permanent attenuation of battery capacity. In order to solve these problems, this paper proposes a low-temperature charging heating combined control strategy, which takes the temperature acceptable charging current of the battery at low temperature as the charging current constraint and the maximum output power of the system as the power constraint. Firstly, a scheme of combined charging and heating control system is put forward. Secondly, the low temperature charging control strategy based on adaptive fuzzy control is established and then the model is simulated and analyzed in MATLAB software. At last, a Chroma 72,001 charge and discharge tester is used to conduct a low temperature test on 18,650 lithium iron phosphate battery monomers. The results show that the low-temperature charging control strategy proposed in this paper has a more stable temperature control effect on the battery, the constant current charging time of the battery is reduced by 14% compared with the traditional threshold control method, and the overall charging energy consumption is reduced by 5.6%

    Substrate Cleaning Threshold for Various Coated Al Alloys Using a Continuous-Wave Laser

    No full text
    In this study, different coatings (gray epoxy primer, white epoxy varnish and red alkyd paint) of 7075 aluminum alloy are cleaned with a 500 W continuous-wave (CW) fiber laser. We analyzed the influence of the laser power density on the temperature evolution and target surface morphology. Under the condition of continuous laser irradiation for 1 s, the experimental results indicated that the suitable cleaning thresholds of epoxy primer, epoxy primer and epoxy varnish, as well as epoxy primer, epoxy varnish and alkyd paint were 177.74, 192.89 and 147.44 W/mm2. The results show that the cleaning threshold of thicker three-layer paint target was smaller than the single-layer paint layer, and we analyze the mechanism of this phenomenon
    • …
    corecore