24 research outputs found

    Progressive-Hint Prompting Improves Reasoning in Large Language Models

    Full text link
    The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted an extensive and comprehensive evaluation to demonstrate the effectiveness of the proposed method. Our experimental results on six benchmarks show that combining CoT and self-consistency with PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (91.9%), GSM8K (95.5%) and AQuA (79.9%).Comment: Tech Repor

    Aria-NeRF: Multimodal Egocentric View Synthesis

    Full text link
    We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics

    Genetic liability to inflammatory bowel disease is causally associated with increased risk of erectile dysfunction: Evidence from a bidirectional Mendelian randomization study

    Get PDF
    Background: Several observational cohort studies suggested a close correlation between inflammatory bowel disease and erectile dysfunction. Nevertheless, whether there was a causal effect between them remained debatable. In this study, we aimed to detect the underlying causal links between genetically predicted inflammatory bowel disease and the risk of erectile dysfunction.Methods: A bidirectional Mendelian randomization (MR) study was performed to assess the causal link between inflammatory bowel disease and erectile dysfunction. Inverse variance weighted (IVW), MR-Egger, weighted median, weighted mode, and simple mode were utilized to estimate the causality. The top single nucleotide polymorphisms (SNPs) associated with inflammatory bowel disease cases (n = 25,800) and erectile dysfunction cases (n = 1,154) were extracted from the summary genome-wide association study (GWAS) data obtained from a publicly attainable database. MR-PRESSO global outlier test and MR-Egger regression were utilized to explore the horizontal pleiotropy and outlier instrumental variables. Cochran’s Q statistic was utilized to detect the heterogeneity.Results: In the forward MR study, the IVW approach demonstrated that genetically determined inflammatory bowel disease exhibited a suggestively causal association with an increased risk of erectile dysfunction (OR: 1.11, 95% CI: 1.02–1.21, p = 0.019), and also the genetically determined Crohn’s disease was found to be causally associated with an increased risk of erectile dysfunction (OR: 1.09, 95% CI: 1.02–1.17, p = 0.014). However, the MR analysis results showed no significant evidence supporting a causal effect of ulcerative colitis with erectile dysfunction (OR: 1.02, 95% CI: 0.92–1.14, p = 0.679). Furthermore, the reverse MR analysis showed no causal effects of genetically determined erectile dysfunction on inflammatory bowel disease. Additionally, sensitivity analysis demonstrated no pleiotropy and heterogeneity.Conclusion: Our MR analysis substantiated causal links of inflammatory bowel disease and Crohn’s disease on erectile dysfunction, which may further elucidate how inflammatory bowel disease impacted the initiation and development of erectile dysfunction, and facilitated the prevention and clinical management of inflammatory bowel disease in individuals with erectile dysfunction

    Effects of Dual/Threefold Rootstock Grafting on the Plant Growth, Yield and Quality of Watermelon

    Get PDF
    To test the feasibility of multi-rootstock grafting, bottle gourd and pumpkin were used as rootstocks in a comparative analysis of the effects of single, dual, and threefold rootstock grafting on the plant growth, fruit yield, and quality of watermelon. Results showed that different grafts have significant effects on the abovementioned properties. The appropriate dual/threefold rootstock grafting allowed for higher survival rates. The combined rootstock of bottle gourd and pumpkin can enhance the plant growth potential and lower the incidence of wilt. The single fruit weight of the grafted plants with a combined rootstock from bottle gourd and pumpkin was the median of the weights obtained with the pumpkin rootstock and the bottle gourd rootstock. The plot yield of grafted plants with a pumpkin rootstock was higher than that of the plants with a bottle gourd rootstock. The low soluble solids content of the fruit grafted with a pumpkin rootstock had relatively high acidity, which could be improved by adding bottle gourd to the rootstock. The vitamin C content of the grafted fruit from the combined bottle gourd and pumpkin rootstock was higher than that of plants grafted with either bottle gourd or pumpkin alone. The subsequent analysis showed that the combined rootstock of bottle gourd and pumpkin has significant or extremely significant interaction effects on the stem diameter, number of leaves, single fruit weight, plot yield, and fruit vitamin C content of the grafted watermelon plants, which probably led to the higher related index values of some of grafting combinations

    FIMO: A Challenge Formal Dataset for Automated Theorem Proving

    Full text link
    We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes

    DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning

    Full text link
    Recent advances in natural language processing, primarily propelled by Large Language Models (LLMs), have showcased their remarkable capabilities grounded in in-context learning. A promising avenue for guiding LLMs in intricate reasoning tasks involves the utilization of intermediate reasoning steps within the Chain-of-Thought (CoT) paradigm. Nevertheless, the central challenge lies in the effective selection of exemplars for facilitating in-context learning. In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking (DQ-LoRe) to automatically select exemplars for in-context learning. Dual Queries first query LLM to obtain LLM-generated knowledge such as CoT, then query the retriever to obtain the final exemplars via both question and the knowledge. Moreover, for the second query, LoRe employs dimensionality reduction techniques to refine exemplar selection, ensuring close alignment with the input question's knowledge. Through extensive experiments, we demonstrate that DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%. Our comprehensive analysis further reveals that DQ-LoRe consistently outperforms retrieval-based approaches in terms of both performance and adaptability, especially in scenarios characterized by distribution shifts. DQ-LoRe pushes the boundary of in-context learning and opens up new avenues for addressing complex reasoning challenges. Our code is released at https://github.com/AI4fun/DQ-LoRe}{https://github.com/AI4fun/DQ-LoRe.Comment: Accepted in ICLR 202

    TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models

    Full text link
    Automated theorem proving (ATP) has become an appealing domain for exploring the reasoning ability of the recent successful generative language models. However, current ATP benchmarks mainly focus on symbolic inference, but rarely involve the understanding of complex number combination reasoning. In this work, we propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas and its capability to manipulate, group, and factor number terms. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We then automatically generate additional examples from the annotated samples to expand the dataset. Furthermore, we develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability. Our extensive experiments show our proposed TRIGO poses a new challenge for advanced generative LM's including GPT-4 which is pre-trained on a considerable amount of open-source formal theorem-proving language data, and provide a new tool to study the generative LM's ability on both formal and mathematical reasoning.Comment: Accepted by EMNLP 2023. Code is available at https://github.com/menik1126/TRIG

    LEGO-Prover: Neural Theorem Proving with Growing Libraries

    Full text link
    Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills

    MPC-based path following design for automated vehicles with rear wheel steering

    No full text
    Many studies have been recently exploited to discuss the path following control algorithms for automated vehicles using various control techniques. However, path following algorithm considering the possibility of automated vehicles with rear wheel steering (RWS) is still less investigated. In this study, we implemented nonlinear model predictive control (NMPC) on a passenger vehicle with active RWS for path following. The controller was compared to two other variations of NMPC where the rear steering angle is proportional to the front or fixed to zero. Simulation results suggested that the proposed controller outperforms the other two variations and the baseline controllers (Stanley and LQR) in terms of accuracy and responsiveness.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Intelligent Vehicle
    corecore