24 research outputs found
Progressive-Hint Prompting Improves Reasoning in Large Language Models
The performance of Large Language Models (LLMs) in reasoning tasks depends
heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency
being critical methods that enhance this ability. However, these methods do not
fully exploit the answers generated by the LLM to guide subsequent responses.
This paper proposes a new prompting method, named Progressive-Hint Prompting
(PHP), that enables automatic multiple interactions between users and LLMs by
using previously generated answers as hints to progressively guide toward the
correct answers. PHP is orthogonal to CoT and self-consistency, making it easy
to combine with state-of-the-art techniques to further improve performance. We
conducted an extensive and comprehensive evaluation to demonstrate the
effectiveness of the proposed method. Our experimental results on six
benchmarks show that combining CoT and self-consistency with PHP significantly
improves accuracy while remaining highly efficient. For instance, with
text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding
compared to Complex CoT, and a 46.17% reduction in sample paths with
self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances
on SVAMP (91.9%), GSM8K (95.5%) and AQuA (79.9%).Comment: Tech Repor
Aria-NeRF: Multimodal Egocentric View Synthesis
We seek to accelerate research in developing rich, multimodal scene models
trained from egocentric data, based on differentiable volumetric ray-tracing
inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like
model from an egocentric image sequence plays a pivotal role in understanding
human behavior and holds diverse applications within the realms of VR/AR. Such
egocentric NeRF-like models may be used as realistic simulations, contributing
significantly to the advancement of intelligent agents capable of executing
tasks in the real-world. The future of egocentric view synthesis may lead to
novel environment representations going beyond today's NeRFs by augmenting
visual data with multimodal sensors such as IMU for egomotion tracking, audio
sensors to capture surface texture and human language context, and eye-gaze
trackers to infer human attention patterns in the scene. To support and
facilitate the development and evaluation of egocentric multimodal scene
modeling, we present a comprehensive multimodal egocentric video dataset. This
dataset offers a comprehensive collection of sensory data, featuring RGB
images, eye-tracking camera footage, audio recordings from a microphone,
atmospheric pressure readings from a barometer, positional coordinates from
GPS, connectivity details from Wi-Fi and Bluetooth, and information from
dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The
dataset was collected with the Meta Aria Glasses wearable device platform. The
diverse data modalities and the real-world context captured within this dataset
serve as a robust foundation for furthering our understanding of human behavior
and enabling more immersive and intelligent experiences in the realms of VR,
AR, and robotics
Genetic liability to inflammatory bowel disease is causally associated with increased risk of erectile dysfunction: Evidence from a bidirectional Mendelian randomization study
Background: Several observational cohort studies suggested a close correlation between inflammatory bowel disease and erectile dysfunction. Nevertheless, whether there was a causal effect between them remained debatable. In this study, we aimed to detect the underlying causal links between genetically predicted inflammatory bowel disease and the risk of erectile dysfunction.Methods: A bidirectional Mendelian randomization (MR) study was performed to assess the causal link between inflammatory bowel disease and erectile dysfunction. Inverse variance weighted (IVW), MR-Egger, weighted median, weighted mode, and simple mode were utilized to estimate the causality. The top single nucleotide polymorphisms (SNPs) associated with inflammatory bowel disease cases (n = 25,800) and erectile dysfunction cases (n = 1,154) were extracted from the summary genome-wide association study (GWAS) data obtained from a publicly attainable database. MR-PRESSO global outlier test and MR-Egger regression were utilized to explore the horizontal pleiotropy and outlier instrumental variables. Cochran’s Q statistic was utilized to detect the heterogeneity.Results: In the forward MR study, the IVW approach demonstrated that genetically determined inflammatory bowel disease exhibited a suggestively causal association with an increased risk of erectile dysfunction (OR: 1.11, 95% CI: 1.02–1.21, p = 0.019), and also the genetically determined Crohn’s disease was found to be causally associated with an increased risk of erectile dysfunction (OR: 1.09, 95% CI: 1.02–1.17, p = 0.014). However, the MR analysis results showed no significant evidence supporting a causal effect of ulcerative colitis with erectile dysfunction (OR: 1.02, 95% CI: 0.92–1.14, p = 0.679). Furthermore, the reverse MR analysis showed no causal effects of genetically determined erectile dysfunction on inflammatory bowel disease. Additionally, sensitivity analysis demonstrated no pleiotropy and heterogeneity.Conclusion: Our MR analysis substantiated causal links of inflammatory bowel disease and Crohn’s disease on erectile dysfunction, which may further elucidate how inflammatory bowel disease impacted the initiation and development of erectile dysfunction, and facilitated the prevention and clinical management of inflammatory bowel disease in individuals with erectile dysfunction
Effects of Dual/Threefold Rootstock Grafting on the Plant Growth, Yield and Quality of Watermelon
To test the feasibility of multi-rootstock grafting, bottle gourd and pumpkin were used as rootstocks in a comparative analysis of the effects of single, dual, and threefold rootstock grafting on the plant growth, fruit yield, and quality of watermelon. Results showed that different grafts have significant effects on the abovementioned properties. The appropriate dual/threefold rootstock grafting allowed for higher survival rates. The combined rootstock of bottle gourd and pumpkin can enhance the plant growth potential and lower the incidence of wilt. The single fruit weight of the grafted plants with a combined rootstock from bottle gourd and pumpkin was the median of the weights obtained with the pumpkin rootstock and the bottle gourd rootstock. The plot yield of grafted plants with a pumpkin rootstock was higher than that of the plants with a bottle gourd rootstock. The low soluble solids content of the fruit grafted with a pumpkin rootstock had relatively high acidity, which could be improved by adding bottle gourd to the rootstock. The vitamin C content of the grafted fruit from the combined bottle gourd and pumpkin rootstock was higher than that of plants grafted with either bottle gourd or pumpkin alone. The subsequent analysis showed that the combined rootstock of bottle gourd and pumpkin has significant or extremely significant interaction effects on the stem diameter, number of leaves, single fruit weight, plot yield, and fruit vitamin C content of the grafted watermelon plants, which probably led to the higher related index values of some of grafting combinations
FIMO: A Challenge Formal Dataset for Automated Theorem Proving
We present FIMO, an innovative dataset comprising formal mathematical problem
statements sourced from the International Mathematical Olympiad (IMO)
Shortlisted Problems. Designed to facilitate advanced automated theorem proving
at the IMO level, FIMO is currently tailored for the Lean formal language. It
comprises 149 formal problem statements, accompanied by both informal problem
descriptions and their corresponding LaTeX-based informal proofs. Through
initial experiments involving GPT-4, our findings underscore the existing
limitations in current methodologies, indicating a substantial journey ahead
before achieving satisfactory IMO-level automated theorem proving outcomes
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
Recent advances in natural language processing, primarily propelled by Large
Language Models (LLMs), have showcased their remarkable capabilities grounded
in in-context learning. A promising avenue for guiding LLMs in intricate
reasoning tasks involves the utilization of intermediate reasoning steps within
the Chain-of-Thought (CoT) paradigm. Nevertheless, the central challenge lies
in the effective selection of exemplars for facilitating in-context learning.
In this study, we introduce a framework that leverages Dual Queries and
Low-rank approximation Re-ranking (DQ-LoRe) to automatically select exemplars
for in-context learning. Dual Queries first query LLM to obtain LLM-generated
knowledge such as CoT, then query the retriever to obtain the final exemplars
via both question and the knowledge. Moreover, for the second query, LoRe
employs dimensionality reduction techniques to refine exemplar selection,
ensuring close alignment with the input question's knowledge. Through extensive
experiments, we demonstrate that DQ-LoRe significantly outperforms prior
state-of-the-art methods in the automatic selection of exemplars for GPT-4,
enhancing performance from 92.5% to 94.2%. Our comprehensive analysis further
reveals that DQ-LoRe consistently outperforms retrieval-based approaches in
terms of both performance and adaptability, especially in scenarios
characterized by distribution shifts. DQ-LoRe pushes the boundary of in-context
learning and opens up new avenues for addressing complex reasoning challenges.
Our code is released at
https://github.com/AI4fun/DQ-LoRe}{https://github.com/AI4fun/DQ-LoRe.Comment: Accepted in ICLR 202
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models
Automated theorem proving (ATP) has become an appealing domain for exploring
the reasoning ability of the recent successful generative language models.
However, current ATP benchmarks mainly focus on symbolic inference, but rarely
involve the understanding of complex number combination reasoning. In this
work, we propose TRIGO, an ATP benchmark that not only requires a model to
reduce a trigonometric expression with step-by-step proofs but also evaluates a
generative LM's reasoning ability on formulas and its capability to manipulate,
group, and factor number terms. We gather trigonometric expressions and their
reduced forms from the web, annotate the simplification process manually, and
translate it into the Lean formal language system. We then automatically
generate additional examples from the annotated samples to expand the dataset.
Furthermore, we develop an automatic generator based on Lean-Gym to create
dataset splits of varying difficulties and distributions in order to thoroughly
analyze the model's generalization ability. Our extensive experiments show our
proposed TRIGO poses a new challenge for advanced generative LM's including
GPT-4 which is pre-trained on a considerable amount of open-source formal
theorem-proving language data, and provide a new tool to study the generative
LM's ability on both formal and mathematical reasoning.Comment: Accepted by EMNLP 2023. Code is available at
https://github.com/menik1126/TRIG
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Despite the success of large language models (LLMs), the task of theorem
proving still remains one of the hardest reasoning tasks that is far from being
fully solved. Prior methods using language models have demonstrated promising
results, but they still struggle to prove even middle school level theorems.
One common limitation of these methods is that they assume a fixed theorem
library during the whole theorem proving process. However, as we all know,
creating new useful theorems or even new theories is not only helpful but
crucial and necessary for advancing mathematics and proving harder and deeper
results. In this work, we present LEGO-Prover, which employs a growing skill
library containing verified lemmas as skills to augment the capability of LLMs
used in theorem proving. By constructing the proof modularly, LEGO-Prover
enables LLMs to utilize existing skills retrieved from the library and to
create new skills during the proving process. These skills are further evolved
(by prompting an LLM) to enrich the library on another scale. Modular and
reusable skills are constantly added to the library to enable tackling
increasingly intricate mathematical problems. Moreover, the learned library
further bridges the gap between human proofs and formal proofs by making it
easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass
rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%).
During the proving process, LEGO-Prover also manages to generate over 20,000
skills (theorems/lemmas) and adds them to the growing library. Our ablation
study indicates that these newly added skills are indeed helpful for proving
theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We
also release our code and all the generated skills
MPC-based path following design for automated vehicles with rear wheel steering
Many studies have been recently exploited to discuss the path following control algorithms for automated vehicles using various control techniques. However, path following algorithm considering the possibility of automated vehicles with rear wheel steering (RWS) is still less investigated. In this study, we implemented nonlinear model predictive control (NMPC) on a passenger vehicle with active RWS for path following. The controller was compared to two other variations of NMPC where the rear steering angle is proportional to the front or fixed to zero. Simulation results suggested that the proposed controller outperforms the other two variations and the baseline controllers (Stanley and LQR) in terms of accuracy and responsiveness.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Intelligent Vehicle