980 research outputs found
Enabling Large Language Models to Generate Text with Citations
Large language models (LLMs) have emerged as a widely-used tool for
information seeking, but their generated outputs are prone to hallucination. In
this work, our aim is to allow LLMs to generate text with citations, improving
their factual correctness and verifiability. Existing work mainly relies on
commercial search engines and human evaluation, making it challenging to
reproduce and compare different modeling approaches. We propose ALCE, the first
benchmark for Automatic LLMs' Citation Evaluation. ALCE collects a diverse set
of questions and retrieval corpora and requires building end-to-end systems to
retrieve supporting evidence and generate answers with citations. We develop
automatic metrics along three dimensions -- fluency, correctness, and citation
quality -- and demonstrate their strong correlation with human judgements. Our
experiments with state-of-the-art LLMs and novel prompting strategies show that
current systems have considerable room for improvement -- For example, on the
ELI5 dataset, even the best models lack complete citation support 50% of the
time. Our analyses further highlight promising future directions, including
developing better retrievers, advancing long-context LLMs, and improving the
ability to synthesize information from multiple sources.Comment: Accepted by EMNLP 2023. Code and data are available at
https://github.com/princeton-nlp/ALC
DAMM: Directionality-Aware Mixture Model Parallel Sampling for Efficient Dynamical System Learning
The Linear Parameter Varying Dynamical System (LPV-DS) is a promising
framework for learning stable time-invariant motion policies in robot control.
By employing statistical modeling and semi-definite optimization, LPV-DS
encodes complex motions via non-linear DS, ensuring the robustness and
stability of the system. However, the current LPV-DS scheme faces challenges in
accurately interpreting trajectory data while maintaining model efficiency and
computational efficiency. To address these limitations, we propose the
Directionality-aware Mixture Model (DAMM), a new statistical model that
leverages Riemannian metric on -dimensional sphere , and
efficiently incorporates non-Euclidean directional information with position.
Additionally, we introduce a hybrid Markov chain Monte Carlo method that
combines the Gibbs Sampling and the Split/Merge Proposal, facilitating parallel
computation and enabling faster inference for near real-time learning
performance. Through extensive empirical validation, we demonstrate that the
improved LPV-DS framework with DAMM is capable of producing
physically-meaningful representations of the trajectory data and improved
performance of the generated DS while showcasing significantly enhanced
learning speed compared to its previous iterations
Free electron emission in vacuum assisted by photonic time crystals
The Cerenkov radiation and the Smith-Purcell effect state that free electron
emission occurs exclusively in dielectrics when the velocity of the particles
exceeds the speed of light in the medium or in the vicinity of periodic
gratings close to each other within a vacuum. We demonstrate that free
electrons in a vacuum can also emit highly directional monochromatic waves when
they are in close proximity to a medium that is periodically modulated
temporally, suggesting the existence of temporal Smith-Purcell effect. The
momentum band gaps of time-varying media, such as photonic time crystals
(PTCs), create new pathways for the injection of external energy, allowing the
frequency, intensity, and spatial distribution of the electromagnetic fields to
be controlled. Moreover, the PTC substrate enables the conversion of localized
evanescent fields into amplified, highly directional propagating plane waves
that are only sensitive to the velocity of particles and the modulation
frequency, which allows us to observe and utilize Cerenkov-like radiation in
free space. Our work exhibits significant opportunities for the utilization of
time-varying structures in various fields, including particle identification,
ultraweak signal detection, and improved radiation source design
Tight Collision Probability for UAV Motion Planning in Uncertain Environment
Operating unmanned aerial vehicles (UAVs) in complex environments that
feature dynamic obstacles and external disturbances poses significant
challenges, primarily due to the inherent uncertainty in such scenarios.
Additionally, inaccurate robot localization and modeling errors further
exacerbate these challenges. Recent research on UAV motion planning in static
environments has been unable to cope with the rapidly changing surroundings,
resulting in trajectories that may not be feasible. Moreover, previous
approaches that have addressed dynamic obstacles or external disturbances in
isolation are insufficient to handle the complexities of such environments.
This paper proposes a reliable motion planning framework for UAVs, integrating
various uncertainties into a chance constraint that characterizes the
uncertainty in a probabilistic manner. The chance constraint provides a
probabilistic safety certificate by calculating the collision probability
between the robot's Gaussian-distributed forward reachable set and states of
obstacles. To reduce the conservatism of the planned trajectory, we propose a
tight upper bound of the collision probability and evaluate it both exactly and
approximately. The approximated solution is used to generate motion primitives
as a reference trajectory, while the exact solution is leveraged to iteratively
optimize the trajectory for better results. Our method is thoroughly tested in
simulation and real-world experiments, verifying its reliability and
effectiveness in uncertain environments.Comment: Paper Accepted by IROS 202
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged
moderate-sized large language models (LLMs) highlights the potential of
building smaller yet powerful LLMs. Regardless, the cost of training such
models from scratch on trillions of tokens remains high. In this work, we study
structured pruning as an effective means to develop smaller LLMs from
pre-trained, larger models. Our approach employs two key techniques: (1)
targeted structured pruning, which prunes a larger model to a specified target
shape by removing layers, heads, and intermediate and hidden dimensions in an
end-to-end manner, and (2) dynamic batch loading, which dynamically updates the
composition of sampled data in each training batch based on varying losses
across different domains. We demonstrate the efficacy of our approach by
presenting the Sheared-LLaMA series, pruning the LLaMA2-7B model down to 1.3B
and 2.7B parameters. Sheared-LLaMA models outperform state-of-the-art
open-source models of equivalent sizes, such as Pythia, INCITE, and OpenLLaMA
models, on a wide range of downstream and instruction tuning evaluations, while
requiring only 3% of compute compared to training such models from scratch.
This work provides compelling evidence that leveraging existing LLMs with
structured pruning is a far more cost-effective approach for building smaller
LLMs.Comment: The code and models are available at
https://github.com/princeton-nlp/LLM-Shearin
- …