Search CORE

980 research outputs found

Enabling Large Language Models to Generate Text with Citations

Author: Chen Danqi
Gao Tianyu
Yen Howard
Yu Jiatong
Publication venue
Publication date: 31/10/2023
Field of study

Large language models (LLMs) have emerged as a widely-used tool for information seeking, but their generated outputs are prone to hallucination. In this work, our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability. Existing work mainly relies on commercial search engines and human evaluation, making it challenging to reproduce and compare different modeling approaches. We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation. ALCE collects a diverse set of questions and retrieval corpora and requires building end-to-end systems to retrieve supporting evidence and generate answers with citations. We develop automatic metrics along three dimensions -- fluency, correctness, and citation quality -- and demonstrate their strong correlation with human judgements. Our experiments with state-of-the-art LLMs and novel prompting strategies show that current systems have considerable room for improvement -- For example, on the ELI5 dataset, even the best models lack complete citation support 50% of the time. Our analyses further highlight promising future directions, including developing better retrievers, advancing long-context LLMs, and improving the ability to synthesize information from multiple sources.Comment: Accepted by EMNLP 2023. Code and data are available at https://github.com/princeton-nlp/ALC

arXiv.org e-Print Archive

DAMM: Directionality-Aware Mixture Model Parallel Sampling for Efficient Dynamical System Learning

Author: Figueroa Nadia
Gao Haihui
Li Tianyu
Sun Sunan
Publication venue
Publication date: 05/09/2023
Field of study

The Linear Parameter Varying Dynamical System (LPV-DS) is a promising framework for learning stable time-invariant motion policies in robot control. By employing statistical modeling and semi-definite optimization, LPV-DS encodes complex motions via non-linear DS, ensuring the robustness and stability of the system. However, the current LPV-DS scheme faces challenges in accurately interpreting trajectory data while maintaining model efficiency and computational efficiency. To address these limitations, we propose the Directionality-aware Mixture Model (DAMM), a new statistical model that leverages Riemannian metric on

d

-dimensional sphere

\mathbb{S}^d

, and efficiently incorporates non-Euclidean directional information with position. Additionally, we introduce a hybrid Markov chain Monte Carlo method that combines the Gibbs Sampling and the Split/Merge Proposal, facilitating parallel computation and enabling faster inference for near real-time learning performance. Through extensive empirical validation, we demonstrate that the improved LPV-DS framework with DAMM is capable of producing physically-meaningful representations of the trajectory data and improved performance of the generated DS while showcasing significantly enhanced learning speed compared to its previous iterations

arXiv.org e-Print Archive

Free electron emission in vacuum assisted by photonic time crystals

Author: Dong Tianyu
Gao Xiaoke
Ma Xikui
Zhao Xiaoyu
Publication venue
Publication date: 02/11/2023
Field of study

The Cerenkov radiation and the Smith-Purcell effect state that free electron emission occurs exclusively in dielectrics when the velocity of the particles exceeds the speed of light in the medium or in the vicinity of periodic gratings close to each other within a vacuum. We demonstrate that free electrons in a vacuum can also emit highly directional monochromatic waves when they are in close proximity to a medium that is periodically modulated temporally, suggesting the existence of temporal Smith-Purcell effect. The momentum band gaps of time-varying media, such as photonic time crystals (PTCs), create new pathways for the injection of external energy, allowing the frequency, intensity, and spatial distribution of the electromagnetic fields to be controlled. Moreover, the PTC substrate enables the conversion of localized evanescent fields into amplified, highly directional propagating plane waves that are only sensitive to the velocity of particles and the modulation frequency, which allows us to observe and utilize Cerenkov-like radiation in free space. Our work exhibits significant opportunities for the utilization of time-varying structures in various fields, including particle identification, ultraweak signal detection, and improved radiation source design

arXiv.org e-Print Archive

Tight Collision Probability for UAV Motion Planning in Uncertain Environment

Author: Gao Fei
Liu Tianyu
Pan Jia
Zhang Fu
Publication venue
Publication date: 27/09/2023
Field of study

Operating unmanned aerial vehicles (UAVs) in complex environments that feature dynamic obstacles and external disturbances poses significant challenges, primarily due to the inherent uncertainty in such scenarios. Additionally, inaccurate robot localization and modeling errors further exacerbate these challenges. Recent research on UAV motion planning in static environments has been unable to cope with the rapidly changing surroundings, resulting in trajectories that may not be feasible. Moreover, previous approaches that have addressed dynamic obstacles or external disturbances in isolation are insufficient to handle the complexities of such environments. This paper proposes a reliable motion planning framework for UAVs, integrating various uncertainties into a chance constraint that characterizes the uncertainty in a probabilistic manner. The chance constraint provides a probabilistic safety certificate by calculating the collision probability between the robot's Gaussian-distributed forward reachable set and states of obstacles. To reduce the conservatism of the planned trajectory, we propose a tight upper bound of the collision probability and evaluate it both exactly and approximately. The approximated solution is used to generate motion primitives as a reference trajectory, while the exact solution is leveraged to iteratively optimize the trajectory for better results. Our method is thoroughly tested in simulation and real-world experiments, verifying its reliability and effectiveness in uncertain environments.Comment: Paper Accepted by IROS 202

arXiv.org e-Print Archive

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Author: Chen Danqi
Gao Tianyu
Xia Mengzhou
Zeng Zhiyuan
Publication venue
Publication date: 10/10/2023
Field of study

The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs. Regardless, the cost of training such models from scratch on trillions of tokens remains high. In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains. We demonstrate the efficacy of our approach by presenting the Sheared-LLaMA series, pruning the LLaMA2-7B model down to 1.3B and 2.7B parameters. Sheared-LLaMA models outperform state-of-the-art open-source models of equivalent sizes, such as Pythia, INCITE, and OpenLLaMA models, on a wide range of downstream and instruction tuning evaluations, while requiring only 3% of compute compared to training such models from scratch. This work provides compelling evidence that leveraging existing LLMs with structured pruning is a far more cost-effective approach for building smaller LLMs.Comment: The code and models are available at https://github.com/princeton-nlp/LLM-Shearin

arXiv.org e-Print Archive