Search CORE

44 research outputs found

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

Author: Bansal Mohit
Mei Hongyuan
Walter Matthew R.
Publication venue
Publication date: 17/12/2015
Field of study

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence "regions" salient to the current world state by using multiple abstractions of the input sentence. In contrast to existing methods, our model uses no specialized linguistic resources (e.g., parsers) or task-specific annotations (e.g., seed lexicons). It is therefore generalizable, yet still achieves the best results reported to-date on a benchmark single-sentence dataset and competitive results for the limited-training multi-sentence setting. We analyze our model through a series of ablations that elucidate the contributions of the primary components of our model.Comment: To appear at AAAI 2016 (and an extended version of a NIPS 2015 Multimodal Machine Learning workshop paper

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Neural Probabilistic Methods for Event Sequence Modeling

Author: Mei Hongyuan
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/09/2021
Field of study

This thesis focuses on modeling event sequences, namely, sequences of discrete events in continuous time. We build a family of generative probabilistic models that is able to reason about what events will happen in the future and when, given the history of previous events. Under our models, each event—as it happens—is allowed to update the future intensities of multiple event types, and the intensity of each event type—as nothing happens—is allowed to evolve with time along a trajectory. We use neural networks to allow the “updates” and “trajectories” to be complex and realistic. In the purely neural version of our model, all future event intensities are conditioned on the hidden state of a continuous-time LSTM, which has consumed every past event as it happened. To exploit domain-specific knowledge of how an event might only affect a few—but not all—future event intensities, we propose to introduce domain-specific structure into the model. We design a modeling language, by which a domain expert can write down the rules of a temporal deductive database. The database tracks facts over time; the rules deduce facts from other facts and from past events. Each fact has a time-varying state, computed by a neural network whose topology is determined by the fact’s provenance, including its experience of the past events that have contributed to deducing it. The possible event types at any time are given by special facts, whose intensities are neurally modeled alongside their states. We develop efficient methods for training our models, and doing inference with them. Applying the general principle of noise-contrastive estimation, we work out a stochastic training objective that is less expensive to optimize than the log-likelihood, which people typically maximize for parameter estimation. As in the discrete-time case that inspired us, the parameters that maximize our objective will provably maximize the log-likelihood as well. For the scenarios where we are given incomplete sequences, we propose particle smoothing—a form of sequential importance sampling—to impute the missing events. This thesis includes extensive experiments, demonstrating the effectiveness of our models and algorithms. On many synthetic and real-world datasets, on held-out sequences, we show empirically: (1) our purely neural model achieves competitive likelihood and predictive accuracy; (2) our neural-symbolic model improves prediction by encoding appropriate domain knowledge in the architecture; (3) for models to achieve the same level of log-likelihood, our noise-contrastive estimation needs considerably fewer function evaluations and less wall-clock time than maximum likelihood estimation; (4) our particle smoothing method is effective at inferring the ground-truth unobserved events. In this thesis, I will also discuss a few future research directions, including embedding our models within a reinforcement learner to discover causal structure and learn an intervention policy

JScholarship

Explicit Planning Helps Language Models in Logical Reasoning

Author: Mei Hongyuan
Wang Kangrui
Yu Mo
Zhao Hongyu
Publication venue
Publication date: 27/03/2023
Field of study

Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose a novel system that uses language models to perform multi-step logical reasoning. Our system incorporates explicit planning into its inference procedure, thus able to make more informed reasoning decisions at each step by looking ahead into their future effects. In our experiments, our full system significantly outperforms other competing systems. On a multiple-choice question answering task, our system performs competitively compared to GPT-3-davinci despite having only around 1.5B parameters. We conduct several ablation studies to demonstrate that explicit planning plays a crucial role in the system's performance

arXiv.org e-Print Archive

HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences

Author: Mei Hongyuan
Shi Xiaoming
Xue Siqiao
Zhang James Y
Publication venue
Publication date: 04/10/2022
Field of study

In this paper, we tackle the important yet under-investigated problem of making long-horizon prediction of event sequences. Existing state-of-the-art models do not perform well at this task due to their autoregressive structure. We propose HYPRO, a hybridly normalized probabilistic model that naturally fits this task: its first part is an autoregressive base model that learns to propose predictions; its second part is an energy function that learns to reweight the proposals such that more realistic predictions end up with higher probabilities. We also propose efficient training and inference algorithms for this model. Experiments on multiple real-world datasets demonstrate that our proposed HYPRO model can significantly outperform previous models at making long-horizon predictions of future events. We also conduct a range of ablation studies to investigate the effectiveness of each component of our proposed methods.Comment: NeurIPS 2022 camera-read

arXiv.org e-Print Archive