327 research outputs found
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
Robustness has been extensively studied in reinforcement learning (RL) to
handle various forms of uncertainty such as random perturbations, rare events,
and malicious attacks. In this work, we consider one critical type of
robustness against spurious correlation, where different portions of the state
do not have correlations induced by unobserved confounders. These spurious
correlations are ubiquitous in real-world tasks, for instance, a self-driving
car usually observes heavy traffic in the daytime and light traffic at night
due to unobservable human activity. A model that learns such useless or even
harmful correlation could catastrophically fail when the confounder in the test
case deviates from the training one. Although motivated, enabling robustness
against spurious correlation poses significant challenges since the uncertainty
set, shaped by the unobserved confounder and causal structure, is difficult to
characterize and identify. Existing robust algorithms that assume simple and
unstructured uncertainty sets are therefore inadequate to address this
challenge. To solve this issue, we propose Robust State-Confounded Markov
Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in
avoiding learning spurious correlations compared with other robust RL
counterparts. We also design an empirical algorithm to learn the robust optimal
policy for RSC-MDPs, which outperforms all baselines in eight realistic
self-driving and manipulation tasks.Comment: Accepted to NeurIPS 202
Seasonal variability does not impact in vitro fertilization success
Peer reviewedPublisher PD
Solving Math Word Problems with Reexamination
Math word problem (MWP) solving aims to understand the descriptive math
problem and calculate the result, for which previous efforts are mostly devoted
to upgrade different technical modules. This paper brings a different
perspective of \textit{reexamination process} during training by introducing a
pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual)
learning scheme to model such process, which is model-agnostic thus can be
adapted to any existing MWP solvers. The pseudo-dual task is specifically
defined as filling the numbers in the expression back into the original word
problem with numbers masked. To facilitate the effective joint learning of the
two tasks, we further design a scheduled fusion strategy for the number
infilling task, which smoothly switches the input from the ground-truth math
expressions to the predicted ones. Our pseudo-dual learning scheme has been
tested and proven effective when being equipped in several representative MWP
solvers through empirical studies. \textit{The codes and trained models are
available at:} \url{https://github.com/steven640pixel/PsedualMWP}.
\end{abstract}Comment: To be appeared at NeurIPS2023 Workshop on MATH-A
LLaSM: Large Language and Speech Model
Multi-modal large language models have garnered significant interest
recently. Though, most of the works focus on vision-language multi-modal models
providing strong capabilities in following vision-and-language instructions.
However, we claim that speech is also an important modality through which
humans interact with the world. Hence, it is crucial for a general-purpose
assistant to be able to follow multi-modal speech-and-language instructions. In
this work, we propose Large Language and Speech Model (LLaSM). LLaSM is an
end-to-end trained large multi-modal speech-language model with cross-modal
conversational abilities, capable of following speech-and-language
instructions. Our early experiments show that LLaSM demonstrates a more
convenient and natural way for humans to interact with artificial intelligence.
Specifically, we also release a large Speech Instruction Following dataset
LLaSM-Audio-Instructions. Code and demo are available at
https://github.com/LinkSoul-AI/LLaSM and
https://huggingface.co/spaces/LinkSoul/LLaSM. The LLaSM-Audio-Instructions
dataset is available at
https://huggingface.co/datasets/LinkSoul/LLaSM-Audio-Instructions
Non-Autoregressive Sentence Ordering
Existing sentence ordering approaches generally employ encoder-decoder
frameworks with the pointer net to recover the coherence by recurrently
predicting each sentence step-by-step. Such an autoregressive manner only
leverages unilateral dependencies during decoding and cannot fully explore the
semantic dependency between sentences for ordering. To overcome these
limitations, in this paper, we propose a novel Non-Autoregressive Ordering
Network, dubbed \textit{NAON}, which explores bilateral dependencies between
sentences and predicts the sentence for each position in parallel. We claim
that the non-autoregressive manner is not just applicable but also particularly
suitable to the sentence ordering task because of two peculiar characteristics
of the task: 1) each generation target is in deterministic length, and 2) the
sentences and positions should match exclusively. Furthermore, to address the
repetition issue of the naive non-autoregressive Transformer, we introduce an
exclusive loss to constrain the exclusiveness between positions and sentences.
To verify the effectiveness of the proposed model, we conduct extensive
experiments on several common-used datasets and the experimental results show
that our method outperforms all the autoregressive approaches and yields
competitive performance compared with the state-of-the-arts. The codes are
available at:
\url{https://github.com/steven640pixel/nonautoregressive-sentence-ordering}.Comment: Accepted at Findings of EMNLP202
CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller
We present CAJun, a novel hierarchical learning and control framework that
enables legged robots to jump continuously with adaptive jumping distances.
CAJun consists of a high-level centroidal policy and a low-level leg
controller. In particular, we use reinforcement learning (RL) to train the
centroidal policy, which specifies the gait timing, base velocity, and swing
foot position for the leg controller. The leg controller optimizes motor
commands for the swing and stance legs according to the gait timing to track
the swing foot target and base velocity commands using optimal control.
Additionally, we reformulate the stance leg optimizer in the leg controller to
speed up policy training by an order of magnitude. Our system combines the
versatility of learning with the robustness of optimal control. By combining RL
with optimal control methods, our system achieves the versatility of learning
while enjoys the robustness from control methods, making it easily transferable
to real robots. We show that after 20 minutes of training on a single GPU,
CAJun can achieve continuous, long jumps with adaptive distances on a Go1 robot
with small sim-to-real gaps. Moreover, the robot can jump across gaps with a
maximum width of 70cm, which is over 40% wider than existing methods.Comment: Please visit https://yxyang.github.io/cajun/ for additional result
Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution
Click-through rate (CTR) prediction is an essential task in industrial
applications such as video recommendation. Recently, deep learning models have
been proposed to learn the representation of users' overall interests, while
ignoring the fact that interests may dynamically change over time. We argue
that it is necessary to consider the continuous-time information in CTR models
to track user interest trend from rich historical behaviors. In this paper, we
propose a novel Deep Time-Stream framework (DTS) which introduces the time
information by an ordinary differential equations (ODE). DTS continuously
models the evolution of interests using a neural network, and thus is able to
tackle the challenge of dynamically representing users' interests based on
their historical behaviors. In addition, our framework can be seamlessly
applied to any existing deep CTR models by leveraging the additional
Time-Stream Module, while no changes are made to the original CTR models.
Experiments on public dataset as well as real industry dataset with billions of
samples demonstrate the effectiveness of proposed approaches, which achieve
superior performance compared with existing methods.Comment: 8 pages. arXiv admin note: text overlap with arXiv:1809.03672 by
other author
Non-Autoregressive Math Word Problem Solver with Unified Tree Structure
Existing MWP solvers employ sequence or binary tree to present the solution
expression and decode it from given problem description. However, such
structures fail to handle the variants that can be derived via mathematical
manipulation, e.g., and can both be
possible valid solutions for a same problem but formulated as different
expression sequences or trees. The multiple solution variants depicting
different possible solving procedures for the same input problem would raise
two issues: 1) making it hard for the model to learn the mapping function
between the input and output spaces effectively, and 2) wrongly indicating
\textit{wrong} when evaluating a valid expression variant. To address these
issues, we introduce a unified tree structure to present a solution expression,
where the elements are permutable and identical for all the expression
variants. We propose a novel non-autoregressive solver, named \textit{MWP-NAS},
to parse the problem and deduce the solution expression based on the unified
tree. For evaluating the possible expression variants, we design a path-based
metric to evaluate the partial accuracy of expressions of a unified tree. The
results from extensive experiments conducted on Math23K and MAWPS demonstrate
the effectiveness of our proposed MWP-NAS. The codes and checkpoints are
available at: \url{https://github.com/mengqunhan/MWP-NAS}.Comment: Accepted at EMNLP202
Recommended from our members
The interplay between thermodynamics and kinetics in the solid-state synthesis of layered oxides.
In the synthesis of inorganic materials, reactions often yield non-equilibrium kinetic byproducts instead of the thermodynamic equilibrium phase. Understanding the competition between thermodynamics and kinetics is a fundamental step towards the rational synthesis of target materials. Here, we use in situ synchrotron X-ray diffraction to investigate the multistage crystallization pathways of the important two-layer (P2) sodium oxides Na0.67MO2 (M = Co, Mn). We observe a series of fast non-equilibrium phase transformations through metastable three-layer O3, O3' and P3 phases before formation of the equilibrium two-layer P2 polymorph. We present a theoretical framework to rationalize the observed phase progression, demonstrating that even though P2 is the equilibrium phase, compositionally unconstrained reactions between powder precursors favour the formation of non-equilibrium three-layered intermediates. These insights can guide the choice of precursors and parameters employed in the solid-state synthesis of ceramic materials, and constitutes a step forward in unravelling the complex interplay between thermodynamics and kinetics during materials synthesis
- …