60 research outputs found
Positive-unlabeled learning for the prediction of conformational B-cell epitopes
© 2015 Ren et al. Background: The incomplete ground truth of training data of B-cell epitopes is a demanding issue in computational epitope prediction. The challenge is that only a small fraction of the surface residues of an antigen are confirmed as antigenic residues (positive training data); the remaining residues are unlabeled. As some of these uncertain residues can possibly be grouped to form novel but currently unknown epitopes, it is misguided to unanimously classify all the unlabeled residues as negative training data following the traditional supervised learning scheme. Results: We propose a positive-unlabeled learning algorithm to address this problem. The key idea is to distinguish between epitope-likely residues and reliable negative residues in unlabeled data. The method has two steps: (1) identify reliable negative residues using a weighted SVM with a high recall; and (2) construct a classification model on the positive residues and the reliable negative residues. Complex-based 10-fold cross-validation was conducted to show that this method outperforms those commonly used predictors DiscoTope 2.0, ElliPro and SEPPA 2.0 in every aspect. We conducted four case studies, in which the approach was tested on antigens of West Nile virus, dihydrofolate reductase, beta-lactamase, and two Ebola antigens whose epitopes are currently unknown. All the results were assessed on a newly-established data set of antigen structures not bound by antibodies, instead of on antibody-bound antigen structures. These bound structures may contain unfair binding information such as bound-state B-factors and protrusion index which could exaggerate the epitope prediction performance. Source codes are available on request
Model-free trajectory optimization for reinforcement learning
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update.
However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy.
In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation
of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system
dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics
Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
While bisimulation-based approaches hold promise for learning robust state
representations for Reinforcement Learning (RL) tasks, their efficacy in
offline RL tasks has not been up to par. In some instances, their performance
has even significantly underperformed alternative methods. We aim to understand
why bisimulation methods succeed in online settings, but falter in offline
tasks. Our analysis reveals that missing transitions in the dataset are
particularly harmful to the bisimulation principle, leading to ineffective
estimation. We also shed light on the critical role of reward scaling in
bounding the scale of bisimulation measurements and of the value error they
induce. Based on these findings, we propose to apply the expectile operator for
representation learning to our offline RL setting, which helps to prevent
overfitting to incomplete data. Meanwhile, by introducing an appropriate reward
scaling strategy, we avoid the risk of feature collapse in representation
space. We implement these recommendations on two state-of-the-art
bisimulation-based algorithms, MICo and SimSR, and demonstrate performance
gains on two benchmark suites: D4RL and Visual D4RL. Codes are provided at
\url{https://github.com/zanghyu/Offline_Bisimulation}.Comment: NeurIPS 202
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
We propose to directly map raw visual observations and text input to actions
for instruction execution. While existing approaches assume access to
structured environment representations or use a pipeline of separately trained
models, we learn a single model to jointly reason about linguistic and visual
input. We use reinforcement learning in a contextual bandit setting to train a
neural network agent. To guide the agent's exploration, we use reward shaping
with different forms of supervision. Our approach does not require intermediate
representations, planning procedures, or training different models. We evaluate
in a simulated environment, and show significant improvements over supervised
learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 201
Model-Free Trajectory-based Policy Optimization with Monotonic Improvement
Many of the recent trajectory optimization algorithms alternate between linear approximation
of the system dynamics around the mean trajectory and conservative policy update.
One way of constraining the policy change is by bounding the Kullback-Leibler (KL)
divergence between successive policies. These approaches already demonstrated great experimental
success in challenging problems such as end-to-end control of physical systems.
However, these approaches lack any improvement guarantee as the linear approximation of
the system dynamics can introduce a bias in the policy update and prevent convergence
to the optimal policy. In this article, we propose a new model-free trajectory-based policy
optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates
a local, quadratic and time-dependent Q-Function learned from trajectory data
instead of a model of the system dynamics. Our policy update ensures exact KL-constraint
satisfaction without simplifying assumptions on the system dynamics. We experimentally
demonstrate on highly non-linear control tasks the improvement in performance of our algorithm
in comparison to approaches linearizing the system dynamics. In order to show the
monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of
our policy update scheme to derive a lower bound of the change in policy return between
successive iterations
- …