221 research outputs found
不完全な人間の誘導からのオフライン強化学習
京都大学新制・課程博士博士(情報学)甲第24856号情博第838号新制||情||140(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 鹿島, 久嗣, 教授 河原, 達也, 教授 森本, 淳学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA
Batch Reinforcement Learning from Crowds
A shortcoming of batch reinforcement learning is its requirement for rewards
in data, thus not applicable to tasks without reward functions. Existing
settings for lack of reward, such as behavioral cloning, rely on optimal
demonstrations collected from humans. Unfortunately, extensive expertise is
required for ensuring optimality, which hinder the acquisition of large-scale
data for complex tasks. This paper addresses the lack of reward in a batch
reinforcement learning setting by learning a reward function from preferences.
Generating preferences only requires a basic understanding of a task. Being a
mental process, generating preferences is faster than performing
demonstrations. So preferences can be collected at scale from non-expert humans
using crowdsourcing. This paper tackles a critical challenge that emerged when
collecting data from non-expert humans: the noise in preferences. A novel
probabilistic model is proposed for modelling the reliability of labels, which
utilizes labels collaboratively. Moreover, the proposed model smooths the
estimation with a learned reward function. Evaluation on Atari datasets
demonstrates the effectiveness of the proposed model, followed by an ablation
study to analyze the relative importance of the proposed ideas.Comment: 16 pages. Accepted by ECML-PKDD 202
Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning
Offline reinforcement learning (RL) have received rising interest due to its
appealing data efficiency. The present study addresses behavior estimation, a
task that lays the foundation of many offline RL algorithms. Behavior
estimation aims at estimating the policy with which training data are
generated. In particular, this work considers a scenario where the data are
collected from multiple sources. In this case, neglecting data heterogeneity,
existing approaches for behavior estimation suffers from behavior
misspecification. To overcome this drawback, the present study proposes a
latent variable model to infer a set of policies from data, which allows an
agent to use as behavior policy the policy that best describes a particular
trajectory. This model provides with a agent fine-grained characterization for
multi-source data and helps it overcome behavior misspecification. This work
also proposes a learning algorithm for this model and illustrates its practical
usage via extending an existing offline RL algorithm. Lastly, with extensive
evaluation this work confirms the existence of behavior misspecification and
the efficacy of the proposed model.Comment: Accepted by AAAI 2023. Fixed errors in Fig. 4 presented in the
camera-ready version and Table
On Modeling Long-Term User Engagement from Stochastic Feedback
An ultimate goal of recommender systems (RS) is to improve user engagement.
Reinforcement learning (RL) is a promising paradigm for this goal, as it
directly optimizes overall performance of sequential recommendation. However,
many existing RL-based approaches induce huge computational overhead, because
they require not only the recommended items but also all other candidate items
to be stored. This paper proposes an efficient alternative that does not
require the candidate items. The idea is to model the correlation between user
engagement and items directly from data. Moreover, the proposed approach
consider randomness in user feedback and termination behavior, which are
ubiquitous for RS but rarely discussed in RL-based prior work. With online A/B
experiments on real-world RS, we confirm the efficacy of the proposed approach
and the importance of modeling the two types of randomness.Comment: Accepted by the workshop on decision making for information retrieval
and recommender systems (the Web Conference 2023
Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction
Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is
crucial in industrial production, yet existing models often struggle with
limited generalization capabilities due to their inability to fully process all
vibration signal patterns. We introduce a novel multi-input autoregressive
model to address this challenge in RUL prediction for bearings. Our approach
uniquely integrates vibration signals with previously predicted Health
Indicator (HI) values, employing feature fusion to output current window HI
values. Through autoregressive iterations, the model attains a global receptive
field, effectively overcoming the limitations in generalization. Furthermore,
we innovatively incorporate a segmentation method and multiple training
iterations to mitigate error accumulation in autoregressive models. Empirical
evaluation on the PMH2012 dataset demonstrates that our model, compared to
other backbone networks using similar autoregressive approaches, achieves
significantly lower Root Mean Square Error (RMSE) and Score. Notably, it
outperforms traditional autoregressive models that use label values as inputs
and non-autoregressive networks, showing superior generalization abilities with
a marked lead in RMSE and Score metrics
Utilizing VQ-VAE for End-to-End Health Indicator Generation in Predicting Rolling Bearing RUL
The prediction of the remaining useful life (RUL) of rolling bearings is a
pivotal issue in industrial production. A crucial approach to tackling this
issue involves transforming vibration signals into health indicators (HI) to
aid model training. This paper presents an end-to-end HI construction method,
vector quantised variational autoencoder (VQ-VAE), which addresses the need for
dimensionality reduction of latent variables in traditional unsupervised
learning methods such as autoencoder. Moreover, concerning the inadequacy of
traditional statistical metrics in reflecting curve fluctuations accurately,
two novel statistical metrics, mean absolute distance (MAD) and mean variance
(MV), are introduced. These metrics accurately depict the fluctuation patterns
in the curves, thereby indicating the model's accuracy in discerning similar
features. On the PMH2012 dataset, methods employing VQ-VAE for label
construction achieved lower values for MAD and MV. Furthermore, the ASTCN
prediction model trained with VQ-VAE labels demonstrated commendable
performance, attaining the lowest values for MAD and MV.Comment: 17 figure
Arylsulfonyl indoline-enzamide exhibits inhibitory effect on nasopharyngeal carcinoma
Purpose: To investigate the effect of arylsulfonyl indoline-benzamide (ASIB) on the viability of nasopharyngeal carcinoma (NPC) cells, and the underlying mechanism of action.
Methods: The viability of C666 and NPC 039 cells was determined using 3-(4, 5-dimethylthiazol-2-yl) 2, 5-diphenyltetrazolium bromide (MTT) assay. Cell migration was analysed by wound healing assay, while protein expression levels of matrix metalloproteinases (MMPs), p50, p65 and NF κB were assayed using western blotting.
Results: MTT assay results showed that ASIB treatment led to significant and dose-dependent reductions in the viability of C666 and NPC 039 (p ˂ 0.05). The migration and invasive potential of C666 cells were decreased on incubation with ASIB for 48 h. Western blotting data showed significant decrease in MMP 2/9 expressions in C666 cells on treatment with ASIB (p ˂ 0.05). The levels of p65 and p50 in the nuclear fraction of C666 cells were markedly lower than those in the negative control group. Arylsulfonyl indoline-benzamide (ASIB) treatment for 48 h decreased the level of NF κB expression in C666 cells (p ˂ 0.05). The volume of tumor excised from ASIB-treated NPC mice was lower than that of the untreated group.
Conclusion: Arylsulfonyl indoline-benzamide (ASIB) exhibits inhibitory effects on the viability and metastasis potential of NPC cells. Thus, it may be beneficial in the treatment of nasopharyngeal carcinoma but this has to be further investigated
Estimating Treatment Effects Under Heterogeneous Interference
Treatment effect estimation can assist in effective decision-making in
e-commerce, medicine, and education. One popular application of this estimation
lies in the prediction of the impact of a treatment (e.g., a promotion) on an
outcome (e.g., sales) of a particular unit (e.g., an item), known as the
individual treatment effect (ITE). In many online applications, the outcome of
a unit can be affected by the treatments of other units, as units are often
associated, which is referred to as interference. For example, on an online
shopping website, sales of an item will be influenced by an advertisement of
its co-purchased item. Prior studies have attempted to model interference to
estimate the ITE accurately, but they often assume a homogeneous interference,
i.e., relationships between units only have a single view. However, in
real-world applications, interference may be heterogeneous, with multi-view
relationships. For instance, the sale of an item is usually affected by the
treatment of its co-purchased and co-viewed items. We hypothesize that ITE
estimation will be inaccurate if this heterogeneous interference is not
properly modeled. Therefore, we propose a novel approach to model heterogeneous
interference by developing a new architecture to aggregate information from
diverse neighbors. Our proposed method contains graph neural networks that
aggregate same-view information, a mechanism that aggregates information from
different views, and attention mechanisms. In our experiments on multiple
datasets with heterogeneous interference, the proposed method significantly
outperforms existing methods for ITE estimation, confirming the importance of
modeling heterogeneous interference
- …