221 research outputs found

    不完全な人間の誘導からのオフライン強化学習

    Get PDF
    京都大学新制・課程博士博士(情報学)甲第24856号情博第838号新制||情||140(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 鹿島, 久嗣, 教授 河原, 達也, 教授 森本, 淳学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Batch Reinforcement Learning from Crowds

    Full text link
    A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward in a batch reinforcement learning setting by learning a reward function from preferences. Generating preferences only requires a basic understanding of a task. Being a mental process, generating preferences is faster than performing demonstrations. So preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences. A novel probabilistic model is proposed for modelling the reliability of labels, which utilizes labels collaboratively. Moreover, the proposed model smooths the estimation with a learned reward function. Evaluation on Atari datasets demonstrates the effectiveness of the proposed model, followed by an ablation study to analyze the relative importance of the proposed ideas.Comment: 16 pages. Accepted by ECML-PKDD 202

    Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that lays the foundation of many offline RL algorithms. Behavior estimation aims at estimating the policy with which training data are generated. In particular, this work considers a scenario where the data are collected from multiple sources. In this case, neglecting data heterogeneity, existing approaches for behavior estimation suffers from behavior misspecification. To overcome this drawback, the present study proposes a latent variable model to infer a set of policies from data, which allows an agent to use as behavior policy the policy that best describes a particular trajectory. This model provides with a agent fine-grained characterization for multi-source data and helps it overcome behavior misspecification. This work also proposes a learning algorithm for this model and illustrates its practical usage via extending an existing offline RL algorithm. Lastly, with extensive evaluation this work confirms the existence of behavior misspecification and the efficacy of the proposed model.Comment: Accepted by AAAI 2023. Fixed errors in Fig. 4 presented in the camera-ready version and Table

    On Modeling Long-Term User Engagement from Stochastic Feedback

    Full text link
    An ultimate goal of recommender systems (RS) is to improve user engagement. Reinforcement learning (RL) is a promising paradigm for this goal, as it directly optimizes overall performance of sequential recommendation. However, many existing RL-based approaches induce huge computational overhead, because they require not only the recommended items but also all other candidate items to be stored. This paper proposes an efficient alternative that does not require the candidate items. The idea is to model the correlation between user engagement and items directly from data. Moreover, the proposed approach consider randomness in user feedback and termination behavior, which are ubiquitous for RS but rarely discussed in RL-based prior work. With online A/B experiments on real-world RS, we confirm the efficacy of the proposed approach and the importance of modeling the two types of randomness.Comment: Accepted by the workshop on decision making for information retrieval and recommender systems (the Web Conference 2023

    Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction

    Full text link
    Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integrates vibration signals with previously predicted Health Indicator (HI) values, employing feature fusion to output current window HI values. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Furthermore, we innovatively incorporate a segmentation method and multiple training iterations to mitigate error accumulation in autoregressive models. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score. Notably, it outperforms traditional autoregressive models that use label values as inputs and non-autoregressive networks, showing superior generalization abilities with a marked lead in RMSE and Score metrics

    Utilizing VQ-VAE for End-to-End Health Indicator Generation in Predicting Rolling Bearing RUL

    Full text link
    The prediction of the remaining useful life (RUL) of rolling bearings is a pivotal issue in industrial production. A crucial approach to tackling this issue involves transforming vibration signals into health indicators (HI) to aid model training. This paper presents an end-to-end HI construction method, vector quantised variational autoencoder (VQ-VAE), which addresses the need for dimensionality reduction of latent variables in traditional unsupervised learning methods such as autoencoder. Moreover, concerning the inadequacy of traditional statistical metrics in reflecting curve fluctuations accurately, two novel statistical metrics, mean absolute distance (MAD) and mean variance (MV), are introduced. These metrics accurately depict the fluctuation patterns in the curves, thereby indicating the model's accuracy in discerning similar features. On the PMH2012 dataset, methods employing VQ-VAE for label construction achieved lower values for MAD and MV. Furthermore, the ASTCN prediction model trained with VQ-VAE labels demonstrated commendable performance, attaining the lowest values for MAD and MV.Comment: 17 figure

    Arylsulfonyl indoline-enzamide exhibits inhibitory effect on nasopharyngeal carcinoma

    Get PDF
    Purpose: To investigate the effect of arylsulfonyl indoline-benzamide (ASIB) on the viability of nasopharyngeal carcinoma (NPC) cells, and the underlying mechanism of action. Methods: The viability of C666 and NPC 039 cells was determined using 3-(4, 5-dimethylthiazol-2-yl) 2, 5-diphenyltetrazolium bromide (MTT) assay. Cell migration was analysed by wound healing assay, while protein expression levels of matrix metalloproteinases (MMPs), p50, p65 and NF κB were assayed using western blotting. Results: MTT assay results showed that ASIB treatment led to significant and dose-dependent reductions in the viability of C666 and NPC 039 (p ˂ 0.05). The migration and invasive potential of C666 cells were decreased on incubation with ASIB for 48 h. Western blotting data showed significant decrease in MMP 2/9 expressions in C666 cells on treatment with ASIB (p ˂ 0.05). The levels of p65 and p50 in the nuclear fraction of C666 cells were markedly lower than those in the negative control group. Arylsulfonyl indoline-benzamide (ASIB) treatment for 48 h decreased the level of NF κB expression in C666 cells (p ˂ 0.05). The volume of tumor excised from ASIB-treated NPC mice was lower than that of the untreated group. Conclusion: Arylsulfonyl indoline-benzamide (ASIB) exhibits inhibitory effects on the viability and metastasis potential of NPC cells. Thus, it may be beneficial in the treatment of nasopharyngeal carcinoma but this has to be further investigated

    Estimating Treatment Effects Under Heterogeneous Interference

    Full text link
    Treatment effect estimation can assist in effective decision-making in e-commerce, medicine, and education. One popular application of this estimation lies in the prediction of the impact of a treatment (e.g., a promotion) on an outcome (e.g., sales) of a particular unit (e.g., an item), known as the individual treatment effect (ITE). In many online applications, the outcome of a unit can be affected by the treatments of other units, as units are often associated, which is referred to as interference. For example, on an online shopping website, sales of an item will be influenced by an advertisement of its co-purchased item. Prior studies have attempted to model interference to estimate the ITE accurately, but they often assume a homogeneous interference, i.e., relationships between units only have a single view. However, in real-world applications, interference may be heterogeneous, with multi-view relationships. For instance, the sale of an item is usually affected by the treatment of its co-purchased and co-viewed items. We hypothesize that ITE estimation will be inaccurate if this heterogeneous interference is not properly modeled. Therefore, we propose a novel approach to model heterogeneous interference by developing a new architecture to aggregate information from diverse neighbors. Our proposed method contains graph neural networks that aggregate same-view information, a mechanism that aggregates information from different views, and attention mechanisms. In our experiments on multiple datasets with heterogeneous interference, the proposed method significantly outperforms existing methods for ITE estimation, confirming the importance of modeling heterogeneous interference
    corecore