2,384 research outputs found

    Simulation study of an estimator of bivariante survivor function and its variance estimator

    Get PDF
    Bivariate survival data arises when we have either a pair of observation times for each individual or times on two related individuals, such as infection times for the two kidneys of a person or death times of twins. Such data are also often subject to censoring - bivariate censoring - i.e., exact observations may not be available on one or both of components because of drop-out or other reasons. Hence it is important to have an efficient, nonparametric bivariate survivor function estimator under censoring, i.e., a bivariate Kaplan-Meier estimator. In this thesis we carry out an extensive simulation study of an estimator proposed by Sen and Stute (2007), which involves solving for an eigenvector of a certain matrix. A comparison of the estimator with two other existing but unsatisfactory ones is also given using a small data-set. Moreover, variance of the former is computed using a bivariate analogue of Greenwood's formula, which involves solving a matrix equation of the form AXB=

    Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

    Full text link
    Reinforcement learning (RL) is a powerful tool for solving complex decision-making problems, but its lack of transparency and interpretability has been a major challenge in domains where decisions have significant real-world consequences. In this paper, we propose a novel Advantage Actor-Critic with Reasoner (A2CR), which can be easily applied to Actor-Critic-based RL models and make them interpretable. A2CR consists of three interconnected networks: the Policy Network, the Value Network, and the Reasoner Network. By predefining and classifying the underlying purpose of the actor's actions, A2CR automatically generates a more comprehensive and interpretable paradigm for understanding the agent's decision-making process. It offers a range of functionalities such as purpose-based saliency, early failure detection, and model supervision, thereby promoting responsible and trustworthy RL. Evaluations conducted in action-rich Super Mario Bros environments yield intriguing findings: Reasoner-predicted label proportions decrease for ``Breakout" and increase for ``Hovering" as the exploration level of the RL algorithm intensifies. Additionally, purpose-based saliencies are more focused and comprehensible
    corecore