Search CORE

20 research outputs found

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention.

Author: Lei Huitian
Publication venue
Publication date: 01/01/2016
Field of study

Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative health interventions. An Adaptive Intervention (AI) personalizes the type, mode and dose of intervention based on users' ongoing performances and changing needs. A Just-In-Time Adaptive Intervention (JITAI) employs the real-time data collection and communication capabilities that modern mobile devices provide to adapt and deliver interventions in real-time. The lack of methodological guidance in constructing data-based high quality JITAI remains a hurdle in advancing JITAI research despite its increasing popularity. In the first part of the dissertation, we make a first attempt to bridge this methodological gap by formulating the task of tailoring interventions in real-time as a contextual bandit problem. Under the linear reward assumption, we choose the reward function (the ``critic") parameterization separately from a lower dimensional parameterization of stochastic JITAIs (the ``actor"). We provide an online actor critic algorithm that guides the construction and refinement of a JITAI. Asymptotic properties of the actor critic algorithm, including consistency, asymptotic distribution and regret bound of the optimal JITAI parameters are developed and tested by numerical experiments. We also present numerical experiment to test performance of the algorithm when assumptions in the contextual bandits are broken. In the second part of the dissertation, we propose a statistical decision procedure that identifies whether a patient characteristic is useful for AI. We define a discrete-valued characteristic as useful in adaptive intervention if for some values of the characteristic, there is sufficient evidence to recommend a particular intervention, while for other values of the characteristic, either there is sufficient evidence to recommend a different intervention, or there is insufficient evidence to recommend a particular intervention.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133223/1/ehlei_1.pd

Deep Blue Documents at the University of Michigan

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

Author: Dereventsov Anton
Starnes Andrew
Webster Clayton G.
Publication venue
Publication date: 21/11/2022
Field of study

This effort is focused on examining the behavior of reinforcement learning systems in personalization environments and detailing the differences in policy entropy associated with the type of learning algorithm utilized. We demonstrate that Policy Optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others. Conversely, we also show that Q-Learning agents are far less susceptible to such behavior and generally maintain high-entropy policies throughout training, which is often preferable in real-world applications. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed

arXiv.org e-Print Archive

Policy Learning for Individualized Treatment Regimes on Infinite Time Horizon

Author: Li Yuhan
Zhou Wenzhuo
Zhu Ruoqing
Publication venue
Publication date: 23/09/2023
Field of study

With the recent advancements of technology in facilitating real-time monitoring and data collection, "just-in-time" interventions can be delivered via mobile devices to achieve both real-time and long-term management and control. Reinforcement learning formalizes such mobile interventions as a sequence of decision rules and assigns treatment arms based on the user's status at each decision point. In practice, real applications concern a large number of decision points beyond the time horizon of the currently collected data. This usually refers to reinforcement learning in the infinite horizon setting, which becomes much more challenging. This article provides a selective overview of some statistical methodologies on this topic. We discuss their modeling framework, generalizability, and interpretability and provide some use case examples. Some future research directions are discussed in the end

arXiv.org e-Print Archive

A model a day keeps the doctor away:Reinforcement learning for personalized healthcare

Author: el Hassouni Ali
Publication venue
Publication date: 18/01/2022
Field of study

VU Research Portal

Robust Bandit Learning with Imperfect Context

Author: Ren Shaolei
Yang Jianyi
Publication venue
Publication date: 04/04/2021
Field of study

A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques

Author: Barnes Laura E.
Boukhechba Mehdi
Wang Zhiyuan
Xiong Haoyi
Yang Sijia
Zhang Daqing
Zhang Jie
Publication venue
Publication date: 02/07/2021
Field of study

Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- \emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.Comment: Submitted to a journal for revie

arXiv.org e-Print Archive