20 research outputs found
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention.
Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative health interventions. An Adaptive Intervention (AI) personalizes the type, mode and dose of intervention based on users' ongoing performances and changing needs. A Just-In-Time Adaptive Intervention (JITAI) employs the real-time data collection and communication capabilities that modern mobile devices provide to adapt and deliver interventions in real-time. The lack of methodological guidance in constructing data-based high quality JITAI remains a hurdle in advancing JITAI research despite its increasing popularity. In the first part of the dissertation, we make a first attempt to bridge this methodological gap by formulating the task of tailoring interventions in real-time as a contextual bandit problem. Under the linear reward assumption, we choose the reward function (the ``critic") parameterization separately from a lower dimensional parameterization of stochastic JITAIs (the ``actor"). We provide an online actor critic algorithm that guides the construction and refinement of a JITAI. Asymptotic properties of the actor critic algorithm, including consistency, asymptotic distribution and regret bound of the optimal JITAI parameters are developed and tested by numerical experiments. We also present numerical experiment to test performance of the algorithm when assumptions in the contextual bandits are broken. In the second part of the dissertation, we propose a statistical decision procedure that identifies whether a patient characteristic is useful for AI. We define a discrete-valued characteristic as useful in adaptive intervention if for some values of the characteristic, there is sufficient evidence to recommend a particular intervention, while for other values of the characteristic, either there is sufficient evidence to recommend a different intervention, or there is insufficient evidence to recommend a particular intervention.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133223/1/ehlei_1.pd
Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks
This effort is focused on examining the behavior of reinforcement learning
systems in personalization environments and detailing the differences in policy
entropy associated with the type of learning algorithm utilized. We demonstrate
that Policy Optimization agents often possess low-entropy policies during
training, which in practice results in agents prioritizing certain actions and
avoiding others. Conversely, we also show that Q-Learning agents are far less
susceptible to such behavior and generally maintain high-entropy policies
throughout training, which is often preferable in real-world applications. We
provide a wide range of numerical experiments as well as theoretical
justification to show that these differences in entropy are due to the type of
learning being employed
Policy Learning for Individualized Treatment Regimes on Infinite Time Horizon
With the recent advancements of technology in facilitating real-time
monitoring and data collection, "just-in-time" interventions can be delivered
via mobile devices to achieve both real-time and long-term management and
control. Reinforcement learning formalizes such mobile interventions as a
sequence of decision rules and assigns treatment arms based on the user's
status at each decision point. In practice, real applications concern a large
number of decision points beyond the time horizon of the currently collected
data. This usually refers to reinforcement learning in the infinite horizon
setting, which becomes much more challenging. This article provides a selective
overview of some statistical methodologies on this topic. We discuss their
modeling framework, generalizability, and interpretability and provide some use
case examples. Some future research directions are discussed in the end
Robust Bandit Learning with Imperfect Context
A standard assumption in contextual multi-arm bandit is that the true context
is perfectly known before arm selection. Nonetheless, in many practical
applications (e.g., cloud resource management), prior to arm selection, the
context information can only be acquired by prediction subject to errors or
adversarial modification. In this paper, we study a contextual bandit setting
in which only imperfect context is available for arm selection while the true
context is revealed at the end of each round. We propose two robust arm
selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the
worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes
the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and
MinWD by deriving both regret and reward bounds compared to an oracle that
knows the true context. Our results show that as time goes on, MaxMinUCB and
MinWD both perform as asymptotically well as their optimal counterparts that
know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge
datacenter selection, and run synthetic simulations to validate our theoretical
analysis
From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques
Mobile Sensing Apps have been widely used as a practical approach to collect
behavioral and health-related information from individuals and provide timely
intervention to promote health and well-beings, such as mental health and
chronic cares. As the objectives of mobile sensing could be either \emph{(a)
personalized medicine for individuals} or \emph{(b) public health for
populations}, in this work we review the design of these mobile sensing apps,
and propose to categorize the design of these apps/systems in two paradigms --
\emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both
sensing paradigms might incorporate with common ubiquitous sensing
technologies, such as wearable sensors, mobility monitoring, mobile data
offloading, and/or cloud-based data analytics to collect and process sensing
data from individuals, we present a novel taxonomy system with two major
components that can specify and classify apps/systems from aspects of the
life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \&
Participation}, \emph{(2) Health Surveillance \& Data Collection}, and
\emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different
goals of the two paradigms, this work systematically reviews this field, and
summarizes the design of typical apps/systems in the view of the configurations
and interactions between these two components. In addition to summarization,
the proposed taxonomy system also helps figure out the potential directions of
mobile sensing for health from both personalized medicines and population
health perspectives.Comment: Submitted to a journal for revie