354 research outputs found
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention.
Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative health interventions. An Adaptive Intervention (AI) personalizes the type, mode and dose of intervention based on users' ongoing performances and changing needs. A Just-In-Time Adaptive Intervention (JITAI) employs the real-time data collection and communication capabilities that modern mobile devices provide to adapt and deliver interventions in real-time. The lack of methodological guidance in constructing data-based high quality JITAI remains a hurdle in advancing JITAI research despite its increasing popularity. In the first part of the dissertation, we make a first attempt to bridge this methodological gap by formulating the task of tailoring interventions in real-time as a contextual bandit problem. Under the linear reward assumption, we choose the reward function (the ``critic") parameterization separately from a lower dimensional parameterization of stochastic JITAIs (the ``actor"). We provide an online actor critic algorithm that guides the construction and refinement of a JITAI. Asymptotic properties of the actor critic algorithm, including consistency, asymptotic distribution and regret bound of the optimal JITAI parameters are developed and tested by numerical experiments. We also present numerical experiment to test performance of the algorithm when assumptions in the contextual bandits are broken. In the second part of the dissertation, we propose a statistical decision procedure that identifies whether a patient characteristic is useful for AI. We define a discrete-valued characteristic as useful in adaptive intervention if for some values of the characteristic, there is sufficient evidence to recommend a particular intervention, while for other values of the characteristic, either there is sufficient evidence to recommend a different intervention, or there is insufficient evidence to recommend a particular intervention.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133223/1/ehlei_1.pd
Reinforcement learning in real-time geometry assurance
To improve the assembly quality during production, expert systems are often used. These experts typically use a system model as a basis for identifying improvements. However, since a model uses approximate dynamics or imperfect parameters, the expert advice is bound to be biased. This paper presents a reinforcement learning agent that can identify and limit systematic errors of an expert systems used for geometry assurance. By observing the resulting assembly quality over time, and understanding how different decisions affect the quality, the agent learns when and how to override the biased advice from the expert software
An Overview on Application of Machine Learning Techniques in Optical Networks
Today's telecommunication networks have become sources of enormous amounts of
widely heterogeneous data. This information can be retrieved from network
traffic traces, network alarms, signal quality indicators, users' behavioral
data, etc. Advanced mathematical tools are required to extract meaningful
information from these data and take decisions pertaining to the proper
functioning of the networks from the network-generated data. Among these
mathematical tools, Machine Learning (ML) is regarded as one of the most
promising methodological approaches to perform network-data analysis and enable
automated network self-configuration and fault management. The adoption of ML
techniques in the field of optical communication networks is motivated by the
unprecedented growth of network complexity faced by optical networks in the
last few years. Such complexity increase is due to the introduction of a huge
number of adjustable and interdependent system parameters (e.g., routing
configurations, modulation format, symbol rate, coding schemes, etc.) that are
enabled by the usage of coherent transmission/reception technologies, advanced
digital signal processing and compensation of nonlinear effects in optical
fiber propagation. In this paper we provide an overview of the application of
ML to optical communications and networking. We classify and survey relevant
literature dealing with the topic, and we also provide an introductory tutorial
on ML for researchers and practitioners interested in this field. Although a
good number of research papers have recently appeared, the application of ML to
optical networks is still in its infancy: to stimulate further work in this
area, we conclude the paper proposing new possible research directions
Simulator adaptation at runtime for component-based simulation software
Component-based simulation software can provide many opportunities to compose and configure simulators, resulting in an algorithm selection problem for the user of this software. This thesis aims to automate the selection and adaptation of simulators at runtime in an application-independent manner. Further, it explores the potential of tailored and approximate simulators - in this thesis concretely developed for the modeling language ML-Rules - supporting the effectiveness of the adaptation scheme.Komponenten-basierte Simulationssoftware kann viele Möglichkeiten zur Komposition und Konfiguration von Simulatoren bieten und damit zu einem Konfigurationsproblem für Nutzer dieser Software führen. Das Ziel dieser Arbeit ist die Entwicklung einer generischen und automatisierten Auswahl- und Adaptionsmethode für Simulatoren. Darüber hinaus wird das Potential von spezifischen und approximativen Simulatoren anhand der Modellierungssprache ML-Rules untersucht, welche die Effektivität des entwickelten Adaptionsmechanismus erhöhen können
The art of clustering bandits.
Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithms could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. The purpose of this thesis is to introduce novel and principled algorithmic approaches to the solution of such networked bandit problems. Starting from a global (Laplacian-based) strategy which allocates a bandit algorithm to each network node (user), and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, our goal is to derive and experimentally test more scalable approaches based on different ways of clustering the graph nodes. More importantly, we shall investigate the case when the graph structure is not given ahead of time, and has to be inferred based on past user behavior. A general difficulty arising in such practical scenarios is that data sequences are typically nonstationary, implying that traditional statistical inference methods should be used cautiously, possibly replacing them with by more robust nonstochastic (e.g., game-theoretic) inference methods.
In this thesis, we will firstly introduce the centralized clustering bandits. Then, we propose the corresponding solution in decentralized scenario. After that, we explain the generic collaborative clustering bandits. Finally, we extend and showcase the state-of-the-art clustering bandits that we developed in the quantification problem
Item pool quality control in educational testing: change point model, compound risk, and sequential detection
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric properties, where a change can be caused by, for example, leakage of the item or change of the corresponding curriculum. We propose a statistical framework for the detection of abrupt changes in individual items. This framework consists of (1) a multistream Bayesian change point model describing sequential changes in items, (2) a compound risk function quantifying the risk in sequential decisions, and (3) sequential decision rules that control the compound risk. Throughout the sequential decision process, the proposed decision rule balances the trade-off between two sources of errors, the false detection of prechange items, and the nondetection of postchange items. An item-specific monitoring statistic is proposed based on an item response theory model that eliminates the confounding from the examinee population which changes over time. Sequential decision rules and their theoretical properties are developed under two settings: the oracle setting where the Bayesian change point model is completely known and a more realistic setting where some parameters of the model are unknown. Simulation studies are conducted under settings that mimic real operational tests
- …