20 research outputs found
Gambler's Ruin Bandit Problem
In this paper, we propose a new multi-armed bandit problem called the
Gambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a
sequence of rounds, where each round is a Markov Decision Process (MDP) with
two actions (arms): a continuation action that moves the learner randomly over
the state space around the current state; and a terminal action that moves the
learner directly into one of the two terminal states (goal and dead-end state).
The current round ends when a terminal state is reached, and the learner incurs
a positive reward only when the goal state is reached. The objective of the
learner is to maximize its long-term reward (expected number of times the goal
state is reached), without having any prior knowledge on the state transition
probabilities. We first prove a result on the form of the optimal policy for
the GRBP. Then, we define the regret of the learner with respect to an
omnipotent oracle, which acts optimally in each round, and prove that it
increases logarithmically over rounds. We also identify a condition under which
the learner's regret is bounded. A potential application of the GRBP is optimal
medical treatment assignment, in which the continuation action corresponds to a
conservative treatment and the terminal action corresponds to a risky treatment
such as surgery
Inverse Prism based on Temporal Discontinuity and Spatial Dispersion
We introduce the concept of the inverse prism as the dual of the conventional
prism and deduce from this duality an implementation of it based on temporal
discontinuity and spatial dispersion provided by anisotropy. Moreover, we show
that this inverse prism exhibits the following three unique properties:
chromatic refraction birefringence, ordinary-monochromatic and extraordinary-
polychromatic temporal refraction, and linear-to-Lissajous polarization
transformation
Two families of indexable partially observable restless bandits and Whittle index computation
We consider the restless bandits with general state space under partial
observability with two observational models: first, the state of each bandit is
not observable at all, and second, the state of each bandit is observable only
if it is chosen. We assume both models satisfy the restart property under which
we prove indexability of the models and propose the Whittle index policy as the
solution. For the first model, we derive a closed-form expression for the
Whittle index. For the second model, we propose an efficient algorithm to
compute the Whittle index by exploiting the qualitative properties of the
optimal policy. We present detailed numerical experiments for multiple
instances of machine maintenance problem. The result indicates that the Whittle
index policy outperforms myopic policy and can be close to optimal in different
setups
Approximate information state based convergence analysis of recurrent Q-learning
In spite of the large literature on reinforcement learning (RL) algorithms
for partially observable Markov decision processes (POMDPs), a complete
theoretical understanding is still lacking. In a partially observable setting,
the history of data available to the agent increases over time so most
practical algorithms either truncate the history to a finite window or compress
it using a recurrent neural network leading to an agent state that is
non-Markovian. In this paper, it is shown that in spite of the lack of the
Markov property, recurrent Q-learning (RQL) converges in the tabular setting.
Moreover, it is shown that the quality of the converged limit depends on the
quality of the representation which is quantified in terms of what is known as
an approximate information state (AIS). Based on this characterization of the
approximation error, a variant of RQL with AIS losses is presented. This
variant performs better than a strong baseline for RQL that does not use AIS
losses. It is demonstrated that there is a strong correlation between the
performance of RQL over time and the loss associated with the AIS
representation.Comment: 25 pages, 6 figure
The Translation and Psychometric Evaluation of the Persian Version of the Neuropsychological Vertigo Inventory
Background and Aim: Experiencing dizziness/vertigo is often an indication of dysfunction in the vestibular system. Recent findings show a connection between peripheral vestibular dysfunction and cognitive impairments. The Neuropsychological Vertigo Inventory (NVI) can assess physical, emotional, and cognitive issues in individuals with dizziness/vertigo. The aim of this research was to translate, cultural adaptation and evaluation of the reliability and validity of NVI to Persian.
Methods: In this descriptive-analytical study, the NVI scale was translated and adapted to the Iranian cultural context following the international quality of life assessment protocol for translation and equivalence. After obtaining face validity, the scale was administered to 140 patients with peripheral vestibular system dysfunction and 70 control group (age between 25 and 80 years). After one week, 50 participants were asked to complete the questionnaire again. Eventually, reliability was evaluated with both methods of internal consistency and test-retest reproducibility.
Results: Out of the 28 items in the NVI scale, 3 items were modified to better align with the cultural conditions of Iranians. The impact scores for most items in this scale were found to be higher than 1.5. The Cronbach’s alpha coefficient values for the overall scale (0.90) and test-retest reliability with intra-class correlation coefficient for the overall scale (0.91) were confirmed.
Conclusion: The Persian version of NVI scale demonstrates excellent validity and reliability, and it exhibits a high level of content alignment with the original version. Therefore, it can be a useful tool to better understand the physical, emotional and cognitive disturbances in patients with vertigo/dizziness
The global burden of cancer attributable to risk factors, 2010-19 : a systematic analysis for the Global Burden of Disease Study 2019
Background Understanding the magnitude of cancer burden attributable to potentially modifiable risk factors is crucial for development of effective prevention and mitigation strategies. We analysed results from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 to inform cancer control planning efforts globally. Methods The GBD 2019 comparative risk assessment framework was used to estimate cancer burden attributable to behavioural, environmental and occupational, and metabolic risk factors. A total of 82 risk-outcome pairs were included on the basis of the World Cancer Research Fund criteria. Estimated cancer deaths and disability-adjusted life-years (DALYs) in 2019 and change in these measures between 2010 and 2019 are presented. Findings Globally, in 2019, the risk factors included in this analysis accounted for 4.45 million (95% uncertainty interval 4.01-4.94) deaths and 105 million (95.0-116) DALYs for both sexes combined, representing 44.4% (41.3-48.4) of all cancer deaths and 42.0% (39.1-45.6) of all DALYs. There were 2.88 million (2.60-3.18) risk-attributable cancer deaths in males (50.6% [47.8-54.1] of all male cancer deaths) and 1.58 million (1.36-1.84) risk-attributable cancer deaths in females (36.3% [32.5-41.3] of all female cancer deaths). The leading risk factors at the most detailed level globally for risk-attributable cancer deaths and DALYs in 2019 for both sexes combined were smoking, followed by alcohol use and high BMI. Risk-attributable cancer burden varied by world region and Socio-demographic Index (SDI), with smoking, unsafe sex, and alcohol use being the three leading risk factors for risk-attributable cancer DALYs in low SDI locations in 2019, whereas DALYs in high SDI locations mirrored the top three global risk factor rankings. From 2010 to 2019, global risk-attributable cancer deaths increased by 20.4% (12.6-28.4) and DALYs by 16.8% (8.8-25.0), with the greatest percentage increase in metabolic risks (34.7% [27.9-42.8] and 33.3% [25.8-42.0]). Interpretation The leading risk factors contributing to global cancer burden in 2019 were behavioural, whereas metabolic risk factors saw the largest increases between 2010 and 2019. Reducing exposure to these modifiable risk factors would decrease cancer mortality and DALY rates worldwide, and policies should be tailored appropriately to local cancer risk factor burden. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.Peer reviewe
Özel yapılı Markov karar süreçlerinde çevrimiçi öğrenme
Cataloged from PDF version of article.Thesis (M.S.): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2017.Includes bibliographical references (leaves 80-86).This thesis proposes three new multi-armed bandit problems, in which the learner
proceeds in a sequence of rounds where each round is a Markov Decision Process
(MDP). The learner's goal is to maximize its cumulative reward without any a
priori knowledge on the state transition probabilities. The rst problem considers
an MDP with sorted states and a continuation action that moves the learner to an
adjacent state; and a terminal action that moves the learner to a terminal state
(goal or dead-end state). In this problem, a round ends and the next round starts
when a terminal state is reached, and the aim of the learner in each round is to
reach the goal state. First, the structure of the optimal policy is derived. Then,
the regret of the learner with respect to an oracle, who takes optimal actions in
each round is de ned, and a learning algorithm that exploits the structure of the
optimal policy is proposed. Finally, it is shown that the regret either increases
logarithmically over rounds or becomes bounded. In the second problem, we
investigate the personalization of a clinical treatment. This process is modeled
as a goal-oriented MDP with dead-end states. Moreover, the state transition
probabilities of the MDP depends on the context of the patients. An algorithm
that uses the rule of optimism in face of uncertainty is proposed to maximize the
number of rounds in which the goal state is reached. In the third problem, we
propose an online learning algorithm for optimal execution in the limit order book
of a nancial asset. Given a certain amount of shares to sell and an allocated time to complete the transaction, the proposed algorithm dynamically learns the
optimal number of shares to sell at each time slot of the allocated time. We model
this problem as an MDP, and derive the form of the optimal policy.by Nima Akbarzadeh.M.S