166 research outputs found
Towards Thompson Sampling for Complex Bayesian Reasoning
Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found.
The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio
Reinforcement Learning in Education: A Multi-Armed Bandit Approach
Advances in reinforcement learning research have demonstrated the ways in
which different agent-based models can learn how to optimally perform a task
within a given environment. Reinforcement leaning solves unsupervised problems
where agents move through a state-action-reward loop to maximize the overall
reward for the agent, which in turn optimizes the solving of a specific problem
in a given environment. However, these algorithms are designed based on our
understanding of actions that should be taken in a real-world environment to
solve a specific problem. One such problem is the ability to identify,
recommend and execute an action within a system where the users are the
subject, such as in education. In recent years, the use of blended learning
approaches integrating face-to-face learning with online learning in the
education context, has in-creased. Additionally, online platforms used for
education require the automation of certain functions such as the
identification, recommendation or execution of actions that can benefit the
user, in this sense, the student or learner. As promising as these scientific
advances are, there is still a need to conduct research in a variety of
different areas to ensure the successful deployment of these agents within
education systems. Therefore, the aim of this study was to contextualise and
simulate the cumulative reward within an environment for an intervention
recommendation problem in the education context.Comment: 17 pages, 6 figures, 1 table, EAI AFRICATEK 2022 Conferenc
Bayesian Optimization for Partially Overlapping Covariate Data Sources
One problem in the real-world industrial process is how to utilize diverse information on best practices through different data sources. It becomes more complicated when those best practices are different, but not entirely, from each other. The goal is to find the optimal best practices from those diverse and somewhat different data.
That problem has been formulated into finding the optimal parameter settings in diverse, partially overlapping covariate data sources. First, the data from different sources are stacked row-wise to form a master data set with missing data. Then, Bayesian Optimization with Missing Inputs is employed to find the optimal experiment's parameter settings.
Different methods of modeling the missing data set are tested, such as Bayesian Non-negative Matrix Factorization (BNMF) and Bayesian Probabilistic Matrix Factorization (BPMF). Both provide a quality representation of the missing data, allowing the Bayesian Optimization algorithm to work. The BPMF-based methods have significantly better performances than the BNMF-based methods. However, BNMF-based methods are helpful in some specific cases due to the structure of the missing data set.
Multi-armed Bandit Algorithms are used to tackle the problem of a parameter settings budget constraint in each iteration. The -greedy and UCB1 have been tested. The -greedy can occasionally give better results because of its randomness. In contrast, The UCB1 consistently improves its performance through each iteration.
This work proposes a framework to utilize the information from partially overlapping data sources to find the parameter settings that yield a maximum return. This work benefits a wide range of real-world industrial production processes and opens exciting research directions
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Gradient-free Online Learning in Games with Delayed Rewards
Motivated by applications to online advertising and recommender systems, we
consider a game-theoretic model with delayed rewards and asynchronous,
payoff-based feedback. In contrast to previous work on delayed multi-armed
bandits, we focus on multi-player games with continuous action spaces, and we
examine the long-run behavior of strategic agents that follow a no-regret
learning policy (but are otherwise oblivious to the game being played, the
objectives of their opponents, etc.). To account for the lack of a consistent
stream of information (for instance, rewards can arrive out of order, with an a
priori unbounded delay, etc.), we introduce a gradient-free learning policy
where payoff information is placed in a priority queue as it arrives. In this
general context, we derive new bounds for the agents' regret; furthermore,
under a standard diagonal concavity assumption, we show that the induced
sequence of play converges to Nash equilibrium with probability , even if
the delay between choosing an action and receiving the corresponding reward is
unbounded.Comment: 26 pages, 4 figures; to appear in ICML 202
Interpretability of AI in Computer Systems and Public Policy
Advances in Artificial Intelligence (AI) have led to spectacular innovations and sophisticated systems for tasks that were thought to be capable only by humans. Examples include playing chess and Go, face and voice recognition, driving vehicles, and more. In recent years, the impact of AI has moved beyond offering mere predictive models into building interpretable models that appeal to human logic and intuition because they ensure transparency and simplicity and can be used to make meaningful decisions in real-world applications. A second trend in AI is characterized by important advancements in the realm of causal reasoning. Identifying causal relationships is an important aspect of scientific endeavors in a variety of fields. Causal models and Bayesian inference can help us gain better domain-specific insight and make better data-driven decisions because of their interpretability. The main objective of this dissertation was to adapt theoretically sound AI-based interpretable data-analytic approaches to solve domain-specific problems in the two un-related fields of Storage Systems and Public Policy. For the first task, we considered the well-studied problem of cache replacement problem in computing systems, which can be modeled as a variant of the well-known Multi-Armed Bandit (MAB) problem with delayed feedback and decaying costs, and developed an algorithm called EXP4-DFDC. We proved theoretically that EXP4-DFDC exhibits an important feature called vanishing regret. Based on the theoretical analysis, we designed a machine-learning algorithm called ALeCaR, with adaptive hyperparameters. We used extensive experiments on a wide range of workloads to show that ALeCaR performed better than LeCaR, the best machine learning algorithm for cache replacement at that time. We concluded that reinforcement machine learning can offer an outstanding approach for implementing cache management policies. For the second task, we used Bayesian networks to analyze the service request data from three 311 centers providing non-emergency services in the cities of Miami-Dade, New York City, and San Francisco. Using a causal inference approach, this study investigated the presence of inequities in the quality of the 311 services to neighborhoods with varying demographics and socioeconomic status. We concluded that the services provided by the local governments showed no detectable biases on the basis of race, ethnicity, or socioeconomic status
- …