398 research outputs found

    Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits

    Full text link
    We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To this end, we design and analyze the BoBW-lil'UCB(γ)(\gamma) algorithm. Complementarily, by establishing lower bounds on the regret achievable by any algorithm with a given BAI failure probability, we show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives, and (ii) BoBW-lil'UCB(γ)(\gamma) achieves order-wise optimal performance for RM or BAI under different values of γ\gamma. Our work elucidates the trade-off more precisely by showing how the constants in previous works depend on certain hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close competitor UCBα_\alpha (Degenne et al., 2019) in terms of the time complexity and the regret on diverse datasets such as MovieLens and Published Kinase Inhibitor Set.Comment: 43 pages, 10 figure

    Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

    Full text link
    This paper considers a stochastic multi-armed bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of TT consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. This paper introduces \emph{Regret Optimal Best Arm Identification} (ROBAI) which aims to achieve these dual objectives. To solve ROBAI with both pre-determined stopping time and adaptive stopping time requirements, we present the EOCP\mathsf{EOCP} algorithm and its variants respectively, which not only achieve asymptotic optimal regret in both Gaussian and general bandits, but also commit to the optimal arm in O(logT)\mathcal{O}(\log T) rounds with pre-determined stopping time and O(log2T)\mathcal{O}(\log^2 T) rounds with adaptive stopping time. We further characterize lower bounds on the commitment time (equivalent to sample complexity) of ROBAI, showing that EOCP\mathsf{EOCP} and its variants are sample optimal with pre-determined stopping time, and almost sample optimal with adaptive stopping time. Numerical results confirm our theoretical analysis and reveal an interesting ``over-exploration'' phenomenon carried by classic UCB\mathsf{UCB} algorithms, such that EOCP\mathsf{EOCP} has smaller regret even though it stops exploration much earlier than UCB\mathsf{UCB} (O(logT)\mathcal{O}(\log T) versus O(T)\mathcal{O}(T)), which suggests over-exploration is unnecessary and potentially harmful to system performance

    Pareto Front Identification with Regret Minimization

    Full text link
    We consider Pareto front identification for linear bandits (PFILin) where the goal is to identify a set of arms whose reward vectors are not dominated by any of the others when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is O~(d/Δ2)\tilde{O}(d/\Delta^2), where dd is the dimension of contexts and Δ\Delta is a measure of problem complexity. Our sample complexity is optimal up to a logarithmic factor. A novel feature of our algorithm is that it uses the contexts of all actions. In addition to efficiently identifying the Pareto front, our algorithm also guarantees O~(d/t)\tilde{O}(\sqrt{d/t}) bound for instantaneous Pareto regret when the number of samples is larger than Ω(dlogdL)\Omega(d\log dL) for LL dimensional vector rewards. By using the contexts of all arms, our proposed algorithm simultaneously provides efficient Pareto front identification and regret minimization. Numerical experiments demonstrate that the proposed algorithm successfully identifies the Pareto front while minimizing the regret.Comment: 25 pages including appendi

    On Experimentation in Software-Intensive Systems

    Get PDF
    Context: Delivering software that has value to customers is a primary concern of every software company. Prevalent in web-facing companies, controlled experiments are used to validate and deliver value in incremental deployments. At the same that web-facing companies are aiming to automate and reduce the cost of each experiment iteration, embedded systems companies are starting to adopt experimentation practices and leverage their activities on the automation developments made in the online domain. Objective: This thesis has two main objectives. The first objective is to analyze how software companies can run and optimize their systems through automated experiments. This objective is investigated from the perspectives of the software architecture, the algorithms for the experiment execution and the experimentation process. The second objective is to analyze how non web-facing companies can adopt experimentation as part of their development process to validate and deliver value to their customers continuously. This objective is investigated from the perspectives of the software development process and focuses on the experimentation aspects that are distinct from web-facing companies. Method: To achieve these objectives, we conducted research in close collaboration with industry and used a combination of different empirical research methods: case studies, literature reviews, simulations, and empirical evaluations. Results: This thesis provides six main results. First, it proposes an architecture framework for automated experimentation that can be used with different types of experimental designs in both embedded systems and web-facing systems. Second, it proposes a new experimentation process to capture the details of a trustworthy experimentation process that can be used as the basis for an automated experimentation process. Third, it identifies the restrictions and pitfalls of different multi-armed bandit algorithms for automating experiments in industry. This thesis also proposes a set of guidelines to help practitioners select a technique that minimizes the occurrence of these pitfalls. Fourth, it proposes statistical models to analyze optimization algorithms that can be used in automated experimentation. Fifth, it identifies the key challenges faced by embedded systems companies when adopting controlled experimentation, and we propose a set of strategies to address these challenges. Sixth, it identifies experimentation techniques and proposes a new continuous experimentation model for mission-critical and business-to-business. Conclusion: The results presented in this thesis indicate that the trustworthiness in the experimentation process and the selection of algorithms still need to be addressed before automated experimentation can be used at scale in industry. The embedded systems industry faces challenges in adopting experimentation as part of its development process. In part, this is due to the low number of users and devices that can be used in experiments and the diversity of the required experimental designs for each new situation. This limitation increases both the complexity of the experimentation process and the number of techniques used to address this constraint

    How Technology Impacts and Compares to Humans in Socially Consequential Arenas

    Full text link
    One of the main promises of technology development is for it to be adopted by people, organizations, societies, and governments -- incorporated into their life, work stream, or processes. Often, this is socially beneficial as it automates mundane tasks, frees up more time for other more important things, or otherwise improves the lives of those who use the technology. However, these beneficial results do not apply in every scenario and may not impact everyone in a system the same way. Sometimes a technology is developed which produces both benefits and inflicts some harm. These harms may come at a higher cost to some people than others, raising the question: {\it how are benefits and harms weighed when deciding if and how a socially consequential technology gets developed?} The most natural way to answer this question, and in fact how people first approach it, is to compare the new technology to what used to exist. As such, in this work, I make comparative analyses between humans and machines in three scenarios and seek to understand how sentiment about a technology, performance of that technology, and the impacts of that technology combine to influence how one decides to answer my main research question.Comment: Doctoral thesis proposal. arXiv admin note: substantial text overlap with arXiv:2110.08396, arXiv:2108.12508, arXiv:2006.1262

    Kernel Methods for Learning with Limited Labeled Data

    Full text link
    Machine learning is a rapidly developing technology that enables a system to automatically learn and improve from experience. Modern machine learning algorithms have achieved state-of-the-art performances on a variety of tasks such as speech recognition, image classification, machine translation, playing games like Go, Dota 2, etc. However, one of the biggest challenges in applying these machine learning algorithms in the real world is that they require huge amount of labeled data for the training. In the real world, the amount of labeled training data is often limited. In this thesis, we address three challenges in learning with limited labeled data using kernel methods. In our first contribution, we provide an efficient way to solve an existing domain generalization algorithm and extend the theoretical analysis to multiclass classification. As a second contribution, we propose a multi-task learning framework for contextual bandit problems. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. Our third contribution is to provide a simple regret guarantee (best policy identification) in a contextual bandits setup. Our experiments examine a novel application to adaptive sensor selection for magnetic field estimation in interplanetary spacecraft and demonstrate considerable improvements of our algorithm over algorithms designed to minimize the cumulative regret.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/149810/1/aniketde_1.pd

    Towards Automated Experiments in Software Intensive Systems

    Get PDF
    Context: Delivering software that has value to customers is a primary concern of every software company. One of the techniques to continuously validate and deliver value in online software systems is the use of controlled experiments. The time cost of each experiment iteration, the increasing growth in the development organization to run experiments and the need for a more automated and systematic approach is leading companies to look for different techniques to automate the experimentation process. Objective: The overall objective of this thesis is to analyze how to automate different types of experiments and how companies can support and optimize their systems through automated experiments. This thesis explores the topic of automated online experiments from the perspectives of the software architecture, the algorithms for the experiment execution and the experimentation process, and focuses on two main application domains: the online and the embedded systems domain. Method: To achieve the objective, we conducted this research in close collaboration with industry using a combination of different empirical research methods: case studies, literature reviews, simulations and empirical evaluations. Results and conclusions: This thesis provides five main results. First, we propose an architecture framework for automated experimentation that can be used with different types of experimental designs in both embedded systems and web-facing systems. Second, we identify the key challenges faced by embedded systems companies when adopting controlled experimentation and we propose a set of strategies to address these challenges. Third, we develop a new algorithm for online experiments. Fourth, we identify restrictions and pitfalls of different algorithms for automating experiments in industry and we propose a set of guidelines to help practitioners select a technique that minimizes the occurrence of these pitfalls. Fifth, we propose a new experimentation process to capture the details of a trustworthy experimentation process that can be used as basis for an automated experimentation process. Future work: In future work, we plan to investigate how embedded systems can incorporate experiments in their development process without compromising existing real-time and safety requirements. We also plan to analyze the impact and costs of automating the different parts of the experimentation process

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF
    corecore