126,407 research outputs found
Online decision problems with large strategy sets
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2005.Includes bibliographical references (p. 165-171).In an online decision problem, an algorithm performs a sequence of trials, each of which involves selecting one element from a fixed set of alternatives (the "strategy set") whose costs vary over time. After T trials, the combined cost of the algorithm's choices is compared with that of the single strategy whose combined cost is minimum. Their difference is called regret, and one seeks algorithms which are efficient in that their regret is sublinear in T and polynomial in the problem size. We study an important class of online decision problems called generalized multi- armed bandit problems. In the past such problems have found applications in areas as diverse as statistics, computer science, economic theory, and medical decision-making. Most existing algorithms were efficient only in the case of a small (i.e. polynomial- sized) strategy set. We extend the theory by supplying non-trivial algorithms and lower bounds for cases in which the strategy set is much larger (exponential or infinite) and the cost function class is structured, e.g. by constraining the cost functions to be linear or convex. As applications, we consider adaptive routing in networks, adaptive pricing in electronic markets, and collaborative decision-making by untrusting peers in a dynamic environment.by Robert David Kleinberg.Ph.D
Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems
A Multi-Agent Cooperative Learning (MACL) system is an artificial
intelligence (AI) system where multiple learning agents work together to
complete a common task. Recent empirical success of MACL systems in various
domains (e.g. traffic control, cloud computing, robotics) has sparked active
research into the design and analysis of MACL systems for sequential decision
making problems. One important metric of the learning algorithm for decision
making problems is its regret, i.e. the difference between the highest
achievable reward and the actual reward that the algorithm gains. The design
and development of a MACL system with low-regret learning algorithms can create
huge economic values. In this thesis, I analyze MACL systems for different
sequential decision making problems. Concretely, the Chapter 3 and 4
investigate the cooperative multi-agent multi-armed bandit problems, with
full-information or bandit feedback, in which multiple learning agents can
exchange their information through a communication network and the agents can
only observe the rewards of the actions they choose. Chapter 5 considers the
communication-regret trade-off for online convex optimization in the
distributed setting. Chapter 6 discusses how to form high-productive teams for
agents based on their unknown but fixed types using adaptive incremental
matchings. For the above problems, I present the regret lower bounds for
feasible learning algorithms and provide the efficient algorithms to achieve
this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the
regret depends on the connectivity of the communication network and the
communication delay, thus giving useful guidance on design of the communication
protocol in MACL systemsComment: Thesis submitted to London School of Economics and Political Science
for PhD in Statistic
Recommended from our members
Abstractions in Reasoning for Long-Term Autonomy
The path to building adaptive, robust, intelligent agents has led researchers to develop a suite of powerful models and algorithms for agents with a single objective. However, in recent years, attempts to use this monolithic approach to solve an ever-expanding set of complex real-world problems, which increasingly include long-term autonomous deployments, have illuminated challenges in its ability to scale. Consequently, a fragmented collection of hierarchical and multi-objective models were developed. This trend continues into the algorithms as well, as each approximates an optimal solution in a different manner for scalability. These models and algorithms represent an attempt to solve pieces of an overarching problem: how can an agent explicitly model and integrate the necessary aspects of reasoning required to achieve long-term autonomy?
This thesis presents a general hierarchical and multi-objective model called a policy network that unifies prior fragmented solutions into a single graphical decision-making structure. Policy networks are broadly useful to solve numerous real-world problems. This thesis focuses on autonomous vehicle (AV) problems: (1) route-planning with multiple objectives; (2) semi-autonomy with proactive transfer of control; and (3) intersection decision-making for reasoning online about any number of other vehicles and pedestrians. Formal models are presented for each of the distinct problems. Solutions are evaluated using real-world map data in simulation and demonstrated on a fully operational AV prototype driving on real public roads. Policy networks serve as a shared underlying framework for all three, enabling their seamless integration as parts of an overall solution for rich, real-world, scalable decision-making in agents with long-term autonomy
Meta-RaPS Hybridization with Machine Learning Algorithms
This dissertation focuses on advancing the Metaheuristic for Randomized Priority Search algorithm, known as Meta-RaPS, by integrating it with machine learning algorithms. Introducing a new metaheuristic algorithm starts with demonstrating its performance. This is accomplished by using the new algorithm to solve various combinatorial optimization problems in their basic form. The next stage focuses on advancing the new algorithm by strengthening its relatively weaker characteristics. In the third traditional stage, the algorithms are exercised in solving more complex optimization problems. In the case of effective algorithms, the second and third stages can occur in parallel as researchers are eager to employ good algorithms to solve complex problems. The third stage can inadvertently strengthen the original algorithm. The simplicity and effectiveness Meta-RaPS enjoys places it in both second and third research stages concurrently. This dissertation explores strengthening Meta-RaPS by incorporating memory and learning features. The major conceptual frameworks that guided this work are the Adaptive Memory Programming framework (or AMP) and the metaheuristic hybridization taxonomy. The concepts from both frameworks are followed when identifying useful information that Meta-RaPS can collect during execution. Hybridizing Meta-RaPS with machine learning algorithms helped in transforming the collected information into knowledge. The learning concepts selected are supervised and unsupervised learning. The algorithms selected to achieve both types of learning are the Inductive Decision Tree (supervised learning) and Association Rules (unsupervised learning). The objective behind hybridizing Meta-RaPS with an Inductive Decision Tree algorithm is to perform online control for Meta-RaPS\u27 parameters. This Inductive Decision Tree algorithm is used to find favorable parameter values using knowledge gained from previous Meta-RaPS iterations. The values selected are used in future Meta-RaPS iterations. The objective behind hybridizing Meta-RaPS with an Association Rules algorithm is to identify patterns associated with good solutions. These patterns are considered knowledge and are inherited as starting points for in future Meta-RaPS iteration. The performance of the hybrid Meta-RaPS algorithms is demonstrated by solving the capacitated Vehicle Routing Problem with and without time windows
Multiobjective synchronization of coupled systems
Copyright @ 2011 American Institute of PhysicsSynchronization of coupled chaotic systems has been a subject of great interest and importance, in theory but also various fields of application, such as secure communication and neuroscience. Recently, based on stability theory, synchronization of coupled chaotic systems by designing appropriate coupling has been widely investigated. However, almost all the available results have been focusing on ensuring the synchronization of coupled chaotic systems with as small coupling strengths as possible. In this contribution, we study multiobjective synchronization of coupled chaotic systems by considering two objectives in parallel, i. e., minimizing optimization of coupling strength and convergence speed. The coupling form and coupling strength are optimized by an improved multiobjective evolutionary approach. The constraints on the coupling form are also investigated by formulating the problem into a multiobjective constraint problem. We find that the proposed evolutionary method can outperform conventional adaptive strategy in several respects. The results presented in this paper can be extended into nonlinear time-series analysis, synchronization of complex networks and have various applications
- …