433 research outputs found
Truth and Regret in Online Scheduling
We consider a scheduling problem where a cloud service provider has multiple
units of a resource available over time. Selfish clients submit jobs, each with
an arrival time, deadline, length, and value. The service provider's goal is to
implement a truthful online mechanism for scheduling jobs so as to maximize the
social welfare of the schedule. Recent work shows that under a stochastic
assumption on job arrivals, there is a single-parameter family of mechanisms
that achieves near-optimal social welfare. We show that given any such family
of near-optimal online mechanisms, there exists an online mechanism that in the
worst case performs nearly as well as the best of the given mechanisms. Our
mechanism is truthful whenever the mechanisms in the given family are truthful
and prompt, and achieves optimal (within constant factors) regret.
We model the problem of competing against a family of online scheduling
mechanisms as one of learning from expert advice. A primary challenge is that
any scheduling decisions we make affect not only the payoff at the current
step, but also the resource availability and payoffs in future steps.
Furthermore, switching from one algorithm (a.k.a. expert) to another in an
online fashion is challenging both because it requires synchronization with the
state of the latter algorithm as well as because it affects the incentive
structure of the algorithms. We further show how to adapt our algorithm to a
non-clairvoyant setting where job lengths are unknown until jobs are run to
completion. Once again, in this setting, we obtain truthfulness along with
asymptotically optimal regret (within poly-logarithmic factors)
Entrepreneurial Operations Management
In the presence of tight capital, time and talent constraints, many traditional operational challenges are reinforced (and sometimes redefined) in the entrepreneurial setting. This dissertation addresses some of these challenges by examining theoretically and experimentally several problems in entrepreneurship and innovation for which the existing literature offers little guidance. The dissertation is organized into three chapters.
When tight time-to-market constraints are binding an important question in product development is how much time a development team should spend on generating new ideas and designs vs executing the idea, and who should make that decision. In the first chapter of this dissertation I develop an experimental approach to examining this question. Entrepreneurial ventures can have limited (often zero) cash inflow and limited access to capital, and so use equity ownership to compensate founders and early employees. In the second chapter I focus on the challenges of equity-based incentive design, examining the effects of contract form (equal vs non-equal equity splits) and time (upfront vs. delayed contracting) on effort and value generation in startups. In "technology-push" (relative to "demand-pull") innovation, technology teams often develop a new capability that may find voice in a wide range of industrial settings. However, the team may lack the appropriate marketing budget to explore each in great depth, or even all of them at any depth. In the third chapter I study entrepreneurial market identification, developing and testing search strategies for choosing a market for a new technology when the number of potential markets is large but the search budget is small.PHDBusiness AdministrationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145946/1/ekagan_1.pd
The Exploration-Exploitation Trade-Off in Sequential Decision Making Problems
Sequential decision making problems require an agent to repeatedly choose between
a series of actions. Common to such problems is the exploration-exploitation
trade-off, where an agent must choose between the action expected to yield the best
reward (exploitation) or trying an alternative action for potential future benefit (exploration).
The main focus of this thesis is to understand in more detail the role this
trade-off plays in various important sequential decision making problems, in terms
of maximising finite-time reward.
The most common and best studied abstraction of the exploration-exploitation
trade-off is the classic multi-armed bandit problem. In this thesis we study several
important extensions that are more suitable than the classic problem to real-world
applications. These extensions include scenarios where the rewards for actions
change over time or the presence of other agents must be repeatedly considered. In
these contexts, the exploration-exploitation trade-off has a more complicated role
in terms of maximising finite-time performance. For example, the amount of exploration
required will constantly change in a dynamic decision problem, in multiagent
problems agents can explore by communication, and in repeated games, the
exploration-exploitation trade-off must be jointly considered with game theoretic
reasoning.
Existing techniques for balancing exploration-exploitation are focused on achieving
desirable asymptotic behaviour and are in general only applicable to basic decision
problems. The most flexible state-of-the-art approaches, Î-greedy and Î-first,
require exploration parameters to be set a priori, the optimal values of which are
highly dependent on the problem faced. To overcome this, we construct a novel algorithm, Î-ADAPT, which has no exploration parameters and can adapt exploration
on-line for a wide range of problems. Î-ADAPT is built on newly proven theoretical
properties of the Î-first policy and we demonstrate that Î-ADAPT can accurately
learn not only how much to explore, but also when and which actions to explore
On the identification and mitigation of weaknesses in the Knowledge Gradient policy for multi-armed bandits
The Knowledge Gradient (KG) policy was originally proposed for online ranking and selection problems but has recently been adapted for use in online decision making in general and multi-armed bandit problems (MABs) in particular. We study its use in a class of exponential family MABs and identify weaknesses, including a propensity to take actions which are dominated with respect to both exploitation and exploration. We propose variants of KG which avoid such errors. These new policies include an index heuristic which deploys a KG approach to develop an approximation to the Gittins index. A numerical study shows this policy to perform well over a range of MABs including those for which index policies are not optimal. While KG does not make dominated actions when bandits are Gaussian, it fails to be index consistent and appears not to enjoy a performance advantage over competitor policies when arms are correlated to compensate for its greater computational demands
- …