27,223 research outputs found
Online Reinforcement Learning for Dynamic Multimedia Systems
In our previous work, we proposed a systematic cross-layer framework for
dynamic multimedia systems, which allows each layer to make autonomous and
foresighted decisions that maximize the system's long-term performance, while
meeting the application's real-time delay constraints. The proposed solution
solved the cross-layer optimization offline, under the assumption that the
multimedia system's probabilistic dynamics were known a priori. In practice,
however, these dynamics are unknown a priori and therefore must be learned
online. In this paper, we address this problem by allowing the multimedia
system layers to learn, through repeated interactions with each other, to
autonomously optimize the system's long-term performance at run-time. We
propose two reinforcement learning algorithms for optimizing the system under
different design constraints: the first algorithm solves the cross-layer
optimization in a centralized manner, and the second solves it in a
decentralized manner. We analyze both algorithms in terms of their required
computation, memory, and inter-layer communication overheads. After noting that
the proposed reinforcement learning algorithms learn too slowly, we introduce a
complementary accelerated learning algorithm that exploits partial knowledge
about the system's dynamics in order to dramatically improve the system's
performance. In our experiments, we demonstrate that decentralized learning can
perform as well as centralized learning, while enabling the layers to act
autonomously. Additionally, we show that existing application-independent
reinforcement learning algorithms, and existing myopic learning algorithms
deployed in multimedia systems, perform significantly worse than our proposed
application-aware and foresighted learning methods.Comment: 35 pages, 11 figures, 10 table
Learning in evolutionary environments
Not availabl
Decision Making in Uncertain and Changing Environments
We consider an agent who has to repeatedly make choices in an uncertain and changing environment, who has full information of the past, who discounts future payoffs, but who has no prior. We provide a learning algorithm that performs almost as well as the best of a given finite number of experts or benchmark strategies and does so at any point in time, provided the agent is sufficiently patient. The key is to find the appropriate degree of forgetting distant past. Standard learning algorithms that treat recent and distant past equally do not have the sequential epsilon optimality property.Adaptive learning, experts, distribution-free, epsilon-optimality, Hannan regret
Decision making in uncertain and changing environments
We consider an agent who has to repeatedly make choices in an uncertain and changing environment, who has full information of the past, who discounts future payoffs, but who has no prior. We provide a learning algorithm that performs almost as well as the best of a given finite number of experts or benchmark strategies and does so at any point in time, provided the agent is sufficiently patient. The key is to find the appropriate degree of forgetting distant past. Standard learning algorithms that treat recent and distant past equally do not have the sequential epsilon optimality property.Adaptive learning, experts, distribution-free, e-optimality, Hannan regret
The Value of Information for Populations in Varying Environments
The notion of information pervades informal descriptions of biological
systems, but formal treatments face the problem of defining a quantitative
measure of information rooted in a concept of fitness, which is itself an
elusive notion. Here, we present a model of population dynamics where this
problem is amenable to a mathematical analysis. In the limit where any
information about future environmental variations is common to the members of
the population, our model is equivalent to known models of financial
investment. In this case, the population can be interpreted as a portfolio of
financial assets and previous analyses have shown that a key quantity of
Shannon's communication theory, the mutual information, sets a fundamental
limit on the value of information. We show that this bound can be violated when
accounting for features that are irrelevant in finance but inherent to
biological systems, such as the stochasticity present at the individual level.
This leads us to generalize the measures of uncertainty and information usually
encountered in information theory
A Theory of Firm Decline
We study the problem of an investor that buys an equity stake in an entrepreneurial venture, under the assumption that the former cannot monitor the latter’s operations. The dynamics implied by the optimal incentive scheme is rich and quite different from that induced by other models of repeated moral hazard. In particular, our framework generates a rationale for firm decline. As young firms accumulate capital, the claims of both investor (outside equity) and entrepreneur (inside equity) increase. At some juncture, however, even as the latter keeps on growing, invested capital and firm value start declining and so does the value of outside equity. The reason is that incentive provision is costlier the wealthier the entrepreneur (the greater is inside equity). In turn, this leads to a decline in the constrained–efficient level of effort and therefore to a drop in the return to investment.Principal Agent, Moral Hazard, Hidden Action, Incentives, Survival, Firm Dynamics
- …