119 research outputs found
Actively Learning to Attract Followers on Twitter
Twitter, a popular social network, presents great opportunities for on-line
machine learning research. However, previous research has focused almost
entirely on learning from passively collected data. We study the problem of
learning to acquire followers through normative user behavior, as opposed to
the mass following policies applied by many bots. We formalize the problem as a
contextual bandit problem, in which we consider retweeting content to be the
action chosen and each tweet (content) is accompanied by context. We design
reward signals based on the change in followers. The result of our month long
experiment with 60 agents suggests that (1) aggregating experience across
agents can adversely impact prediction accuracy and (2) the Twitter community's
response to different actions is non-stationary. Our findings suggest that
actively learning on-line can provide deeper insights about how to attract
followers than machine learning over passively collected data alone
Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)
For complex, high-dimensional Markov Decision Processes (MDPs), it may be
necessary to represent the policy with function approximation. A problem is
misspecified whenever, the representation cannot express any policy with
acceptable performance. We introduce IHOMP : an approach for solving
misspecified problems. IHOMP iteratively learns a set of context specialized
options and combines these options to solve an otherwise misspecified problem.
Our main contribution is proving that IHOMP enjoys theoretical convergence
guarantees. In addition, we extend IHOMP to exploit Option Interruption (OI)
enabling it to decide where the learned options can be reused. Our experiments
demonstrate that IHOMP can find near-optimal solutions to otherwise
misspecified problems and that OI can further improve the solutions.Comment: arXiv admin note: text overlap with arXiv:1506.0362
Strategic Formation of Heterogeneous Networks
We establish a network formation game for the Internet's Autonomous System
(AS) interconnection topology. The game includes different types of players,
accounting for the heterogeneity of ASs in the Internet. In this network
formation game, the utility of a player depends on the network structure, e.g.,
the distances between nodes and the cost of links. We also consider the case
where utility (or monetary) transfers are allowed between the players. We
incorporate reliability considerations in the player's utility function, and
analyze static properties of the game as well as its dynamic evolution. We
provide dynamic analysis of topological quantities, and explain the prevalence
of some "network motifs" in the Internet graph. We assess our predictions with
real-world data.Comment: arXiv admin note: text overlap with arXiv:1307.4102, arXiv:1412.850
Bootstrapping Skills
The monolithic approach to policy representation in Markov Decision Processes
(MDPs) looks for a single policy that can be represented as a function from
states to actions. For the monolithic approach to succeed (and this is not
always possible), a complex feature representation is often necessary since the
policy is a complex object that has to prescribe what actions to take all over
the state space. This is especially true in large domains with complicated
dynamics. It is also computationally inefficient to both learn and plan in MDPs
using a complex monolithic approach. We present a different approach where we
restrict the policy space to policies that can be represented as combinations
of simpler, parameterized skills---a type of temporally extended action, with a
simple policy representation. We introduce Learning Skills via Bootstrapping
(LSB) that can use a broad family of Reinforcement Learning (RL) algorithms as
a "black box" to iteratively learn parametrized skills. Initially, the learned
skills are short-sighted but each iteration of the algorithm allows the skills
to bootstrap off one another, improving each skill in the process. We prove
that this bootstrapping process returns a near-optimal policy. Furthermore, our
experiments demonstrate that LSB can solve MDPs that, given the same
representational power, could not be solved by a monolithic approach. Thus,
planning with learned skills results in better policies without requiring
complex policy representations
Adaptive Skills, Adaptive Partitions (ASAP)
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that
(1) learns skills (i.e., temporally extended actions or options) as well as (2)
where to apply them. We believe that both (1) and (2) are necessary for a truly
general skill learning framework, which is a key building block needed to scale
up to lifelong learning agents. The ASAP framework can also solve related new
tasks simply by adapting where it applies its existing learned skills. We prove
that ASAP converges to a local optimum under natural conditions. Finally, our
experimental results, which include a RoboCup domain, demonstrate the ability
of ASAP to learn where to reuse skills as well as solve multiple tasks with
considerably less experience than solving each task from scratch
Formation Games of Reliable Networks
We establish a network formation game for the Internet's Autonomous System
(AS) interconnection topology. The game includes different types of players,
accounting for the heterogeneity of ASs in the Internet. We incorporate
reliability considerations in the player's utility function, and analyze static
properties of the game as well as its dynamic evolution. We provide dynamic
analysis of its topological quantities, and explain the prevalence of some
"network motifs" in the Internet graph. We assess our predictions with
real-world data
Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks
We consider the problem of controlling a partially-observed dynamic process
on a graph by a limited number of interventions. This problem naturally arises
in contexts such as scheduling virus tests to curb an epidemic; targeted
marketing in order to promote a product; and manually inspecting posts to
detect fake news spreading on social networks.
We formulate this setup as a sequential decision problem over a temporal
graph process. In face of an exponential state space, combinatorial action
space and partial observability, we design a novel tractable scheme to control
dynamical processes on temporal graphs. We successfully apply our approach to
two popular problems that fall into our framework: prioritizing which nodes
should be tested in order to curb the spread of an epidemic, and influence
maximization on a graph.Comment: ICML 202
Optimizing Tensor Network Contraction Using Reinforcement Learning
Quantum Computing (QC) stands to revolutionize computing, but is currently
still limited. To develop and test quantum algorithms today, quantum circuits
are often simulated on classical computers. Simulating a complex quantum
circuit requires computing the contraction of a large network of tensors. The
order (path) of contraction can have a drastic effect on the computing cost,
but finding an efficient order is a challenging combinatorial optimization
problem.
We propose a Reinforcement Learning (RL) approach combined with Graph Neural
Networks (GNN) to address the contraction ordering problem. The problem is
extremely challenging due to the huge search space, the heavy-tailed reward
distribution, and the challenging credit assignment. We show how a carefully
implemented RL-agent that uses a GNN as the basic policy construct can address
these challenges and obtain significant improvements over state-of-the-art
techniques in three varieties of circuits, including the largest scale networks
used in contemporary QC
Soft-Robust Actor-Critic Policy-Gradient
Robust Reinforcement Learning aims to derive optimal behavior that accounts
for model uncertainty in dynamical systems. However, previous studies have
shown that by considering the worst case scenario, robust policies can be
overly conservative. Our soft-robust framework is an attempt to overcome this
issue. In this paper, we present a novel Soft-Robust Actor-Critic algorithm
(SR-AC). It learns an optimal policy with respect to a distribution over an
uncertainty set and stays robust to model uncertainty but avoids the
conservativeness of robust strategies. We show the convergence of SR-AC and
test the efficiency of our approach on different domains by comparing it
against regular learning methods and their robust formulations.Comment: UAI 201
Learning Robust Options
Robust reinforcement learning aims to produce policies that have strong
guarantees even in the face of environments/transition models whose parameters
have strong uncertainty. Existing work uses value-based methods and the usual
primitive action setting. In this paper, we propose robust methods for learning
temporally abstract actions, in the framework of options. We present a Robust
Options Policy Iteration (ROPI) algorithm with convergence guarantees, which
learns options that are robust to model uncertainty. We utilize ROPI to learn
robust options with the Robust Options Deep Q Network (RO-DQN) that solves
multiple tasks and mitigates model misspecification due to model uncertainty.
We present experimental results which suggest that policy iteration with linear
features may have an inherent form of robustness when using coarse feature
representations. In addition, we present experimental results which demonstrate
that robustness helps policy iteration implemented on top of deep neural
networks to generalize over a much broader range of dynamics than non-robust
policy iteration
- …