962 research outputs found
Actively Learning to Attract Followers on Twitter
Twitter, a popular social network, presents great opportunities for on-line
machine learning research. However, previous research has focused almost
entirely on learning from passively collected data. We study the problem of
learning to acquire followers through normative user behavior, as opposed to
the mass following policies applied by many bots. We formalize the problem as a
contextual bandit problem, in which we consider retweeting content to be the
action chosen and each tweet (content) is accompanied by context. We design
reward signals based on the change in followers. The result of our month long
experiment with 60 agents suggests that (1) aggregating experience across
agents can adversely impact prediction accuracy and (2) the Twitter community's
response to different actions is non-stationary. Our findings suggest that
actively learning on-line can provide deeper insights about how to attract
followers than machine learning over passively collected data alone
A Bayesian Approach to Robust Reinforcement Learning
Robust Markov Decision Processes (RMDPs) intend to ensure robustness with
respect to changing or adversarial system behavior. In this framework,
transitions are modeled as arbitrary elements of a known and properly
structured uncertainty set and a robust optimal policy can be derived under the
worst-case scenario. In this study, we address the issue of learning in RMDPs
using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation
(URBE) which encourages safe exploration for adapting the uncertainty set to
new observations while preserving robustness. We propose a URBE-based
algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our
experiments show that the derived URBE-based strategy leads to a better
trade-off between less conservative solutions and robustness in the presence of
model misspecification. In addition, we show that the DQN-URBE algorithm can
adapt significantly faster to changing dynamics online compared to existing
robust techniques with fixed uncertainty sets.Comment: Accepted to UAI 201
Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)
For complex, high-dimensional Markov Decision Processes (MDPs), it may be
necessary to represent the policy with function approximation. A problem is
misspecified whenever, the representation cannot express any policy with
acceptable performance. We introduce IHOMP : an approach for solving
misspecified problems. IHOMP iteratively learns a set of context specialized
options and combines these options to solve an otherwise misspecified problem.
Our main contribution is proving that IHOMP enjoys theoretical convergence
guarantees. In addition, we extend IHOMP to exploit Option Interruption (OI)
enabling it to decide where the learned options can be reused. Our experiments
demonstrate that IHOMP can find near-optimal solutions to otherwise
misspecified problems and that OI can further improve the solutions.Comment: arXiv admin note: text overlap with arXiv:1506.0362
Bootstrapping Skills
The monolithic approach to policy representation in Markov Decision Processes
(MDPs) looks for a single policy that can be represented as a function from
states to actions. For the monolithic approach to succeed (and this is not
always possible), a complex feature representation is often necessary since the
policy is a complex object that has to prescribe what actions to take all over
the state space. This is especially true in large domains with complicated
dynamics. It is also computationally inefficient to both learn and plan in MDPs
using a complex monolithic approach. We present a different approach where we
restrict the policy space to policies that can be represented as combinations
of simpler, parameterized skills---a type of temporally extended action, with a
simple policy representation. We introduce Learning Skills via Bootstrapping
(LSB) that can use a broad family of Reinforcement Learning (RL) algorithms as
a "black box" to iteratively learn parametrized skills. Initially, the learned
skills are short-sighted but each iteration of the algorithm allows the skills
to bootstrap off one another, improving each skill in the process. We prove
that this bootstrapping process returns a near-optimal policy. Furthermore, our
experiments demonstrate that LSB can solve MDPs that, given the same
representational power, could not be solved by a monolithic approach. Thus,
planning with learned skills results in better policies without requiring
complex policy representations
Off-policy evaluation for MDPs with unknown structure
Off-policy learning in dynamic decision problems is essential for providing
strong evidence that a new policy is better than the one in use. But how can we
prove superiority without testing the new policy? To answer this question, we
introduce the G-SCOPE algorithm that evaluates a new policy based on data
generated by the existing policy. Our algorithm is both computationally and
sample efficient because it greedily learns to exploit factored structure in
the dynamics of the environment. We present a finite sample analysis of our
approach and show through experiments that the algorithm scales well on
high-dimensional problems with few samples
Adaptive Skills, Adaptive Partitions (ASAP)
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that
(1) learns skills (i.e., temporally extended actions or options) as well as (2)
where to apply them. We believe that both (1) and (2) are necessary for a truly
general skill learning framework, which is a key building block needed to scale
up to lifelong learning agents. The ASAP framework can also solve related new
tasks simply by adapting where it applies its existing learned skills. We prove
that ASAP converges to a local optimum under natural conditions. Finally, our
experimental results, which include a RoboCup domain, demonstrate the ability
of ASAP to learn where to reuse skills as well as solve multiple tasks with
considerably less experience than solving each task from scratch
A Dual Approach to Scalable Verification of Deep Networks
This paper addresses the problem of formally verifying desirable properties
of neural networks, i.e., obtaining provable guarantees that neural networks
satisfy specifications relating their inputs and outputs (robustness to bounded
norm adversarial perturbations, for example). Most previous work on this topic
was limited in its applicability by the size of the network, network
architecture and the complexity of properties to be verified. In contrast, our
framework applies to a general class of activation functions and specifications
on neural network inputs and outputs. We formulate verification as an
optimization problem (seeking to find the largest violation of the
specification) and solve a Lagrangian relaxation of the optimization problem to
obtain an upper bound on the worst case violation of the specification being
verified. Our approach is anytime i.e. it can be stopped at any time and a
valid bound on the maximum violation can be obtained. We develop specialized
verification algorithms with provable tightness guarantees under special
assumptions and demonstrate the practical significance of our general
verification approach on a variety of verification tasks
Black Hole Squeezers
We show that the gravitational quasi-normal modes (QNMs) of a Schwarzschild
black hole play the role of a multimode squeezer that can generate particles.
For a minimally coupled scalar field, the QNMs "squeeze" the initial state of
the scalar field (even for the vacuum) and produce scalar particles. The
maximal squeezing amplitude is inversely proportional to the cube of the
imaginary part of the QNM frequency, implying that the particle generation
efficiency is higher for lower decaying QNMs. Our results show that the
gravitational perturbations can amplify Hawking radiation.Comment: 19 pages, 3 figures, 1 table. Comments are welcom
Automated soft tissue lesion detection and segmentation in digital mammography using a u-net deep learning network
Computer-aided detection or decision support systems aim to improve breast
cancer screening programs by helping radiologists to evaluate digital
mammography (DM) exams. Commonly such methods proceed in two steps: selection
of candidate regions for malignancy, and later classification as either
malignant or not. In this study, we present a candidate detection method based
on deep learning to automatically detect and additionally segment soft tissue
lesions in DM. A database of DM exams (mostly bilateral and two views) was
collected from our institutional archive. In total, 7196 DM exams (28294 DM
images) acquired with systems from three different vendors (General Electric,
Siemens, Hologic) were collected, of which 2883 contained malignant lesions
verified with histopathology. Data was randomly split on an exam level into
training (50\%), validation (10\%) and testing (40\%) of deep neural network
with u-net architecture. The u-net classifies the image but also provides
lesion segmentation. Free receiver operating characteristic (FROC) analysis was
used to evaluate the model, on an image and on an exam level. On an image
level, a maximum sensitivity of 0.94 at 7.93 false positives (FP) per image was
achieved. Similarly, per exam a maximum sensitivity of 0.98 at 7.81 FP per
image was achieved. In conclusion, the method could be used as a candidate
selection model with high accuracy and with the additional information of
lesion segmentation.Comment: To appear in IWBI 201
Non-Stationary Delayed Bandits with Intermediate Observations
Online recommender systems often face long delays in receiving feedback,
especially when optimizing for some long-term metrics. While mitigating the
effects of delays in learning is well-understood in stationary environments,
the problem becomes much more challenging when the environment changes. In
fact, if the timescale of the change is comparable to the delay, it is
impossible to learn about the environment, since the available observations are
already obsolete. However, the arising issues can be addressed if intermediate
signals are available without delay, such that given those signals, the
long-term behavior of the system is stationary. To model this situation, we
introduce the problem of stochastic, non-stationary, delayed bandits with
intermediate observations. We develop a computationally efficient algorithm
based on UCRL, and prove sublinear regret guarantees for its performance.
Experimental results demonstrate that our method is able to learn in
non-stationary delayed environments where existing methods fail.Comment: 18 pages, 17 figures, ICML 202
- …