231,060 research outputs found
Agnostic Active Learning Without Constraints
We present and analyze an agnostic active learning algorithm that works
without keeping a version space. This is unlike all previous approaches where a
restricted set of candidate hypotheses is maintained throughout learning, and
only hypotheses from this set are ever returned. By avoiding this version space
approach, our algorithm sheds the computational burden and brittleness
associated with maintaining version spaces, yet still allows for substantial
improvements over supervised learning for classification
Iterative Random Forests to detect predictive and stable high-order interactions
Genomics has revolutionized biology, enabling the interrogation of whole
transcriptomes, genome-wide binding sites for proteins, and many other
molecular processes. However, individual genomic assays measure elements that
interact in vivo as components of larger molecular machines. Understanding how
these high-order interactions drive gene expression presents a substantial
statistical challenge. Building on Random Forests (RF), Random Intersection
Trees (RITs), and through extensive, biologically inspired simulations, we
developed the iterative Random Forest algorithm (iRF). iRF trains a
feature-weighted ensemble of decision trees to detect stable, high-order
interactions with same order of computational cost as RF. We demonstrate the
utility of iRF for high-order interaction discovery in two prediction problems:
enhancer activity in the early Drosophila embryo and alternative splicing of
primary transcripts in human derived cell lines. In Drosophila, among the 20
pairwise transcription factor interactions iRF identifies as stable (returned
in more than half of bootstrap replicates), 80% have been previously reported
as physical interactions. Moreover, novel third-order interactions, e.g.
between Zelda (Zld), Giant (Gt), and Twist (Twi), suggest high-order
relationships that are candidates for follow-up experiments. In human-derived
cells, iRF re-discovered a central role of H3K36me3 in chromatin-mediated
splicing regulation, and identified novel 5th and 6th order interactions,
indicative of multi-valent nucleosomes with specific roles in splicing
regulation. By decoupling the order of interactions from the computational cost
of identification, iRF opens new avenues of inquiry into the molecular
mechanisms underlying genome biology
Low-Cost Learning via Active Data Procurement
We design mechanisms for online procurement of data held by strategic agents
for machine learning tasks. The challenge is to use past data to actively price
future data and give learning guarantees even when an agent's cost for
revealing her data may depend arbitrarily on the data itself. We achieve this
goal by showing how to convert a large class of no-regret algorithms into
online posted-price and learning mechanisms. Our results in a sense parallel
classic sample complexity guarantees, but with the key resource being money
rather than quantity of data: With a budget constraint , we give robust risk
(predictive error) bounds on the order of . Because we use an
active approach, we can often guarantee to do significantly better by
leveraging correlations between costs and data.
Our algorithms and analysis go through a model of no-regret learning with
arriving pairs (cost, data) and a budget constraint of . Our regret bounds
for this model are on the order of and we give lower bounds on the
same order.Comment: Full version of EC 2015 paper. Color recommended for figures but
nonessential. 36 pages, of which 12 appendi
mfEGRA: Multifidelity Efficient Global Reliability Analysis through Active Learning for Failure Boundary Location
This paper develops mfEGRA, a multifidelity active learning method using
data-driven adaptively refined surrogates for failure boundary location in
reliability analysis. This work addresses the issue of prohibitive cost of
reliability analysis using Monte Carlo sampling for expensive-to-evaluate
high-fidelity models by using cheaper-to-evaluate approximations of the
high-fidelity model. The method builds on the Efficient Global Reliability
Analysis (EGRA) method, which is a surrogate-based method that uses adaptive
sampling for refining Gaussian process surrogates for failure boundary location
using a single-fidelity model. Our method introduces a two-stage adaptive
sampling criterion that uses a multifidelity Gaussian process surrogate to
leverage multiple information sources with different fidelities. The method
combines expected feasibility criterion from EGRA with one-step lookahead
information gain to refine the surrogate around the failure boundary. The
computational savings from mfEGRA depends on the discrepancy between the
different models, and the relative cost of evaluating the different models as
compared to the high-fidelity model. We show that accurate estimation of
reliability using mfEGRA leads to computational savings of 46% for an
analytic multimodal test problem and 24% for a three-dimensional acoustic horn
problem, when compared to single-fidelity EGRA. We also show the effect of
using a priori drawn Monte Carlo samples in the implementation for the acoustic
horn problem, where mfEGRA leads to computational savings of 45% for the
three-dimensional case and 48% for a rarer event four-dimensional case as
compared to single-fidelity EGRA
Active Learning based on Data Uncertainty and Model Sensitivity
Robots can rapidly acquire new skills from demonstrations. However, during
generalisation of skills or transitioning across fundamentally different
skills, it is unclear whether the robot has the necessary knowledge to perform
the task. Failing to detect missing information often leads to abrupt movements
or to collisions with the environment. Active learning can quantify the
uncertainty of performing the task and, in general, locate regions of missing
information. We introduce a novel algorithm for active learning and demonstrate
its utility for generating smooth trajectories. Our approach is based on deep
generative models and metric learning in latent spaces. It relies on the
Jacobian of the likelihood to detect non-smooth transitions in the latent
space, i.e., transitions that lead to abrupt changes in the movement of the
robot. When non-smooth transitions are detected, our algorithm asks for an
additional demonstration from that specific region. The newly acquired
knowledge modifies the data manifold and allows for learning a latent
representation for generating smooth movements. We demonstrate the efficacy of
our approach on generalising elementary skills, transitioning across different
skills, and implicitly avoiding collisions with the environment. For our
experiments, we use a simulated pendulum where we observe its motion from
images and a 7-DoF anthropomorphic arm.Comment: Published on 2018 IEEE/RSJ International Conference on Intelligent
Robots and Syste
- …