1,634 research outputs found
A Meta-Learning Approach to One-Step Active Learning
We consider the problem of learning when obtaining the training labels is
costly, which is usually tackled in the literature using active-learning
techniques. These approaches provide strategies to choose the examples to label
before or during training. These strategies are usually based on heuristics or
even theoretical measures, but are not learned as they are directly used during
training. We design a model which aims at \textit{learning active-learning
strategies} using a meta-learning setting. More specifically, we consider a
pool-based setting, where the system observes all the examples of the dataset
of a problem and has to choose the subset of examples to label in a single
shot. Experiments show encouraging results
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
How Many Pairwise Preferences Do We Need to Rank A Graph Consistently?
We consider the problem of optimal recovery of true ranking of items from
a randomly chosen subset of their pairwise preferences. It is well known that
without any further assumption, one requires a sample size of for
the purpose. We analyze the problem with an additional structure of relational
graph over the items added with an assumption of
\emph{locality}: Neighboring items are similar in their rankings. Noting the
preferential nature of the data, we choose to embed not the graph, but, its
\emph{strong product} to capture the pairwise node relationships. Furthermore,
unlike existing literature that uses Laplacian embedding for graph based
learning problems, we use a richer class of graph
embeddings---\emph{orthonormal representations}---that includes (normalized)
Laplacian as its special case. Our proposed algorithm, {\it Pref-Rank},
predicts the underlying ranking using an SVM based approach over the chosen
embedding of the product graph, and is the first to provide \emph{statistical
consistency} on two ranking losses: \emph{Kendall's tau} and \emph{Spearman's
footrule}, with a required sample complexity of pairs, being the \emph{chromatic
number} of the complement graph . Clearly, our sample complexity is
smaller for dense graphs, with characterizing the degree of node
connectivity, which is also intuitive due to the locality assumption e.g.
for union of -cliques, or for random
and power law graphs etc.---a quantity much smaller than the fundamental limit
of for large . This, for the first time, relates ranking
complexity to structural properties of the graph. We also report experimental
evaluations on different synthetic and real datasets, where our algorithm is
shown to outperform the state-of-the-art methods.Comment: In Thirty-Third AAAI Conference on Artificial Intelligence, 201
- …