5,769 research outputs found
Fast optimization of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a density based model
which searches for a linear projection maximizing the Cauchy-Schwarz Divergence
of dataset kernel density estimation. Despite its good empirical results, one
of its drawbacks is the optimization speed. In this paper we analyze how one
can speed it up through solving an approximate problem. We analyze two methods,
both similar to the approximate solutions of the Kernel Density Estimation
querying and provide adaptive schemes for selecting a crucial parameters based
on user-specified acceptable error. Furthermore we show how one can exploit
well known conjugate gradients and L-BFGS optimizers despite the fact that the
original optimization problem should be solved on the sphere. All above methods
and modifications are tested on 10 real life datasets from UCI repository to
confirm their practical usability.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
Lower Bounds on the Oracle Complexity of Nonsmooth Convex Optimization via Information Theory
We present an information-theoretic approach to lower bound the oracle
complexity of nonsmooth black box convex optimization, unifying previous lower
bounding techniques by identifying a combinatorial problem, namely string
guessing, as a single source of hardness. As a measure of complexity we use
distributional oracle complexity, which subsumes randomized oracle complexity
as well as worst-case oracle complexity. We obtain strong lower bounds on
distributional oracle complexity for the box , as well as for the
-ball for (for both low-scale and large-scale regimes),
matching worst-case upper bounds, and hence we close the gap between
distributional complexity, and in particular, randomized complexity, and
worst-case complexity. Furthermore, the bounds remain essentially the same for
high-probability and bounded-error oracle complexity, and even for combination
of the two, i.e., bounded-error high-probability oracle complexity. This
considerably extends the applicability of known bounds
Agnostic Active Learning Without Constraints
We present and analyze an agnostic active learning algorithm that works
without keeping a version space. This is unlike all previous approaches where a
restricted set of candidate hypotheses is maintained throughout learning, and
only hypotheses from this set are ever returned. By avoiding this version space
approach, our algorithm sheds the computational burden and brittleness
associated with maintaining version spaces, yet still allows for substantial
improvements over supervised learning for classification
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies
We study the problem of frequent itemset mining in domains where data is not
recorded in a conventional database but only exists in human knowledge. We
provide examples of such scenarios, and present a crowdsourcing model for them.
The model uses the crowd as an oracle to find out whether an itemset is
frequent or not, and relies on a known taxonomy of the item domain to guide the
search for frequent itemsets. In the spirit of data mining with oracles, we
analyze the complexity of this problem in terms of (i) crowd complexity, that
measures the number of crowd questions required to identify the frequent
itemsets; and (ii) computational complexity, that measures the computational
effort required to choose the questions. We provide lower and upper complexity
bounds in terms of the size and structure of the input taxonomy, as well as the
size of a concise description of the output itemsets. We also provide
constructive algorithms that achieve the upper bounds, and consider more
efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing
acknowledgemen
Closed-loop Bayesian Semantic Data Fusion for Collaborative Human-Autonomy Target Search
In search applications, autonomous unmanned vehicles must be able to
efficiently reacquire and localize mobile targets that can remain out of view
for long periods of time in large spaces. As such, all available information
sources must be actively leveraged -- including imprecise but readily available
semantic observations provided by humans. To achieve this, this work develops
and validates a novel collaborative human-machine sensing solution for dynamic
target search. Our approach uses continuous partially observable Markov
decision process (CPOMDP) planning to generate vehicle trajectories that
optimally exploit imperfect detection data from onboard sensors, as well as
semantic natural language observations that can be specifically requested from
human sensors. The key innovation is a scalable hierarchical Gaussian mixture
model formulation for efficiently solving CPOMDPs with semantic observations in
continuous dynamic state spaces. The approach is demonstrated and validated
with a real human-robot team engaged in dynamic indoor target search and
capture scenarios on a custom testbed.Comment: Final version accepted and submitted to 2018 FUSION Conference
(Cambridge, UK, July 2018
- …