37,649 research outputs found
Fast Algorithms for Constructing Maximum Entropy Summary Trees
Karloff? and Shirley recently proposed summary trees as a new way to
visualize large rooted trees (Eurovis 2013) and gave algorithms for generating
a maximum-entropy k-node summary tree of an input n-node rooted tree. However,
the algorithm generating optimal summary trees was only pseudo-polynomial (and
worked only for integral weights); the authors left open existence of a
olynomial-time algorithm. In addition, the authors provided an additive
approximation algorithm and a greedy heuristic, both working on real weights.
This paper shows how to construct maximum entropy k-node summary trees in time
O(k^2 n + n log n) for real weights (indeed, as small as the time bound for the
greedy heuristic given previously); how to speed up the approximation algorithm
so that it runs in time O(n + (k^4/eps?) log(k/eps?)), and how to speed up the
greedy algorithm so as to run in time O(kn + n log n). Altogether, these
results make summary trees a much more practical tool than before.Comment: 17 pages, 4 figures. Extended version of paper appearing in ICALP
201
Modelling of selection and mating decisions in tree breeding programs
Hardwood trees from the temperate forests of southern Australia are an important source of timber for high quality paper. Two species in particular, Eucalyptus globulus and Eucalyptus nitens are well suited to this purpose and are now widely grown in commercial plantations. These plantations have been established by professional tree breeders using seedlings derived originally from broadly based collection of seed in natural forests. To increase productivity it is desirable to select trees that grow quickly and give high yields of top quality timber. Nevertheless it is important to maintain genetic diversity in the breeding population and thereby retain a robust capacity to adapt to changing environmental factors. In this article we formulate a number of related mathematical models for the selection and mating processes and discuss the consequences of these models. We recommend a relatively simple scheme which can be implemented on an IBM compatible PC using standard algorithms
Model Extraction Warning in MLaaS Paradigm
Cloud vendors are increasingly offering machine learning services as part of
their platform and services portfolios. These services enable the deployment of
machine learning models on the cloud that are offered on a pay-per-query basis
to application developers and end users. However recent work has shown that the
hosted models are susceptible to extraction attacks. Adversaries may launch
queries to steal the model and compromise future query payments or privacy of
the training data. In this work, we present a cloud-based extraction monitor
that can quantify the extraction status of models by observing the query and
response streams of both individual and colluding adversarial users. We present
a novel technique that uses information gain to measure the model learning rate
by users with increasing number of queries. Additionally, we present an
alternate technique that maintains intelligent query summaries to measure the
learning rate relative to the coverage of the input feature space in the
presence of collusion. Both these approaches have low computational overhead
and can easily be offered as services to model owners to warn them of possible
extraction attacks from adversaries. We present performance results for these
approaches for decision tree models deployed on BigML MLaaS platform, using
open source datasets and different adversarial attack strategies
Variable Selection Bias in Classification Trees Based on Imprecise Probabilities
Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities
Localizing the Latent Structure Canonical Uncertainty: Entropy Profiles for Hidden Markov Models
This report addresses state inference for hidden Markov models. These models
rely on unobserved states, which often have a meaningful interpretation. This
makes it necessary to develop diagnostic tools for quantification of state
uncertainty. The entropy of the state sequence that explains an observed
sequence for a given hidden Markov chain model can be considered as the
canonical measure of state sequence uncertainty. This canonical measure of
state sequence uncertainty is not reflected by the classic multivariate state
profiles computed by the smoothing algorithm, which summarizes the possible
state sequences. Here, we introduce a new type of profiles which have the
following properties: (i) these profiles of conditional entropies are a
decomposition of the canonical measure of state sequence uncertainty along the
sequence and makes it possible to localize this uncertainty, (ii) these
profiles are univariate and thus remain easily interpretable on tree
structures. We show how to extend the smoothing algorithms for hidden Markov
chain and tree models to compute these entropy profiles efficiently.Comment: Submitted to Journal of Machine Learning Research; No RR-7896 (2012
- …