1,095,743 research outputs found
Better subset regression
To find efficient screening methods for high dimensional linear regression
models, this paper studies the relationship between model fitting and screening
performance. Under a sparsity assumption, we show that a subset that includes
the true submodel always yields smaller residual sum of squares (i.e., has
better model fitting) than all that do not in a general asymptotic setting.
This indicates that, for screening important variables, we could follow a
"better fitting, better screening" rule, i.e., pick a "better" subset that has
better model fitting. To seek such a better subset, we consider the
optimization problem associated with best subset regression. An EM algorithm,
called orthogonalizing subset screening, and its accelerating version are
proposed for searching for the best subset. Although the two algorithms cannot
guarantee that a subset they yield is the best, their monotonicity property
makes the subset have better model fitting than initial subsets generated by
popular screening methods, and thus the subset can have better screening
performance asymptotically. Simulation results show that our methods are very
competitive in high dimensional variable screening even for finite sample
sizes.Comment: 24 pages, 1 figur
Learning to Predict the Wisdom of Crowds
The problem of "approximating the crowd" is that of estimating the crowd's
majority opinion by querying only a subset of it. Algorithms that approximate
the crowd can intelligently stretch a limited budget for a crowdsourcing task.
We present an algorithm, "CrowdSense," that works in an online fashion to
dynamically sample subsets of labelers based on an exploration/exploitation
criterion. The algorithm produces a weighted combination of a subset of the
labelers' votes that approximates the crowd's opinion.Comment: Presented at Collective Intelligence conference, 2012
(arXiv:1204.2991
On the Power of Conditional Samples in Distribution Testing
In this paper we define and examine the power of the {\em
conditional-sampling} oracle in the context of distribution-property testing.
The conditional-sampling oracle for a discrete distribution takes as
input a subset of the domain, and outputs a random sample drawn according to , conditioned on (and independently of all
prior samples). The conditional-sampling oracle is a natural generalization of
the ordinary sampling oracle in which always equals .
We show that with the conditional-sampling oracle, testing uniformity,
testing identity to a known distribution, and testing any label-invariant
property of distributions is easier than with the ordinary sampling oracle. On
the other hand, we also show that for some distribution properties the
sample-complexity remains near-maximal even with conditional sampling
- …
