Search CORE

1,095,743 research outputs found

Better subset regression

Author: Xiong Shifeng
Publication venue
Publication date: 18/03/2013
Field of study

To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a "better fitting, better screening" rule, i.e., pick a "better" subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their monotonicity property makes the subset have better model fitting than initial subsets generated by popular screening methods, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive in high dimensional variable screening even for finite sample sizes.Comment: 24 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Learning to Predict the Wisdom of Crowds

Author: Ertekin Seyda
Hirsh Haym
Rudin Cynthia
Publication venue
Publication date: 01/01/2012
Field of study

The problem of "approximating the crowd" is that of estimating the crowd's majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, "CrowdSense," that works in an online fashion to dynamically sample subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers' votes that approximates the crowd's opinion.Comment: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991

arXiv.org e-Print Archive

CiteSeerX

OpenMETU (Middle East Technical University)

On the Power of Conditional Samples in Distribution Testing

Author: Chakraborty Sourav
Fischer Eldar
Goldhirsh Yonatan
Matsliah Arie
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we define and examine the power of the {\em conditional-sampling} oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution

\mu

takes as input a subset

S \subset [n]

of the domain, and outputs a random sample

i \in S

drawn according to

\mu

, conditioned on

S

(and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which

S

always equals

[n]

. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling

arXiv.org e-Print Archive

CiteSeerX

Crossref