62,456 research outputs found
Universal Sequential Outlier Hypothesis Testing
Universal outlier hypothesis testing is studied in a sequential setting.
Multiple observation sequences are collected, a small subset of which are
outliers. A sequence is considered an outlier if the observations in that
sequence are generated by an "outlier" distribution, distinct from a common
"typical" distribution governing the majority of the sequences. Apart from
being distinct, the outlier and typical distributions can be arbitrarily close.
The goal is to design a universal test to best discern all the outlier
sequences. A universal test with the flavor of the repeated significance test
is proposed and its asymptotic performance is characterized under various
universal settings. The proposed test is shown to be universally consistent.
For the model with identical outliers, the test is shown to be asymptotically
optimal universally when the number of outliers is the largest possible and
with the typical distribution being known, and its asymptotic performance
otherwise is also characterized. An extension of the findings to the model with
multiple distinct outliers is also discussed. In all cases, it is shown that
the asymptotic performance guarantees for the proposed test when neither the
outlier nor typical distribution is known converge to those when the typical
distribution is known.Comment: Proc. of the Asilomar Conference on Signals, Systems, and Computers,
2014. To appea
Predicting Native Language from Gaze
A fundamental question in language learning concerns the role of a speaker's
first language in second language acquisition. We present a novel methodology
for studying this question: analysis of eye-movement patterns in second
language reading of free-form text. Using this methodology, we demonstrate for
the first time that the native language of English learners can be predicted
from their gaze fixations when reading English. We provide analysis of
classifier uncertainty and learned features, which indicates that differences
in English reading are likely to be rooted in linguistic divergences across
native languages. The presented framework complements production studies and
offers new ground for advancing research on multilingualism.Comment: ACL 201
Detecting positive correlations in a multivariate sample
We consider the problem of testing whether a correlation matrix of a
multivariate normal population is the identity matrix. We focus on sparse
classes of alternatives where only a few entries are nonzero and, in fact,
positive. We derive a general lower bound applicable to various classes and
study the performance of some near-optimal tests. We pay special attention to
computational feasibility and construct near-optimal tests that can be computed
efficiently. Finally, we apply our results to prove new lower bounds for the
clique number of high-dimensional random geometric graphs.Comment: Published at http://dx.doi.org/10.3150/13-BEJ565 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
The bias-variance tradeoff tells us that as model complexity increases, bias
falls and variances increases, leading to a U-shaped test error curve. However,
recent empirical results with over-parameterized neural networks are marked by
a striking absence of the classic U-shaped test error curve: test error keeps
decreasing in wider networks. This suggests that there might not be a
bias-variance tradeoff in neural networks with respect to network width, unlike
was originally claimed by, e.g., Geman et al. (1992). Motivated by the shaky
evidence used to support this claim in neural networks, we measure bias and
variance in the modern setting. We find that both bias and variance can
decrease as the number of parameters grows. To better understand this, we
introduce a new decomposition of the variance to disentangle the effects of
optimization and data sampling. We also provide theoretical analysis in a
simplified setting that is consistent with our empirical findings
- …