8,578 research outputs found
On nonparametric maximum likelihood for a class of stochastic inverse problems
We establish the consistency of a nonparametric maximum likelihood estimator
for a class of stochastic inverse problems. We proceed by embedding the
framework into the general settings of early results of Pfanzagl related to
mixtures
Limits of Learning about a Categorical Latent Variable under Prior Near-Ignorance
In this paper, we consider the coherent theory of (epistemic) uncertainty of
Walley, in which beliefs are represented through sets of probability
distributions, and we focus on the problem of modeling prior ignorance about a
categorical random variable. In this setting, it is a known result that a state
of prior ignorance is not compatible with learning. To overcome this problem,
another state of beliefs, called \emph{near-ignorance}, has been proposed.
Near-ignorance resembles ignorance very closely, by satisfying some principles
that can arguably be regarded as necessary in a state of ignorance, and allows
learning to take place. What this paper does, is to provide new and substantial
evidence that also near-ignorance cannot be really regarded as a way out of the
problem of starting statistical inference in conditions of very weak beliefs.
The key to this result is focusing on a setting characterized by a variable of
interest that is \emph{latent}. We argue that such a setting is by far the most
common case in practice, and we provide, for the case of categorical latent
variables (and general \emph{manifest} variables) a condition that, if
satisfied, prevents learning to take place under prior near-ignorance. This
condition is shown to be easily satisfied even in the most common statistical
problems. We regard these results as a strong form of evidence against the
possibility to adopt a condition of prior near-ignorance in real statistical
problems.Comment: 27 LaTeX page
Fractional norms and quasinorms do not help to overcome the curse of dimensionality
The curse of dimensionality causes the well-known and widely discussed
problems for machine learning methods. There is a hypothesis that using of the
Manhattan distance and even fractional quasinorms lp (for p less than 1) can
help to overcome the curse of dimensionality in classification problems. In
this study, we systematically test this hypothesis. We confirm that fractional
quasinorms have a greater relative contrast or coefficient of variation than
the Euclidean norm l2, but we also demonstrate that the distance concentration
shows qualitatively the same behaviour for all tested norms and quasinorms and
the difference between them decays as dimension tends to infinity. Estimation
of classification quality for kNN based on different norms and quasinorms shows
that a greater relative contrast does not mean better classifier performance
and the worst performance for different databases was shown by different norms
(quasinorms). A systematic comparison shows that the difference of the
performance of kNN based on lp for p=2, 1, and 0.5 is statistically
insignificant
Direct Ensemble Estimation of Density Functionals
Estimating density functionals of analog sources is an important problem in
statistical signal processing and information theory. Traditionally, estimating
these quantities requires either making parametric assumptions about the
underlying distributions or using non-parametric density estimation followed by
integration. In this paper we introduce a direct nonparametric approach which
bypasses the need for density estimation by using the error rates of k-NN
classifiers asdata-driven basis functions that can be combined to estimate a
range of density functionals. However, this method is subject to a non-trivial
bias that dramatically slows the rate of convergence in higher dimensions. To
overcome this limitation, we develop an ensemble method for estimating the
value of the basis function which, under some minor constraints on the
smoothness of the underlying distributions, achieves the parametric rate of
convergence regardless of data dimension.Comment: 5 page
Partially-Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology
In population studies on the etiology of disease, one goal is the estimation
of the fraction of cases attributable to each of several causes. For example,
pneumonia is a clinical diagnosis of lung infection that may be caused by
viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology
is challenging because directly sampling from the lung to identify the
etiologic pathogen is not standard clinical practice in most settings. Instead,
measurements from multiple peripheral specimens are made. This paper introduces
the statistical methodology designed for estimating the population etiology
distribution and the individual etiology probabilities in the Pneumonia
Etiology Research for Child Health (PERCH) study of 9; 500 children for 7 sites
around the world. We formulate the scientific problem in statistical terms as
estimating the mixing weights and latent class indicators under a
partially-latent class model (pLCM) that combines heterogeneous measurements
with different error rates obtained from a case-control study. We introduce the
pLCM as an extension of the latent class model. We also introduce graphical
displays of the population data and inferred latent-class frequencies. The
methods are tested with simulated data, and then applied to PERCH data. The
paper closes with a brief description of extensions of the pLCM to the
regression setting and to the case where conditional independence among the
measures is relaxed.Comment: 25 pages, 4 figures, 1 supplementary materia
- …