22,310 research outputs found
Optimal Bayes Classifiers for Functional Data and Density Ratios
Bayes classifiers for functional data pose a challenge. This is because
probability density functions do not exist for functional data. As a
consequence, the classical Bayes classifier using density quotients needs to be
modified. We propose to use density ratios of projections on a sequence of
eigenfunctions that are common to the groups to be classified. The density
ratios can then be factored into density ratios of individual functional
principal components whence the classification problem is reduced to a sequence
of nonparametric one-dimensional density estimates. This is an extension to
functional data of some of the very earliest nonparametric Bayes classifiers
that were based on simple density ratios in the one-dimensional case. By means
of the factorization of the density quotients the curse of dimensionality that
would otherwise severely affect Bayes classifiers for functional data can be
avoided. We demonstrate that in the case of Gaussian functional data, the
proposed functional Bayes classifier reduces to a functional version of the
classical quadratic discriminant. A study of the asymptotic behavior of the
proposed classifiers in the large sample limit shows that under certain
conditions the misclassification rate converges to zero, a phenomenon that has
been referred to as "perfect classification". The proposed classifiers also
perform favorably in finite sample applications, as we demonstrate in
comparisons with other functional classifiers in simulations and various data
applications, including wine spectral data, functional magnetic resonance
imaging (fMRI) data for attention deficit hyperactivity disorder (ADHD)
patients, and yeast gene expression data
High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation
The ratio between two probability density functions is an important component
of various tasks, including selection bias correction, novelty detection and
classification. Recently, several estimators of this ratio have been proposed.
Most of these methods fail if the sample space is high-dimensional, and hence
require a dimension reduction step, the result of which can be a significant
loss of information. Here we propose a simple-to-implement, fully nonparametric
density ratio estimator that expands the ratio in terms of the eigenfunctions
of a kernel-based operator; these functions reflect the underlying geometry of
the data (e.g., submanifold structure), often leading to better estimates
without an explicit dimension reduction step. We show how our general framework
can be extended to address another important problem, the estimation of a
likelihood function in situations where that function cannot be
well-approximated by an analytical form. One is often faced with this situation
when performing statistical inference with data from the sciences, due the
complexity of the data and of the processes that generated those data. We
emphasize applications where using existing likelihood-free methods of
inference would be challenging due to the high dimensionality of the sample
space, but where our spectral series method yields a reasonable estimate of the
likelihood function. We provide theoretical guarantees and illustrate the
effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia
Kinematics of the local universe IX. The Perseus-Pisces supercluster and the Tolman-Bondi model
We study the mass distribution and the infall pattern of the Perseus-Pisces
(PP) supercluster. First we calculate the mass of the central part of PP, a
sphere with a radius of 15/h Mpc centered at (l,b)=(140.2\deg ,-22.0\deg),
d=50/h Mpc, using the virial and other estimators. We get M_{PP} = 4 -- 7 /h
10^{15} M_{sun}, giving mass-to-light ratio 200 -- 600 h M_{sun} / L_{sun}, and
overdensity \delta \approx 4.
The radially averaged smoothed density distribution around the PP is inputted
to the Tolman-Bondi (TB) equations, calculated for different cosmologies:
\Omega_0 = [0.1,1], \Omega_{\Lambda} = 1-\Omega_0 or 0. As a result we get the
infall velocities towards the PP center. Comparing the TB results to the
peculiar velocities measured for the Kinematics of the Local Universe (KLUN)
Tully-Fisher data set we get the best fit for the conditions \Omega_0 = 0.2 --
0.4 and v_{inf} < 100 km/s for the Local Group infall towards the center of PP.
The applicability of the TB method in a complex environment, such as PP, is
tested on an N-body simulation.Comment: in press (A&A
The bivariate gas-stellar mass distributions and the mass functions of early- and late-type galaxies at
We report the bivariate HI- and H-stellar mass distributions of local
galaxies in addition of an inventory of galaxy mass functions, MFs, for HI,
H, cold gas, and baryonic mass, separately into early- and late-type
galaxies. The MFs are determined using the HI and H conditional
distributions and the galaxy stellar mass function, GSMF. For the conditional
distributions we use the compilation presented in Calette et al. 2018. For
determining the GSMF from to
, we combine two spectroscopic samples from the SDSS at the redshift
range . We find that the low-mass end slope of the GSMF, after
correcting from surface brightness incompleteness, is ,
consistent with previous determinations. The obtained HI MFs agree with radio
blind surveys. Similarly, the H MFs are consistent with CO follow-up
optically-selected samples. We estimate the impact of systematics due to
mass-to-light ratios and find that our MFs are robust against systematic
errors. We deconvolve our MFs from random errors to obtain the intrinsic MFs.
Using the MFs, we calculate cosmic density parameters of all the baryonic
components. Baryons locked inside galaxies represent 5.4% of the universal
baryon content, while % of the HI and H mass inside galaxies reside
in late-type morphologies. Our results imply cosmic depletion times of H
and total neutral H in late-type galaxies of and 7.2 Gyr,
respectively, which shows that late type galaxies are on average inefficient in
converting H into stars and in transforming HI gas into H. Our results
provide a fully self-consistent empirical description of galaxy demographics in
terms of the bivariate gas--stellar mass distribution and their projections,
the MFs. This description is ideal to compare and/or to constrain galaxy
formation models.Comment: 37 pages, 17 figures. Accepted for publication in PASA. A code that
displays tables and figures with all the relevant statistical distributions
and correlations discussed in this paper is available here
https://github.com/arcalette/Python-code-to-generate-Rodriguez-Puebla-2020-result
Transit Costs and Cost Efficiency: Bootstrapping Nonparametric Frontiers.
This paper explores a selection of recently proposed bootstrapping techniques to estimate non-parametric convex (DEA) cost frontiers and efficiency scores for transit firms. Using a sample of Norwegian bus operators, the key results can be summarised as follows: (i) the bias implied by uncorrected cost efficiency measures is numerically important (close to 25%), (ii) the bootstrapped-based test rejects the constant returns to scale hypothesis (iii) explaining patterns of efficiency scores using a two-stage bootstrapping approach detects only one significant covariate, in contrast to earlier results highlighting, e.g., the positive impact of high-powered contract types. Finally, comparing the average inefficiency obtained for the Norwegian data set with an analogous estimate for a smaller French sample illustrates how the estimated differences in average efficiency almost disappear once sample size differences are accounted for.
Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Within path sampling framework, we show that probability distribution
divergences, such as the Chernoff information, can be estimated via
thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to
different Hamiltonians is implemented to derive tempered transitions along the
path, linking the distributions of interest at the endpoints. Under this
perspective, a geometric approach is feasible, which prompts intuition and
facilitates tuning the error sources. Additionally, there are direct
applications in Bayesian model evaluation. Existing marginal likelihood and
Bayes factor estimators are reviewed here along with their stepping-stone
sampling analogues. New estimators are presented and the use of compound paths
is introduced
- …