22,310 research outputs found

    Optimal Bayes Classifiers for Functional Data and Density Ratios

    Full text link
    Bayes classifiers for functional data pose a challenge. This is because probability density functions do not exist for functional data. As a consequence, the classical Bayes classifier using density quotients needs to be modified. We propose to use density ratios of projections on a sequence of eigenfunctions that are common to the groups to be classified. The density ratios can then be factored into density ratios of individual functional principal components whence the classification problem is reduced to a sequence of nonparametric one-dimensional density estimates. This is an extension to functional data of some of the very earliest nonparametric Bayes classifiers that were based on simple density ratios in the one-dimensional case. By means of the factorization of the density quotients the curse of dimensionality that would otherwise severely affect Bayes classifiers for functional data can be avoided. We demonstrate that in the case of Gaussian functional data, the proposed functional Bayes classifier reduces to a functional version of the classical quadratic discriminant. A study of the asymptotic behavior of the proposed classifiers in the large sample limit shows that under certain conditions the misclassification rate converges to zero, a phenomenon that has been referred to as "perfect classification". The proposed classifiers also perform favorably in finite sample applications, as we demonstrate in comparisons with other functional classifiers in simulations and various data applications, including wine spectral data, functional magnetic resonance imaging (fMRI) data for attention deficit hyperactivity disorder (ADHD) patients, and yeast gene expression data

    High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation

    Full text link
    The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia

    Kinematics of the local universe IX. The Perseus-Pisces supercluster and the Tolman-Bondi model

    Get PDF
    We study the mass distribution and the infall pattern of the Perseus-Pisces (PP) supercluster. First we calculate the mass of the central part of PP, a sphere with a radius of 15/h Mpc centered at (l,b)=(140.2\deg ,-22.0\deg), d=50/h Mpc, using the virial and other estimators. We get M_{PP} = 4 -- 7 /h 10^{15} M_{sun}, giving mass-to-light ratio 200 -- 600 h M_{sun} / L_{sun}, and overdensity \delta \approx 4. The radially averaged smoothed density distribution around the PP is inputted to the Tolman-Bondi (TB) equations, calculated for different cosmologies: \Omega_0 = [0.1,1], \Omega_{\Lambda} = 1-\Omega_0 or 0. As a result we get the infall velocities towards the PP center. Comparing the TB results to the peculiar velocities measured for the Kinematics of the Local Universe (KLUN) Tully-Fisher data set we get the best fit for the conditions \Omega_0 = 0.2 -- 0.4 and v_{inf} < 100 km/s for the Local Group infall towards the center of PP. The applicability of the TB method in a complex environment, such as PP, is tested on an N-body simulation.Comment: in press (A&A

    The bivariate gas-stellar mass distributions and the mass functions of early- and late-type galaxies at z0z\sim0

    Full text link
    We report the bivariate HI- and H2_2-stellar mass distributions of local galaxies in addition of an inventory of galaxy mass functions, MFs, for HI, H2_2, cold gas, and baryonic mass, separately into early- and late-type galaxies. The MFs are determined using the HI and H2_2 conditional distributions and the galaxy stellar mass function, GSMF. For the conditional distributions we use the compilation presented in Calette et al. 2018. For determining the GSMF from M3×107M_{\ast}\sim3\times10^{7} to 3×10123\times10^{12} MM_{\odot}, we combine two spectroscopic samples from the SDSS at the redshift range 0.0033<z<0.20.0033<z<0.2. We find that the low-mass end slope of the GSMF, after correcting from surface brightness incompleteness, is α1.4\alpha\approx-1.4, consistent with previous determinations. The obtained HI MFs agree with radio blind surveys. Similarly, the H2_2 MFs are consistent with CO follow-up optically-selected samples. We estimate the impact of systematics due to mass-to-light ratios and find that our MFs are robust against systematic errors. We deconvolve our MFs from random errors to obtain the intrinsic MFs. Using the MFs, we calculate cosmic density parameters of all the baryonic components. Baryons locked inside galaxies represent 5.4% of the universal baryon content, while 96\sim96% of the HI and H2_2 mass inside galaxies reside in late-type morphologies. Our results imply cosmic depletion times of H2_2 and total neutral H in late-type galaxies of 1.3\sim 1.3 and 7.2 Gyr, respectively, which shows that late type galaxies are on average inefficient in converting H2_2 into stars and in transforming HI gas into H2_2. Our results provide a fully self-consistent empirical description of galaxy demographics in terms of the bivariate gas--stellar mass distribution and their projections, the MFs. This description is ideal to compare and/or to constrain galaxy formation models.Comment: 37 pages, 17 figures. Accepted for publication in PASA. A code that displays tables and figures with all the relevant statistical distributions and correlations discussed in this paper is available here https://github.com/arcalette/Python-code-to-generate-Rodriguez-Puebla-2020-result

    Transit Costs and Cost Efficiency: Bootstrapping Nonparametric Frontiers.

    Get PDF
    This paper explores a selection of recently proposed bootstrapping techniques to estimate non-parametric convex (DEA) cost frontiers and efficiency scores for transit firms. Using a sample of Norwegian bus operators, the key results can be summarised as follows: (i) the bias implied by uncorrected cost efficiency measures is numerically important (close to 25%), (ii) the bootstrapped-based test rejects the constant returns to scale hypothesis (iii) explaining patterns of efficiency scores using a two-stage bootstrapping approach detects only one significant covariate, in contrast to earlier results highlighting, e.g., the positive impact of high-powered contract types. Finally, comparing the average inefficiency obtained for the Norwegian data set with an analogous estimate for a smaller French sample illustrates how the estimated differences in average efficiency almost disappear once sample size differences are accounted for.

    Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

    Full text link
    Within path sampling framework, we show that probability distribution divergences, such as the Chernoff information, can be estimated via thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to different Hamiltonians is implemented to derive tempered transitions along the path, linking the distributions of interest at the endpoints. Under this perspective, a geometric approach is feasible, which prompts intuition and facilitates tuning the error sources. Additionally, there are direct applications in Bayesian model evaluation. Existing marginal likelihood and Bayes factor estimators are reviewed here along with their stepping-stone sampling analogues. New estimators are presented and the use of compound paths is introduced
    corecore