Search CORE

22,310 research outputs found

Optimal Bayes Classifiers for Functional Data and Density Ratios

Author: Dai Xiongtao
Müller Hans-Georg
Yao Fang
Publication venue
Publication date: 12/05/2016
Field of study

Bayes classifiers for functional data pose a challenge. This is because probability density functions do not exist for functional data. As a consequence, the classical Bayes classifier using density quotients needs to be modified. We propose to use density ratios of projections on a sequence of eigenfunctions that are common to the groups to be classified. The density ratios can then be factored into density ratios of individual functional principal components whence the classification problem is reduced to a sequence of nonparametric one-dimensional density estimates. This is an extension to functional data of some of the very earliest nonparametric Bayes classifiers that were based on simple density ratios in the one-dimensional case. By means of the factorization of the density quotients the curse of dimensionality that would otherwise severely affect Bayes classifiers for functional data can be avoided. We demonstrate that in the case of Gaussian functional data, the proposed functional Bayes classifier reduces to a functional version of the classical quadratic discriminant. A study of the asymptotic behavior of the proposed classifiers in the large sample limit shows that under certain conditions the misclassification rate converges to zero, a phenomenon that has been referred to as "perfect classification". The proposed classifiers also perform favorably in finite sample applications, as we demonstrate in comparisons with other functional classifiers in simulations and various data applications, including wine spectral data, functional magnetic resonance imaging (fMRI) data for attention deficit hyperactivity disorder (ADHD) patients, and yeast gene expression data

arXiv.org e-Print Archive

eScholarship - University of California

High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation

Author: Izbicki Rafael
Lee Ann B.
Schafer Chad M.
Publication venue
Publication date: 29/04/2014
Field of study

The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia

arXiv.org e-Print Archive

CiteSeerX

Kinematics of the local universe IX. The Perseus-Pisces supercluster and the Tolman-Bondi model

Author: Ekholm Timo
Hanski Mikko
Teerikorpi Pekka
Theureau Gilles
Publication venue: 'EDP Sciences'
Publication date: 01/01/2001
Field of study

We study the mass distribution and the infall pattern of the Perseus-Pisces (PP) supercluster. First we calculate the mass of the central part of PP, a sphere with a radius of 15/h Mpc centered at (l,b)=(140.2\deg ,-22.0\deg), d=50/h Mpc, using the virial and other estimators. We get M_{PP} = 4 -- 7 /h 10^{15} M_{sun}, giving mass-to-light ratio 200 -- 600 h M_{sun} / L_{sun}, and overdensity \delta \approx 4. The radially averaged smoothed density distribution around the PP is inputted to the Tolman-Bondi (TB) equations, calculated for different cosmologies: \Omega_0 = [0.1,1], \Omega_{\Lambda} = 1-\Omega_0 or 0. As a result we get the infall velocities towards the PP center. Comparing the TB results to the peculiar velocities measured for the Kinematics of the Local Universe (KLUN) Tully-Fisher data set we get the best fit for the conditions \Omega_0 = 0.2 -- 0.4 and v_{inf} < 100 km/s for the Local Group infall towards the center of PP. The applicability of the TB method in a complex environment, such as PP, is tested on an N-body simulation.Comment: in press (A&A

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

The bivariate gas-stellar mass distributions and the mass functions of early- and late-type galaxies at $z\sim0$

Author: Avila-Reese Vladimir
Calette A. R.
Huertas-Company Marc
Rodriguez-Gomez Vicente
Rodriguez-Puebla Aldo
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2020
Field of study

We report the bivariate HI- and H

_2

-stellar mass distributions of local galaxies in addition of an inventory of galaxy mass functions, MFs, for HI, H

_2

, cold gas, and baryonic mass, separately into early- and late-type galaxies. The MFs are determined using the HI and H

_2

conditional distributions and the galaxy stellar mass function, GSMF. For the conditional distributions we use the compilation presented in Calette et al. 2018. For determining the GSMF from

M_{\ast}\sim3\times10^{7}

3\times10^{12}

M_{\odot}

, we combine two spectroscopic samples from the SDSS at the redshift range

0.0033<z<0.2

. We find that the low-mass end slope of the GSMF, after correcting from surface brightness incompleteness, is

\alpha\approx-1.4

, consistent with previous determinations. The obtained HI MFs agree with radio blind surveys. Similarly, the H

_2

MFs are consistent with CO follow-up optically-selected samples. We estimate the impact of systematics due to mass-to-light ratios and find that our MFs are robust against systematic errors. We deconvolve our MFs from random errors to obtain the intrinsic MFs. Using the MFs, we calculate cosmic density parameters of all the baryonic components. Baryons locked inside galaxies represent 5.4% of the universal baryon content, while

\sim96

% of the HI and H

_2

mass inside galaxies reside in late-type morphologies. Our results imply cosmic depletion times of H

_2

and total neutral H in late-type galaxies of

\sim 1.3

and 7.2 Gyr, respectively, which shows that late type galaxies are on average inefficient in converting H

_2

into stars and in transforming HI gas into H

_2

. Our results provide a fully self-consistent empirical description of galaxy demographics in terms of the bivariate gas--stellar mass distribution and their projections, the MFs. This description is ideal to compare and/or to constrain galaxy formation models.Comment: 37 pages, 17 figures. Accepted for publication in PASA. A code that displays tables and figures with all the relevant statistical distributions and correlations discussed in this paper is available here https://github.com/arcalette/Python-code-to-generate-Rodriguez-Puebla-2020-result

arXiv.org e-Print Archive

HAL-INSU

HAL-OBSPM

Hal-Diderot

Transit Costs and Cost Efficiency: Bootstrapping Nonparametric Frontiers.

Author: Bruno De Borger
Kristiaan Kerstens
Matthias Staat
Publication venue
Publication date
Field of study

This paper explores a selection of recently proposed bootstrapping techniques to estimate non-parametric convex (DEA) cost frontiers and efficiency scores for transit firms. Using a sample of Norwegian bus operators, the key results can be summarised as follows: (i) the bias implied by uncorrected cost efficiency measures is numerically important (close to 25%), (ii) the bootstrapped-based test rejects the constant returns to scale hypothesis (iii) explaining patterns of efficiency scores using a two-stage bootstrapping approach detects only one significant covariate, in contrast to earlier results highlighting, e.g., the positive impact of high-powered contract types. Finally, comparing the average inefficiency obtained for the Norwegian data set with an analogous estimate for a smaller French sample illustrates how the estimated differences in average efficiency almost disappear once sample size differences are accounted for.

Research Papers in Economics

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Author: Ntzoufras Ioannis
Vitoratou Silia
Publication venue
Publication date: 17/10/2013
Field of study

Within path sampling framework, we show that probability distribution divergences, such as the Chernoff information, can be estimated via thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to different Hamiltonians is implemented to derive tempered transitions along the path, linking the distributions of interest at the endpoints. Under this perspective, a geometric approach is feasible, which prompts intuition and facilitates tuning the error sources. Additionally, there are direct applications in Bayesian model evaluation. Existing marginal likelihood and Bayes factor estimators are reviewed here along with their stepping-stone sampling analogues. New estimators are presented and the use of compound paths is introduced

arXiv.org e-Print Archive

CiteSeerX