180,250 research outputs found

    Unweighted regression models perform better than weighted regression techniques for respondent-driven sampling data: results from a simulation study

    Get PDF
    Background: It is unclear whether weighted or unweighted regression is preferred in the analysis of data derived from respondent driven sampling. Our objective was to evaluate the validity of various regression models, with and without weights and with various controls for clustering in the estimation of the risk of group membership from data collected using respondent-driven sampling (RDS). Methods: Twelve networked populations, with varying levels of homophily and prevalence, based on a known distribution of a continuous predictor were simulated using 1000 RDS samples from each population. Weighted and unweighted binomial and Poisson general linear models, with and without various clustering controls and standard error adjustments were modelled for each sample and evaluated with respect to validity, bias and coverage rate. Population prevalence was also estimated. Results: In the regression analysis, the unweighted log-link (Poisson) models maintained the nominal type-I error rate across all populations. Bias was substantial and type-I error rates unacceptably high for weighted binomial regression. Coverage rates for the estimation of prevalence were highest using RDS-weighted logistic regression, except at low prevalence (10%) where unweighted models are recommended. Conclusions: Caution is warranted when undertaking regression analysis of RDS data. Even when reported degree is accurate, low reported degree can unduly influence regression estimates. Unweighted Poisson regression is therefore recommended.York University Librarie

    Graph Laplacians and their convergence on random neighborhood graphs

    Full text link
    Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a non-uniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator.Comment: Improved presentation, typos corrected, to appear in JML

    Galaxy clustering with photometric surveys using PDF redshift information

    Get PDF
    Photometric surveys produce large-area maps of the galaxy distribution, but with less accurate redshift information than is obtained from spectroscopic methods. Modern photometric redshift (photo-z) algorithms use galaxy magnitudes, or colors, that are obtained through multi-band imaging to produce a probability density function (PDF) for each galaxy in the map. We used simulated data to study the effect of using different photo-z estimators to assign galaxies to redshift bins in order to compare their effects on angular clustering and galaxy bias measurements. We found that if we use the entire PDF, rather than a single-point (mean or mode) estimate, the deviations are less biased, especially when using narrow redshift bins. When the redshift bin widths are Δz=0.1\Delta z=0.1, the use of the entire PDF reduces the typical measurement bias from 5%, when using single point estimates, to 3%.Comment: Matches the MNRAS published version. 19 pages, 19 Figure

    The Weak Clustering of Gas-Rich Galaxies

    Full text link
    We examine the clustering properties of HI-selected galaxies through an analysis of the HI Parkes All-Sky Survey Catalogue (HICAT) two-point correlation function. Various sub-samples are extracted from this catalogue to study the overall clustering of HI-rich galaxies and its dependence on luminosity, HI gas mass and rotational velocity. These samples cover the entire southern sky Dec < 0 deg, containing up to 4,174 galaxies over the radial velocity range 300-12,700 km/s. A scale length of r_0 = 3.45 +/- 0.25 Mpc/h and slope of gamma = 1.47 +/- 0.08 is obtained for the HI-rich galaxy real-space correlation function, making gas-rich galaxies among the most weakly clustered objects known. HI-selected galaxies also exhibit weaker clustering than optically selected galaxies of comparable luminosities. Good agreement is found between our results and those of synthetic HI-rich galaxy catalogues generated from the Millennium Run CDM simulation. Bisecting HICAT using different parameter cuts, clustering is found to depend most strongly on rotational velocity and luminosity, while the dependency on HI mass is marginal. Splitting the sample around v_rot = 108 km/s, a scale length of r_0 = 2.86 +/- 0.46 Mpc/h is found for galaxies with low rotational velocities compared to r_0 = 3.96 +/- 0.33 Mpc/h for the high rotational velocity sample.Comment: Accepted for publication in the Astrophysical Journa
    corecore