9 research outputs found

    Modeling and Analysing Respondent Driven Sampling as a Counting Process

    Full text link
    Respondent-driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral methods to facilitate sampling. RDS typically leads to biased sampling, favoring participants with many acquaintances. Naive estimates, such as the sample average, which are uncorrected for the sampling bias, will themselves be biased. To compensate for this bias, current methodology suggests inverse-degree weighting, where the "degree" is the number of acquaintances. This stems from the fundamental RDS assumption that the probability of sampling an individual is proportional to their degree. Since this assumption is tenuous at best, we propose to harness the additional information encapsulated in the time of recruitment, into a model-based inference framework for RDS. This information is typically collected by researchers, but ignored. We adapt methods developed for inference in epidemic processes to estimate the population size, degree counts and frequencies. While providing valuable information in themselves, these quantities ultimately serve to debias other estimators, such a disease's prevalence. A fundamental advantage of our approach is that, being model-based, it makes all assumptions of the data-generating process explicit. This enables verification of the assumptions, maximum likelihood estimation, extension with covariates, and model selection. We develop asymptotic theory, proving consistency and asymptotic normality properties. We further compare these estimators to the standard inverse-degree weighting through simulations, and using real-world data. In both cases we find our estimators to outperform current methods. The likelihood problem in the model we present is convex, and thus efficiently solvable. We implement these estimators in an R package, chords, available on CRAN.Comment: 16 page

    Cost effectiveness and affordability of trastuzumab in sub-Saharan Africa for early stage HER2-positive breast cancer

    Get PDF
    Additional file 3: Figure S2. ICER (incremental cost-effectiveness ratio) results for each country

    Simplicity Bias in Overparameterized Machine Learning

    No full text
    A thorough theoretical understanding of the surprising generalization ability of deep networks (and other overparameterized models) is still lacking. Here we demonstrate that simplicity bias is a major phenomenon to be reckoned with in overparameterized machine learning. In addition to explaining the outcome of simplicity bias, we also study its source: following concrete rigorous examples, we argue that (i) simplicity bias can explain generalization in overparameterized learning models such as neural networks; (ii) simplicity bias and excellent generalization are optimizer-independent, as our example shows, and although the optimizer affects training, it is not the driving force behind simplicity bias; (iii) simplicity bias in pre-training models, and subsequent posteriors, is universal and stems from the subtle fact that uniformly-at-random constructed priors are not uniformly-at-random sampled ; and (iv) in neural network models, the biasing mechanism in wide (and shallow) networks is different from the biasing mechanism in deep (and narrow) networks

    Polio particles vs vaccinated in 7 cities/towns

    No full text
    Polio particles vs vaccinated in 7 cities/towns. All on a Log10 scal

    Random Intersection Graphs and Missing Data

    No full text
    Random-graphs and statistical inference with missing data are two separate topics that have been widely explored each in its field. In this paper we demonstrate the relationship between these two different topics and take a novel view of the data matrix as a random intersection graph. We use graph properties and theoretical results from random-graph theory, such as connectivity and the emergence of the giant component, to identify two threshold phenomena in statistical inference with missing data: loss of identifiability and slower convergence of algorithms that are pertinent to statistical inference such as expectation-maximization (EM). We provide two examples corresponding to these threshold phenomena and illustrate the theoretical predictions with simulations that are consistent with our reduction

    Normalized Information Criteria and Model Selection in the Presence of Missing Data

    No full text
    Information criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are commonly used for model selection. However, the current theory does not support unconventional data, so naive use of these criteria is not suitable for data with missing values. Imputation, at the core of most alternative methods, is both distorted as well as computationally demanding. We propose a new approach that enables the use of classic well-known information criteria for model selection when there are missing data. We adapt the current theory of information criteria through normalization, accounting for the different sample sizes used for each candidate model (focusing on AIC and BIC). Interestingly, when the sample sizes are different, our theoretical analysis finds that AICj/nj is the proper correction for AICj that we need to optimize (where nj is the sample size available to the jth model) while −(BICj−BICi)/(nj−ni) is the correction of BIC. Furthermore, we find that the computational complexity of normalized information criteria methods is exponentially better than that of imputation methods. In a series of simulation studies, we find that normalized-AIC and normalized-BIC outperform previous methods (i.e., normalized-AIC is more efficient, and normalized BIC includes only important variables, although it tends to exclude some of them in cases of large correlation). We propose three additional methods aimed at increasing the statistical efficiency of normalized-AIC: post-selection imputation, Akaike sub-model averaging, and minimum-variance averaging. The latter succeeds in increasing efficiency further
    corecore