43 research outputs found

    Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response

    Get PDF
    Methods specifically targeting missing values in a wide spectrum of statistical analyses are now part of serious statistical thinking due to many advances in computational statistics and increased awareness among sophisticated consumers of statistics. Despite many advances in both theory and applied methods for missing data, missing-data methods in multilevel applications lack equal development. In this paper, I consider a popular inferential tool via multiple imputation in multilevel applications with missing values. I specifically consider missing values occurring arbitrarily at any level of observational units. I use Bayesian arguments for drawing multiple imputations from the underlying (posterior) predictive distribution of missing data. Multivariate extensions of well-known mixed-effects models form the basis for simulating the posterior predictive distribution, hence creating the multiple imputations. The discussion of these topics is demonstrated in an application assessing correlates to unmet need for mental health care among children with special health care needs

    Within-Household Selection Methods: A Critical Review and Experimental Examination

    Get PDF
    Probability samples are necessary for making statistical inferences to the general population (Baker et al. 2013). Some countries (e.g. Sweden) have population registers from which to randomly select samples of adults. The U.S. and many other countries, however, do not have population registers. Instead, researchers (i) select a probability sample of households from lists of areas, addresses, or telephone numbers and (ii) select an adult within these sampled households. The process by which individuals are selected from sampled households to obtain a probability-based sample of individuals is called within-household (or within-unit) selection (Gaziano 2005).Within-household selection aims to provide each member of a sampled household with a known, nonzero chance of being selected for the survey (Gaziano 2005; Lavrakas 2008). Thus, it helps to ensure that the sample represents the target population rather than only those most willing and available to participate and, as such, reduces total survey error (TSE). In interviewer-administered surveys, trained interviewers can implement a prespecified within-household selection procedure, making the selection process relatively straightforward. In self-administered surveys, within-household selection is more challenging because households must carry out the selection task themselves. This can lead to errors in the selection process or nonresponse, resulting in too many or too few of certain types of people in the data (e.g. typically too many female, highly educated, older, and white respondents), and may also lead to biased estimates for other items. We expect the smallest biases in estimates for items that do not differ across household members (e.g. political views, household income) and the largest biases for items that do differ across household members (e.g. household division of labor). In this chapter, we review recent literature on within-household selection across survey modes, identify the methodological requirements of studying within-household selection methods experimentally, provide an example of an experiment designed to improve the quality of selecting an adult within a household in mail surveys, and summarize current implications for survey practice regarding within-household selection. We focus on selection of one adult out of all possible adults in a household; screening households for members who have particular characteristics has additional complications (e.g. Tourangeau et al. 2012; Brick et al. 2016; Brick et al. 2011), although designing experimental studies for screening follows the same principles

    Synthetic Data for Small Area Estimation

    No full text

    Generating Synthetic Data to Produce Public-Use Microdata for Small Geographic Areas Based on Complex Sample Survey Data with Application to the National Health Interview Survey

    No full text
    <p>Small area statistics obtained from sample survey data provide a critical source of information used to study health, economic, and sociological trends. However, most large-scale sample surveys are not designed for the purpose of producing small area statistics. Moreover, data disseminators are prevented from releasing public-use microdata for small geographic areas for disclosure reasons; thus, limiting the utility of the data they collect. This research evaluates a synthetic data method, intended for data disseminators, for releasing public-use microdata for small geographic areas based on complex sample survey data. The method replaces all observed survey values with synthetic (or imputed) values generated from a hierarchical Bayesian model that explicitly accounts for complex sample design features, including stratification, clustering, and sampling weights. The method is applied to restricted microdata from the National Health Interview Survey and synthetic data are generated for both sampled and non-sampled small areas. The analytic validity of the resulting small area inferences is assessed by direct comparison with the actual data, a simulation study, and a cross-validation study.</p

    Sequential imputation for models with latent variables assuming latent ignorability

    No full text
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/150588/1/anzs12264-sup-0001-Supinfo.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/150588/2/anzs12264_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/150588/3/anzs12264.pd

    Privacy-preserving datamining on vertically partitioned databases

    No full text
    Abstract. In a recent paper Dinur and Nissim considered a statistical database in which a trusted database administrator monitors queries and introduces noise to the responses with the goal of maintaining data privacy [5]. Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noise is required to avoid a breach, rendering the database almost useless. As databases grow increasingly large, the possibility of being able to query only a sub-linear number of times becomes realistic. We further investigate this situation, generalizing the previous work in two important directions: multi-attribute databases (previous work dealt only with single-attribute databases) and vertically partitioned databases, in which different subsets of attributes are stored in different databases. In addition, we show how to use our techniques for datamining on published noisy statistics
    corecore