367 research outputs found

    Record-Linkage from a Technical Point of View

    Get PDF
    TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

    Avoiding Problems of Traditional Sampling Strategies for Household Surveys in Germany: Some New Suggestions

    Get PDF
    All of the sampling plans currently in use for general population surveys in Germany suffer from methodological and practical problems. A new sampling plan is thus urgently needed: one with a low cost overhead that can be prepared in a very short time. Germany also lacks a sampling plan covering all institutional populations, immigrants in general, and illegal immigrants in particular. The availability of new databases covering these populations suggests ways of developing, implementing, and testing new sampling plans for population surveys in Germany. One such sampling plan (G-Plan) is proposed here for the first time. The implementation problems of this design must be studied in a number of empirical pretests.

    Multiple imputation for unit-nonresponse versus weighting including a comparison with a nonresponse follow-up study

    Get PDF
    The results of a national fear of crime survey are compared with results following the use of different nonresponse correction procedures. We compared naive estimates, weighted estimates, estimates after a thorough nonresponse follow-up and estimates after multiple imputation. A strong similarity between the MI and the follow-up-estimates was found. This suggests, that if the assumptions of MAR hold, carefully selected and collected additional data applied in a MI could yield similar estimates to a nonresponse follow-up at a much lower price and respondent burden. --Multiple Imputation,Unit-nonresponse,missing data,complex surveys.

    Record-linkage from a technical point of view

    Full text link
    "Record linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contain errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed." [author's abstract

    Biological variables in social surveys

    Full text link
    "Social scientists have long virtually ignored the biological constraints of human behavior. Yet if the prediction of behavior is considered essential to a social science, neglecting any variable that might influence human behavior is unacceptable. This paper provides examples of important biological variables and describes their measurement in social surveys." (author's abstract

    Biological Variables in Social Surveys

    Get PDF
    Social scientists have long virtually ignored the biological constraints of human behavior. Yet if the prediction of behavior is considered essential to a social science, neglecting any variable that might influence human behavior is unacceptable. This paper provides examples of important biological variables and describes their measurement in social surveys.

    The effect of the refusal avoidance training experiment on final disposition codes in the German ESS-2

    Full text link
    "The implementation of a Refusal Avoidance Training (RAT) within wave 2 of the German part of the European Social Survey (ESS) successful reduced the amount of reported refusal by nearly 7%. The effect of the reduction was compensated by a nearly equal increase in the proportion of non-contacted designated respondents. This effect may be due to non-random allocation of trained interviewers. Further randomized experiments are neccessary to separate the effects of RAT on response rates." (author's abstract

    Multiple imputation for unit-nonresponse versus weighting including a comparison with a nonresponse follow-up study

    Full text link
    "The results of a national fear of crime survey are compared with results following the use of different nonresponse correction procedures. We compared naive estimates, weighted estimates, estimates after a thorough nonresponse follow-up and estimates after multiple imputation. A strong similarity between the MI and the follow-up-estimates was found. This suggests, that if the assumptions of MAR hold, carefully selected and collected additional data applied in a MI could yield similar estimates to a nonresponse follow-up at a much lower price and respondent burden." (author's abstract

    Big Data is not the New Oil: Common Misconceptions about Population Data

    Full text link
    Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably many of these misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many of these misconceptions are also not well documented in scientific publications. We conclude with a set of recommendations for using population data
    corecore