research

Dealing with the cryptic survey: Processing labels and value labels with Mata

Abstract

Survey data comes often as a plain table containing cryptic variable names, numbers, and letters. To make sense of the data, the researcher is given a questionnaire or a code book that contains a list of variable names, their description, and an interpretation of the values (either a number or a string) that each variable can take. Code books are commonly provided as plain text or in PDF format. Hence, the researcher is left “free” to type labels and value labels one by one. This often leads to bad research habits, such as “cutting” and “processing” the piece of survey the researcher needs in the short-run and leaving the rest for future processing. Obviously, this is boring, time consuming, and eventually leads to the creation of various versions of the same survey, an inability to track important changes, and an incapacity to reproduce research results—because the researcher cannot recreate the analyzed dataset step by step from the original source. In this talk, I will discuss how to recover the information that is contained in questionnaires or code books and how to process this information in a clean, fast, and efficient way with Mata.

    Similar works