16 research outputs found

    Humans vs. Machines: Comparing Coding of Interviewer Question-Asking Behaviors Using Recurrent Neural Networks to Human Coders

    Get PDF
    Standardized survey interviewing techniques are intended to reduce interviewers’ effects on survey data. A common method to assess whether or not interviewers read survey questions exactly as worded is behavior coding. However, manually behavior coding an entire survey is expensive and time-consuming. Machine learning techniques such as Recurrent Neural Networks (RNNs) may offer a way to partially automate this process, saving time and money. RNNs learn to categorize sequential data (e.g., conversational speech) based on patterns learned from previously categorized examples. Yet the feasibility of an automated RNN-based behavior coding approach and how accurately this approach codes behaviors compared to human behavior coders are unknown. In this poster, we compare coding of interviewer question-asking behaviors by human undergraduate coders to the coding of transcripts performed by RNNs. Humans transcribe and manually behavior code each interview in the Work and Leisure Today II telephone survey (AAPOR RR3=7.8%) at the conversational turn level (n=47,900 question-asking turns) to identify when interviewers asked questions (1) exactly as worded, (2) with minor changes (i.e., changes not affecting question meaning), or (3) with major changes (i.e., changes affecting question meaning). With a random subset of interview transcripts as learning examples, we train RNNs to classify interviewer question-asking behaviors into these same categories. A random 10% subsample of transcripts (n=94) were also coded by master coders to evaluate inter-coder reliability. We compare the reliability of coding (versus the master coders) by the human coders with the reliability of the coding (versus the master coders) by the RNNs. Preliminary results indicate that the human coders and the RNNs have equal reliability (p\u3e.05) for questions with a large proportion of major and minor changes. We conclude with implications for behavior coding telephone interview surveys using machine learning in general, and RNNs in particular

    Ranking: Perceptions of Tied Ranks and Equal Intervals on a Modified Visual Analog Scale

    Get PDF
    This study examined a novel paper-based ranking system (called the BINS format) that was designed to address two limitations of traditional ranking formats. This new system allows respondents to: 1) assign ties to ranked alternatives and 2) indicate distance between ranked alternatives. Participants reported high satisfaction with the ability to express ties using the BINS format, and preferred to use a ranking format that allowed for ties over a format that did not. Two versions of the BINS format (a numbered continuum and an unnumbered continuum) were compared to examine participants’ perception of the distance between ranked alternatives. When a numbered continuum was used, participants saw the relationship between ranked alternatives as both multiplicative and divisible; conversely, participants using the unnumbered continuum did not see either relationship. This lends support to the notion that participants perceived the numbered BINS format as having equal psychological intervals

    Why Do Cell Phone Interviews Last Longer? A Behavior Coding Perspective

    Get PDF
    Why do telephone interviews last longer on cell phones than landline phones? Common explanations for this phenomenon include differential selection into subsets of questions, activities outside the question-answer sequence (such as collecting contact information for cell-minute reimbursement), respondent characteristics, behaviors indicating disruption to respondents’ perception and comprehension, and behaviors indicating interviewer reactions to disruption. We find that the time difference persists even when we focus only on the question-answer portion of the interview and only on shared questions (i.e., eliminating the first two explanations above). To learn why the difference persists, we use behavior codes from the U.S./Japan Newspaper Opinion Poll, a dual-frame telephone survey of US adults, to examine indicators of satisficing, line-quality issues, and distraction. Overall, we find that respondents on cell phones are more disrupted, and that the difference in interview duration occurs because cell phone respondents take longer to provide acceptable answers. Interviewers also slow their speed of speech when asking questions. A slower speaking rate from both actors results in a longer and more expensive interview when respondents use cell phones. Includes Supplementary Data

    How Do Interviewers and Respondents Navigate Sexual Identity Questions in a CATI Survey?

    Get PDF
    Survey-based research has demonstrated that sexual minority individuals experience unique outcomes in areas such as physical and mental health (Boehmer et al. 2007; Hatzenbuehler 2014, 2017), crime (Herek 2009), public education (Kosciw et al. 2015), same-sex romantic relationships and family (Powell and Downey 1997; Umberson et al. 2015), and economics (Black et al. 2007). Having a reliable and valid measure of sexual identity (i.e., the way in which an individual self-describes their sexual orientation) (Gagnon and Simon 1973) is essential for conducting research on sexual minorities. Indeed, many national surveys such as the General Social Survey, the National Health Interview Survey, and the National Survey of Family Growth ask survey respondents about their sexual identity. The percentage of US adults identifying as a sexual minority has increased from 3.5% (8.3 million) in December 2012 to 4.1% (10.05 million) in December 2016 (Gates 2017). As the prevalence of sexual minorities is still low, inaccurate answers or item nonresponse to sexual identity questions (SIQs) may result in large distortions of estimates of sexual minorities. In a meta-analysis, Ridolfo et al. (2012) demonstrate that item nonresponse rates for SIQs range from 1.6% to 4.3%, with an average of approximately 2%. For context, this places item nonresponse for SIQs higher than education questions (1.1%), but below income questions (11.2%) (Conron et al. 2008). The threat of item nonresponse due to concerns over question sensitivity has led many researchers to advocate against the use of interviewers to administer SIQs (SMART 2009; Ridolfo et al. 2012). Instead, they recommend that survey researchers ask SIQs using only self-administered modes of data collection (i.e., using mail surveys, web surveys, or Computer-Assisted Self-Interview [CASI] devices for face-to-face surveys) (SMART 2009; Ridolfo et al. 2012). Yet asking SIQs exclusively in a self-administered context may not be a feasible approach. First, self-administered modes are not without drawbacks of their own: mail surveys are time consuming, face-to-face surveys using CASI are time consuming and expensive, and web surveys do not have a sampling frame with adequate coverage of the US population (Dillman et al. 2014). In contrast, researchers using telephone surveys can collect nationally representative data quickly. Second, while SIQs provide important demographic information, they are often not the primary focus of a survey. Many private survey companies (e.g., Pew, Gallup, Abt Associates) and government surveys (e.g., the Behavioral Risk Factor Surveillance System) rely on telephone surveys to achieve a variety of cost, quality, and timeliness objectives; these organizations are not likely to switch modes to improve data quality for a single demographic question like sexual identity. Thus, understanding the implications of administering SIQs in all modes, including telephone, is of continued interest

    Your Best Estimate is Fine. Or is It?

    Get PDF
    Providing an exact answer to open-ended numeric questions can be a burdensome task for respondents. Researchers often assume that adding an invitation to estimate (e.g., “Your best estimate is fine”) to these questions reduces cognitive burden, and in turn, reduces rates of undesirable response behaviors like item nonresponse, nonsubstantive answers, and answers that must be processed into a final response (e.g., qualified answers like “about 12” and ranges). Yet there is little research investigating this claim. Additionally, explicitly inviting estimation may lead respondents to round their answers, which may affect survey estimates. In this study, we investigate the effect of adding an invitation to estimate to 22 open-ended numeric questions in a mail survey and three questions in a separate telephone survey. Generally, we find that explicitly inviting estimation does not significantly change rates of item nonresponse, rounding, or qualified/range answers in either mode, though it does slightly reduce nonsubstantive answers for mail respondents. In the telephone survey, an invitation to estimate results in fewer conversational turns and shorter response times. Our results indicate that an invitation to estimate may simplify the interaction between interviewers and respondents in telephone surveys, and neither hurts nor helps data quality in mail surveys

    ASPECT: A Survey to Assess Student Perspective of Engagement in an Active-Learning Classroom

    Get PDF
    This paper describes the development and validation of a survey to measure students? self-reported engagement during a wide variety of in-class active-learning exercises. The survey provides researchers and instructors alike with a tool to rapidly evaluate different active-learning strategies from the perspective of the learner

    Humans vs. Machines: Comparing Coding of Interviewer Question-Asking Behaviors Using Recurrent Neural Networks to Human Coders

    Get PDF
    Standardized survey interviewing techniques are intended to reduce interviewers’ effects on survey data. A common method to assess whether or not interviewers read survey questions exactly as worded is behavior coding. However, manually behavior coding an entire survey is expensive and time-consuming. Machine learning techniques such as Recurrent Neural Networks (RNNs) may offer a way to partially automate this process, saving time and money. RNNs learn to categorize sequential data (e.g., conversational speech) based on patterns learned from previously categorized examples. Yet the feasibility of an automated RNN-based behavior coding approach and how accurately this approach codes behaviors compared to human behavior coders are unknown. In this poster, we compare coding of interviewer question-asking behaviors by human undergraduate coders to the coding of transcripts performed by RNNs. Humans transcribe and manually behavior code each interview in the Work and Leisure Today II telephone survey (AAPOR RR3=7.8%) at the conversational turn level (n=47,900 question-asking turns) to identify when interviewers asked questions (1) exactly as worded, (2) with minor changes (i.e., changes not affecting question meaning), or (3) with major changes (i.e., changes affecting question meaning). With a random subset of interview transcripts as learning examples, we train RNNs to classify interviewer question-asking behaviors into these same categories. A random 10% subsample of transcripts (n=94) were also coded by master coders to evaluate inter-coder reliability. We compare the reliability of coding (versus the master coders) by the human coders with the reliability of the coding (versus the master coders) by the RNNs. Preliminary results indicate that the human coders and the RNNs have equal reliability (p\u3e.05) for questions with a large proportion of major and minor changes. We conclude with implications for behavior coding telephone interview surveys using machine learning in general, and RNNs in particular

    Using Recurrent Neural Networks to Code Interviewer Question-Asking Behaviors: A Proof of Concept

    No full text
    Survey researchers use behavior coding to identify when interviewers do not read survey questions exactly as worded in the questionnaire, creating a potential source of interviewer variance. Manual behavior coding of an entire survey, however, can be expensive and time-consuming. In this dissertation, I examine whether this process can be partially automated by Recurrent Neural Networks (RNNs; a machine learning technique that can learn to categorize sequential data using patterns learned from previously categorized examples), saving time and money. Humans use transcripts from 33 questions in the Work and Leisure Today II telephone survey (WLT2) to manually behavior code n=25,442 interviewer question-askings. Using these transcripts as learning examples, I train RNNs to classify interviewer question-asking behaviors. I also code these transcripts using two string-based methods: 1) Levenshtein distance (i.e., the number of characters in the question-asking transcript that need to be changed to match the questionnaire text exactly); and 2) exact string matching (i.e., whether the question-asking transcript matches the questionnaire text exactly). A random 10% subsample of transcripts (n=2,650) was also coded by master coders to evaluate inter-coder reliability. I compare the reliability of RNN coding versus the master coders to the reliability of human and string-based coding. I also evaluate generalizability by using these RNNs to code n=4,711 question-asking transcripts from 13 of the same questions in a different telephone survey fielded at a different time, Work and Leisure Today I (WLT1). Results demonstrate that RNN coding of interviewer question-asking behaviors was comparable to human coding for most questions in this dissertation. The coding utility of Levenshtein distance and exact matching were limited relative to the RNNs and human coders. Finally, the RNNs had comparable reliability when coding questions from both WLT1 and WLT2, but only when question-asking behaviors were similar across the two surveys. Overall, this dissertation provides a proof-of-concept for using RNNs to code interviewer question-asking behaviors, but more research is necessary to refine the settings used to train these networks

    Using Recurrent Neural Networks to Code Interviewer Question-Asking Behaviors: A Proof of Concept

    No full text
    Survey researchers use behavior coding to identify when interviewers do not read survey questions exactly as worded in the questionnaire, creating a potential source of interviewer variance. Manual behavior coding of an entire survey, however, can be expensive and time-consuming. In this dissertation, I examine whether this process can be partially automated by Recurrent Neural Networks (RNNs; a machine learning technique that can learn to categorize sequential data using patterns learned from previously categorized examples), saving time and money. Humans use transcripts from 33 questions in the Work and Leisure Today II telephone survey (WLT2) to manually behavior code n=25,442 interviewer question-askings. Using these transcripts as learning examples, I train RNNs to classify interviewer question-asking behaviors. I also code these transcripts using two string-based methods: 1) Levenshtein distance (i.e., the number of characters in the question-asking transcript that need to be changed to match the questionnaire text exactly); and 2) exact string matching (i.e., whether the question-asking transcript matches the questionnaire text exactly). A random 10% subsample of transcripts (n=2,650) was also coded by master coders to evaluate inter-coder reliability. I compare the reliability of RNN coding versus the master coders to the reliability of human and string-based coding. I also evaluate generalizability by using these RNNs to code n=4,711 question-asking transcripts from 13 of the same questions in a different telephone survey fielded at a different time, Work and Leisure Today I (WLT1). Results demonstrate that RNN coding of interviewer question-asking behaviors was comparable to human coding for most questions in this dissertation. The coding utility of Levenshtein distance and exact matching were limited relative to the RNNs and human coders. Finally, the RNNs had comparable reliability when coding questions from both WLT1 and WLT2, but only when question-asking behaviors were similar across the two surveys. Overall, this dissertation provides a proof-of-concept for using RNNs to code interviewer question-asking behaviors, but more research is necessary to refine the settings used to train these networks

    Why Do Cell Phone Interviews Last Longer? A Behavior Coding Perspective

    Get PDF
    Why do telephone interviews last longer on cell phones than landline phones? Common explanations for this phenomenon include differential selection into subsets of questions, activities outside the question-answer sequence (such as collecting contact information for cell-minute reimbursement), respondent characteristics, behaviors indicating disruption to respondents’ perception and comprehension, and behaviors indicating interviewer reactions to disruption. We find that the time difference persists even when we focus only on the question-answer portion of the interview and only on shared questions (i.e., eliminating the first two explanations above). To learn why the difference persists, we use behavior codes from the U.S./Japan Newspaper Opinion Poll, a dual-frame telephone survey of US adults, to examine indicators of satisficing, line-quality issues, and distraction. Overall, we find that respondents on cell phones are more disrupted, and that the difference in interview duration occurs because cell phone respondents take longer to provide acceptable answers. Interviewers also slow their speed of speech when asking questions. A slower speaking rate from both actors results in a longer and more expensive interview when respondents use cell phones. Includes Supplementary Data
    corecore