Search CORE

26 research outputs found

Translating data between MySQL and Stata

Author: Michael Johnson
Phil Schumm
Publication venue
Publication date
Field of study

As web-based and other electronic data collection methods become more widely used in research, the opportunities to use statistical software in conjunction with conventional database systems are increasing. Among such systems, MySQL is particularly well suited for research purposes. For example, MySQL's ENUM and SET column types are ideal for storing data collected via the multiple choice questions typically used in social surveys. At the same time, Stata is uniquely suited for working in conjunction with a database; for example, its implementation of characteristics makes it possible to preserve (in a usable form) important information about how the database and front-end application are constructed (e.g., column types and other attributes). In this presentation, we shall describe a Python script we have developed for translating data from MySQL to Stata, and will indicate briefly how we are using it in the development of tools for the collection and management of research data.

Research Papers in Economics

Using Stata for questionnaire development

Author: Phil Schumm
Theodore Pollari
Publication venue
Publication date
Field of study

In studies which collect survey data, the investigator(s) often construct the questionnaire using a word processor and then deliver it to a survey organization which translates it into an electronic data collection instrument (e.g., CAPI or CATI). Unfortunately, this approach suffers from the following problems: (1) a word processor is not well-suited to the development of a complex questionnaire, (2) time is wasted and errors may occur when translating the questionnaire into CAPI, and (3) background information about the individual questions which is often relevant for analysis of the data (e.g., question source and rationale, scoring instructions, etc.) is not preserved in the final data file. We shall describe a system we are developing which permits an investigator to construct a questionnaire in Stata by representing questions as variables and using labels and characteristics to specify attributes such as question text, response categories, and background information together with specifications regarding the structure of the interview (e.g., skip patterns and loops). The resulting .dta file is then automatically translated into a variety of useful forms, including a human-readable version of the questionnaire and a format that may be imported directly into CAPI. The file also serves as a shell into which the actual data may be placed so that researchers analyzing the data have easy access to question attributes.

Research Papers in Economics

Recommended from our members

A Digital Network Approach to Infer Sex Behavior in Emerging HIV Epidemics

Author: Abhinav Kapur
Heard Daniel
Laumann Edward O.
Mukherjee Sayan
Oruganti Ganesh
Schneider John A.
Schumm Phil
Publication venue
Publication date: 04/10/2023
Field of study

Purpose: Improve the ability to infer sex behaviors more accurately using network data.Methods: A hybrid network analytic approach was utilized to integrate: (1) the plurality of reports from others tied to individual(s) of interest; and (2) structural features of the network generated from those ties. Network data was generated from digitally extracted cell-phone contact lists of a purposeful sample of 241 high-risk men in India. These data were integrated with interview responses to describe the corresponding individuals in the contact lists and the ties between them. HIV serostatus was collected for each respondent and served as an internal validation of the model’s predictions of sex behavior.Results: We found that network-based model predictions of sex behavior and self-reported sex behavior had limited correlation (54% agreement). Additionally, when respondent sex behaviors were re-classified to network model predictions from self-reported data, there was a 30.7% decrease in HIV seroprevalence among groups of men with lower risk behavior, which is consistent with HIV transmission biology.Conclusion: Combining the relative completeness and objectivity of digital network data with the substantive details of classical interview and HIV biomarker data permitted new analyses and insights into the accuracy of self-reported sex behavior.</p

Knowledge UChicago

Quality of Life and Performance in Advanced Head and Neck Cancer Patients on Concomitant Chemoradiotherapy: A Prospective Examination

Author: Amy Siston
Daniel Haraf
Everett E. Vokes
Kerstin Stenson
Marcy A. List
Merrill Kies
Phil Schumm
Publication venue: 'American Society of Clinical Oncology (ASCO)'
Publication date
Field of study

Crossref

Network Characteristics of People Who Inject Drugs Within a New HIV Epidemic Following Austerity in Athens, Greece

Author: Tsang Michelle A. Schneider, John A. Sypsa, Vana Schumm, Phil Nikolopoulos, Georgios K. Paraskevis, Dimitrios and Friedman, Samuel R. Malliori, Meni Hatzakis, Angelos
Publication venue
Publication date: 01/01/2015
Field of study

Background: Greece experienced an unprecedented increase in HIV cases among drug injectors in 2011 after economic crisis. Network-level factors are increasingly understood to drive HIV transmission in emerging epidemics. Methods: We examined the relationship between networks, risk behaviors, and HIV serostatus among 1404 people who inject drugs in Athens, Greece. We generated networks using the chain-referral structure within a large HIV screening program. Network proportions, the proportion of a respondent’s network with a given characteristic, were calculated. Multiple logistic regression models were used to assess the relationship between network proportions and individual HIV seroprevalence, injection frequency and unprotected sex. Results: Of note, 1030 networks were generated. Respondent HIV seroprevalence was associated with greater proportions of network members who were HIV infected (ie, those with >= 50% of network members HIV positive vs. those with no network members HIV positive) (AOR: 3.11; 95% CI: 2.10 to 4.62), divided drugs (AOR: 1.60; 95% CI: 1.10 to 2.35), or injected frequently (AOR: 1.50; 95% CI: 1.02 to 2.21). Homelessness was the only sociodemographic characteristic associated with a risk outcome measure-highfrequency injecting (AOR: 1.41; 95% CI: 1.03 to 1.93). These associations were weaker for more distal second-and third-degree networks and not present when examined within random networks. Conclusions: Networks are an independently important contributor to the HIV outbreak in Athens, Greece. Network associations were strongest for the immediate network, with residual associations for distal networks. Homelessness was associated with high-frequency injecting. Prevention programs should consider including network-level interventions to prevent future emerging epidemics

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Self-reported insertive only and receptive only sex behavior classified as versatile according to the network model.

Author: Abhinav Kapur (440762)
Daniel Heard (598348)
Edward O. Laumann (2225809)
Ganesh Oruganti (2225812)
John A. Schneider (244659)
Phil Schumm (2225815)
Sayan Mukherjee (43870)
Publication venue
Publication date
Field of study

<p>Versatile sex means engaging in both insertive and receptive anal sex. Respondents self-reporting insertive sex only were more likely to be classified as versatile by the network model. (τ) serves as a metric comparing proportions of insertive and receptive model predictions.</p

FigShare

Sample recruitment schema of study respondents (n = 241), Southern India 2010.

Author: Abhinav Kapur (440762)
Daniel Heard (598348)
Edward O. Laumann (2225809)
Ganesh Oruganti (2225812)
John A. Schneider (244659)
Phil Schumm (2225815)
Sayan Mukherjee (43870)
Publication venue
Publication date
Field of study

<p>Non-respondents were eligible participants who did not present for informed consent at a nearby field office following field recruitment. Name interpreters are a series of questions asked about contact list members of respondents. In this case respondents identified contact list members as MSM or not MSM.</p

FigShare

35 Gene-Environment Interactions in Crohn's Disease: Identification of a Novel SNP That Interacts Strongly With Smoking to Shorten Time to First Resection

Author: Dan L. Nicolae
Graham L. Radford-Smith
Hae K. Im
Judy H. Cho
Lisa Simms
Maria Ikonomopoulou
Ning Huang
Phil Schumm
Yashoda Sharma
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Percentage of MSM sex behavior agreement with self-report in social compared to sex networks.

Author: Abhinav Kapur (440762)
Daniel Heard (598348)
Edward O. Laumann (2225809)
Ganesh Oruganti (2225812)
John A. Schneider (244659)
Phil Schumm (2225815)
Sayan Mukherjee (43870)
Publication venue
Publication date
Field of study

<p>More agreement between self-report and model predictions is evident in the social network (upper surface) than the sex network (lower surface); however agreement between self-report and model predictions increases across thresholds as closeness (κ) between sex network members increases. (τ) serves as a metric comparing proportions of model predictions for insertive and receptive sex positions. (κ) serves as a measure of closeness indicated by score from 1.0 (least close) to 3.0 (closest).</p

FigShare

Network redundancy curve of study respondents used to determine adequate sample size for network model (n = 241).

Author: Abhinav Kapur (440762)
Daniel Heard (598348)
Edward O. Laumann (2225809)
Ganesh Oruganti (2225812)
John A. Schneider (244659)
Phil Schumm (2225815)
Sayan Mukherjee (43870)
Publication venue
Publication date
Field of study

<p>Curve fit from data on index of respondents and week of respondent interviews versus network size to exponential model. The data were fit to a scaled/shifted exponential cumulative distribution function f(x) = 99.2–95.9e∧(−4.9x) where x represents the index of the respondent and f(x) represents network size. Data approach horizontal asymptote at approximately 240 respondents.</p

FigShare