26 research outputs found
Translating data between MySQL and Stata
As web-based and other electronic data collection methods become more widely used in research, the opportunities to use statistical software in conjunction with conventional database systems are increasing. Among such systems, MySQL is particularly well suited for research purposes. For example, MySQL's ENUM and SET column types are ideal for storing data collected via the multiple choice questions typically used in social surveys. At the same time, Stata is uniquely suited for working in conjunction with a database; for example, its implementation of characteristics makes it possible to preserve (in a usable form) important information about how the database and front-end application are constructed (e.g., column types and other attributes). In this presentation, we shall describe a Python script we have developed for translating data from MySQL to Stata, and will indicate briefly how we are using it in the development of tools for the collection and management of research data.
Using Stata for questionnaire development
In studies which collect survey data, the investigator(s) often construct the questionnaire using a word processor and then deliver it to a survey organization which translates it into an electronic data collection instrument (e.g., CAPI or CATI). Unfortunately, this approach suffers from the following problems: (1) a word processor is not well-suited to the development of a complex questionnaire, (2) time is wasted and errors may occur when translating the questionnaire into CAPI, and (3) background information about the individual questions which is often relevant for analysis of the data (e.g., question source and rationale, scoring instructions, etc.) is not preserved in the final data file. We shall describe a system we are developing which permits an investigator to construct a questionnaire in Stata by representing questions as variables and using labels and characteristics to specify attributes such as question text, response categories, and background information together with specifications regarding the structure of the interview (e.g., skip patterns and loops). The resulting .dta file is then automatically translated into a variety of useful forms, including a human-readable version of the questionnaire and a format that may be imported directly into CAPI. The file also serves as a shell into which the actual data may be placed so that researchers analyzing the data have easy access to question attributes.
Recommended from our members
A Digital Network Approach to Infer Sex Behavior in Emerging HIV Epidemics
Purpose: Improve the ability to infer sex behaviors more accurately using network data.Methods: A hybrid network analytic approach was utilized to integrate: (1) the plurality of reports from others tied to individual(s) of interest; and (2) structural features of the network generated from those ties. Network data was generated from digitally extracted cell-phone contact lists of a purposeful sample of 241 high-risk men in India. These data were integrated with interview responses to describe the corresponding individuals in the contact lists and the ties between them. HIV serostatus was collected for each respondent and served as an internal validation of the modelās predictions of sex behavior.Results: We found that network-based model predictions of sex behavior and self-reported sex behavior had limited correlation (54% agreement). Additionally, when respondent sex behaviors were re-classified to network model predictions from self-reported data, there was a 30.7% decrease in HIV seroprevalence among groups of men with lower risk behavior, which is consistent with HIV transmission biology.Conclusion: Combining the relative completeness and objectivity of digital network data with the substantive details of classical interview and HIV biomarker data permitted new analyses and insights into the accuracy of self-reported sex behavior.</p
Network Characteristics of People Who Inject Drugs Within a New HIV Epidemic Following Austerity in Athens, Greece
Background: Greece experienced an unprecedented increase in HIV cases
among drug injectors in 2011 after economic crisis. Network-level
factors are increasingly understood to drive HIV transmission in
emerging epidemics.
Methods: We examined the relationship between networks, risk behaviors,
and HIV serostatus among 1404 people who inject drugs in Athens, Greece.
We generated networks using the chain-referral structure within a large
HIV screening program. Network proportions, the proportion of a
respondentās network with a given characteristic, were calculated.
Multiple logistic regression models were used to assess the relationship
between network proportions and individual HIV seroprevalence, injection
frequency and unprotected sex.
Results: Of note, 1030 networks were generated. Respondent HIV
seroprevalence was associated with greater proportions of network
members who were HIV infected (ie, those with >= 50% of network members
HIV positive vs. those with no network members HIV positive) (AOR: 3.11;
95% CI: 2.10 to 4.62), divided drugs (AOR: 1.60; 95% CI: 1.10 to
2.35), or injected frequently (AOR: 1.50; 95% CI: 1.02 to 2.21).
Homelessness was the only sociodemographic characteristic associated
with a risk outcome measure-highfrequency injecting (AOR: 1.41; 95% CI:
1.03 to 1.93). These associations were weaker for more distal second-and
third-degree networks and not present when examined within random
networks.
Conclusions: Networks are an independently important contributor to the
HIV outbreak in Athens, Greece. Network associations were strongest for
the immediate network, with residual associations for distal networks.
Homelessness was associated with high-frequency injecting. Prevention
programs should consider including network-level interventions to
prevent future emerging epidemics
Self-reported insertive only and receptive only sex behavior classified as versatile according to the network model.
<p>Versatile sex means engaging in both insertive and receptive anal sex. Respondents self-reporting insertive sex only were more likely to be classified as versatile by the network model. (Ļ) serves as a metric comparing proportions of insertive and receptive model predictions.</p
Sample recruitment schema of study respondents (nā=ā241), Southern India 2010.
<p>Non-respondents were eligible participants who did not present for informed consent at a nearby field office following field recruitment. Name interpreters are a series of questions asked about contact list members of respondents. In this case respondents identified contact list members as MSM or not MSM.</p
Percentage of MSM sex behavior agreement with self-report in social compared to sex networks.
<p>More agreement between self-report and model predictions is evident in the social network (upper surface) than the sex network (lower surface); however agreement between self-report and model predictions increases across thresholds as closeness (Īŗ) between sex network members increases. (Ļ) serves as a metric comparing proportions of model predictions for insertive and receptive sex positions. (Īŗ) serves as a measure of closeness indicated by score from 1.0 (least close) to 3.0 (closest).</p
Network redundancy curve of study respondents used to determine adequate sample size for network model (nā=ā241).
<p>Curve fit from data on index of respondents and week of respondent interviews versus network size to exponential model. The data were fit to a scaled/shifted exponential cumulative distribution function f(x)ā=ā99.2ā95.9eā§(ā4.9x) where x represents the index of the respondent and f(x) represents network size. Data approach horizontal asymptote at approximately 240 respondents.</p