26 research outputs found

    Translating data between MySQL and Stata

    Get PDF
    As web-based and other electronic data collection methods become more widely used in research, the opportunities to use statistical software in conjunction with conventional database systems are increasing. Among such systems, MySQL is particularly well suited for research purposes. For example, MySQL's ENUM and SET column types are ideal for storing data collected via the multiple choice questions typically used in social surveys. At the same time, Stata is uniquely suited for working in conjunction with a database; for example, its implementation of characteristics makes it possible to preserve (in a usable form) important information about how the database and front-end application are constructed (e.g., column types and other attributes). In this presentation, we shall describe a Python script we have developed for translating data from MySQL to Stata, and will indicate briefly how we are using it in the development of tools for the collection and management of research data.

    Using Stata for questionnaire development

    Get PDF
    In studies which collect survey data, the investigator(s) often construct the questionnaire using a word processor and then deliver it to a survey organization which translates it into an electronic data collection instrument (e.g., CAPI or CATI). Unfortunately, this approach suffers from the following problems: (1) a word processor is not well-suited to the development of a complex questionnaire, (2) time is wasted and errors may occur when translating the questionnaire into CAPI, and (3) background information about the individual questions which is often relevant for analysis of the data (e.g., question source and rationale, scoring instructions, etc.) is not preserved in the final data file. We shall describe a system we are developing which permits an investigator to construct a questionnaire in Stata by representing questions as variables and using labels and characteristics to specify attributes such as question text, response categories, and background information together with specifications regarding the structure of the interview (e.g., skip patterns and loops). The resulting .dta file is then automatically translated into a variety of useful forms, including a human-readable version of the questionnaire and a format that may be imported directly into CAPI. The file also serves as a shell into which the actual data may be placed so that researchers analyzing the data have easy access to question attributes.

    Network Characteristics of People Who Inject Drugs Within a New HIV Epidemic Following Austerity in Athens, Greece

    No full text
    Background: Greece experienced an unprecedented increase in HIV cases among drug injectors in 2011 after economic crisis. Network-level factors are increasingly understood to drive HIV transmission in emerging epidemics. Methods: We examined the relationship between networks, risk behaviors, and HIV serostatus among 1404 people who inject drugs in Athens, Greece. We generated networks using the chain-referral structure within a large HIV screening program. Network proportions, the proportion of a respondentā€™s network with a given characteristic, were calculated. Multiple logistic regression models were used to assess the relationship between network proportions and individual HIV seroprevalence, injection frequency and unprotected sex. Results: Of note, 1030 networks were generated. Respondent HIV seroprevalence was associated with greater proportions of network members who were HIV infected (ie, those with >= 50% of network members HIV positive vs. those with no network members HIV positive) (AOR: 3.11; 95% CI: 2.10 to 4.62), divided drugs (AOR: 1.60; 95% CI: 1.10 to 2.35), or injected frequently (AOR: 1.50; 95% CI: 1.02 to 2.21). Homelessness was the only sociodemographic characteristic associated with a risk outcome measure-highfrequency injecting (AOR: 1.41; 95% CI: 1.03 to 1.93). These associations were weaker for more distal second-and third-degree networks and not present when examined within random networks. Conclusions: Networks are an independently important contributor to the HIV outbreak in Athens, Greece. Network associations were strongest for the immediate network, with residual associations for distal networks. Homelessness was associated with high-frequency injecting. Prevention programs should consider including network-level interventions to prevent future emerging epidemics

    Self-reported insertive only and receptive only sex behavior classified as versatile according to the network model.

    No full text
    <p>Versatile sex means engaging in both insertive and receptive anal sex. Respondents self-reporting insertive sex only were more likely to be classified as versatile by the network model. (Ļ„) serves as a metric comparing proportions of insertive and receptive model predictions.</p

    Sample recruitment schema of study respondents (nā€Š=ā€Š241), Southern India 2010.

    No full text
    <p>Non-respondents were eligible participants who did not present for informed consent at a nearby field office following field recruitment. Name interpreters are a series of questions asked about contact list members of respondents. In this case respondents identified contact list members as MSM or not MSM.</p

    Percentage of MSM sex behavior agreement with self-report in social compared to sex networks.

    No full text
    <p>More agreement between self-report and model predictions is evident in the social network (upper surface) than the sex network (lower surface); however agreement between self-report and model predictions increases across thresholds as closeness (Īŗ) between sex network members increases. (Ļ„) serves as a metric comparing proportions of model predictions for insertive and receptive sex positions. (Īŗ) serves as a measure of closeness indicated by score from 1.0 (least close) to 3.0 (closest).</p

    Network redundancy curve of study respondents used to determine adequate sample size for network model (nā€Š=ā€Š241).

    No full text
    <p>Curve fit from data on index of respondents and week of respondent interviews versus network size to exponential model. The data were fit to a scaled/shifted exponential cumulative distribution function f(x)ā€Š=ā€Š99.2ā€“95.9eāˆ§(āˆ’4.9x) where x represents the index of the respondent and f(x) represents network size. Data approach horizontal asymptote at approximately 240 respondents.</p
    corecore