11 research outputs found

    Discriminating Gender on Twitter

    No full text
    Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

    Supplementing Obesity-Related Surveillance with Persistent Health Assessment Tools

    Get PDF
    We developed Persistent Health Assessment Tools, PHAT, to equip public health policy makers with more precise tools and timely information for measuring the success of obesity prevention programs. PHAT monitors social media to supplement traditional surveillance by making real-time estimates based on observations of obesity-relevant behaviors. Specifically, we developed models for predicting obesity rates from sets of tweets and developed a dashboard to provide interactive navigation and time slicing

    MiTAP for SARS detection

    No full text
    The MiTAP prototype for SARS detection uses human language technology for detect-ing, monitoring, and analyzing potential indi-cators of infectious disease outbreaks and reasoning for issuing warnings and alerts. Mi-TAP focuses on providing timely, multi-lingual information access to analysts, domain experts, and decision-makers worldwide. Data sources are captured, filtered, translated, summarized, and categorized by content. Critical information is automatically extracted and tagged to facilitate browsing, searching, and scanning, and to provide key terms at a glance. The processed articles are made avail-able through an easy-to-use news server and cross-language information retrieval system for access and analysis anywhere, any time. Specialized newsgroups and customizable fil-ters or searches on incoming stories allow us-ers to create their own view into the data while a variety of tools summarize, indicate trends, and provide alerts to potentially rele-vant spikes of activity