426 research outputs found
Insights from Machine-Learned Diet Success Prediction
To support people trying to lose weight and stay healthy, more and more
fitness apps have sprung up including the ability to track both calories intake
and expenditure. Users of such apps are part of a wider ``quantified self''
movement and many opt-in to publicly share their logged data. In this paper, we
use public food diaries of more than 4,000 long-term active MyFitnessPal users
to study the characteristics of a (un-)successful diet. Concretely, we train a
machine learning model to predict repeatedly being over or under self-set daily
calories goals and then look at which features contribute to the model's
prediction. Our findings include both expected results, such as the token
``mcdonalds'' or the category ``dessert'' being indicative for being over the
calories goal, but also less obvious ones such as the difference between pork
and poultry concerning dieting success, or the use of the ``quick added
calories'' functionality being indicative of over-shooting calorie-wise. This
study also hints at the feasibility of using such data for more in-depth data
mining, e.g., looking at the interaction between consumed foods such as mixing
protein- and carbohydrate-rich foods. To the best of our knowledge, this is the
first systematic study of public food diaries.Comment: Preprint of an article appearing at the Pacific Symposium on
Biocomputing (PSB) 2016 in the Social Media Mining for Public Health
Monitoring and Surveillance trac
#greysanatomy vs. #yankees: Demographics and Hashtag Use on Twitter
Demographics, in particular, gender, age, and race, are a key predictor of
human behavior. Despite the significant effect that demographics plays, most
scientific studies using online social media do not consider this factor,
mainly due to the lack of such information. In this work, we use
state-of-the-art face analysis software to infer gender, age, and race from
profile images of 350K Twitter users from New York. For the period from
November 1, 2014 to October 31, 2015, we study which hashtags are used by
different demographic groups. Though we find considerable overlap for the most
popular hashtags, there are also many group-specific hashtags.Comment: This is a preprint of an article appearing at ICWSM 201
U.S. Religious Landscape on Twitter
Religiosity is a powerful force shaping human societies, affecting domains as
diverse as economic growth or the ability to cope with illness. As more
religious leaders and organizations as well as believers start using social
networking sites (e.g., Twitter, Facebook), online activities become important
extensions to traditional religious rituals and practices. However, there has
been lack of research on religiosity in online social networks. This paper
takes a step toward the understanding of several important aspects of
religiosity on Twitter, based on the analysis of more than 250k U.S. users who
self-declared their religions/belief, including Atheism, Buddhism,
Christianity, Hinduism, Islam, and Judaism. Specifically, (i) we examine the
correlation of geographic distribution of religious people between Twitter and
offline surveys. (ii) We analyze users' tweets and networks to identify
discriminative features of each religious group, and explore supervised methods
to identify believers of different religions. (iii) We study the linkage
preference of different religious groups, and observe a strong preference of
Twitter users connecting to others sharing the same religion.Comment: 10 page
Automated Hate Speech Detection and the Problem of Offensive Language
A key challenge for automatic hate-speech detection on social media is the
separation of hate speech from other instances of offensive language. Lexical
detection methods tend to have low precision because they classify all messages
containing particular terms as hate speech and previous work using supervised
learning has failed to distinguish between the two categories. We used a
crowd-sourced hate speech lexicon to collect tweets containing hate speech
keywords. We use crowd-sourcing to label a sample of these tweets into three
categories: those containing hate speech, only offensive language, and those
with neither. We train a multi-class classifier to distinguish between these
different categories. Close analysis of the predictions and the errors shows
when we can reliably separate hate speech from other offensive language and
when this differentiation is more difficult. We find that racist and homophobic
tweets are more likely to be classified as hate speech but that sexist tweets
are generally classified as offensive. Tweets without explicit hate keywords
are also more difficult to classify.Comment: To appear in the Proceedings of ICWSM 2017. Please cite that versio
- …
