18,244 research outputs found
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
Hoodsquare: Modeling and Recommending Neighborhoods in Location-based Social Networks
Information garnered from activity on location-based social networks can be
harnessed to characterize urban spaces and organize them into neighborhoods. In
this work, we adopt a data-driven approach to the identification and modeling
of urban neighborhoods using location-based social networks. We represent
geographic points in the city using spatio-temporal information about
Foursquare user check-ins and semantic information about places, with the goal
of developing features to input into a novel neighborhood detection algorithm.
The algorithm first employs a similarity metric that assesses the homogeneity
of a geographic area, and then with a simple mechanism of geographic
navigation, it detects the boundaries of a city's neighborhoods. The models and
algorithms devised are subsequently integrated into a publicly available,
map-based tool named Hoodsquare that allows users to explore activities and
neighborhoods in cities around the world.
Finally, we evaluate Hoodsquare in the context of a recommendation
application where user profiles are matched to urban neighborhoods. By
comparing with a number of baselines, we demonstrate how Hoodsquare can be used
to accurately predict the home neighborhood of Twitter users. We also show that
we are able to suggest neighborhoods geographically constrained in size, a
desirable property in mobile recommendation scenarios for which geographical
precision is key.Comment: ASE/IEEE SocialCom 201
Using Social Media to Promote STEM Education: Matching College Students with Role Models
STEM (Science, Technology, Engineering, and Mathematics) fields have become
increasingly central to U.S. economic competitiveness and growth. The shortage
in the STEM workforce has brought promoting STEM education upfront. The rapid
growth of social media usage provides a unique opportunity to predict users'
real-life identities and interests from online texts and photos. In this paper,
we propose an innovative approach by leveraging social media to promote STEM
education: matching Twitter college student users with diverse LinkedIn STEM
professionals using a ranking algorithm based on the similarities of their
demographics and interests. We share the belief that increasing STEM presence
in the form of introducing career role models who share similar interests and
demographics will inspire students to develop interests in STEM related fields
and emulate their models. Our evaluation on 2,000 real college students
demonstrated the accuracy of our ranking algorithm. We also design a novel
implementation that recommends matched role models to the students.Comment: 16 pages, 8 figures, accepted by ECML/PKDD 2016, Industrial Trac
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
- …