655,225 research outputs found
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
A Social Citizen Dashboard for Participatory Urban Planning in Berlin: Prototype and Evaluation
Participatory urban planning enables citizens to make their voices heard in the urban planning process. The resulting measures are more likely to be accepted by the community. However, the parti-cipation process becomes more effortful and time-consuming. New approaches have been developed using digital technologies to facilitate citizen participation, such as topic modeling based on social media. Using Twitter data for the city of Berlin, we explore how social media and topic modeling can be used to classify and analyze citizen opinions. We develop a Social Citizen Dashboard allowing for a better understanding of changes in citizensâ priorities and incorporating constant cycles of feedback throughout planning phases. Evaluation interviews indicate the dashboardâs potential usefulness and implications as well as point to limitation in data quality and spur further research potentials
Topic Modeling in the News Document on Sustainable Development Goals
Indonesia is a developing country and supports the program of the Sustainable Development Goals (SDGs) which consist of 17 goals. SDGs is not only the governmentâs duty, but a shared duty from any elements. Online media has a crucial role in implementing goals of Indonesiaâs SDG. Information published in online news related to the SDGs is an important consideration for the government, society, and all elements. Categorizing news manually to find out news topics is very time-consuming and done by the ability of news editors. News presented by online media on the news site can be used as topic modeling, where hidden topics can be found in the news on online media. Topic modeling will classify data based on a particular topic and determine the relationship between text. Latent Dirichlet allocation (LDA) is one of the methods on topic modeling to find out the trend of topics of SDGs news. Based on the result of this research, the implementation of LDA is the right choice for finding topics in a document. The result of topic modeling with k = 17 obtained the highest coherence score of 0.5405 on topic 8. Topic 8 discussed news related to the eighth SDGs goals, namely decent work and economic growth. This categorization was based on words formed after the LDA process. Then, topic 5 discussed the news on the 17th SDGs goals, namely partnerships for the goals. Topic 6 discussed the news of the first SDGs, namely no poverty
How did the discussion go: Discourse act classification in social media conversations
We propose a novel attention based hierarchical LSTM model to classify
discourse act sequences in social media conversations, aimed at mining data
from online discussion using textual meanings beyond sentence level. The very
uniqueness of the task is the complete categorization of possible pragmatic
roles in informal textual discussions, contrary to extraction of
question-answers, stance detection or sarcasm identification which are very
much role specific tasks. Early attempt was made on a Reddit discussion
dataset. We train our model on the same data, and present test results on two
different datasets, one from Reddit and one from Facebook. Our proposed model
outperformed the previous one in terms of domain independence; without using
platform-dependent structural features, our hierarchical LSTM with word
relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively
to predict discourse roles of comments in Reddit and Facebook discussions.
Efficiency of recurrent and convolutional architectures in order to learn
discursive representation on the same task has been presented and analyzed,
with different word and comment embedding schemes. Our attention mechanism
enables us to inquire into relevance ordering of text segments according to
their roles in discourse. We present a human annotator experiment to unveil
important observations about modeling and data annotation. Equipped with our
text-based discourse identification model, we inquire into how heterogeneous
non-textual features like location, time, leaning of information etc. play
their roles in charaterizing online discussions on Facebook
Landslide Detection in Real-Time Social Media Image Streams
Lack of global data inventories obstructs scientific modeling of and response
to landslide hazards which are oftentimes deadly and costly. To remedy this
limitation, new approaches suggest solutions based on citizen science that
requires active participation. However, as a non-traditional data source,
social media has been increasingly used in many disaster response and
management studies in recent years. Inspired by this trend, we propose to
capitalize on social media data to mine landslide-related information
automatically with the help of artificial intelligence (AI) techniques.
Specifically, we develop a state-of-the-art computer vision model to detect
landslides in social media image streams in real time. To that end, we create a
large landslide image dataset labeled by experts and conduct extensive model
training experiments. The experimental results indicate that the proposed model
can be deployed in an online fashion to support global landslide susceptibility
maps and emergency response
Recommended from our members
Semantics-Space-Time Cube. A Conceptual Framework for Systematic Analysis of Texts in Space and Time
We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people's reactions to social and natural events of different spatial and temporal scales
- âŠ