133 research outputs found
From the User to the Medium: Neural Profiling Across Web Communities
Online communities provide a unique way for individuals to access information
from those in similar circumstances, which can be critical for health
conditions that require daily and personalized management. As these groups and
topics often arise organically, identifying the types of topics discussed is
necessary to understand their needs. As well, these communities and people in
them can be quite diverse, and existing community detection methods have not
been extended towards evaluating these heterogeneities. This has been limited
as community detection methodologies have not focused on community detection
based on semantic relations between textual features of the user-generated
content. Thus here we develop an approach, NeuroCom, that optimally finds dense
groups of users as communities in a latent space inferred by neural
representation of published contents of users. By embedding of words and
messages, we show that NeuroCom demonstrates improved clustering and identifies
more nuanced discussion topics in contrast to other common unsupervised
learning approaches
Analyzing fluctuation of topics and public sentiment through social media data
Over the past decade years, Internet users were expending rapidly in the world. They form various online social networks through such Internet platforms as Twitter, Facebook and Instagram. These platforms provide a fast way that helps their users receive and disseminate information and express personal opinions in virtual space. When dealing with massive and chaotic social media data, how to accurately determine what events or concepts users are discussing is an interesting and important problem.
This dissertation work mainly consists of two parts. First, this research pays attention to mining the hidden topics and user interest trend by analyzing real-world social media activities. Topic modeling and sentiment analysis methods are proposed to classify the social media posts into different sentiment classes and then discover the trend of sentiment based on different topics over time. The presented case study focuses on COVID-19 pandemic that started in 2019. A large amount of Twitter data is collected and used to discover the vaccine-related topics during the pre- and post-vaccine emergency use period. By using the proposed framework, 11 vaccine-related trend topics are discovered. Ultimately the discovered topics can be used to improve the readability of confusing messages about vaccines on social media and provide effective results to support policymakers in making their policy their informed decisions about public health. Second, using conventional topic models cannot deal with the sparsity problem of short text. A novel topic model, named Topic Noise based-Biterm Topic Model with FastText embeddings (TN-BTMF), is proposed to deal with this problem. Word co-occurrence patterns (i.e. biterms) are dirctly generated in BTM. A scoring method based on word co-occurrence and semantic similarity is proposed to detect noise biterms. In th
Low-resource Personal Attribute Prediction from Conversation
Personal knowledge bases (PKBs) are crucial for a broad range of applications
such as personalized recommendation and Web-based chatbots. A critical
challenge to build PKBs is extracting personal attribute knowledge from users'
conversation data. Given some users of a conversational system, a personal
attribute and these users' utterances, our goal is to predict the ranking of
the given personal attribute values for each user. Previous studies often rely
on a relative number of resources such as labeled utterances and external data,
yet the attribute knowledge embedded in unlabeled utterances is underutilized
and their performance of predicting some difficult personal attributes is still
unsatisfactory. In addition, it is found that some text classification methods
could be employed to resolve this task directly. However, they also perform not
well over those difficult personal attributes. In this paper, we propose a
novel framework PEARL to predict personal attributes from conversations by
leveraging the abundant personal attribute knowledge from utterances under a
low-resource setting in which no labeled utterances or external data are
utilized. PEARL combines the biterm semantic information with the word
co-occurrence information seamlessly via employing the updated prior attribute
knowledge to refine the biterm topic model's Gibbs sampling process in an
iterative manner. The extensive experimental results show that PEARL
outperforms all the baseline methods not only on the task of personal attribute
prediction from conversations over two data sets, but also on the more general
weakly supervised text classification task over one data set.Comment: Accepted by AAAI'2
- …