Search CORE

1 research outputs found

Semantically Enhanced Topic Modeling and Its Applications in Social Media

Author: Guo Lifan
Publication venue: Drexel University
Publication date
Field of study

As we witness the prosperity of the social media in the past few years, and feel the explosion of "user-generated content" on the Internet, there is little question that we have entered an era of Big Data. Those social media sites, such as Facebook, LinkedIn, Quora and Twitter have been important sources for a wide spectrum of users. Mushrooming numbers of tasks, such as community detection, personalized message recommendation and sentiment analysis, have become more important under such scenario. While many researches wish to use standard text mining tools to understand social media data, the heterogeneity and restricted length of data, prevents them from directly applying those tools. Among those tools, topic modeling (Blei D. , 2003) (Hofmann, 2009) (Steyvers, 2007), a type of probabilistic and statistical model for discovering the abstract "topics" that occur in a collection of documents, draw a lot of interests in recent ten years. Topic model can uncover the hidden structure in document collections and help us develop new ways to search, browse and summarize large archives of texts. Directly applying topic model to social media data, however, is not straightforward for the following reasons: (1) social-media data are essentially unstructured and include heterogeneous data types, such as text, clicks, votes and so on, while traditional topic model are used to analyze structured data, like archives of books, journals, and newspapers; (2) compared to focus on discovering topics, the purpose of using social media data is more complex, such as reliable information detection, sentiment detection, and recommendation. In other words, discovered topics are just intermediate results for further use; (3) traditional topic modeling technology assumes that words in documents are drawn independently from a set of topics and documents are identically distrusted in the corpus. Such an independently and identically distributed (i.i.d.) assumption, however, often does not hold in reality. Further, the i.i.d. assumption ignores semantic information existed on web. Therefore, it is reasonable to incorporate the existing knowledge into current unsupervised topic modeling in the purpose of semantically enhancing topic modeling technology. To address the facing challenges, this dissertation first proposes a semantically enhanced topic modeling framework that does not rely on independently and identically distributed (i.i.d.) assumption through utilizing existing knowledge. Experiments show that this framework enhanced current topic models since they are able to employ the relations of words to achieve better results compared to other traditional topic modeling methods.Second, we extend the framework into social media data targeting two research questions: 1) How to detect reliable authority and content information in community question answering? 2) How to enhance recommender system with items reviews in communities. To answer the first question, we effectively extend LDA model to model the question and answers from different topic distribution in community question answering through semantic correlations between terms. Our model outperforms the model that directly apply LDA model to the same question and the model without enhanced semantic correlations. Also our model can utilize the topical information from questions, answers, questioner and answerer, in the purpose of detecting domain authority and reliable contents. Last but not least, we apply our model to recommender system. We propose an innovative concept, namely Item Social Reputation (ISR), to enhance current recommender system. Our model is to add another social dimension to items, in the purpose of effectively improving conversion rate of items recommendations. Furthermore, we can automatically determine the number of ISR of a certain item. Our experiments outperform the-state-of-the-art algorithms in the domain of sentiment analysis. Besides, our model shows potentials to be used to design a new interface of recommender systems.Ph.D., Information Science -- Drexel University, 201

Drexel Libraries E-Repository and Archives