520 research outputs found

    Toward Effective Knowledge Discovery in Social Media Streams

    Get PDF
    The last few decades have seen an unprecedented growth in the amount of new data. New computing and communications resources, such as cloud data platforms and mo- bile devices have enabled individuals to contribute new ideas, share points of view and exchange newsworthy bits with each other at a previously unfathomable rate. While there are many ways a modern person can communicate digitally with others, social media outlets, such as Twitter or Facebook have been occupying much of the focus of inter-person social networking in recent years. The millions of pieces of content published on social media sites have been both a blessing and a curse for those trying to make sense of the discourse. On one hand, the sheer amount of easily available, real time, contextually relevant content has been a cause of much excitement in academia and the industry. On the other hand, however, the amount of new diverse content that is being continuously published on social sites makes it difficult for researchers and industry participants to effectively grasp. Therefore, the goal of this thesis is to discover a set of approaches and techniques that would help enable data miners to quickly develop intuitions regarding the happenings in the social media space. To that aim, I concentrate on effectively visualizing social media streams as hierarchical structures, as such structures have been shown to be useful in human sense makingPh.D., Information Studies -- Drexel University, 201

    Interpretable classification and summarization of crisis events from microblogs

    Get PDF
    The widespread use of social media platforms has created convenient ways to obtain and spread up-to-date information during crisis events such as disasters. Time-critical analysis of crisis-related information helps humanitarian organizations and governmental bodies gain actionable information and plan for aid response. However, situational information is often immersed in a high volume of irrelevant content. Moreover, crisis-related messages also vary greatly in terms of information types, ranging from general situational awareness - such as information about warnings, infrastructure damages, and casualties - to individual needs. Different humanitarian organizations or governmental bodies usually demand information of different types for various tasks such as crisis preparation, resource planning, and aid response. To cope with information overload and efficiently support stakeholders in crisis situations, it is necessary to (a) classify data posted during crisis events into fine-grained humanitarian categories, (b) summarize the situational data in near real-time. In this thesis, we tackle the aforementioned problems and propose novel methods for the classification and summarization of user-generated posts from microblogs. Previous studies have introduced various machine learning techniques to assist humanitarian or governmental bodies, but they primarily focused on model performance. Unlike those works, we develop interpretable machine-learning models which can provide explanations of model decisions. Generally, we focus on three methods for reducing information overload in crisis situations: (i) post classification, (ii) post summarization, (iii) interpretable models for post classification and summarization. We evaluate our methods using posts from the microblogging platform Twitter, so-called tweets. First, we expand publicly available labeled datasets with rationale annotations. Each tweet is annotated with a class label and rationales, which are short snippets from the tweet to explain its assigned label. Using the data, we develop trustworthy classification methods that give the best tradeoff between model performance and interoperability. Rationale snippets usually convey essential information in the tweets. Hence, we propose an integer linear programming-based summarization method that maximizes the coverage of rationale phrases to generate summaries of class-level tweet data. Next, we introduce an approach that can enhance latent embedding representations of tweets in vector space. Our approach helps improve the classification performance-interpretability tradeoff and detect near duplicates for designing a summarization model with low computational complexity. Experiments show that rationale labels are helpful for developing interpretable-by-design models. However, annotations are not always available, especially in real-time situations for new tasks and crisis events. In the last part of the thesis, we propose a two-stage approach to extract the rationales under minimal human supervision

    Personalized Expert Recommendation: Models and Algorithms

    Get PDF
    Many large-scale information sharing systems including social media systems, questionanswering sites and rating and reviewing applications have been growing rapidly, allowing millions of human participants to generate and consume information on an unprecedented scale. To manage the sheer growth of information generation, there comes the need to enable personalization of information resources for users — to surface high-quality content and feeds, to provide personally relevant suggestions, and so on. A fundamental task in creating and supporting user-centered personalization systems is to build rich user profile to aid recommendation for better user experience. Therefore, in this dissertation research, we propose models and algorithms to facilitate the creation of new crowd-powered personalized information sharing systems. Specifically, we first give a principled framework to enable personalization of resources so that information seekers can be matched with customized knowledgeable users based on their previous historical actions and contextual information; We then focus on creating rich user models that allows accurate and comprehensive modeling of user profiles for long tail users, including discovering user’s known-for profile, user’s opinion bias and user’s geo-topic profile. In particular, this dissertation research makes two unique contributions: First, we introduce the problem of personalized expert recommendation and propose the first principled framework for addressing this problem. To overcome the sparsity issue, we investigate the use of user’s contextual information that can be exploited to build robust models of personal expertise, study how spatial preference for personally-valuable expertise varies across regions, across topics and based on different underlying social communities, and integrate these different forms of preferences into a matrix factorization-based personalized expert recommender. Second, to support the personalized recommendation on experts, we focus on modeling and inferring user profiles in online information sharing systems. In order to tap the knowledge of most majority of users, we provide frameworks and algorithms to accurately and comprehensively create user models by discovering user’s known-for profile, user’s opinion bias and user’s geo-topic profile, with each described shortly as follows: —We develop a probabilistic model called Bayesian Contextual Poisson Factorization to discover what users are known for by others. Our model considers as input a small fraction of users whose known-for profiles are already known and the vast majority of users for whom we have little (or no) information, learns the implicit relationships between user?s known-for profiles and their contextual signals, and finally predict known-for profiles for those majority of users. —We explore user’s topic-sensitive opinion bias, propose a lightweight semi-supervised system called “BiasWatch” to semi-automatically infer the opinion bias of long-tail users, and demonstrate how user’s opinion bias can be exploited to recommend other users with similar opinion in social networks. — We study how a user’s topical profile varies geo-spatially and how we can model a user’s geo-spatial known-for profile as the last step in our dissertation for creation of rich user profile. We propose a multi-layered Bayesian hierarchical user factorization to overcome user heterogeneity and an enhanced model to alleviate the sparsity issue by integrating user contexts into the two-layered hierarchical user model for better representation of user’s geo-topic preference by others
    • …
    corecore