2,891 research outputs found

    A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

    Get PDF
    abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    What's unusual in online disease outbreak news?

    Get PDF
    Background: Accurate and timely detection of public health events of international concern is necessary to help support risk assessment and response and save lives. Novel event-based methods that use the World Wide Web as a signal source offer potential to extend health surveillance into areas where traditional indicator networks are lacking. In this paper we address the issue of systematically evaluating online health news to support automatic alerting using daily disease-country counts text mined from real world data using BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance against expert moderated ProMED-mail postings. Results: We report sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), mean alerts/100 days and F1, at 95% confidence interval (CI) for 287 ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period. Results indicate that W2 had the best F1 with a slight benefit for day of week effect over C2. In drill down analysis we indicate issues arising from the granular choice of country-level modeling, sudden drops in reporting due to day of week effects and reporting bias. Automatic alerting has been implemented in BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news alerts have the potential to enhance manual analytical methods by increasing throughput, timeliness and detection rates. Systematic evaluation of health news aberrations is necessary to push forward our understanding of the complex relationship between news report volumes and case numbers and to select the best performing features and algorithms

    Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

    Get PDF
    One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.Comment: PhD thesis, Mar 201

    The Role of Bias in News Recommendation in the Perception of the Covid-19 Pandemic

    Full text link
    News recommender systems (NRs) have been shown to shape public discourse and to enforce behaviors that have a critical, oftentimes detrimental effect on democracies. Earlier research on the impact of media bias has revealed their strong impact on opinions and preferences. Responsible NRs are supposed to have depolarizing capacities, once they go beyond accuracy measures. We performed sequence prediction by using the BERT4Rec algorithm to investigate the interplay of news of coverage and user behavior. Based on live data and training of a large data set from one news outlet "event bursts", "rally around the flag" effect and "filter bubbles" were investigated in our interdisciplinary approach between data science and psychology. Potentials for fair NRs that go beyond accuracy measures are outlined via training of the models with a large data set of articles, keywords, and user behavior. The development of the news coverage and user behavior of the COVID-19 pandemic from primarily medical to broader political content and debates was traced. Our study provides first insights for future development of responsible news recommendation that acknowledges user preferences while stimulating diversity and accountability instead of accuracy, only.Comment: Accepted for presentation at the 5th FAccTRec Workshop on Responsible Recommendation (FAccTRec '22). Revised based on the reviewers' feedbac

    Exploiting Social Media Sources for Search, Fusion and Evaluation

    Get PDF
    The web contains heterogeneous information that is generated with different characteristics and is presented via different media. Social media, as one of the largest content carriers, has generated information from millions of users worldwide, creating material rapidly in all types of forms such as comments, images, tags, videos and ratings, etc. In social applications, the formation of online communities contributes to conversations of substantially broader aspects, as well as unfiltered opinions about subjects that are rarely covered in public media. Information accrued on social platforms, therefore, presents a unique opportunity to augment web sources such as Wikipedia or news pages, which are usually characterized as being more formal. The goal of this dissertation is to investigate in depth how social data can be exploited and applied in the context of three fundamental information retrieval (IR) tasks: search, fusion, and evaluation. Improving search performance has consistently been a major focus in the IR community. Given the in-depth discussions and active interactions contained in social media, we present approaches to incorporating this type of data to improve search on general web corpora. In particular, we propose two graph-based frameworks, social anchor and information network, to associate related web and social content, where information sources of diverse characteristics can be used to complement each other in a unified manner. We investigate how the enriched representation can potentially reduce vocabulary mismatch and improve retrieval effectiveness. Presenting social media content to users is valuable particularly for queries intended for time-sensitive events or community opinions. Current major search engines commonly blend results from different search services (or verticals) into core web results. Motivated by this real-world need, we explore ways to merge results from different web and social services into a single ranked list. We present an optimization framework for fusion, where impact of documents, ranked lists, and verticals can be modeled simultaneously to maximize performance. Evaluating search system performance has largely relied on creating reusable test collections in IR. Traditional ways to creating evaluation sets can require substantial manual effort. To reduce such effort, we explore an approach to automating the process of collecting pairs of queries and relevance judgments, using high quality social media, Community Question Answering (CQA). Our approach is based on the idea that CQA services support platforms for users to raise questions and to share answers, therefore encoding the associations between real user information needs and real user assessments. To demonstrate the effectiveness of our approaches, we conduct extensive retrieval and fusion experiments, as well as verify the reliability of the new, CQA-based evaluation test sets

    Identification and characterization of diseases on social web

    Get PDF
    [no abstract

    PARTISANS AND CONTROVERSIAL NEWS ONLINE: COMPARING PERCEIVED BIAS, CREDIBILITY, AND USER BEHAVIOR IN MAINSTREAM NEWS VERSUS BLOGS

    Get PDF
    This 2 (partisan opinion) x 2 (content source) x 2 (content valence) factorial experiment investigates how partisans' prior positions on two controversial issues of same sex marriage (N = 132) and guns on campus (N = 130) influence their perceptions about online content from either mainstream news source online (the Associated Press) or citizen blogs. Partisans' perceptions of the content included perceived bias and credibility. This study also explores how the perceptions affect partisans' online behaviors, including commenting on the content and subsequent information seeking. Theoretically, the study tests `hostile media effect' framework with a blog then investigates whether the effect differs when the same content appears on a mainstream news source online (the Associated Press). The study also examines the relationship between the hostile media effect and partisans' online behaviors. Participants were randomly assigned to one of the four conditions with each containing stimuli manipulated as either pro or anti on the issues on either a mainstream news source online (the Associated Press) or a blog. Similar to previous evidence of a relative hostile media effect in traditional printed news articles and national network broadcasts, this study found that online content also generated the effect regardless if content is produced by professional journalists or citizen bloggers. Partisans evaluated both mainstream online news and blog postings with opposite views as biased and less credible. Particularly, user-generated content, blog postings, generated a stronger relative hostile media effect than mainstream online news. In addition, hostile media effect appeared to motivate partisans to comment on content that opposes their position to correct perceived bias, and amplify their own position. This study also confirms partisans' selective exposure to additional content that supports their position. However, the hostile media effect did not appear to enhance the tendency for selective exposure. In their totality, partisan audiences' perceptions of bias and credibility in mainstream online news and blog postings in a hostile direction, followed by commenting and more information seeking, seems to reinforce partisanship rather than encourage consensus between supporters and opponents of the controversial issues

    Effectiveness of Corporate Social Media Activities to Increase Relational Outcomes

    Get PDF
    This study applies social media analytics to investigate the impact of different corporate social media activities on user word of mouth and attitudinal loyalty. We conduct a multilevel analysis of approximately 5 million tweets regarding the main Twitter accounts of 28 large global companies. We empirically identify different social media activities in terms of social media management strategies (using social media management tools or the web-frontend client), account types (broadcasting or receiving information), and communicative approaches (conversational or disseminative). We find positive effects of social media management tools, broadcasting accounts, and conversational communication on public perception

    Third Person Effect and Internet Privacy Risks

    Get PDF
    The current study tests the third-person effect (TPE) in the context of Internet privacy. TPE refers to the phenomenon that people tend to perceive greater media effects on others than on themselves. The behavioral component of TPE holds that the self-others perceptual gap is positively associated with support for restricting harmful media messages. Using a sample (N=613) from Amazon Mturk, the current research documented firm support for the perceptual and behavioral components of TPE in the context of Internet privacy. Moreover, social distance, perceived Internet privacy knowledge, negative online privacy experiences, and Internet use were found to be significant predictors of the TPE perceptions of Internet privacy risks. There are four novel contributions of the current study. First, this study systematically tests TPE in a new context―Internet privacy. Second, this study examines five antecedents of TPE perceptions, of which perceived Internet privacy knowledge, negative online privacy experiences, and Internet use are novel to TPE studies. Unlike prior studies which assume social distance and desirability of media content, the current study provides direct empirical tests of these two antecedents. Third, prior research primarily examines support for censorship of harmful media messages, a context in which individuals do not have control over policy enforcement. In the case of Internet privacy, people can decide whether to adopt privacy protective measures or not. The current study addresses two types of behavioral intentions to reduce privacy risks: (1) the willingness to adopt online privacy protection measures; and (2) recommend such measures to others. Fourth, unlike prior studies using fear based theories to investigate Internet privacy issues, the current tests Internet privacy from a novel perspective—TPE theory

    Disentangling the effects of efficacy-facilitating informational support on health resilience in online health communities based on phrase-level text analysis

    Get PDF
    This study examines the different types of supportive messages posted on a forum at online Healthcare communities (OHCs), which facilitate user self-efficacy and response-efficacy and an issue of how such informational messages encourage users to enhance their health resilience via goal-setting for health improvement. We theorize that self-efficacy-oriented messages affect helpfulness, focusing on the efficiency of the implementation, while response-efficacy-oriented messages influence the relationships among helpfulness, goal-settings, and health resilience based on the outcome expectancy. Using a computer assisted approach which allows for the directed content analysis, we test a conceptual model with the text-data collected from an OHC
    • …
    corecore