3,148 research outputs found

    A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

    Get PDF
    abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Generating Natural Language from Linked Data:Unsupervised template extraction

    Get PDF
    We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.

    Learning Product Attributes from User-Generated Content for Dynamic Promotion Strategies

    Get PDF
    One widely adopted product attribute classification in the literature is the “Search” versus “Experience” dichotomy. Because the costs involved in searching and experiencing products vary across consumers and over a product’s life time, it is important for marketers to understand consumers’ evaluation of these attributes in order to formulate scalable and dynamic promotion strategies. This thesis attempts to address this challenge by proposing a text analytics framework for understanding consumers’ evaluation of product attributes to support agile promotion strategies. In the past, researchers have attempted to classify entire product categories as search or experience via questionnaires or using quantitative approaches by analyzing review star ratings. This thesis uses objective consumer reviews and text mining techniques to extract product features that can define search or experience attributes. A hybrid of unsupervised and supervised learning techniques was used to generate labelled training data from eight different product categories of Amazon and train classification models to determine the likely position of a product within the search-experience product classification spectrum. Extensive experiments using best-case and worst-case scenario were used to improve the accuracy levels of decision-tree based classification models and demonstrate the scalability of the text analytics framework. The proposed approach also incorporated a mechanism to aggregate the scores that the model gives to each individual review in order to determine the likely position at a product level. It is also shown that a product’s position in the search-experience spectrum may change during its review cycle, indicating that marketers need to investigate reviews for any periods of interest to develop effective promotion strategies in a more agile fashion. From a theoretical view, the text mining approach significantly adds to the existing body of knowledge in the classification of product attributes for supporting promotions. In addition to detecting dominant signals for search and experience positions, marketers can uncover a great deal of contents to formulate more specific advertising messages

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
    • …
    corecore