Search CORE

12,357 research outputs found

LCCT: a semisupervised model for sentiment classification

Author: Chow KP
Lu Z
Tu W
Yang M
Yin W
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2015
Field of study

Conference Theme: Human Language TechnologiesAnalyzing public opinions towards products, services and social events is an important but challenging task. An accurate sentiment analyzer should take both lexicon-level information and corpus-level information into account. It also needs to exploit the domain-specific knowledge and utilize the common knowledge shared across domains. In addition, we want the algorithm being able to deal with missing labels and learning from incomplete sentiment lexicons. This paper presents a LCCT (Lexicon-based and Corpus-based, Co-Training) model for semi-supervised sentiment classification. The proposed method combines the idea of lexicon-based learning and corpus-based learning in a unified co-training framework. It is capable of incorporating both domain-specific and domain-independent knowledge. Extensive experiments show that it achieves very competitive classification accuracy, even with a small portion of labeled data. Comparing to state-of-the-art sentiment classification methods, the LCCT approach exhibits significantly better performances on a variety of datasets in both English and Chinese. © 2015 Association for Computational Linguisticspublished_or_final_versio

HKU Scholars Hub

A Data Analysis Pipeline for the Study and Categorization of User Content in Online Health Communities

Author: Sara Filipa Mendes da Silva
Publication venue
Publication date: 31/10/2022
Field of study

Repositório Aberto da Universidade do Porto

Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

Author: Geis J Raymond
Kohli Marc D
Summers Ronald M
Publication venue: eScholarship, University of California
Publication date: 17/05/2017
Field of study

At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities

Crossref

eScholarship - University of California

Health Misinformation in Search and Social Media

Author: Ghenai Amira
Publication venue: 'University of Waterloo'
Publication date: 22/11/2019
Field of study

People increasingly rely on the Internet in order to search for and share health-related information. Indeed, searching for and sharing information about medical treatments are among the most frequent uses of online data. While this is a convenient and fast method to collect information, online sources may contain incorrect information that has the potential to cause harm, especially if people believe what they read without further research or professional medical advice. The goal of this thesis is to address the misinformation problem in two of the most commonly used online services: search engines and social media platforms. We examined how people use these platforms to search for and share health information. To achieve this, we designed controlled laboratory user studies and employed large-scale social media data analysis tools. The solutions proposed in this thesis can be used to build systems that better support people's health-related decisions. The techniques described in this thesis addressed online searching and social media sharing in the following manner. First, with respect to search engines, we aimed to determine the extent to which people can be influenced by search engine results when trying to learn about the efficacy of various medical treatments. We conducted a controlled laboratory study wherein we biased the search results towards either correct or incorrect information. We then asked participants to determine the efficacy of different medical treatments. Results showed that people were significantly influenced both positively and negatively by search results bias. More importantly, when the subjects were exposed to incorrect information, they made more incorrect decisions than when they had no interaction with the search results. Following from this work, we extended the study to gain insights into strategies people use during this decision-making process, via the think-aloud method. We found that, even with verbalization, people were strongly influenced by the search results bias. We also noted that people paid attention to what the majority states, authoritativeness, and content quality when evaluating online content. Understanding the effects of cognitive biases that can arise during online search is a complex undertaking because of the presence of unconscious biases (such as the search results ranking) that the think-aloud method fails to show. Moving to social media, we first proposed a solution to detect and track misinformation in social media. Using Zika as a case study, we developed a tool for tracking misinformation on Twitter. We collected 13 million tweets regarding the Zika outbreak and tracked rumors outlined by the World Health Organization and the Snopes fact-checking website. We incorporated health professionals, crowdsourcing, and machine learning to capture health-related rumors as well as clarification communications. In this way, we illustrated insights that the proposed tools provide into potentially harmful information on social media, allowing public health researchers and practitioners to respond with targeted and timely action. From identifying rumor-bearing tweets, we examined individuals on social media who are posting questionable health-related information, in particular those promoting cancer treatments that have been shown to be ineffective. Specifically, we studied 4,212 Twitter users who have posted about one of 139 ineffective ``treatments'' and compared them to a baseline of users generally interested in cancer. Considering features that capture user attributes, writing style, and sentiment, we built a classifier that is able to identify users prone to propagating such misinformation. This classifier achieved an accuracy of over 90%, providing a potential tool for public health officials to identify such individuals for preventive intervention

University of Waterloo's Institutional Repository

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models

Author: Yang Chenyu
Publication venue: SMU Scholar
Publication date: 16/12/2023
Field of study

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models. The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset application, AMIC can achieve state-of-the-art performance compared to a number of machine learning methods, while providing much improved interpretability. The second model, AMIC 2.0, improves AMIC in two key aspects. Notably, AMIC is limited in integrating positional information in text because it ignores the order of words in documents. AMIC 2.0 comes up with a novel approach to incorporate relative positional information in the self-attention mechanism, enabling the model to capture more accurate sentiment that is position-sensitive. This modification enables the model to better understand how word order and proximity influence sentiment expressions. Secondly, AMIC 2.0 takes a step further by decomposing the sentiment score in AMIC into a context-independent score and a context-dependent score. This decomposition, along with the incorporation of two sentiment shifters linking these scores in a global environment and a local environment of text respectively, elucidate how context of document influences sentiment of words, leading to more interpretable results in sentiment analysis. The utility of AMIC 2.0 is demonstrated by an application to a Twitter dataset. AMIC 2.0 has improved the overall performance of AMIC, with the additional capability of handling more intricate language subtleties, such as different types of negations. Both AMIC and AMIC 2.0 are trained without having to use pre-trained sentiment word dictionary or seeded sentiment words. Compared to some other big language models, their computation cost is relatively low and they are versatile to use conventional datasets to generate domain-specific sentiment dictionary and provide interpretable sentiment analysis results

Southern Methodist University

Classification of socially generated medical data

Author: Alnashwan Rana
Publication venue: 'University College Cork'
Publication date: 01/09/2019
Field of study

The growth of online health communities, particularly those involving socially generated content, can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. However, the sheer volume of information so generated – and the consequent ‘noise’ associated with large data volumes – can create difficulties for information consumers. We propose a solution to this problem by applying high-level analytics to the data – primarily sentiment analysis, but also content and topic analysis - for accurate classification. We believe that such analysis can be of significant value to data users, such as identifying a particular aspect of an information space, determining themes that predominate among a large dataset, and allowing people to summarize topics within a big dataset. In this thesis, we apply machine learning strategies to identify sentiments expressed in online medical forums that discuss Lyme Disease. As part of this process, we distinguish a complete and relevant set of categories that can be used to characterize Lyme Disease discourse. We present a feature-based model that employs supervised learning algorithms and assess the feasibility and accuracy of this sentiment classification model. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing a disease with similar characteristics, Lupus. The experimental results demonstrate the effectiveness of our approach. In many sentiment analysis applications, the labelled training datasets are expensive to obtain, whereas unlabelled datasets are readily available. Therefore, we present an adaptation of a well-known semi-supervised learning technique, in which co-training is implemented by combining labelled and unlabelled data. Our results would suggest the ability to learn even with limited labelled data. In addition, we investigate complementary analytic techniques – content and topic analysis – to leverage best used of the data for various consumer groups. Within the work described in this thesis, some particular research issues are addressed, specifically when applied to socially generated medical/health datasets: • When applying binary sentiment analysis to short-form text data (e.g. Twitter), could meta-level features improve performance of classification? • When applying more complex multi-class sentiment analysis to classification of long-form content-rich text data, would meta-level features be a useful addition to more conventional features? • Can this multi-class analysis approach be generalised to other medical/health domains? • How would alternative classification strategies benefit different groups of information consumers

Cork Open Research Archive

CREATE: Concept Representation and Extraction from Heterogeneous Evidence

Author: Bhattarai Archana
Publication venue: University of Memphis Digital Commons
Publication date: 25/07/2013
Field of study

Traditional information retrieval methodology is guided by document retrieval paradigm, where relevant documents are returned in response to user queries. This paradigm faces serious drawback if the desired result is not explicitly present in a single document. The problem becomes more obvious when a user tries to obtain complete information about a real world entity, such as person, company, location etc. In such cases, various facts about the target entity or concept need to be gathered from multiple document sources. In this work, we present a method to extract information about a target entity based on the concept retrieval paradigm that focuses on extracting and blending information related to a concept from multiple sources if necessary. The paradigm is built around a generic notion of concept which is defined as any item that can be thought of as a topic of interest. Concepts may correspond to any real world entity such as restaurant, person, city, organization, etc, or any abstract item such as news topic, event, theory, etc. Web is a heterogeneous collection of data in different forms such as facts, news, opinions etc. We propose different models for different forms of data, all of which work towards the same goal of concept centric retrieval. We motivate our work based on studies about current trends and demands for information seeking. The framework helps in understanding the intent of content, i.e. opinion versus fact. Our work has been conducted on free text data in English. Nevertheless, our framework can be easily transferred to other languages

University of Memphis Digital Commons

INFORMATIONAL SUPPORT OR EMOTIONAL SUPPORT: PRELIMINARY STUDY OF AN AUTOMATED APPROACH TO ANALYZE ONLINE SUPPORT COMMUNITY CONTENTS

Author: Huang Kuang-Yuan
Nambisan Priya
Uzuner Özlem
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2010
Field of study

Recognizing the need for analyzing large amounts of data in the study of online support communities, an automated content analysis method is introduced in this article. By adopting machine learning techniques and tools, this method requires minimal manual intervention while capable of analyzing large amounts of data automatically. Through this method, contents of messages from online support communities spanning over years are categorized as either informational support or emotional support. A case study on the analysis of online breast cancer and prostate cancer message boards is presented to demonstrate that the proposed method generates results comparable to results concluded from traditional manual qualitative content analysis methods

AIS Electronic Library (AISeL)