186 research outputs found

    Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

    Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 201

    Discovering topic structures of a temporally evolving document corpus

    In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the metabolic syndrome (MetS)—both increasingly important research subjects with significant social and healthcare consequences. In addition to the collected ASD and metabolic syndrome literature corpora which we made freely available, our contribution also includes an extensive empirical analysis of the proposed framework. We describe a detailed and careful examination of the effects that our algorithms’s free parameters have on its output and discuss the significance of the findings both in the context of the practical application of our algorithm as well as in the context of the existing body of work on temporal topic analysis. Our quantitative analysis is followed by several qualitative case studies highly relevant to the current research on ASD and MetS, on which our algorithm is shown to capture well the actual developments in these fields.Publisher PDFPeer reviewe

    Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance

    The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public medical literature corpus.Publisher PDFPeer reviewe

    Identification of promising research directions using machine learning aided medical literature analysis

    The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora, and of tracking complex temporal changes within it.Postprin

    Prediction of future hospital admissions - what is the tradeoff between specificity and accuracy?

    Large amounts of electronic medical records collected by hospitals across the developed world offer unprecedented possibilities for knowledge discovery using computer based data mining and machine learning. Notwithstanding significant research efforts, the use of this data in the prediction of disease development has largely been disappointing. In this paper we examine in detail a recently proposed method which has in preliminary experiments demonstrated highly promising results on real-world data. We scrutinize the authors' claims that the proposed model is scalable and investigate whether the tradeoff between prediction specificity (i.e. the ability of the model to predict a wide number of different ailments) and accuracy (i.e. the ability of the model to make the correct prediction) is practically viable. Our experiments conducted on a data corpus of nearly 3,000,000 admissions support the authors' expectations and demonstrate that the high prediction accuracy is maintained well even when the number of admission types explicitly included in the model is increased to account for 98% of all admissions in the corpus. Thus several promising directions for future work are highlighted.Comment: In Proc. International Conference on Bioinformatics and Computational Biology, April 201

    Using Twitter to learn about the autism community

    Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD -- their behaviour, concerns, needs etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.Comment: Social Network Analysis and Mining, 201

    Cannabidiol tweet miner: a framework for identifying misinformation In CBD tweets.

    As regulations surrounding cannabis continue to develop, the demand for cannabis-based products is on the rise. Despite not producing the psychoactive effects commonly associated with THC, products containing cannabidiol (CBD) have gained immense popularity in recent years as a potential treatment option for a range of conditions, particularly those associated with pain or sleep disorders. However, due to current federal policies, these products have yet to undergo comprehensive safety and efficacy testing. Fortunately, utilizing advanced natural language processing (NLP) techniques, data harvested from social networks have been employed to investigate various social trends within healthcare, such as disease tracking and drug surveillance. By leveraging Twitter data, NLP can offer invaluable insights into public perceptions around CBD, as well as the marketing tactics employed by those marketing such loosely-regulated substances to the general public. Given the lack of comprehensive clinical CBD testing, the various health claims made by CBD sellers regarding their products are highly dubious and potentially perilous, as is evident from the ongoing COVID-19 misinformation. It is therefore critically important to efficiently identify unsupportable claims to guide public health policy and action. To this end, we present our proposed framework, the Cannabidiol Tweet Miner (CBD-TM), which utilizes advanced natural language processing (NLP) techniques, including text mining and sentiment analysis, to analyze the similarities and differences between commercial and personal tweets that mention CBD. CBD-TM enables us to identify conditions typically associated with commercial CBD advertising, or conditions not associated with positive sentiment, that are also absent from personal conversations. Through our technical contributions, including NLP, text mining, and sentiment analysis, we can effectively uncover areas where the public may be misled by CBD sellers. Since the rise in popularity of CBD, advertisements making bold claims about its benefits have become increasingly prevalent. The COVID-19 pandemic created a new opportunity for sellers to promote and sell products that purportedly treat and/or prevent the virus, with CBD being one of them. Although the U.S. Food and Drug Administration issued multiple warnings to CBD sellers, this type of misinformation still persists. In response, we have extended the CBD-TM framework with an additional layer of tweet classification designed to identify tweets that make potentially misleading claims about CBD\u27s efficacy in treating and/or preventing COVID-19. Our approach harnesses modern NLP algorithms, utilizing a transformer-based language model to establish the semantic relationship between statements extracted from the FDA\u27s website that contain false information and tweets conveying similar false claims. Our technical contributions build upon the impressive performance of deep language models in various natural language processing and understanding tasks. Specifically, we employ transfer learning via pre-trained deep language models, enabling us to achieve improved misinformation identification in tweets, even with relatively small training sets. Furthermore, this extension of CBD-TM can be easily adapted to detect other forms of misinformation. Through our innovative use of NLP techniques and algorithms, we can more effectively identify and combat false and potentially harmful claims related to CBD and COVID-19, as well as other forms of misinformation. As the conversations surrounding CBD on Twitter evolve over time, concept drift can occur, leading to changes in the topics being discussed. We observed significant changes within the CBD Twitter data stream with the emergence of COVID-19, introducing a new medical condition associated with CBD that would not have been discussed in conversations prior to the pandemic. These shifts in conversation introduce concept drift into CBD-TM, which has the potential to negatively impact our tweet classification models. Therefore, it is crucial to identify when such concept drift occurs to maintain the accuracy of our models. To this end, we propose an innovative approach for identifying potential changes within social network streams, allowing us to determine how and when these conversations evolve over time. Our approach leverages a BERT-based topic model, which can effectively capture how conversations related to CBD change over time. By incorporating advanced NLP techniques and algorithms, we are able to better understand the changes in topic that occur within the CBD Twitter data stream, allowing us to more effectively manage concept drift in CBD-TM. Our technical contributions enable us to maintain the accuracy and effectiveness of our tweet classification models, ensuring that we can continue to identify and address potentially harmful misinformation related to CBD

    Mapping the evolving landscape of child-computer interaction research: structures and processes of knowledge (re)production

    Implementing an iterative sequential mixed methods design (Quantitative → Qualitative → Quantitative) framed within a sociology of knowledge approach to discourse, this study offers an account of the structure of the field of Child-Computer Interaction (CCI), its development over time, and the practices through which researchers have (re)structured knowledge comprising the field. Thematic structure of knowledge within the field, and its evolution over time, is quantified through implementation of a Correlated Topic Model (CTM), an automated inductive content analysis method, in analysing 4,771 CCI research papers published between 2003 and 2021. Detailed understanding of practices through which researchers (re)structure knowledge within the field, including factors influencing these practices, is obtained through thematic analysis of online workshops involving prominent contributors to the field (n=7). Strategic practices utilised by researchers in negotiating tensions impeding integration of novel concepts in the field are investigated through analysis of semantic features of retrieved papers using linear and negative binomial regression models. Contributing an extensive mapping, results portray the field of CCI as a varied research landscape, comprising 48 major themes of study, which has evolved dynamically over time. Research priorities throughout the field have been subject to influence from a range of endogenous and exogenous factors which researchers actively negotiate through research and publication practices. Tacitly structuring research practices, these factors have broadly sustained a technology-driven, novelty-dominated paradigm throughout the field which has failed to substantively progress cumulative knowledge. Through strategic negotiation of persistent tensions arising as consequence of these factors, researchers have nonetheless affected structural change within the field, contributing to a shift towards a user needs-driven agenda and progression of knowledge therein. Findings demonstrate that the field of CCI is proceeding through an intermediary phase in maturation, forming an increasingly distinct disciplinary shape and identity through the cumulative structuring effect of community members’ continued negotiation of tensions

    An Investigation of Autism Support Groups on Facebook

    Autism-affected users, such as autism patients, caregivers, parents, family members, and researchers, currently seek informational support and social support from communities on social media. To reveal the information needs of autism- affected users, this study centers on the research of users’ interactions and information sharing within autism communities on social media. It aims to understand how autism-affected users utilize support groups on Facebook. A systematic method was proposed to aid in the data analysis including social network analysis, topic modeling, sentiment analysis, and inferential analysis. Social network analysis method was adopted to reveal the interaction patterns appearing in the groups, and topic modeling method was employed to uncover the discussion themes that users were concerned with in their daily lives. Sentiment analysis method helped analyze the emotional characteristics of the content that users expressed in the groups. Inferential analysis method was applied to compare the similarities and differences among different autism support groups found on Facebook. This study collected user-generated content from five sampled support groups (an awareness group, a treatment group, a parents group, a research group, and a local support group) on Facebook. Findings show that the discussion topics varied in different groups. Influential users in each Facebook support group were identified through the analysis of the interaction network. The results indicated that the influential users not only attracted more attention from other group members but also led the discussion topics in the group. In addition, it was examined that autism support groups on Facebook offered a supportive emotional atmosphere for group members. The findings of this study revealed the characteristics of user interactions and information exchanges in autism support groups on social media. Theoretically, the findings demonstrated the significance of social media for autism users. The unique implication of this study is to identify support groups on Facebook as a source of informational, social, and emotional support for autism-related users. The methodology applied in this study presented a systematic approach to evaluating the information exchange in health-related support groups on social media. Further, it investigated the potential role of technology in the social lives of autism-related users. The outcomes of this study can contribute to improving online intervention programs by highlighting effective communication approaches

    Predictive Analysis on Twitter: Techniques and Applications

    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories
