976 research outputs found

    Computational vs. qualitative: analyzing different approaches in identifying networked frames during the Covid-19 crisis

    Get PDF
    Despite the increasing adaption of automated text analysis in communication studies, its strengths and weaknesses in framing analysis are so far unknown. Fewer efforts have been made to automatic detection of networked frames. Drawing on the recent developments in this field, we harness a comparative exploration, using Latent Dirichlet Allocation (LDA) and a human-driven qualitative coding process on three different samples. Samples were comprised of a dataset of 4,165,177 million tweets collected from Iranian Twittersphere during the Coronavirus crisis, from 21 January, 2020 to 29 April, 2020. Findings showed that while LDA is reliable in identifying the most prominent networked frames, it misses to detects less dominant frames. Our investigation also confirmed that LDA works better on larger datasets and lexical semantics. Finally, we argued that LDA could give us some primary intuitions, but qualitative interpretations are indispensable for understanding the deeper layers of meaning

    Depression and Anxiety Detection from Blog Posts Data

    Get PDF
    Depressioon ja Ă€revus mĂ”jutavad paljude inimeste elu ja kui diagnoos ei ole Ă”igeaeg-selt mÀÀratud, vĂ”ib see kaasa tuua mĂ€rkimisvÀÀrseid terviseprobleeme ja isegi suitsiidi. TĂ€napĂ€eval uurivad vaimse tervise spetsialistid ja andmeteadlased meetodeid, kuidas sotsiaalmeedia ja eriti avalikult kĂ€ttesaadavate tekstisĂ”numite ja blogitekstide analĂŒĂŒsimise abil depressioonis inimesi tuvstada ja pakkuda neile vajalikku ravi ja toetust. Selles töös kogume eksperimentaalse andmestiku avalikult kĂ€ttesaadavatest blogipostitustsest, mis koosneb nii kliinilisest kui ka kontrollgrupi postitustest. Kliiniline grupp koosneb autoritest, kes kannatavad depressiooni ja/vĂ”i Ă€revuse all, kontrollgrupp koosneb tervetest isikutest, kes oma blogis kirjutavad depressiooni ja Ă€revuse teemadel. Töös leiame kogutud andmetes sisalduvad latentsed teemad ja analĂŒĂŒsime blogipostituste sisu vastavaltblogi autorite poolt kajastatud teemadele. Katsetame mitmete teksti kodeerimismeetoditega nagu sĂ”nahulk (BOW), TFIDF ja teemamudelist tuletatud tunnused. Treenime tugivektormasinatel (SVM) ning konvolutsioonilistel nĂ€rvivĂ”rkudel (CNN) pĂ”hinevaid klassifikaatoreid kliinilisse ja kontrollgruppi kuuluvate autorite eristamiseks. Lisaks uurime, kuidas mĂ”jutavad erineva pikkusega blogipostitused CNN’i klassifitseerimistĂ€psust. Parimad tĂ€psuse ja saagise skoorid vastavalt 78% ja 0,72 saadi konvolutsioonilise nĂ€rvivĂ”rgu (CNN) klassifikaatoriga, mis oli initsialiseeritud eeltreenitud GloVe sĂ”navektoritega.Depression and anxiety affect the life of many individuals and if the diagnosis is notstated in time it could lead to considerable health decline and even suicide. Nowadays,mental health specialists, as well as data scientists, work towards analyzing socialmedia sources and, in particular, publicly available text messages and blogs to identifydepressed people and provide them with necessary treatment and support. In this work,we adopt an experimental data collection approach to gather a corpus of blog posts fromclinical and control subjects. Ill people are considered as clinical subjects while controlsubjects refer to healthy individuals. We inspect the latent topics found in collecteddata to analyze the blog’ content according to themes covered by blog authors. Weexperiment with various text encoding techniques such as Bag-of-Words (BOW), TermFrequency-Inverse Document Frequency (TFIDF) and topic model’s features. We applySupport Vector Machines (SVM) and Convolutional Neural Network (CNN) classifiersto discriminate between clinical and control subjects. Additionally, we explore theclassification performance of CNNs trained on blog post texts of different size. Thebest accuracy and recall scores of 78% and 0.72 respectively were obtained with aConvolutional Neural Network (CNN) classifier initialised with pretrained GloVe wordvector

    A survey of data mining techniques for social media analysis

    Get PDF
    Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors

    Linking social media, medical literature, and clinical notes using deep learning.

    Get PDF
    Researchers analyze data, information, and knowledge through many sources, formats, and methods. The dominant data format includes text and images. In the healthcare industry, professionals generate a large quantity of unstructured data. The complexity of this data and the lack of computational power causes delays in analysis. However, with emerging deep learning algorithms and access to computational powers such as graphics processing unit (GPU) and tensor processing units (TPUs), processing text and images is becoming more accessible. Deep learning algorithms achieve remarkable results in natural language processing (NLP) and computer vision. In this study, we focus on NLP in the healthcare industry and collect data not only from electronic medical records (EMRs) but also medical literature and social media. We propose a framework for linking social media, medical literature, and EMRs clinical notes using deep learning algorithms. Connecting data sources requires defining a link between them, and our key is finding concepts in the medical text. The National Library of Medicine (NLM) introduces a Unified Medical Language System (UMLS) and we use this system as the foundation of our own system. We recognize social media’s dynamic nature and apply supervised and semi-supervised methodologies to generate concepts. Named entity recognition (NER) allows efficient extraction of information, or entities, from medical literature, and we extend the model to process the EMRs’ clinical notes via transfer learning. The results include an integrated, end-to-end, web-based system solution that unifies social media, literature, and clinical notes, and improves access to medical knowledge for the public and experts

    Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling

    Full text link
    Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano, Switzerlan
    • 

    corecore