1,959 research outputs found

    Leveraging graph-based semantic annotation for the identification of cause-effect relations

    Get PDF
    This research is related to language article in Indonesia that discuss about causality relationship research used as public health surveillance information monitoring system. Utilization of this research is suitability of feature selection, phrase annotation, paragraph annotation, medical element annotation and graph-based semantic annotation. Evaluation of system performance is done by intrinsic approach using the Naive Bayes Multinomial method. The results obtained sequentially for recall, precision and f-measure are 0.924, 0.905, and 0.910

    Towards a National Security Analysis Approach via Machine Learning and Social Media Analytics

    Get PDF
    Various severe threats at national and international level, such as health crises, radicalisation, or organised crime, have the potential of unbalancing a nation's stability. Such threats impact directly on elements linked to people's security, known in the literature as human security components. Protecting the citizens from such risks is the primary objective of the various organisations that have as their main objective the protection of the legitimacy, stability and security of the state. Given the importance of maintaining security and stability, governments across the globe have been developing a variety of strategies to diminish or negate the devastating effects of the aforementioned threats. Technological progress plays a pivotal role in the evolution of these strategies. Most recently, artificial intelligence has enabled the examination of large volumes of data and the creation of bespoke analytical tools that are able to perform complex tasks towards the analysis of multiple scenarios, tasks that would usually require significant amounts of human resources. Several research projects have already proposed and studied the use of artificial intelligence to analyse crucial problems that impact national security components, such as violence or ideology. However, the focus of all this prior research was examining isolated components. However, understanding national security issues requires studying and analysing a multitude of closely interrelated elements and constructing a holistic view of the problem. The work documented in this thesis aims at filling this gap. Its main contribution is the creation of a complete pipeline for constructing a big picture that helps understand national security problems. The proposed pipeline covers different stages and begins with the analysis of the unfolding event, which produces timely detection points that indicate that society might head toward a disruptive situation. Then, a further examination based on machine learning techniques enables the interpretation of an already confirmed crisis in terms of high-level national security concepts. Apart from using widely accepted national security theoretical constructions developed over years of social and political research, the second pillar of the approach is the modern computational paradigms, especially machine learning and its applications in natural language processing

    Systematic review on the prevalence, frequency and comparative value of adverse events data in social media

    Get PDF
    Aim: The aim of this review was to summarize the prevalence, frequency and comparative value of information on the adverse events of healthcare interventions from user comments and videos in social media. Methods: A systematic review of assessments of the prevalence or type of information on adverse events in social media was undertaken. Sixteen databases and two internet search engines were searched in addition to handsearching, reference checking and contacting experts. The results were sifted independently by two researchers. Data extraction and quality assessment were carried out by one researcher and checked by a second. The quality assessment tool was devised in-house and a narrative synthesis of the results followed. Results: From 3064 records, 51 studies met the inclusion criteria. The studies assessed over 174 social media sites with discussion forums (71%) being the most popular. The overall prevalence of adverse events reports in social media varied from 0.2% to 8% of posts. Twenty-nine studies compared the results from searching social media with using other data sources to identify adverse events. There was general agreement that a higher frequency of adverse events was found in social media and that this was particularly true for ‘symptom’ related and ‘mild’ adverse events. Those adverse events that were under-represented in social media were laboratory-based and serious adverse events. Conclusions: Reports of adverse events are identifiable within social media. However, there is considerable heterogeneity in the frequency and type of events reported, and the reliability or validity of the data has not been thoroughly evaluated

    Detecting Frames and Causal Relationships in Climate Change Related Text Databases Based on Semantic Features

    Get PDF
    abstract: The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface variations in text when different keywords are used for similar concepts. This thesis develops a unique type of textual features that generalize triplets extracted from text, by clustering them into high-level concepts. These concepts are utilized as features to detect frames in text. Compared to uni-gram and bi-gram based models, classification and clustering using generalized concepts yield better discriminating features and a higher classification accuracy with a 12% boost (i.e. from 74% to 83% F-measure) and 0.91 clustering purity for Frame/Non-Frame detection. The automatic discovery of complex causal chains among interlinked events and their participating actors has not yet been thoroughly studied. Previous studies related to extracting causal relationships from text were based on laborious and incomplete hand-developed lists of explicit causal verbs, such as “causes" and “results in." Such approaches result in limited recall because standard causal verbs may not generalize well to accommodate surface variations in texts when different keywords and phrases are used to express similar causal effects. Therefore, I present a system that utilizes generalized concepts to extract causal relationships. The proposed algorithms overcome surface variations in written expressions of causal relationships and discover the domino effects between climate events and human security. This semi-supervised approach alleviates the need for labor intensive keyword list development and annotated datasets. Experimental evaluations by domain experts achieve an average precision of 82%. Qualitative assessments of causal chains show that results are consistent with the 2014 IPCC report illuminating causal mechanisms underlying the linkages between climatic stresses and social instability.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    What Twitter Profile and Posted Images Reveal About Depression and Anxiety

    Full text link
    Previous work has found strong links between the choice of social media images and users' emotions, demographics and personality traits. In this study, we examine which attributes of profile and posted images are associated with depression and anxiety of Twitter users. We used a sample of 28,749 Facebook users to build a language prediction model of survey-reported depression and anxiety, and validated it on Twitter on a sample of 887 users who had taken anxiety and depression surveys. We then applied it to a different set of 4,132 Twitter users to impute language-based depression and anxiety labels, and extracted interpretable features of posted and profile pictures to uncover the associations with users' depression and anxiety, controlling for demographics. For depression, we find that profile pictures suppress positive emotions rather than display more negative emotions, likely because of social media self-presentation biases. They also tend to show the single face of the user (rather than show her in groups of friends), marking increased focus on the self, emblematic for depression. Posted images are dominated by grayscale and low aesthetic cohesion across a variety of image features. Profile images of anxious users are similarly marked by grayscale and low aesthetic cohesion, but less so than those of depressed users. Finally, we show that image features can be used to predict depression and anxiety, and that multitask learning that includes a joint modeling of demographics improves prediction performance. Overall, we find that the image attributes that mark depression and anxiety offer a rich lens into these conditions largely congruent with the psychological literature, and that images on Twitter allow inferences about the mental health status of users.Comment: ICWSM 201

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management

    Get PDF
    This paper investigates research using VGI and geo-social media in the disaster management context. Relying on the method of systematic mapping, it develops a classification schema that captures three levels of main category, focus, and intended use, and analyzes the relationships with the employed data sources and analysis methods. It focuses the scope to the pioneering field of disaster management, but the described approach and the developed classification schema are easily adaptable to different application domains or future developments. The results show that a hypothesized consolidation of research, characterized through the building of canonical bodies of knowledge and advanced application cases with refined methodology, has not yet happened. The majority of the studies investigate the challenges and potential solutions of data handling, with fewer studies focusing on socio-technological issues or advanced applications. This trend is currently showing no sign of change, highlighting that VGI research is still very much technology-driven as opposed to theory- or application-driven. From the results of the systematic mapping study, the authors formulate and discuss several research objectives for future work, which could lead to a stronger, more theory-driven treatment of the topic VGI in GIScience.Carlos Granell has been partly funded by the Ramón y Cajal Programme (grant number RYC-2014-16913

    Three Essays on Opinion Mining of Social Media Texts

    Get PDF
    This dissertation research is a collection of three essays on opinion mining of social media texts. I explore different theoretical and methodological perspectives in this inquiry. The first essay focuses on improving lexicon-based sentiment classification. I propose a method to automatically generate a sentiment lexicon that incorporates knowledge from both the language domain and the content domain. This method learns word associations from a large unannotated corpus. These associations are used to identify new sentiment words. Using a Twitter data set containing 743,069 tweets related to the stock market, I show that the sentiment lexicons generated using the proposed method significantly outperforms existing sentiment lexicons in sentiment classification. As sentiment analysis is being applied to different types of documents to solve different problems, the proposed method provides a useful tool to improve sentiment classification. The second essay focuses on improving supervised sentiment classification. In previous work on sentiment classification, a document was typically represented as a collection of single words. This method of feature representation suffers from severe ambiguity, especially in classifying short texts, such as microblog messages. I propose the use of dependency features in sentiment classification. A dependency describes the relationship between a pair of words even when they are distant. I compare the sentiment classification performance of dependency features with a few commonly used features in different experiment settings. The results show that dependency features significantly outperform existing feature representations. In the third essay, I examine the relationship between social media sentiment and stock returns. This is the first study to test the bidirectional effects in this relationship. Based on theories in behavioral finance research, I speculate that social media sentiment does not predict stock return, but rather that stock return predicts social media sentiment. I empirically test a set of research hypotheses by applying the vector autoregression (VAR) model on a social media data set, which is much larger than those used in previous studies. The hypotheses are supported by the results. The findings have significant implications for both theory and practice

    From social media to expert reports: automatically validating and extending complex conceptual models using machine learning approaches

    Get PDF
    Given the importance of developing accurate models of any complex system, the modeling process often seeks to be comprehensive by including experts and community members. While many qualitative modeling processes can produce models in the form of maps (e.g., cognitive/concept mapping, causal loop diagrams), they are generally conducted with a facilitator. The limited capacity of the facilitators limits the number of participants. The need to be either physically present (for face-to-face sessions) or at least in a compatible time zone (for phone interviews) also limits the geographical diversity of participants. In addition, participants may not openly express their beliefs (e.g., weight discrimination, political views) when perceiving that they may not be well received by a facilitator or others in the room. In contrast, the naturally occurring exchange of perspectives on social media provides an unobtrusive approach to collecting beliefs on causes and consequences of such complex systems. Mining social media also supports a scalable approach and a geographically diverse sample. While obtaining a conceptual model via social media can inform policymakers about popular support for possible policies, the model may stand in stark contrast with an expert-based model. Identifying and reconciling these differences is an important step to integrate social computing with policy making. The pipeline to automatically validate large conceptual models, here of obesity and politics using large text data-set (academic reports or social media like Twitter) comprise technical innovation of applying machine learning approaches. This is achieved by generating relevant keywords using wordnet interface from NLTK, articulating topic modelling using gensim LDA model, entity recognition using Google Cloud Natural language processing API and categorizing themes by count vectorizer and tf-idf transformer using scikit-learn library. Once the pipeline validates the model, it is further suggested for extension by mining literature or Twitter conversations and using Granger causality tests on the time series gained from respective sources of data. Later we realize the impact of the shift in public opinion on Twitter, which can alter the results of validation and extension of conceptual models while using our computational methods. So we finally compare the sentiment analysis and sarcasm detection results on these conceptual models. Analyzing these results we discuss whether the confirmed and extended associations in our conceptual model are an artifact of our method or an accurate reflection of events related to that complex conceptual model. The combination of these machine learning approaches will help us automatically confirm and extend complex conceptual models with less hassle of money, time and resources. It can be used for automatically formulating public policies which are created in response to issues brought before decision makers, instead we create them using issues discussed everyday on social media platform
    • …
    corecore