6 research outputs found

    System and Method for Truth Discovery in social media Big Data

    Get PDF
    Within the span of enormous information and the coming of numerous advancements in the communication technologies, at every tick of the clock, enormous sums of information is produced from different sources. One such source of data generation is social media. However, such data carries much of the noisy, uncertain, and untrustworthy data. In this way, finding independable information from loud information is one of the characteristic challenges of huge information focusing on the esteem characteristic of enormous information. Therefore, in this article, an attempt is made to target a few challenges arriving from “misinformation spread”, “data sparsity” or the “long-tail wonder” in the domain of social media data analytics. The study uses an instance from the Online Social Network (OSN) datasets to develop scalable to wide-range social sensing by consolidating Scalable Robust Trust Discovery (SRTD) plots to address the mentioned challenges utilizing the distributed parallel computing framework. The dataset picked for investigation includes 128,483 tweets which incorporates 20% deception, 80% retweets bringing about 0.05 milliseconds utilizing Spark parallel processing

    Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

    Get PDF
    Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents

    Towards an axiomatic approach to truth discovery

    Get PDF
    The problem of truth discovery, i.e., of trying to find the true facts concerning a number of objects based on reports from various information sources of unknown trustworthiness, has received increased attention recently. The problem is made interesting by the fact that the relative believability of facts depends on the trustworthiness of their sources, which in turn depends on the believability of the facts the sources report. Several algorithms for truth discovery have been proposed, but their evaluation has mainly been performed experimentally by computing accuracy against large datasets. Furthermore, it is often unclear how these algorithms behave on an intuitive level. In this paper we take steps towards a framework for truth discovery which allows comparison and evaluation of algorithms based instead on their theoretical properties. To do so we pose truth discovery as a social choice problem, and formulate various axioms that any reasonable algorithm should satisfy. Along the way we provide an axiomatic characterisation of the baseline ‘Voting’ algorithm – which leads to an impossibility result showing that a certain combination of the axioms cannot hold simultaneously – and check which axioms a particular well-known algorithm satisfies. We find that, surprisingly, our more fundamental axioms do not hold, and propose modifications to the algorithms to partially fix these problems

    Computational and Causal Approaches on Social Media and Multimodal Sensing Data: Examining Wellbeing in Situated Contexts

    Get PDF
    A core aspect of our lives is often embedded in the communities we are situated in. The interconnectedness of our interactions and experiences intertwines our situated context with our wellbeing. A better understanding of wellbeing will help us devise proactive and tailored support strategies. However, existing methodologies to assess wellbeing suffer from limitations of scale and timeliness. These limitations are surmountable by social and ubiquitous technologies. Given its ubiquity and wide use, social media can be considered a “passive sensor” that can act as a complementary source of unobtrusive, real-time, and naturalistic data to infer wellbeing. This dissertation leverages social media in concert with multimodal sensing data, which facilitate analyzing dense and longitudinal behavior at scale. This work adopts machine learning, natural language, and causal inference analysis to infer wellbeing of individuals and collectives, particularly in situated communities, such as college campuses and workplaces. Before incorporating sensing modalities in practice, we need to account for confounds. One such confound that might impact behavior change is the phenomenon of “observer effect” --- that individuals may deviate from their typical or otherwise normal behavior because of the awareness of being “monitored”. I study this problem by leveraging the potential of longitudinal and historical behavioral data through social media. Focused on a multimodal sensing study, I conduct a causal study to measure observer effect in social media behavior, and explain the observations through existing theory in psychology and social science. The findings provide recommendations to correcting biases due to observer effect in social media sensing for human behavior and wellbeing. The novelties and contributions of this dissertation are four-fold. First, I use social media data that uniquely captures the behavior of situated communities. Second, I adopt theory-driven computational and causal methods to make conclusive research claims on wellbeing dynamics. Third, I address major challenges with methods to combine social media with multimodal sensing data for a comprehensive understanding of human behavior. Fourth, I draw interpretations and explanations of online-data-driven offline inferences. This dissertation situates the findings in an interdisciplinary context, including psychology and social science, and bears implications from theoretical, practical, design, methodological, and ethical perspectives catering to various stakeholders, including researchers, practitioners, and policymakers.Ph.D
    corecore