1,909 research outputs found

    Word Adjacency Graph Modeling: Separating Signal From Noise in Big Data

    Get PDF
    There is a need to develop methods to analyze Big Data to inform patient-centered interventions for better health outcomes. The purpose of this study was to develop and test a method to explore Big Data to describe salient health concerns of people with epilepsy. Specifically, we used Word Adjacency Graph modeling to explore a data set containing 1.9 billion anonymous text queries submitted to the ChaCha question and answer service to (a) detect clusters of epilepsy-related topics, and (b) visualize the range of epilepsy-related topics and their mutual proximity to uncover the breadth and depth of particular topics and groups of users. Applied to a large, complex data set, this method successfully identified clusters of epilepsy-related topics while allowing for separation of potentially non-relevant topics. The method can be used to identify patient-driven research questions from large social media data sets and results can inform the development of patient-centered interventions

    Multimedia question answering

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Empirical Methodology for Crowdsourcing Ground Truth

    Full text link
    The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa

    Health Misinformation in Search and Social Media

    Get PDF
    People increasingly rely on the Internet in order to search for and share health-related information. Indeed, searching for and sharing information about medical treatments are among the most frequent uses of online data. While this is a convenient and fast method to collect information, online sources may contain incorrect information that has the potential to cause harm, especially if people believe what they read without further research or professional medical advice. The goal of this thesis is to address the misinformation problem in two of the most commonly used online services: search engines and social media platforms. We examined how people use these platforms to search for and share health information. To achieve this, we designed controlled laboratory user studies and employed large-scale social media data analysis tools. The solutions proposed in this thesis can be used to build systems that better support people's health-related decisions. The techniques described in this thesis addressed online searching and social media sharing in the following manner. First, with respect to search engines, we aimed to determine the extent to which people can be influenced by search engine results when trying to learn about the efficacy of various medical treatments. We conducted a controlled laboratory study wherein we biased the search results towards either correct or incorrect information. We then asked participants to determine the efficacy of different medical treatments. Results showed that people were significantly influenced both positively and negatively by search results bias. More importantly, when the subjects were exposed to incorrect information, they made more incorrect decisions than when they had no interaction with the search results. Following from this work, we extended the study to gain insights into strategies people use during this decision-making process, via the think-aloud method. We found that, even with verbalization, people were strongly influenced by the search results bias. We also noted that people paid attention to what the majority states, authoritativeness, and content quality when evaluating online content. Understanding the effects of cognitive biases that can arise during online search is a complex undertaking because of the presence of unconscious biases (such as the search results ranking) that the think-aloud method fails to show. Moving to social media, we first proposed a solution to detect and track misinformation in social media. Using Zika as a case study, we developed a tool for tracking misinformation on Twitter. We collected 13 million tweets regarding the Zika outbreak and tracked rumors outlined by the World Health Organization and the Snopes fact-checking website. We incorporated health professionals, crowdsourcing, and machine learning to capture health-related rumors as well as clarification communications. In this way, we illustrated insights that the proposed tools provide into potentially harmful information on social media, allowing public health researchers and practitioners to respond with targeted and timely action. From identifying rumor-bearing tweets, we examined individuals on social media who are posting questionable health-related information, in particular those promoting cancer treatments that have been shown to be ineffective. Specifically, we studied 4,212 Twitter users who have posted about one of 139 ineffective ``treatments'' and compared them to a baseline of users generally interested in cancer. Considering features that capture user attributes, writing style, and sentiment, we built a classifier that is able to identify users prone to propagating such misinformation. This classifier achieved an accuracy of over 90%, providing a potential tool for public health officials to identify such individuals for preventive intervention

    Framework for opinion as a service on review data of customer using semantics based analytics

    Get PDF
    At Opinion mining plays a significant role in representing the original and unbiased perception of the products/services. However, there are various challenges associated with performing an effective opinion mining in the present era of distributed computing system with dynamic behaviour of users. Existing approaches is more laborious towards extracting knowledge from the reviews of user which is further subjected to various rounds of operation with complex procedures. The proposed system addresses the problem by introducing a novel framework called as Opinion-as-a-Service which is meant for direct utilization of the extracted knowledge in most user friendly manner. The proposed system introduces a set of three sequential algorithm that performs aggregated of incoming stream of opinion data, performing indexing, followed by applying semantics for extracting knowledge. The study outcome shows that proposed system is better than existing system in mining performance

    Using crowdsourced geospatial data to aid in nuclear proliferation monitoring

    Get PDF
    In 2014, a Defense Science Board Task Force was convened in order to assess and explore new technologies that would aid in nuclear proliferation monitoring. One of their recommendations was for the director of National Intelligence to explore ways that crowdsourced geospatial imagery technologies could aid existing governmental efforts. Our research builds directly on this recommendation and provides feedback on some of the most successful examples of crowdsourced geospatial data (CGD). As of 2016, Special Operations Command (SOCOM) has assumed the new role of becoming the primary U.S. agency responsible for counter-proliferation. Historically, this institution has always been reliant upon other organizations for the execution of its myriad of mission sets. SOCOM's unique ability to build relationships makes it particularly suited to the task of harnessing CGD technologies and employing them in the capacity that our research recommends. Furthermore, CGD is a low cost, high impact tool that is already being employed by commercial companies and non-profit groups around the world. By employing CGD, a wider whole-of-government effort can be created that provides a long term, cohesive engagement plan for facilitating a multi-faceted nuclear proliferation monitoring process.http://archive.org/details/usingcrowdsource1094551570Major, United States ArmyMajor, United States ArmyApproved for public release; distribution is unlimited

    Improving User Experience In Information Retrieval Using Semantic Web And Other Technologies

    Get PDF
    The need to find, access and extract information has been the motivation for many different fields of research in the past few years. The fields such as Machine Learning, Question Answering Systems, Semantic Web, etc. each tries to cover parts of the mentioned problem. Each of these fields have introduced many different tools and approaches which in many cases are multi-disciplinary, covering more than one of these fields to provide solution for one or more of them. On the other hand, the expansion of the Web with Web 2.0, gave researchers many new tools to extend approaches to help users extract and find information faster and easier. Currently, the size of e-commerce and online shopping, the extended use of search engines for different purposes and the amount of collaboration for creating content on the Web provides us with different possibilities and challenges which we address some of them here

    A classification of social media methods of environmental scanning for entrepreneurial opportunity development

    Get PDF
    A limited amount of scholarly literature has focused on environmental scanning and the use of social media by nascent entrepreneurs. This paper aims to address these deficiencies in literature. A theoretical framework is presented that describes the level of scanning towards entrepreneurial opportunity development and includes fifteen social media based methods for scanning the environment with the objective of entrepreneurial opportunity development. This methods are reviewed on their data collection, interpretation and learning. Several implications for both practice and future research derive from this framework and are discussed

    A Visual Analytics System for Making Sense of Real-Time Twitter Streams

    Get PDF
    Through social media platforms, massive amounts of data are being produced. Twitter, as one such platform, enables users to post “tweets” on an unprecedented scale. Once analyzed by machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This thesis describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and ML) to help users make sense of the real-time streams on Twitter. As proofs of concept, public-health and political discussions were analyzed. The system not only provides categorized and aggregate results but also enables the stakeholders to diagnose and to heuristically suggest fixes for the errors in the outcome
    corecore