14,414 research outputs found

    Reinventing the Social Scientist and Humanist in the Era of Big Data

    Get PDF
    This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences

    Intelligent Management and Efficient Operation of Big Data

    Get PDF
    This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources, the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services, and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    A bimodal accessibility analysis of Australia using web-based resources

    Get PDF
    A range of potentially disruptive changes to research strategies have been taking root in the field of transport research. Many of these relate to the emergence of data sources and travel applications reshaping how we conduct accessibility analyses. This paper, based on Meire et al. (in press) and Meire and Derudder (under review), aims to explore the potential of some of these data sources by focusing on a concrete example: we introduce a framework for (road and air) transport data extraction and processing using publicly available web-based resources that can be accessed via web Application Programming Interfaces (APIs), illustrated by a case study evaluating the combined land- and airside accessibility of Australia at the level of statistical units. Given that car and air travel (or a combination thereof) are so dominant in the production of Australia’s accessibility landscape, a systematic bimodal accessibility analysis based on the automated extraction of web-based data shows the practical value of our research framework. With regard to our case study, results show a largely-expected accessibility pattern centred on major agglomerations, supplemented by a number of idiosyncratic and perhaps less-expected geographical patterns. Beyond the lessons learned from our case study, we show some of the major strengths and limitations of web-based data accessed via web-APIs for transport related research topics

    What to do about non-standard (or non-canonical) language in NLP

    Full text link
    Real world data differs radically from the benchmark corpora we use in natural language processing (NLP). As soon as we apply our technologies to the real world, performance drops. The reason for this problem is obvious: NLP models are trained on samples from a limited set of canonical varieties that are considered standard, most prominently English newswire. However, there are many dimensions, e.g., socio-demographics, language, genre, sentence type, etc. on which texts can differ from the standard. The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language. In this paper, I review the notion of canonicity, and how it shapes our community's approach to language. I argue for leveraging what I call fortuitous data, i.e., non-obvious data that is hitherto neglected, hidden in plain sight, or raw data that needs to be refined. If we embrace the variety of this heterogeneous data by combining it with proper algorithms, we will not only produce more robust models, but will also enable adaptive language technology capable of addressing natural language variation.Comment: KONVENS 201

    BIG DATA APPLICATIONS AND CHALLENGES IN GISCIENCE (CASE STUDIES: NATURAL DISASTER AND PUBLIC HEALTH CRISIS MANAGEMENT)

    Get PDF
    This dissertation examines the application and significance of user-generated big data in Geographic Information Science (GIScience), with a focus on managing natural disasters and public health crises. It explores the role of social media data in understanding human-environment interactions and in informing disaster management and public health strategies. A scalable computational framework will be developed to model extensive unstructured geotagged data from social media, facilitating systematic spatiotemporal data analysis.The research investigates how individuals and communities respond to high-impact events like natural disasters and public health emergencies, employing both qualitative and quantitative methods. In particular, it assesses the impact of socio-economic-demographic characteristics and the digital divide on social media engagement during such crises. In addressing the opioid crisis, the dissertation delves into the spatial dynamics of opioid overdose deaths, utilizing Multiscale Geographically Weighted Regression to discern local versus broader-scale determinants. This analysis foregrounds the necessity for targeted public health responses and the importance of localized data in crafting effective interventions, especially within communities that are ethnically diverse and economically disparate. Using Hurricane Irma as a case study, this dissertation analyzes social media activity in Florida in September 2017, leveraging Multiscale Geographically Weighted Regression to explore spatial variations in social media discourse, its correlation with damage severity, and the disproportionate impact on racialized communities. It integrates social media data analysis with political-ecological perspectives and spatial analytical techniques to reveal structural inequalities and political power differentials. The dissertation also tackles the dissemination of false information during the COVID-19 pandemic, examining Twitter activity in the United States from April to July 2020. It identifies misinformation patterns, their origins, and their association with the pandemic\u27s incidence rates. Discourse analysis pinpoints tweets that downplay the pandemic\u27s severity or spread disinformation, while spatial modeling investigates the relationship between social media discourse and disease spread. By concentrating on the experiences of racialized communities, this research aims to highlight and address the environmental and social injustices they face. It contributes empirical and methodological insights into effective policy formulation, with an emphasis on equitable responses to public health emergencies and natural disasters. This dissertation not only provides a nuanced understanding of crisis responses but also advances GIScience research by incorporating social media data into both traditional and critical analytical frameworks

    AMIR: Automated MisInformation Rebuttal -- A COVID-19 Vaccination Datasets based Recommendation System

    Full text link
    Misinformation has emerged as a major societal threat in recent years in general; specifically in the context of the COVID-19 pandemic, it has wrecked havoc, for instance, by fuelling vaccine hesitancy. Cost-effective, scalable solutions for combating misinformation are the need of the hour. This work explored how existing information obtained from social media and augmented with more curated fact checked data repositories can be harnessed to facilitate automated rebuttal of misinformation at scale. While the ideas herein can be generalized and reapplied in the broader context of misinformation mitigation using a multitude of information sources and catering to the spectrum of social media platforms, this work serves as a proof of concept, and as such, it is confined in its scope to only rebuttal of tweets, and in the specific context of misinformation regarding COVID-19. It leverages two publicly available datasets, viz. FaCov (fact-checked articles) and misleading (social media Twitter) data on COVID-19 Vaccination

    Mining Big Data for Tourist Hot Spots: Geographical Patterns of Online Footprints

    Get PDF
    Understanding the complex, and often unequal, spatiality of tourist demand in urban contexts requires other methodologies, among which the information base available online and in social networks has gained prominence. Innovation supported by Information and Communication Technologies in terms of data access and data exchange has emerged as a complementary supporting tool for the more traditional data collection techniques currently in use, particularly, in urban destinations where there is the need to more (near)real-time monitoring. The capacity to collect and analise massive amounts of data on individual and group behaviour is leading to new data-rich research approaches. This chapter addresses the potential for discovering geographical insights regarding tourists’ spatial patterns within a destination, based on the analysis of geotagged data available from two social networks. ·info:eu-repo/semantics/publishedVersio

    Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News

    Full text link
    COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus. Misinformation spread through online social networks (OSN) often misled people from following correct medical practices. In particular, OSN bots have been a primary source of disseminating false information and initiating cyber propaganda. Existing work neglects the presence of bots that act as a catalyst in the spread and focuses on fake news detection in 'articles shared in posts' rather than the post (textual) content. Most work on misinformation detection uses manually labeled datasets that are hard to scale for building their predictive models. In this research, we overcome this challenge of data scarcity by proposing an automated approach for labeling data using verified fact-checked statements on a Twitter dataset. In addition, we combine textual features with user-level features (such as followers count and friends count) and tweet-level features (such as number of mentions, hashtags and urls in a tweet) to act as additional indicators to detect misinformation. Moreover, we analyzed the presence of bots in tweets and show that bots change their behavior over time and are most active during the misinformation campaign. We collected 10.22 Million COVID-19 related tweets and used our annotation model to build an extensive and original ground truth dataset for classification purposes. We utilize various machine learning models to accurately detect misinformation and our best classification model achieves precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot analysis indicates that bots generated approximately 10% of misinformation tweets. Our methodology results in substantial exposure of false information, thus improving the trustworthiness of information disseminated through social media platforms

    Data-Driven Meets Theory-Driven Research in the Era of Big Data: Opportunities and Challenges for Information Systems Research

    Get PDF
    The era of big data provides many opportunities for conducting impactful research from both data-driven and theory-driven perspectives. However, data-driven and theory-driven research have progressed somewhat independently. In this paper, we develop a framework that articulates important differences between these two perspectives and propose a role for information systems research at their intersection. The framework presents a set of pathways that combine the data-driven and theory-driven perspectives. From these pathways, we derive a set of challenges, and show how they can be addressed by research in information systems. By doing so, we identify an important role that information systems research can play in advancing both data-driven and theory-driven research in the era of big data
    corecore