5 research outputs found

    DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

    Full text link
    During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time. On the other hand, semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others, resulting in substantially negative effects on disaster monitoring and rescue. In this paper, we first study two recent debiasing methods on semi-supervised crisis tweet classification. Then we propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training iteration. Extensive experiments are conducted to compare different debiasing methods' performance and generalization ability in both in-distribution and out-of-distribution settings. The results demonstrate the superior performance of our proposed method. Our code is available at https://github.com/HenryPengZou/DeCrisisMB.Comment: Accepted by EMNLP 2023 (Findings

    Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions

    Full text link
    Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution in of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods

    A heuristic approach to flood evacuation planning

    Get PDF
    Flood evacuation planning models are an important tool used in preparation for flooding events. Authorities use the plans generated by flood evacuation models to evacuate the population as quickly as possible. Contemporary models consider the whole solution space and use a stochastic search to explore and produce solutions. The one issue with stochastic approaches is that they cannot guarantee the optimality of the solution and it is important that the plans be of a high quality. We present a heuristically driven flood evacuation planning model; the proposed heuristic is deterministic, which allows the model to avoid this problem. The determinism of the model means that the optimality of solutions found can be readily verified

    Covid-19 in France: Crisis communication and on-the-fly training of populations

    Get PDF
    Among the countries of the world, 99.3 % have been affected by the Covid-19 pandemic, which has been raging since the end of 2019 and continues today in 2021. This initially health crisis quickly turned into a pandemic affecting all possible aspects of a crisis (humanitarian, economic, social, etc.). In this article, we are interested in two important aspects of crisis management: communication / dissemination of information and preparing populations for risks. Following interviews with the population, we set an example of the importance of communicating information, preparing populations for future crises and the impact of populations’ behaviors

    The use and reporting of airline passenger data for infectious disease modelling:a systematic review

    Get PDF
    Background A variety of airline passenger data sources are used for modelling the international spread of infectious diseases. Questions exist regarding the suitability and validity of these sources. Aim We conducted a systematic review to identify the sources of airline passenger data used for these purposes and to assess validation of the data and reproducibility of the methodology. Methods Articles matching our search criteria and describing a model of the international spread of human infectious disease, parameterised with airline passenger data, were identified. Information regarding type and source of airline passenger data used was collated and the studies’ reproducibility assessed. Results We identified 136 articles. The majority (n = 96) sourced data primarily used by the airline industry. Governmental data sources were used in 30 studies and data published by individual airports in four studies. Validation of passenger data was conducted in only seven studies. No study was found to be fully reproducible, although eight were partially reproducible. Limitations By limiting the articles to international spread, articles focussed on within-country transmission even if they used relevant data sources were excluded. Authors were not contacted to clarify their methods. Searches were limited to articles in PubMed, Web of Science and Scopus. Conclusion We recommend greater efforts to assess validity and biases of airline passenger data used for modelling studies, particularly when model outputs are to inform national and international public health policies. We also recommend improving reporting standards and more detailed studies on biases in commercial and open-access data to assess their reproducibility
    corecore