7 research outputs found

    Explainable and High-Performance Hate and Offensive Speech Detection

    Full text link
    The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, which can reduce trust, there is a need to create explainable and interpretable models. Thus, we build an explainable and interpretable high performance model based on the XGBoost algorithm, trained on Twitter data. For unbalanced Twitter data, XGboost outperformed the LSTM, AutoGluon, and ULMFiT models on hate speech detection with an F1 score of 0.75 compared to 0.38 and 0.37, and 0.38 respectively. When we down-sampled the data to three separate classes of approximately 5000 tweets, XGBoost performed better than LSTM, AutoGluon, and ULMFiT; with F1 scores for hate speech detection of 0.79 vs 0.69, 0.77, and 0.66 respectively. XGBoost also performed better than LSTM, AutoGluon, and ULMFiT in the down-sampled version for offensive speech detection with F1 score of 0.83 vs 0.88, 0.82, and 0.79 respectively. We use Shapley Additive Explanations (SHAP) on our XGBoost models' outputs to makes it explainable and interpretable compared to LSTM, AutoGluon and ULMFiT that are black-box models

    How is Vaping Framed on Online Knowledge Dissemination Platforms?

    Full text link
    We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venues for those looking to transition from smoking to vaping. Other platforms (Reddit, wikiHow) are more for vaping hobbyists and may not sufficiently dissuade youth vaping. Conversely, Wikipedia may exaggerate vaping harms, dissuading smokers from transitioning. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to design informational tools to reinforce or mitigate vaping (mis)perceptions online.Comment: arXiv admin note: text overlap with arXiv:2206.07765, arXiv:2206.0902

    US News and Social Media Framing around Vaping

    Get PDF
    In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news media framing of vaping shifted over time in line with emergent regulatory trends, such as; flavored vaping bans, with little discussion around vaping as a smoking cessation tool. We found that social media discussions were far more varied, with transitions toward vaping both as a public health harm and as a smoking cessation tool. Our cloze test, dynamic topic model, and question answering showed similar patterns, where social media, but not news media, characterizes vaping as combustible cigarette substitute. We use n-grams to detail that social media data first centered on vaping as a smoking cessation tool, and in 2019 moved toward narratives around vaping regulation, similar to news media frames. Overall, social media tracks the evolution of vaping as a social practice, while news media reflects more risk based concerns. A strength of our work is how the different techniques we have applied validate each other. Stakeholders may utilize our findings to intervene around the framing of vaping, and may design communications campaigns that improve the way society sees vaping, thus possibly aiding smoking cessation; and reducing youth vaping

    Partisan US News Media Representations of Syrian Refugees

    Full text link
    We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes

    The Right To Confront Your Accusers: Opening the Black Box of Forensic DNA Software

    No full text
    The results of forensic DNA software systems are regularly introduced as compelling evidence in criminal trials, but requests by defendants to evaluate how these results are generated are often denied. Furthermore, there is mounting evidence of problems such as failures to disclose substantial changes in methodology to oversight bodies and substantial differences in the results generated by different software systems. In a society that purports to guarantee defendants the right to face their accusers and confront the evidence against them, what then is the role of black-box forensic software systems in moral decision making in criminal justice? In this paper, we examine the case of the Forensic Statistical Tool (FST), a forensic DNA system developed in 2010 by New York City\u27s Office of Chief Medical Examiner (OCME). For over 5 years, expert witness review requested by defense teams was denied, even under protective order, while the system was used in over 1300 criminal cases. When the first expert review was finally permitted in 2016, many problems were identified including an undisclosed function capable of dropping evidence that could be beneficial to the defense. Overall, the findings were so substantial that a motion to release the full source code of FST publicly was granted. In this paper, we quantify the impact of this undisclosed function on samples from OCME\u27s own validation study and discuss the potential impact on individual defendants. Specifically, we find that 104 of the 439 samples (23.7%) triggered the undisclosed data-dropping behavior and that the change skewed results toward false inclusion for individuals whose DNA was not present in an evidence sample. Beyond this, we consider what changes in the criminal justice system could prevent problems like this from going unresolved in the future

    The Right To Confront Your Accusers: Opening the Black Box of Forensic DNA Software

    No full text
    The results of forensic DNA software systems are regularly introduced as compelling evidence in criminal trials, but requests by defendants to evaluate how these results are generated are often denied. Furthermore, there is mounting evidence of problems such as failures to disclose substantial changes in methodology to oversight bodies and substantial differences in the results generated by different software systems. In a society that purports to guarantee defendants the right to face their accusers and confront the evidence against them, what then is the role of black-box forensic software systems in moral decision making in criminal justice? In this paper, we examine the case of the Forensic Statistical Tool (FST), a forensic DNA system developed in 2010 by New York City\u27s Office of Chief Medical Examiner (OCME). For over 5 years, expert witness review requested by defense teams was denied, even under protective order, while the system was used in over 1300 criminal cases. When the first expert review was finally permitted in 2016, many problems were identified including an undisclosed function capable of dropping evidence that could be beneficial to the defense. Overall, the findings were so substantial that a motion to release the full source code of FST publicly was granted. In this paper, we quantify the impact of this undisclosed function on samples from OCME\u27s own validation study and discuss the potential impact on individual defendants. Specifically, we find that 104 of the 439 samples (23.7%) triggered the undisclosed data-dropping behavior and that the change skewed results toward false inclusion for individuals whose DNA was not present in an evidence sample. Beyond this, we consider what changes in the criminal justice system could prevent problems like this from going unresolved in the future
    corecore