7 research outputs found
Explainable and High-Performance Hate and Offensive Speech Detection
The spread of information through social media platforms can create
environments possibly hostile to vulnerable communities and silence certain
groups in society. To mitigate such instances, several models have been
developed to detect hate and offensive speech. Since detecting hate and
offensive speech in social media platforms could incorrectly exclude
individuals from social media platforms, which can reduce trust, there is a
need to create explainable and interpretable models. Thus, we build an
explainable and interpretable high performance model based on the XGBoost
algorithm, trained on Twitter data. For unbalanced Twitter data, XGboost
outperformed the LSTM, AutoGluon, and ULMFiT models on hate speech detection
with an F1 score of 0.75 compared to 0.38 and 0.37, and 0.38 respectively. When
we down-sampled the data to three separate classes of approximately 5000
tweets, XGBoost performed better than LSTM, AutoGluon, and ULMFiT; with F1
scores for hate speech detection of 0.79 vs 0.69, 0.77, and 0.66 respectively.
XGBoost also performed better than LSTM, AutoGluon, and ULMFiT in the
down-sampled version for offensive speech detection with F1 score of 0.83 vs
0.88, 0.82, and 0.79 respectively. We use Shapley Additive Explanations (SHAP)
on our XGBoost models' outputs to makes it explainable and interpretable
compared to LSTM, AutoGluon and ULMFiT that are black-box models
How is Vaping Framed on Online Knowledge Dissemination Platforms?
We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is
framed across multiple knowledge dissemination platforms (Wikipedia, Quora,
Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to
understand these differences. For example, n-grams, emotion recognition, and
question answering results indicate that Medium, Quora, and Stack Exchange are
appropriate venues for those looking to transition from smoking to vaping.
Other platforms (Reddit, wikiHow) are more for vaping hobbyists and may not
sufficiently dissuade youth vaping. Conversely, Wikipedia may exaggerate vaping
harms, dissuading smokers from transitioning. A strength of our work is how the
different techniques we have applied validate each other. Based on our results,
we provide several recommendations. Stakeholders may utilize our findings to
design informational tools to reinforce or mitigate vaping (mis)perceptions
online.Comment: arXiv admin note: text overlap with arXiv:2206.07765,
arXiv:2206.0902
US News and Social Media Framing around Vaping
In this paper, we investigate how vaping is framed differently (2008-2021)
between US news and social media. We analyze 15,711 news articles and 1,231,379
Facebook posts about vaping to study the differences in framing between media
varieties. We use word embeddings to provide two-dimensional visualizations of
the semantic changes around vaping for news and for social media. We detail
that news media framing of vaping shifted over time in line with emergent
regulatory trends, such as; flavored vaping bans, with little discussion around
vaping as a smoking cessation tool. We found that social media discussions were
far more varied, with transitions toward vaping both as a public health harm
and as a smoking cessation tool. Our cloze test, dynamic topic model, and
question answering showed similar patterns, where social media, but not news
media, characterizes vaping as combustible cigarette substitute. We use n-grams
to detail that social media data first centered on vaping as a smoking
cessation tool, and in 2019 moved toward narratives around vaping regulation,
similar to news media frames. Overall, social media tracks the evolution of
vaping as a social practice, while news media reflects more risk based
concerns. A strength of our work is how the different techniques we have
applied validate each other. Stakeholders may utilize our findings to intervene
around the framing of vaping, and may design communications campaigns that
improve the way society sees vaping, thus possibly aiding smoking cessation;
and reducing youth vaping
Partisan US News Media Representations of Syrian Refugees
We investigate how representations of Syrian refugees (2011-2021) differ
across US partisan news outlets. We analyze 47,388 articles from the online US
media about Syrian refugees to detail differences in reporting between left-
and right-leaning media. We use various NLP techniques to understand these
differences. Our polarization and question answering results indicated that
left-leaning media tended to represent refugees as child victims, welcome in
the US, and right-leaning media cast refugees as Islamic terrorists. We noted
similar results with our sentiment and offensive speech scores over time, which
detail possibly unfavorable representations of refugees in right-leaning media.
A strength of our work is how the different techniques we have applied validate
each other. Based on our results, we provide several recommendations.
Stakeholders may utilize our findings to intervene around refugee
representations, and design communications campaigns that improve the way
society sees refugees and possibly aid refugee outcomes
The Right To Confront Your Accusers: Opening the Black Box of Forensic DNA Software
The results of forensic DNA software systems are regularly introduced as compelling evidence in criminal trials, but requests by defendants to evaluate how these results are generated are often denied. Furthermore, there is mounting evidence of problems such as failures to disclose substantial changes in methodology to oversight bodies and substantial differences in the results generated by different software systems. In a society that purports to guarantee defendants the right to face their accusers and confront the evidence against them, what then is the role of black-box forensic software systems in moral decision making in criminal justice? In this paper, we examine the case of the Forensic Statistical Tool (FST), a forensic DNA system developed in 2010 by New York City\u27s Office of Chief Medical Examiner (OCME). For over 5 years, expert witness review requested by defense teams was denied, even under protective order, while the system was used in over 1300 criminal cases. When the first expert review was finally permitted in 2016, many problems were identified including an undisclosed function capable of dropping evidence that could be beneficial to the defense. Overall, the findings were so substantial that a motion to release the full source code of FST publicly was granted. In this paper, we quantify the impact of this undisclosed function on samples from OCME\u27s own validation study and discuss the potential impact on individual defendants. Specifically, we find that 104 of the 439 samples (23.7%) triggered the undisclosed data-dropping behavior and that the change skewed results toward false inclusion for individuals whose DNA was not present in an evidence sample. Beyond this, we consider what changes in the criminal justice system could prevent problems like this from going unresolved in the future
The Right To Confront Your Accusers: Opening the Black Box of Forensic DNA Software
The results of forensic DNA software systems are regularly introduced as compelling evidence in criminal trials, but requests by defendants to evaluate how these results are generated are often denied. Furthermore, there is mounting evidence of problems such as failures to disclose substantial changes in methodology to oversight bodies and substantial differences in the results generated by different software systems. In a society that purports to guarantee defendants the right to face their accusers and confront the evidence against them, what then is the role of black-box forensic software systems in moral decision making in criminal justice? In this paper, we examine the case of the Forensic Statistical Tool (FST), a forensic DNA system developed in 2010 by New York City\u27s Office of Chief Medical Examiner (OCME). For over 5 years, expert witness review requested by defense teams was denied, even under protective order, while the system was used in over 1300 criminal cases. When the first expert review was finally permitted in 2016, many problems were identified including an undisclosed function capable of dropping evidence that could be beneficial to the defense. Overall, the findings were so substantial that a motion to release the full source code of FST publicly was granted. In this paper, we quantify the impact of this undisclosed function on samples from OCME\u27s own validation study and discuss the potential impact on individual defendants. Specifically, we find that 104 of the 439 samples (23.7%) triggered the undisclosed data-dropping behavior and that the change skewed results toward false inclusion for individuals whose DNA was not present in an evidence sample. Beyond this, we consider what changes in the criminal justice system could prevent problems like this from going unresolved in the future